Chapter 02-S02: Normal–Normal Conjugacy for One Mean

library(glmbayes)
if (requireNamespace("bayesrules", quietly = TRUE)) library(bayesrules)

1. The model

1.1 Likelihood

Suppose we observe \(n\) measurements \(y_1, \dots, y_n\) drawn independently from a Normal distribution with unknown mean \(\mu\) and known standard deviation \(\sigma\):

\[ Y_i \mid \mu \;\sim\; \mathcal{N}(\mu,\, \sigma^2), \qquad i = 1,\dots,n. \]

The sufficient statistic is the sample mean \(\bar y\), which itself follows

\[ \bar{y} \mid \mu \;\sim\; \mathcal{N}\!\left(\mu,\, \frac{\sigma^2}{n}\right). \]

In practice \(\sigma\) is rarely truly known, but treating it as fixed provides the cleanest template for understanding posterior updating. The full treatment — unknown mean and unknown variance — is handled in Chapter 3 with lmb().

1.2 Prior

Since \(\mu\) can take any real value, the Normal distribution is the natural conjugate prior:

\[ \mu \;\sim\; \mathcal{N}(\mu_0,\, \tau_0^2). \]

Here \(\mu_0\) is your prior best guess and \(\tau_0\) encodes your uncertainty about it.

1.3 Posterior derivation

The key is to work with precision (reciprocal variance): \(\kappa = 1/\sigma^2\).

Symbol Formula Meaning
\(\kappa_0 = 1/\tau_0^2\) Prior precision Information in the prior
\(\kappa_L = n/\sigma^2\) Likelihood precision Information in the data

Multiplying the Normal likelihood by the Normal prior kernel and completing the square gives the conjugate posterior:

\[ \mu \mid \bar{y} \;\sim\; \mathcal{N}\!\left(\mu_n,\; \frac{1}{\kappa_n}\right), \]

where

\[ \kappa_n \;=\; \kappa_0 + \kappa_L \qquad\text{(precisions add)} \]

\[ \mu_n \;=\; \frac{\kappa_0\,\mu_0 + \kappa_L\,\bar{y}}{\kappa_n} \;=\; \frac{\kappa_0}{\kappa_n}\,\mu_0 \;+\; \frac{\kappa_L}{\kappa_n}\,\bar{y}. \]

The posterior mean \(\mu_n\) is a precision-weighted average of the prior mean and the data mean.

Intuition.

  • Precisions add because information is additive: each independent source of knowledge about \(\mu\) contributes its own precision.
  • A very diffuse prior (\(\tau_0 \to \infty\), \(\kappa_0 \to 0\)) gives \(\mu_n \approx \bar y\) — the data take over entirely.
  • A very small sample (\(n\) small) or noisy data (\(\sigma\) large) keeps \(\kappa_L\) small, so the prior retains more influence.
  • As \(n \to \infty\), \(\mu_n \to \bar y\) and the posterior variance \(1/\kappa_n \to 0\) — the Bayesian estimate converges to the classical MLE.

2. bayesrules illustration

The bayesrules package provides plot_normal_normal() and summarize_normal_normal() for quickly visualizing and summarizing the Normal–Normal update.

Scenario. A sleep researcher’s prior belief is that undergraduates average about 7 hours of sleep per night. They encode this as \(\mu_0 = 7\), \(\tau_0 = 1.5\) — fairly uncertain, allowing for means anywhere from 4 to 10 hours. The population standard deviation is assumed known at \(\sigma = 1.8\) hours (from large-scale studies). A sample of \(n = 22\) students yields a sample mean of \(\bar y = 6.1\) hours.

prior_mean <- 7;    prior_sd <- 1.5   ## prior: N(7, 1.5^2)
sigma      <- 1.8                     ## known population SD
y_bar      <- 6.1;  n_sleep  <- 22L   ## observed sample
library(bayesrules)
## Overlay prior, likelihood, and posterior
plot_normal_normal(
  mean       = prior_mean,
  sd         = prior_sd,
  sigma      = sigma,
  y_bar      = y_bar,
  n          = n_sleep,
  prior      = TRUE,
  likelihood = TRUE,
  posterior  = TRUE
)

## Tabular summary of prior, likelihood, and posterior parameters
summarize_normal_normal(
  mean  = prior_mean,
  sd    = prior_sd,
  sigma = sigma,
  y_bar = y_bar,
  n     = n_sleep
)
#>       model    mean    mode       var        sd
#> 1     prior 7.00000 7.00000 2.2500000 1.5000000
#> 2 posterior 6.15529 6.15529 0.1382253 0.3717866

Reading the output. The summarize_normal_normal() table reports the prior mean and SD, the likelihood (data) mean and SE, and the resulting posterior mean and SD. The plot overlays all three: the wide prior, the sharper likelihood centered at 6.1, and the posterior — narrower than both and sitting between them, pulled strongly toward the data because 22 observations outweigh the prior.

Analytic posterior.

prec_0 <- 1 / prior_sd^2
prec_L <- n_sleep / sigma^2
prec_n <- prec_0 + prec_L

post_mean <- (prec_0 * prior_mean + prec_L * y_bar) / prec_n
post_sd   <- sqrt(1 / prec_n)

nn_analytic <- data.frame(
  Example = "Sleep hours (BayesRules)",
  n = n_sleep,
  `y-bar` = y_bar,
  Posterior = sprintf("N(%.4f, %.4f^2)", post_mean, post_sd),
  Mean = post_mean,
  SD = post_sd,
  check.names = FALSE
)
knitr::kable(nn_analytic, digits = 4,
  caption = "Conjugate Normal--Normal posterior (known sigma)")
Conjugate Normal–Normal posterior (known sigma)
Example n y-bar Posterior Mean SD
Sleep hours (BayesRules) 22 6.1 N(6.1553, 0.3718^2) 6.1553 0.3718

3. Real data: Old Faithful waiting times

The faithful dataset (built into R) records waiting times between eruptions of the Old Faithful geyser in Yellowstone National Park. We treat the waiting time \(Y_i\) as approximately Normal with unknown mean \(\mu\) and use a Normal–Normal conjugate model.

Prior. Based on decades of visitor records, a geologist believes the mean waiting time is around 72 minutes, with considerable uncertainty (\(\tau_0 = 15\) minutes). We estimate the population SD from the data and treat it as fixed at \(\hat\sigma\).

y_faith <- faithful$waiting
n_faith <- length(y_faith)

ybar_faith  <- mean(y_faith)
sigma_faith <- sd(y_faith)           ## treat as known

prior_mean_f <- 72;  prior_sd_f <- 15   ## N(72, 15^2) prior

cat(sprintf("n = %d,  y-bar = %.2f,  sigma = %.2f\n",
            n_faith, ybar_faith, sigma_faith))
#> n = 272,  y-bar = 70.90,  sigma = 13.59

Analytic posterior.

prec_0f <- 1 / prior_sd_f^2
prec_Lf <- n_faith / sigma_faith^2
prec_nf <- prec_0f + prec_Lf

post_mean_f <- (prec_0f * prior_mean_f + prec_Lf * ybar_faith) / prec_nf
post_sd_f   <- sqrt(1 / prec_nf)

faith_analytic <- data.frame(
  Dataset = "Old Faithful waiting",
  n = n_faith,
  `y-bar` = ybar_faith,
  Posterior = sprintf("N(%.4f, %.4f^2)", post_mean_f, post_sd_f),
  Mean = post_mean_f,
  SD = post_sd_f,
  check.names = FALSE
)
knitr::kable(faith_analytic, digits = 4,
  caption = "Conjugate Normal--Normal posterior (known sigma)")
Conjugate Normal–Normal posterior (known sigma)
Dataset n y-bar Posterior Mean SD
Old Faithful waiting 272 70.8971 N(70.9004, 0.8231^2) 70.9004 0.8231

Reading the output. With \(n = 272\) observations, the likelihood precision dominates the prior, so the posterior mean is almost identical to \(\bar y\). The posterior SD is small — the data overwhelm the prior.

3.1 Fitting with lmb()

In glmbayes, the intercept-only model uses lmb() with dNormal() and fixed dispersion \(\sigma^2\):

df_faith <- data.frame(y = y_faith)

mu_f    <- matrix(prior_mean_f, nrow = 1, ncol = 1,
                  dimnames = list(NULL, "(Intercept)"))
Sigma_f <- matrix(prior_sd_f^2, nrow = 1, ncol = 1,
                  dimnames = list("(Intercept)", "(Intercept)"))

pf_faith <- dNormal(mu = mu_f, Sigma = Sigma_f,
                    dispersion = sigma_faith^2)

set.seed(2026)
fit_faith <- lmb(
  n       = 20000,
  y ~ 1,
  data    = df_faith,
  pfamily = pf_faith
)
print(fit_faith)
#> 
#> Call:  
#> lmb(formula = y ~ 1, pfamily = structure(list(pfamily = "dNormal", 
#>     prior_list = structure(list(mu = structure(72, dim = c(1L, 
#>     1L), dimnames = list(NULL, "(Intercept)")), Sigma = structure(225, dim = c(1L, 
#>     1L), dimnames = list("(Intercept)", "(Intercept)")), dispersion = 184.823312350771, 
#>         ddef = FALSE), "`Prior Type`" = "dNormal"), okfamilies = c("gaussian", 
#>     "poisson", "binomial", "quasipoisson", "quasibinomial", "Gamma"
#>     ), plinks = function (family) 
#>     {
#>         if (family$family == "gaussian") 
#>             oklinks <- c("identity")
#>         if (family$family == "poisson" || family$family == "quasipoisson") 
#>             oklinks <- c("log")
#>         if (family$family == "binomial" || family$family == "quasibinomial") 
#>             oklinks <- c("logit", "probit", "cloglog")
#>         if (family$family == "Gamma") 
#>             oklinks <- c("log")
#>         return(oklinks)
#>     }, simfun = function (n, y, x, prior_list, offset = NULL, 
#>         weights = 1, family = gaussian(), Gridtype = 2, n_envopt = NULL, 
#>         use_parallel = TRUE, use_opencl = FALSE, verbose = FALSE, 
#>         progbar = FALSE) 
#>     {
#>         offset2 = offset
#>         wt = weights
#>         if (missing(prior_list)) 
#>             stop("Prior Specification Missing")
#>         if (!missing(prior_list)) {
#>             if (!is.null(prior_list$mu)) 
#>                 mu = prior_list$mu
#>             if (!is.null(prior_list$Sigma)) 
#>                 Sigma = prior_list$Sigma
#>             if (!is.null(prior_list$P)) 
#>                 P = prior_list$P
#>             if (is.null(prior_list$P)) {
#>                 R <- chol(prior_list$Sigma)
#>                 Pinv <- chol2inv(R)
#>                 P <- 0.5 * (Pinv + t(Pinv))
#>             }
#>             if (!is.null(prior_list$dispersion)) 
#>                 dispersion = prior_list$dispersion
#>             else dispersion = NULL
#>             if (!is.null(prior_list$shape)) 
#>                 shape = prior_list$shape
#>             else shape = NULL
#>             if (!is.null(prior_list$rate)) 
#>                 rate = prior_list$rate
#>             else rate = NULL
#>         }
#>         if (is.numeric(n) == FALSE || is.numeric(y) == FALSE || 
#>             is.numeric(x) == FALSE || is.numeric(mu) == FALSE || 
#>             is.numeric(P) == FALSE) 
#>             stop("non-numeric argument to numeric function")
#>         if (is.null(n_envopt)) 
#>             n_envopt <- n
#>         n_envopt <- as.integer(n_envopt)
#>         x <- as.matrix(x)
#>         mu <- as.matrix(as.vector(mu))
#>         P <- as.matrix(P)
#>         start <- mu
#>         if (length(n) > 1) 
#>             n <- length(n)
#>         nobs <- NROW(y)
#>         nvars <- ncol(x)
#>         if (is.null(offset2)) 
#>             offset2 = rep(0, nobs)
#>         nvars2 <- length(mu)
#>         if (!nvars == nvars2) 
#>             stop("incompatible dimensions")
#>         if (!all(dim(P) == c(nvars2, nvars2))) 
#>             stop("incompatible dimensions")
#>         if (!isSymmetric(P)) 
#>             stop("matrix P must be symmetric")
#>         if (length(wt) == 1) 
#>             wt = rep(wt, nobs)
#>         nobs2 = NROW(wt)
#>         nobs3 = NROW(x)
#>         nobs4 = NROW(offset2)
#>         if (nobs2 != nobs) 
#>             stop("weighting vector must have same number of elements as y")
#>         if (nobs3 != nobs) 
#>             stop("matrix X must have same number of rows as y")
#>         if (nobs4 != nobs) 
#>             stop("offset vector must have same number of rows as y")
#>         tol <- 1e-06
#>         eS <- eigen(P, symmetric = TRUE, only.values = FALSE)
#>         ev <- eS$values
#>         if (!all(ev >= -tol * abs(ev[1L]))) 
#>             stop("'P' is not positive definite")
#>         if (is.character(family)) 
#>             family <- get(family, mode = "function", envir = parent.frame())
#>         if (is.function(family)) 
#>             family <- family()
#>         if (is.null(family$family)) {
#>             print(family)
#>             stop("'family' not recognized")
#>         }
#>         okfamilies <- c("gaussian", "poisson", "binomial", "quasipoisson", 
#>             "quasibinomial", "Gamma")
#>         if (family$family %in% okfamilies) {
#>             if (family$family == "gaussian") 
#>                 oklinks <- c("identity")
#>             if (family$family == "poisson" || family$family == 
#>                 "quasipoisson") 
#>                 oklinks <- c("log")
#>             if (family$family == "binomial" || family$family == 
#>                 "quasibinomial") 
#>                 oklinks <- c("logit", "probit", "cloglog")
#>             if (family$family == "Gamma") 
#>                 oklinks <- c("log")
#>             if (family$link %in% oklinks) {
#>                 famfunc <- glmbfamfunc(family)
#>                 f1 <- famfunc$f1
#>                 f2 <- famfunc$f2
#>                 f3 <- famfunc$f3
#>             }
#>             else {
#>                 stop(gettextf("link \"%s\" not available for selected family; available links are %s", 
#>                   family$link, paste(sQuote(oklinks), collapse = ", ")), 
#>                   domain = NA)
#>             }
#>         }
#>         else {
#>             stop(gettextf("family \"%s\" not available in glmb; available families are %s", 
#>                 family$family, paste(sQuote(okfamilies), collapse = ", ")), 
#>                 domain = NA)
#>         }
#>         if ("ddef" %in% names(prior_list)) {
#>             ddef <- prior_list$ddef
#>         }
#>         else {
#>             ddef <- is.null(prior_list$dispersion)
#>         }
#>         if (family$family %in% c("gaussian", "Gamma") && isTRUE(ddef)) {
#>             stop(paste0("For gaussian() and Gamma() models, dNormal() requires an explicit dispersion ", 
#>                 "(e.g. dispersion = ps$dispersion from Prior_Setup()). ", 
#>                 "Omitted or NULL dispersion is not allowed"))
#>         }
#>         if (family$family == "gaussian") {
#>             outlist <- .rNormalReg_cpp(n = n, y = y, x = x, mu = mu, 
#>                 P = P, offset = offset2, wt = wt, dispersion = dispersion, 
#>                 f2 = f2, f3 = f3, start = mu)
#>             class(outlist$fit) = "lm"
#>         }
#>         else {
#>             if (is.null(dispersion)) {
#>                 dispersion2 = 1
#>             }
#>             else {
#>                 dispersion2 = dispersion
#>             }
#>             outlist <- .rNormalGLM_cpp(n = n, y = y, x = x, mu = mu, 
#>                 P = P, offset = offset2, wt = wt, dispersion = dispersion2, 
#>                 f2 = f2, f3 = f3, start = start, family = family$family, 
#>                 link = family$link, Gridtype = Gridtype, n_envopt = n_envopt, 
#>                 use_parallel = use_parallel, use_opencl = use_opencl, 
#>                 verbose = verbose)
#>             betastar = outlist$coef.mode
#>             x = outlist$x
#>             y = outlist$y
#>             weights = outlist$prior.weights
#>             if (family$family == "quasipoisson" || family$family == 
#>                 "quasibinomial") {
#>                 linkinv <- family$linkinv
#>                 disp_temp = rep(0, n)
#>                 m = length(y)
#>                 k = ncol(x)
#>                 res_temp = matrix(0, nrow = n, ncol = m)
#>                 fit_temp = x %*% t(outlist$coefficients)
#>                 for (l in 1:n) {
#>                   fit_temp[1:m, l] = linkinv(offset2 + fit_temp[1:m, 
#>                     l])
#>                   res_temp[l, 1:m] = (y - fit_temp[1:m, l])
#>                   disp_temp[l] = (1/(m - k)) * sum(res_temp[l, 
#>                     1:m]^2 * wt/fit_temp[1:m, l])
#>                 }
#>                 outlist <- .rNormalGLM_cpp(n = n, y = y, x = x, 
#>                   mu = mu, P = P, offset = offset2, wt = wt, 
#>                   dispersion = mean(disp_temp), f2 = f2, f3 = f3, 
#>                   start = start, family = family$family, link = family$link, 
#>                   Gridtype = Gridtype, n_envopt = n_envopt, use_parallel = use_parallel, 
#>                   use_opencl = use_opencl, verbose = verbose)
#>                 outlist$call <- match.call()
#>                 outlist$dispersion = mean(disp_temp)
#>             }
#>             outlist$fit = glmb.wfit(x, y, weights, offset = offset2, 
#>                 family = family, Bbar = mu, P, betastar)
#>         }
#>         colnames(outlist$coefficients) <- colnames(x)
#>         pl_disp <- if (!is.null(dispersion)) 
#>             dispersion
#>         else outlist$dispersion
#>         pfamily_obj <- list(pfamily = "dNormal", prior_list = list(mu = as.numeric(mu), 
#>             Sigma = Sigma, dispersion = pl_disp))
#>         attr(pfamily_obj, "Prior Type") <- "dNormal"
#>         class(pfamily_obj) <- "pfamily"
#>         outlist$pfamily <- pfamily_obj
#>         rglmb_df = as.data.frame(cbind(y, x))
#>         rglmb_f = DF2formula(rglmb_df)
#>         rglmb_mf = model.frame(rglmb_f, rglmb_df)
#>         outlist$family = family
#>         outlist$famfunc = famfunc
#>         outlist$call <- match.call()
#>         outlist$offset2 <- offset2
#>         outlist$formula <- rglmb_f
#>         outlist$model <- rglmb_mf
#>         outlist$data <- rglmb_df
#>         class(outlist) <- c(outlist$class, c("rglmb", "glmb", 
#>             "glm", "lm"))
#>         outlist
#>     }, call = dNormal(mu = mu_f, Sigma = Sigma_f, dispersion = sigma_faith^2)), "`Prior Type`" = "dNormal", class = "pfamily"), 
#>     n = 20000, data = structure(list(y = c(79, 54, 74, 62, 85, 
#>     55, 88, 85, 51, 85, 54, 84, 78, 47, 83, 52, 62, 84, 52, 79, 
#>     51, 47, 78, 69, 74, 83, 55, 76, 78, 79, 73, 77, 66, 80, 74, 
#>     52, 48, 80, 59, 90, 80, 58, 84, 58, 73, 83, 64, 53, 82, 59, 
#>     75, 90, 54, 80, 54, 83, 71, 64, 77, 81, 59, 84, 48, 82, 60, 
#>     92, 78, 78, 65, 73, 82, 56, 79, 71, 62, 76, 60, 78, 76, 83, 
#>     75, 82, 70, 65, 73, 88, 76, 80, 48, 86, 60, 90, 50, 78, 63, 
#>     72, 84, 75, 51, 82, 62, 88, 49, 83, 81, 47, 84, 52, 86, 81, 
#>     75, 59, 89, 79, 59, 81, 50, 85, 59, 87, 53, 69, 77, 56, 88, 
#>     81, 45, 82, 55, 90, 45, 83, 56, 89, 46, 82, 51, 86, 53, 79, 
#>     81, 60, 82, 77, 76, 59, 80, 49, 96, 53, 77, 77, 65, 81, 71, 
#>     70, 81, 93, 53, 89, 45, 86, 58, 78, 66, 76, 63, 88, 52, 93, 
#>     49, 57, 77, 68, 81, 81, 73, 50, 85, 74, 55, 77, 83, 83, 51, 
#>     78, 84, 46, 83, 55, 81, 57, 76, 84, 77, 81, 87, 77, 51, 78, 
#>     60, 82, 91, 53, 78, 46, 77, 84, 49, 83, 71, 80, 49, 75, 64, 
#>     76, 53, 94, 55, 76, 50, 82, 54, 75, 78, 79, 78, 78, 70, 79, 
#>     70, 54, 86, 50, 90, 54, 54, 77, 79, 64, 75, 47, 86, 63, 85, 
#>     82, 57, 82, 67, 74, 54, 83, 73, 73, 88, 80, 71, 83, 56, 79, 
#>     78, 84, 58, 83, 43, 60, 75, 81, 46, 90, 46, 74)), class = "data.frame", row.names = c(NA, 
#>     -272L)), method = "qr", model = TRUE, x = TRUE, y = TRUE, 
#>     qr = TRUE, singular.ok = TRUE, contrasts = NULL, Gridtype = 2, 
#>     n_envopt = NULL, use_parallel = TRUE, use_opencl = FALSE, 
#>     verbose = FALSE)
#> 
#> Posterior Mean Coefficients:
#> (Intercept)  
#>       70.91
faith_compare <- data.frame(
  Dataset = "Old Faithful waiting",
  Posterior = faith_analytic$Posterior,
  `Analytic Mean` = faith_analytic$Mean,
  `Analytic SD`   = faith_analytic$SD,
  `lmb Post.Mean` = fit_faith$coef.means["(Intercept)"],
  `lmb Post.Sd`   = sd(fit_faith$coefficients[, "(Intercept)", drop = TRUE]),
  check.names = FALSE
)
knitr::kable(faith_compare, digits = 4,
  caption = "Analytic vs. lmb() posterior mean and SD")
Analytic vs. lmb() posterior mean and SD
Dataset Posterior Analytic Mean Analytic SD lmb Post.Mean lmb Post.Sd
(Intercept) Old Faithful waiting N(70.9004, 0.8231^2) 70.9004 0.8231 70.9091 0.8261

The lmb Post.Mean and lmb Post.Sd columns should match the analytic Mean and SD to Monte Carlo error.


4. Connection to glmbayes

This scalar case is the template for Gaussian linear models with lmb() (Chapter 3). In regression, the single unknown \(\mu\) is replaced by a coefficient vector \(\beta \in \mathbb{R}^p\), the scalar prior variance \(\tau_0^2\) becomes a \(p \times p\) prior covariance matrix \(\Sigma_0\), and the precision-weighted average becomes the matrix formula \(\mu_n = (\Sigma_0^{-1} + X^\top X / \sigma^2)^{-1}(\Sigma_0^{-1}\mu_0 + X^\top y / \sigma^2)\) — but the logic is identical.


See also

  • Chapter-02-S01 — introduction to Bayesian inference and conjugacy.
  • Chapter-02-S03 — Beta–Binomial conjugacy for one proportion.
  • Chapter-02-S04 — Gamma–Poisson conjugacy for one count rate.
  • Chapter-02-S05 — Gamma–Gamma conjugacy for a Gamma response rate.
  • Chapter 03lmb() for multivariate Gaussian linear models.