Suppose we observe \(n\) measurements \(y_1, \dots, y_n\) drawn independently from a Normal distribution with unknown mean \(\mu\) and known standard deviation \(\sigma\):
\[ Y_i \mid \mu \;\sim\; \mathcal{N}(\mu,\, \sigma^2), \qquad i = 1,\dots,n. \]
The sufficient statistic is the sample mean \(\bar y\), which itself follows
\[ \bar{y} \mid \mu \;\sim\; \mathcal{N}\!\left(\mu,\, \frac{\sigma^2}{n}\right). \]
In practice \(\sigma\) is rarely
truly known, but treating it as fixed provides the cleanest template for
understanding posterior updating. The full treatment — unknown mean
and unknown variance — is handled in Chapter 3 with
lmb().
Since \(\mu\) can take any real value, the Normal distribution is the natural conjugate prior:
\[ \mu \;\sim\; \mathcal{N}(\mu_0,\, \tau_0^2). \]
Here \(\mu_0\) is your prior best guess and \(\tau_0\) encodes your uncertainty about it.
The key is to work with precision (reciprocal variance): \(\kappa = 1/\sigma^2\).
| Symbol | Formula | Meaning |
|---|---|---|
| \(\kappa_0 = 1/\tau_0^2\) | Prior precision | Information in the prior |
| \(\kappa_L = n/\sigma^2\) | Likelihood precision | Information in the data |
Multiplying the Normal likelihood by the Normal prior kernel and completing the square gives the conjugate posterior:
\[ \mu \mid \bar{y} \;\sim\; \mathcal{N}\!\left(\mu_n,\; \frac{1}{\kappa_n}\right), \]
where
\[ \kappa_n \;=\; \kappa_0 + \kappa_L \qquad\text{(precisions add)} \]
\[ \mu_n \;=\; \frac{\kappa_0\,\mu_0 + \kappa_L\,\bar{y}}{\kappa_n} \;=\; \frac{\kappa_0}{\kappa_n}\,\mu_0 \;+\; \frac{\kappa_L}{\kappa_n}\,\bar{y}. \]
The posterior mean \(\mu_n\) is a precision-weighted average of the prior mean and the data mean.
Intuition.
bayesrules illustrationThe bayesrules package provides
plot_normal_normal() and
summarize_normal_normal() for quickly visualizing and
summarizing the Normal–Normal update.
Scenario. A sleep researcher’s prior belief is that undergraduates average about 7 hours of sleep per night. They encode this as \(\mu_0 = 7\), \(\tau_0 = 1.5\) — fairly uncertain, allowing for means anywhere from 4 to 10 hours. The population standard deviation is assumed known at \(\sigma = 1.8\) hours (from large-scale studies). A sample of \(n = 22\) students yields a sample mean of \(\bar y = 6.1\) hours.
prior_mean <- 7; prior_sd <- 1.5 ## prior: N(7, 1.5^2)
sigma <- 1.8 ## known population SD
y_bar <- 6.1; n_sleep <- 22L ## observed samplelibrary(bayesrules)
## Overlay prior, likelihood, and posterior
plot_normal_normal(
mean = prior_mean,
sd = prior_sd,
sigma = sigma,
y_bar = y_bar,
n = n_sleep,
prior = TRUE,
likelihood = TRUE,
posterior = TRUE
)## Tabular summary of prior, likelihood, and posterior parameters
summarize_normal_normal(
mean = prior_mean,
sd = prior_sd,
sigma = sigma,
y_bar = y_bar,
n = n_sleep
)
#> model mean mode var sd
#> 1 prior 7.00000 7.00000 2.2500000 1.5000000
#> 2 posterior 6.15529 6.15529 0.1382253 0.3717866Reading the output. The
summarize_normal_normal() table reports the prior mean and
SD, the likelihood (data) mean and SE, and the resulting posterior mean
and SD. The plot overlays all three: the wide prior, the sharper
likelihood centered at 6.1, and the posterior — narrower than both and
sitting between them, pulled strongly toward the data because 22
observations outweigh the prior.
Analytic posterior.
prec_0 <- 1 / prior_sd^2
prec_L <- n_sleep / sigma^2
prec_n <- prec_0 + prec_L
post_mean <- (prec_0 * prior_mean + prec_L * y_bar) / prec_n
post_sd <- sqrt(1 / prec_n)
nn_analytic <- data.frame(
Example = "Sleep hours (BayesRules)",
n = n_sleep,
`y-bar` = y_bar,
Posterior = sprintf("N(%.4f, %.4f^2)", post_mean, post_sd),
Mean = post_mean,
SD = post_sd,
check.names = FALSE
)
knitr::kable(nn_analytic, digits = 4,
caption = "Conjugate Normal--Normal posterior (known sigma)")| Example | n | y-bar | Posterior | Mean | SD |
|---|---|---|---|---|---|
| Sleep hours (BayesRules) | 22 | 6.1 | N(6.1553, 0.3718^2) | 6.1553 | 0.3718 |
The faithful dataset (built into R) records waiting
times between eruptions of the Old Faithful geyser in Yellowstone
National Park. We treat the waiting time \(Y_i\) as approximately Normal with unknown
mean \(\mu\) and use a Normal–Normal
conjugate model.
Prior. Based on decades of visitor records, a geologist believes the mean waiting time is around 72 minutes, with considerable uncertainty (\(\tau_0 = 15\) minutes). We estimate the population SD from the data and treat it as fixed at \(\hat\sigma\).
y_faith <- faithful$waiting
n_faith <- length(y_faith)
ybar_faith <- mean(y_faith)
sigma_faith <- sd(y_faith) ## treat as known
prior_mean_f <- 72; prior_sd_f <- 15 ## N(72, 15^2) prior
cat(sprintf("n = %d, y-bar = %.2f, sigma = %.2f\n",
n_faith, ybar_faith, sigma_faith))
#> n = 272, y-bar = 70.90, sigma = 13.59Analytic posterior.
prec_0f <- 1 / prior_sd_f^2
prec_Lf <- n_faith / sigma_faith^2
prec_nf <- prec_0f + prec_Lf
post_mean_f <- (prec_0f * prior_mean_f + prec_Lf * ybar_faith) / prec_nf
post_sd_f <- sqrt(1 / prec_nf)
faith_analytic <- data.frame(
Dataset = "Old Faithful waiting",
n = n_faith,
`y-bar` = ybar_faith,
Posterior = sprintf("N(%.4f, %.4f^2)", post_mean_f, post_sd_f),
Mean = post_mean_f,
SD = post_sd_f,
check.names = FALSE
)
knitr::kable(faith_analytic, digits = 4,
caption = "Conjugate Normal--Normal posterior (known sigma)")| Dataset | n | y-bar | Posterior | Mean | SD |
|---|---|---|---|---|---|
| Old Faithful waiting | 272 | 70.8971 | N(70.9004, 0.8231^2) | 70.9004 | 0.8231 |
Reading the output. With \(n = 272\) observations, the likelihood precision dominates the prior, so the posterior mean is almost identical to \(\bar y\). The posterior SD is small — the data overwhelm the prior.
lmb()In glmbayes, the intercept-only model uses
lmb() with dNormal() and fixed dispersion
\(\sigma^2\):
df_faith <- data.frame(y = y_faith)
mu_f <- matrix(prior_mean_f, nrow = 1, ncol = 1,
dimnames = list(NULL, "(Intercept)"))
Sigma_f <- matrix(prior_sd_f^2, nrow = 1, ncol = 1,
dimnames = list("(Intercept)", "(Intercept)"))
pf_faith <- dNormal(mu = mu_f, Sigma = Sigma_f,
dispersion = sigma_faith^2)
set.seed(2026)
fit_faith <- lmb(
n = 20000,
y ~ 1,
data = df_faith,
pfamily = pf_faith
)
print(fit_faith)
#>
#> Call:
#> lmb(formula = y ~ 1, pfamily = structure(list(pfamily = "dNormal",
#> prior_list = structure(list(mu = structure(72, dim = c(1L,
#> 1L), dimnames = list(NULL, "(Intercept)")), Sigma = structure(225, dim = c(1L,
#> 1L), dimnames = list("(Intercept)", "(Intercept)")), dispersion = 184.823312350771,
#> ddef = FALSE), "`Prior Type`" = "dNormal"), okfamilies = c("gaussian",
#> "poisson", "binomial", "quasipoisson", "quasibinomial", "Gamma"
#> ), plinks = function (family)
#> {
#> if (family$family == "gaussian")
#> oklinks <- c("identity")
#> if (family$family == "poisson" || family$family == "quasipoisson")
#> oklinks <- c("log")
#> if (family$family == "binomial" || family$family == "quasibinomial")
#> oklinks <- c("logit", "probit", "cloglog")
#> if (family$family == "Gamma")
#> oklinks <- c("log")
#> return(oklinks)
#> }, simfun = function (n, y, x, prior_list, offset = NULL,
#> weights = 1, family = gaussian(), Gridtype = 2, n_envopt = NULL,
#> use_parallel = TRUE, use_opencl = FALSE, verbose = FALSE,
#> progbar = FALSE)
#> {
#> offset2 = offset
#> wt = weights
#> if (missing(prior_list))
#> stop("Prior Specification Missing")
#> if (!missing(prior_list)) {
#> if (!is.null(prior_list$mu))
#> mu = prior_list$mu
#> if (!is.null(prior_list$Sigma))
#> Sigma = prior_list$Sigma
#> if (!is.null(prior_list$P))
#> P = prior_list$P
#> if (is.null(prior_list$P)) {
#> R <- chol(prior_list$Sigma)
#> Pinv <- chol2inv(R)
#> P <- 0.5 * (Pinv + t(Pinv))
#> }
#> if (!is.null(prior_list$dispersion))
#> dispersion = prior_list$dispersion
#> else dispersion = NULL
#> if (!is.null(prior_list$shape))
#> shape = prior_list$shape
#> else shape = NULL
#> if (!is.null(prior_list$rate))
#> rate = prior_list$rate
#> else rate = NULL
#> }
#> if (is.numeric(n) == FALSE || is.numeric(y) == FALSE ||
#> is.numeric(x) == FALSE || is.numeric(mu) == FALSE ||
#> is.numeric(P) == FALSE)
#> stop("non-numeric argument to numeric function")
#> if (is.null(n_envopt))
#> n_envopt <- n
#> n_envopt <- as.integer(n_envopt)
#> x <- as.matrix(x)
#> mu <- as.matrix(as.vector(mu))
#> P <- as.matrix(P)
#> start <- mu
#> if (length(n) > 1)
#> n <- length(n)
#> nobs <- NROW(y)
#> nvars <- ncol(x)
#> if (is.null(offset2))
#> offset2 = rep(0, nobs)
#> nvars2 <- length(mu)
#> if (!nvars == nvars2)
#> stop("incompatible dimensions")
#> if (!all(dim(P) == c(nvars2, nvars2)))
#> stop("incompatible dimensions")
#> if (!isSymmetric(P))
#> stop("matrix P must be symmetric")
#> if (length(wt) == 1)
#> wt = rep(wt, nobs)
#> nobs2 = NROW(wt)
#> nobs3 = NROW(x)
#> nobs4 = NROW(offset2)
#> if (nobs2 != nobs)
#> stop("weighting vector must have same number of elements as y")
#> if (nobs3 != nobs)
#> stop("matrix X must have same number of rows as y")
#> if (nobs4 != nobs)
#> stop("offset vector must have same number of rows as y")
#> tol <- 1e-06
#> eS <- eigen(P, symmetric = TRUE, only.values = FALSE)
#> ev <- eS$values
#> if (!all(ev >= -tol * abs(ev[1L])))
#> stop("'P' is not positive definite")
#> if (is.character(family))
#> family <- get(family, mode = "function", envir = parent.frame())
#> if (is.function(family))
#> family <- family()
#> if (is.null(family$family)) {
#> print(family)
#> stop("'family' not recognized")
#> }
#> okfamilies <- c("gaussian", "poisson", "binomial", "quasipoisson",
#> "quasibinomial", "Gamma")
#> if (family$family %in% okfamilies) {
#> if (family$family == "gaussian")
#> oklinks <- c("identity")
#> if (family$family == "poisson" || family$family ==
#> "quasipoisson")
#> oklinks <- c("log")
#> if (family$family == "binomial" || family$family ==
#> "quasibinomial")
#> oklinks <- c("logit", "probit", "cloglog")
#> if (family$family == "Gamma")
#> oklinks <- c("log")
#> if (family$link %in% oklinks) {
#> famfunc <- glmbfamfunc(family)
#> f1 <- famfunc$f1
#> f2 <- famfunc$f2
#> f3 <- famfunc$f3
#> }
#> else {
#> stop(gettextf("link \"%s\" not available for selected family; available links are %s",
#> family$link, paste(sQuote(oklinks), collapse = ", ")),
#> domain = NA)
#> }
#> }
#> else {
#> stop(gettextf("family \"%s\" not available in glmb; available families are %s",
#> family$family, paste(sQuote(okfamilies), collapse = ", ")),
#> domain = NA)
#> }
#> if ("ddef" %in% names(prior_list)) {
#> ddef <- prior_list$ddef
#> }
#> else {
#> ddef <- is.null(prior_list$dispersion)
#> }
#> if (family$family %in% c("gaussian", "Gamma") && isTRUE(ddef)) {
#> stop(paste0("For gaussian() and Gamma() models, dNormal() requires an explicit dispersion ",
#> "(e.g. dispersion = ps$dispersion from Prior_Setup()). ",
#> "Omitted or NULL dispersion is not allowed"))
#> }
#> if (family$family == "gaussian") {
#> outlist <- .rNormalReg_cpp(n = n, y = y, x = x, mu = mu,
#> P = P, offset = offset2, wt = wt, dispersion = dispersion,
#> f2 = f2, f3 = f3, start = mu)
#> class(outlist$fit) = "lm"
#> }
#> else {
#> if (is.null(dispersion)) {
#> dispersion2 = 1
#> }
#> else {
#> dispersion2 = dispersion
#> }
#> outlist <- .rNormalGLM_cpp(n = n, y = y, x = x, mu = mu,
#> P = P, offset = offset2, wt = wt, dispersion = dispersion2,
#> f2 = f2, f3 = f3, start = start, family = family$family,
#> link = family$link, Gridtype = Gridtype, n_envopt = n_envopt,
#> use_parallel = use_parallel, use_opencl = use_opencl,
#> verbose = verbose)
#> betastar = outlist$coef.mode
#> x = outlist$x
#> y = outlist$y
#> weights = outlist$prior.weights
#> if (family$family == "quasipoisson" || family$family ==
#> "quasibinomial") {
#> linkinv <- family$linkinv
#> disp_temp = rep(0, n)
#> m = length(y)
#> k = ncol(x)
#> res_temp = matrix(0, nrow = n, ncol = m)
#> fit_temp = x %*% t(outlist$coefficients)
#> for (l in 1:n) {
#> fit_temp[1:m, l] = linkinv(offset2 + fit_temp[1:m,
#> l])
#> res_temp[l, 1:m] = (y - fit_temp[1:m, l])
#> disp_temp[l] = (1/(m - k)) * sum(res_temp[l,
#> 1:m]^2 * wt/fit_temp[1:m, l])
#> }
#> outlist <- .rNormalGLM_cpp(n = n, y = y, x = x,
#> mu = mu, P = P, offset = offset2, wt = wt,
#> dispersion = mean(disp_temp), f2 = f2, f3 = f3,
#> start = start, family = family$family, link = family$link,
#> Gridtype = Gridtype, n_envopt = n_envopt, use_parallel = use_parallel,
#> use_opencl = use_opencl, verbose = verbose)
#> outlist$call <- match.call()
#> outlist$dispersion = mean(disp_temp)
#> }
#> outlist$fit = glmb.wfit(x, y, weights, offset = offset2,
#> family = family, Bbar = mu, P, betastar)
#> }
#> colnames(outlist$coefficients) <- colnames(x)
#> pl_disp <- if (!is.null(dispersion))
#> dispersion
#> else outlist$dispersion
#> pfamily_obj <- list(pfamily = "dNormal", prior_list = list(mu = as.numeric(mu),
#> Sigma = Sigma, dispersion = pl_disp))
#> attr(pfamily_obj, "Prior Type") <- "dNormal"
#> class(pfamily_obj) <- "pfamily"
#> outlist$pfamily <- pfamily_obj
#> rglmb_df = as.data.frame(cbind(y, x))
#> rglmb_f = DF2formula(rglmb_df)
#> rglmb_mf = model.frame(rglmb_f, rglmb_df)
#> outlist$family = family
#> outlist$famfunc = famfunc
#> outlist$call <- match.call()
#> outlist$offset2 <- offset2
#> outlist$formula <- rglmb_f
#> outlist$model <- rglmb_mf
#> outlist$data <- rglmb_df
#> class(outlist) <- c(outlist$class, c("rglmb", "glmb",
#> "glm", "lm"))
#> outlist
#> }, call = dNormal(mu = mu_f, Sigma = Sigma_f, dispersion = sigma_faith^2)), "`Prior Type`" = "dNormal", class = "pfamily"),
#> n = 20000, data = structure(list(y = c(79, 54, 74, 62, 85,
#> 55, 88, 85, 51, 85, 54, 84, 78, 47, 83, 52, 62, 84, 52, 79,
#> 51, 47, 78, 69, 74, 83, 55, 76, 78, 79, 73, 77, 66, 80, 74,
#> 52, 48, 80, 59, 90, 80, 58, 84, 58, 73, 83, 64, 53, 82, 59,
#> 75, 90, 54, 80, 54, 83, 71, 64, 77, 81, 59, 84, 48, 82, 60,
#> 92, 78, 78, 65, 73, 82, 56, 79, 71, 62, 76, 60, 78, 76, 83,
#> 75, 82, 70, 65, 73, 88, 76, 80, 48, 86, 60, 90, 50, 78, 63,
#> 72, 84, 75, 51, 82, 62, 88, 49, 83, 81, 47, 84, 52, 86, 81,
#> 75, 59, 89, 79, 59, 81, 50, 85, 59, 87, 53, 69, 77, 56, 88,
#> 81, 45, 82, 55, 90, 45, 83, 56, 89, 46, 82, 51, 86, 53, 79,
#> 81, 60, 82, 77, 76, 59, 80, 49, 96, 53, 77, 77, 65, 81, 71,
#> 70, 81, 93, 53, 89, 45, 86, 58, 78, 66, 76, 63, 88, 52, 93,
#> 49, 57, 77, 68, 81, 81, 73, 50, 85, 74, 55, 77, 83, 83, 51,
#> 78, 84, 46, 83, 55, 81, 57, 76, 84, 77, 81, 87, 77, 51, 78,
#> 60, 82, 91, 53, 78, 46, 77, 84, 49, 83, 71, 80, 49, 75, 64,
#> 76, 53, 94, 55, 76, 50, 82, 54, 75, 78, 79, 78, 78, 70, 79,
#> 70, 54, 86, 50, 90, 54, 54, 77, 79, 64, 75, 47, 86, 63, 85,
#> 82, 57, 82, 67, 74, 54, 83, 73, 73, 88, 80, 71, 83, 56, 79,
#> 78, 84, 58, 83, 43, 60, 75, 81, 46, 90, 46, 74)), class = "data.frame", row.names = c(NA,
#> -272L)), method = "qr", model = TRUE, x = TRUE, y = TRUE,
#> qr = TRUE, singular.ok = TRUE, contrasts = NULL, Gridtype = 2,
#> n_envopt = NULL, use_parallel = TRUE, use_opencl = FALSE,
#> verbose = FALSE)
#>
#> Posterior Mean Coefficients:
#> (Intercept)
#> 70.91faith_compare <- data.frame(
Dataset = "Old Faithful waiting",
Posterior = faith_analytic$Posterior,
`Analytic Mean` = faith_analytic$Mean,
`Analytic SD` = faith_analytic$SD,
`lmb Post.Mean` = fit_faith$coef.means["(Intercept)"],
`lmb Post.Sd` = sd(fit_faith$coefficients[, "(Intercept)", drop = TRUE]),
check.names = FALSE
)
knitr::kable(faith_compare, digits = 4,
caption = "Analytic vs. lmb() posterior mean and SD")| Dataset | Posterior | Analytic Mean | Analytic SD | lmb Post.Mean | lmb Post.Sd | |
|---|---|---|---|---|---|---|
| (Intercept) | Old Faithful waiting | N(70.9004, 0.8231^2) | 70.9004 | 0.8231 | 70.9091 | 0.8261 |
The lmb Post.Mean and
lmb Post.Sd columns should match the
analytic Mean and SD to Monte Carlo
error.
This scalar case is the template for Gaussian linear models
with lmb() (Chapter 3). In regression, the single
unknown \(\mu\) is replaced by a
coefficient vector \(\beta \in
\mathbb{R}^p\), the scalar prior variance \(\tau_0^2\) becomes a \(p \times p\) prior covariance matrix \(\Sigma_0\), and the precision-weighted
average becomes the matrix formula \(\mu_n =
(\Sigma_0^{-1} + X^\top X / \sigma^2)^{-1}(\Sigma_0^{-1}\mu_0 + X^\top y
/ \sigma^2)\) — but the logic is identical.
lmb()
for multivariate Gaussian linear models.