--- title: "Chapter 02-S02: Normal–Normal Conjugacy for One Mean" author: "Kjell Nygren" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: REFERENCES.bib reference-section-title: References vignette: > %\VignetteIndexEntry{Chapter 02-S02: Normal–Normal Conjugacy for One Mean} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(glmbayes) if (requireNamespace("bayesrules", quietly = TRUE)) library(bayesrules) ``` --- ## 1. The model ### 1.1 Likelihood Suppose we observe \(n\) measurements \(y_1, \dots, y_n\) drawn independently from a Normal distribution with **unknown mean** \(\mu\) and **known** standard deviation \(\sigma\): \[ Y_i \mid \mu \;\sim\; \mathcal{N}(\mu,\, \sigma^2), \qquad i = 1,\dots,n. \] The sufficient statistic is the sample mean \(\bar y\), which itself follows \[ \bar{y} \mid \mu \;\sim\; \mathcal{N}\!\left(\mu,\, \frac{\sigma^2}{n}\right). \] In practice \(\sigma\) is rarely truly known, but treating it as fixed provides the cleanest template for understanding posterior updating. The full treatment — unknown mean **and** unknown variance — is handled in Chapter 3 with `lmb()`. ### 1.2 Prior Since \(\mu\) can take any real value, the **Normal distribution** is the natural conjugate prior: \[ \mu \;\sim\; \mathcal{N}(\mu_0,\, \tau_0^2). \] Here \(\mu_0\) is your prior best guess and \(\tau_0\) encodes your uncertainty about it. ### 1.3 Posterior derivation The key is to work with **precision** (reciprocal variance): \(\kappa = 1/\sigma^2\). | Symbol | Formula | Meaning | |--------|---------|---------| | \(\kappa_0 = 1/\tau_0^2\) | Prior precision | Information in the prior | | \(\kappa_L = n/\sigma^2\) | Likelihood precision | Information in the data | Multiplying the Normal likelihood by the Normal prior kernel and completing the square gives the conjugate posterior: \[ \mu \mid \bar{y} \;\sim\; \mathcal{N}\!\left(\mu_n,\; \frac{1}{\kappa_n}\right), \] where \[ \kappa_n \;=\; \kappa_0 + \kappa_L \qquad\text{(precisions add)} \] \[ \mu_n \;=\; \frac{\kappa_0\,\mu_0 + \kappa_L\,\bar{y}}{\kappa_n} \;=\; \frac{\kappa_0}{\kappa_n}\,\mu_0 \;+\; \frac{\kappa_L}{\kappa_n}\,\bar{y}. \] The posterior mean \(\mu_n\) is a **precision-weighted average** of the prior mean and the data mean. **Intuition.** - Precisions add because information is additive: each independent source of knowledge about \(\mu\) contributes its own precision. - A very **diffuse** prior (\(\tau_0 \to \infty\), \(\kappa_0 \to 0\)) gives \(\mu_n \approx \bar y\) — the data take over entirely. - A very **small** sample (\(n\) small) or **noisy** data (\(\sigma\) large) keeps \(\kappa_L\) small, so the prior retains more influence. - As \(n \to \infty\), \(\mu_n \to \bar y\) and the posterior variance \(1/\kappa_n \to 0\) — the Bayesian estimate converges to the classical MLE. --- ## 2. `bayesrules` illustration The **`bayesrules`** package provides `plot_normal_normal()` and `summarize_normal_normal()` for quickly visualizing and summarizing the Normal–Normal update. **Scenario.** A sleep researcher's prior belief is that undergraduates average about 7 hours of sleep per night. They encode this as \(\mu_0 = 7\), \(\tau_0 = 1.5\) — fairly uncertain, allowing for means anywhere from 4 to 10 hours. The population standard deviation is assumed known at \(\sigma = 1.8\) hours (from large-scale studies). A sample of \(n = 22\) students yields a sample mean of \(\bar y = 6.1\) hours. ```{r nn-bayesrules-data} prior_mean <- 7; prior_sd <- 1.5 ## prior: N(7, 1.5^2) sigma <- 1.8 ## known population SD y_bar <- 6.1; n_sleep <- 22L ## observed sample ``` ```{r nn-bayesrules-plot, eval = requireNamespace("bayesrules", quietly = TRUE)} library(bayesrules) ## Overlay prior, likelihood, and posterior plot_normal_normal( mean = prior_mean, sd = prior_sd, sigma = sigma, y_bar = y_bar, n = n_sleep, prior = TRUE, likelihood = TRUE, posterior = TRUE ) ``` ```{r nn-bayesrules-summary, eval = requireNamespace("bayesrules", quietly = TRUE)} ## Tabular summary of prior, likelihood, and posterior parameters summarize_normal_normal( mean = prior_mean, sd = prior_sd, sigma = sigma, y_bar = y_bar, n = n_sleep ) ``` **Reading the output.** The `summarize_normal_normal()` table reports the prior mean and SD, the likelihood (data) mean and SE, and the resulting posterior mean and SD. The plot overlays all three: the wide prior, the sharper likelihood centered at 6.1, and the posterior — narrower than both and sitting between them, pulled strongly toward the data because 22 observations outweigh the prior. **Analytic posterior.** ```{r nn-analytic} prec_0 <- 1 / prior_sd^2 prec_L <- n_sleep / sigma^2 prec_n <- prec_0 + prec_L post_mean <- (prec_0 * prior_mean + prec_L * y_bar) / prec_n post_sd <- sqrt(1 / prec_n) nn_analytic <- data.frame( Example = "Sleep hours (BayesRules)", n = n_sleep, `y-bar` = y_bar, Posterior = sprintf("N(%.4f, %.4f^2)", post_mean, post_sd), Mean = post_mean, SD = post_sd, check.names = FALSE ) knitr::kable(nn_analytic, digits = 4, caption = "Conjugate Normal--Normal posterior (known sigma)") ``` --- ## 3. Real data: Old Faithful waiting times The `faithful` dataset (built into R) records waiting times between eruptions of the Old Faithful geyser in Yellowstone National Park. We treat the waiting time \(Y_i\) as approximately Normal with unknown mean \(\mu\) and use a Normal–Normal conjugate model. **Prior.** Based on decades of visitor records, a geologist believes the mean waiting time is around 72 minutes, with considerable uncertainty (\(\tau_0 = 15\) minutes). We estimate the population SD from the data and treat it as fixed at \(\hat\sigma\). ```{r faithful-setup} y_faith <- faithful$waiting n_faith <- length(y_faith) ybar_faith <- mean(y_faith) sigma_faith <- sd(y_faith) ## treat as known prior_mean_f <- 72; prior_sd_f <- 15 ## N(72, 15^2) prior cat(sprintf("n = %d, y-bar = %.2f, sigma = %.2f\n", n_faith, ybar_faith, sigma_faith)) ``` **Analytic posterior.** ```{r faithful-posterior} prec_0f <- 1 / prior_sd_f^2 prec_Lf <- n_faith / sigma_faith^2 prec_nf <- prec_0f + prec_Lf post_mean_f <- (prec_0f * prior_mean_f + prec_Lf * ybar_faith) / prec_nf post_sd_f <- sqrt(1 / prec_nf) faith_analytic <- data.frame( Dataset = "Old Faithful waiting", n = n_faith, `y-bar` = ybar_faith, Posterior = sprintf("N(%.4f, %.4f^2)", post_mean_f, post_sd_f), Mean = post_mean_f, SD = post_sd_f, check.names = FALSE ) knitr::kable(faith_analytic, digits = 4, caption = "Conjugate Normal--Normal posterior (known sigma)") ``` **Reading the output.** With \(n = 272\) observations, the likelihood precision dominates the prior, so the posterior mean is almost identical to \(\bar y\). The posterior SD is small — the data overwhelm the prior. ### 3.1 Fitting with **`lmb()`** In **glmbayes**, the intercept-only model uses `lmb()` with `dNormal()` and fixed dispersion \(\sigma^2\): ```{r faithful-lmb} df_faith <- data.frame(y = y_faith) mu_f <- matrix(prior_mean_f, nrow = 1, ncol = 1, dimnames = list(NULL, "(Intercept)")) Sigma_f <- matrix(prior_sd_f^2, nrow = 1, ncol = 1, dimnames = list("(Intercept)", "(Intercept)")) pf_faith <- dNormal(mu = mu_f, Sigma = Sigma_f, dispersion = sigma_faith^2) set.seed(2026) fit_faith <- lmb( n = 20000, y ~ 1, data = df_faith, pfamily = pf_faith ) print(fit_faith) ``` ```{r faithful-compare} faith_compare <- data.frame( Dataset = "Old Faithful waiting", Posterior = faith_analytic$Posterior, `Analytic Mean` = faith_analytic$Mean, `Analytic SD` = faith_analytic$SD, `lmb Post.Mean` = fit_faith$coef.means["(Intercept)"], `lmb Post.Sd` = sd(fit_faith$coefficients[, "(Intercept)", drop = TRUE]), check.names = FALSE ) knitr::kable(faith_compare, digits = 4, caption = "Analytic vs. lmb() posterior mean and SD") ``` The **`lmb Post.Mean`** and **`lmb Post.Sd`** columns should match the analytic **Mean** and **SD** to Monte Carlo error. --- ## 4. Connection to glmbayes This scalar case is the template for **Gaussian linear models with `lmb()`** (Chapter 3). In regression, the single unknown \(\mu\) is replaced by a coefficient vector \(\beta \in \mathbb{R}^p\), the scalar prior variance \(\tau_0^2\) becomes a \(p \times p\) prior covariance matrix \(\Sigma_0\), and the precision-weighted average becomes the matrix formula \(\mu_n = (\Sigma_0^{-1} + X^\top X / \sigma^2)^{-1}(\Sigma_0^{-1}\mu_0 + X^\top y / \sigma^2)\) — but the logic is identical. --- ## See also - **Chapter-02-S01** — introduction to Bayesian inference and conjugacy. - **Chapter-02-S03** — Beta–Binomial conjugacy for one proportion. - **Chapter-02-S04** — Gamma–Poisson conjugacy for one count rate. - **Chapter-02-S05** — Gamma–Gamma conjugacy for a Gamma response rate. - **Chapter 03** — **`lmb()`** for multivariate Gaussian linear models.