---
title: "Chapter 02-S02: Normal–Normal Conjugacy for One Mean"
author: "Kjell Nygren"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: REFERENCES.bib
reference-section-title: References
vignette: >
  %\VignetteIndexEntry{Chapter 02-S02: Normal–Normal Conjugacy for One Mean}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(glmbayes)
if (requireNamespace("bayesrules", quietly = TRUE)) library(bayesrules)
```

---

## 1. The model

### 1.1 Likelihood

Suppose we observe \(n\) measurements \(y_1, \dots, y_n\) drawn independently from a Normal distribution with **unknown mean** \(\mu\) and **known** standard deviation \(\sigma\):

\[
  Y_i \mid \mu \;\sim\; \mathcal{N}(\mu,\, \sigma^2), \qquad i = 1,\dots,n.
\]

The sufficient statistic is the sample mean \(\bar y\), which itself follows

\[
  \bar{y} \mid \mu \;\sim\; \mathcal{N}\!\left(\mu,\, \frac{\sigma^2}{n}\right).
\]

In practice \(\sigma\) is rarely truly known, but treating it as fixed provides the cleanest template for understanding posterior updating.  The full treatment — unknown mean **and** unknown variance — is handled in Chapter 3 with `lmb()`.

### 1.2 Prior

Since \(\mu\) can take any real value, the **Normal distribution** is the natural conjugate prior:

\[
  \mu \;\sim\; \mathcal{N}(\mu_0,\, \tau_0^2).
\]

Here \(\mu_0\) is your prior best guess and \(\tau_0\) encodes your uncertainty about it.

### 1.3 Posterior derivation

The key is to work with **precision** (reciprocal variance): \(\kappa = 1/\sigma^2\).

| Symbol | Formula | Meaning |
|--------|---------|---------|
| \(\kappa_0 = 1/\tau_0^2\) | Prior precision | Information in the prior |
| \(\kappa_L = n/\sigma^2\) | Likelihood precision | Information in the data |

Multiplying the Normal likelihood by the Normal prior kernel and completing the square gives the conjugate posterior:

\[
  \mu \mid \bar{y} \;\sim\; \mathcal{N}\!\left(\mu_n,\; \frac{1}{\kappa_n}\right),
\]

where

\[
  \kappa_n \;=\; \kappa_0 + \kappa_L
  \qquad\text{(precisions add)}
\]

\[
  \mu_n \;=\; \frac{\kappa_0\,\mu_0 + \kappa_L\,\bar{y}}{\kappa_n}
  \;=\;
  \frac{\kappa_0}{\kappa_n}\,\mu_0
  \;+\;
  \frac{\kappa_L}{\kappa_n}\,\bar{y}.
\]

The posterior mean \(\mu_n\) is a **precision-weighted average** of the prior mean and the data mean.

**Intuition.**

- Precisions add because information is additive: each independent source of knowledge about \(\mu\) contributes its own precision.
- A very **diffuse** prior (\(\tau_0 \to \infty\), \(\kappa_0 \to 0\)) gives \(\mu_n \approx \bar y\) — the data take over entirely.
- A very **small** sample (\(n\) small) or **noisy** data (\(\sigma\) large) keeps \(\kappa_L\) small, so the prior retains more influence.
- As \(n \to \infty\), \(\mu_n \to \bar y\) and the posterior variance \(1/\kappa_n \to 0\) — the Bayesian estimate converges to the classical MLE.

---

## 2. `bayesrules` illustration

The **`bayesrules`** package provides `plot_normal_normal()` and `summarize_normal_normal()` for quickly visualizing and summarizing the Normal–Normal update.

**Scenario.**  A sleep researcher's prior belief is that undergraduates average about 7 hours of sleep per night.  They encode this as \(\mu_0 = 7\), \(\tau_0 = 1.5\) — fairly uncertain, allowing for means anywhere from 4 to 10 hours.  The population standard deviation is assumed known at \(\sigma = 1.8\) hours (from large-scale studies).  A sample of \(n = 22\) students yields a sample mean of \(\bar y = 6.1\) hours.

```{r nn-bayesrules-data}
prior_mean <- 7;    prior_sd <- 1.5   ## prior: N(7, 1.5^2)
sigma      <- 1.8                     ## known population SD
y_bar      <- 6.1;  n_sleep  <- 22L   ## observed sample
```

```{r nn-bayesrules-plot, eval = requireNamespace("bayesrules", quietly = TRUE)}
library(bayesrules)
## Overlay prior, likelihood, and posterior
plot_normal_normal(
  mean       = prior_mean,
  sd         = prior_sd,
  sigma      = sigma,
  y_bar      = y_bar,
  n          = n_sleep,
  prior      = TRUE,
  likelihood = TRUE,
  posterior  = TRUE
)
```

```{r nn-bayesrules-summary, eval = requireNamespace("bayesrules", quietly = TRUE)}
## Tabular summary of prior, likelihood, and posterior parameters
summarize_normal_normal(
  mean  = prior_mean,
  sd    = prior_sd,
  sigma = sigma,
  y_bar = y_bar,
  n     = n_sleep
)
```

**Reading the output.**  The `summarize_normal_normal()` table reports the prior mean and SD, the likelihood (data) mean and SE, and the resulting posterior mean and SD.  The plot overlays all three: the wide prior, the sharper likelihood centered at 6.1, and the posterior — narrower than both and sitting between them, pulled strongly toward the data because 22 observations outweigh the prior.

**Analytic posterior.**

```{r nn-analytic}
prec_0 <- 1 / prior_sd^2
prec_L <- n_sleep / sigma^2
prec_n <- prec_0 + prec_L

post_mean <- (prec_0 * prior_mean + prec_L * y_bar) / prec_n
post_sd   <- sqrt(1 / prec_n)

nn_analytic <- data.frame(
  Example = "Sleep hours (BayesRules)",
  n = n_sleep,
  `y-bar` = y_bar,
  Posterior = sprintf("N(%.4f, %.4f^2)", post_mean, post_sd),
  Mean = post_mean,
  SD = post_sd,
  check.names = FALSE
)
knitr::kable(nn_analytic, digits = 4,
  caption = "Conjugate Normal--Normal posterior (known sigma)")
```

---

## 3. Real data: Old Faithful waiting times

The `faithful` dataset (built into R) records waiting times between eruptions of the Old Faithful geyser in Yellowstone National Park.  We treat the waiting time \(Y_i\) as approximately Normal with unknown mean \(\mu\) and use a Normal–Normal conjugate model.

**Prior.**  Based on decades of visitor records, a geologist believes the mean waiting time is around 72 minutes, with considerable uncertainty (\(\tau_0 = 15\) minutes).  We estimate the population SD from the data and treat it as fixed at \(\hat\sigma\).

```{r faithful-setup}
y_faith <- faithful$waiting
n_faith <- length(y_faith)

ybar_faith  <- mean(y_faith)
sigma_faith <- sd(y_faith)           ## treat as known

prior_mean_f <- 72;  prior_sd_f <- 15   ## N(72, 15^2) prior

cat(sprintf("n = %d,  y-bar = %.2f,  sigma = %.2f\n",
            n_faith, ybar_faith, sigma_faith))
```

**Analytic posterior.**

```{r faithful-posterior}
prec_0f <- 1 / prior_sd_f^2
prec_Lf <- n_faith / sigma_faith^2
prec_nf <- prec_0f + prec_Lf

post_mean_f <- (prec_0f * prior_mean_f + prec_Lf * ybar_faith) / prec_nf
post_sd_f   <- sqrt(1 / prec_nf)

faith_analytic <- data.frame(
  Dataset = "Old Faithful waiting",
  n = n_faith,
  `y-bar` = ybar_faith,
  Posterior = sprintf("N(%.4f, %.4f^2)", post_mean_f, post_sd_f),
  Mean = post_mean_f,
  SD = post_sd_f,
  check.names = FALSE
)
knitr::kable(faith_analytic, digits = 4,
  caption = "Conjugate Normal--Normal posterior (known sigma)")
```

**Reading the output.**  With \(n = 272\) observations, the likelihood precision dominates the prior, so the posterior mean is almost identical to \(\bar y\).  The posterior SD is small — the data overwhelm the prior.

### 3.1 Fitting with **`lmb()`**

In **glmbayes**, the intercept-only model uses `lmb()` with `dNormal()` and fixed dispersion \(\sigma^2\):

```{r faithful-lmb}
df_faith <- data.frame(y = y_faith)

mu_f    <- matrix(prior_mean_f, nrow = 1, ncol = 1,
                  dimnames = list(NULL, "(Intercept)"))
Sigma_f <- matrix(prior_sd_f^2, nrow = 1, ncol = 1,
                  dimnames = list("(Intercept)", "(Intercept)"))

pf_faith <- dNormal(mu = mu_f, Sigma = Sigma_f,
                    dispersion = sigma_faith^2)

set.seed(2026)
fit_faith <- lmb(
  n       = 20000,
  y ~ 1,
  data    = df_faith,
  pfamily = pf_faith
)
print(fit_faith)
```

```{r faithful-compare}
faith_compare <- data.frame(
  Dataset = "Old Faithful waiting",
  Posterior = faith_analytic$Posterior,
  `Analytic Mean` = faith_analytic$Mean,
  `Analytic SD`   = faith_analytic$SD,
  `lmb Post.Mean` = fit_faith$coef.means["(Intercept)"],
  `lmb Post.Sd`   = sd(fit_faith$coefficients[, "(Intercept)", drop = TRUE]),
  check.names = FALSE
)
knitr::kable(faith_compare, digits = 4,
  caption = "Analytic vs. lmb() posterior mean and SD")
```

The **`lmb Post.Mean`** and **`lmb Post.Sd`** columns should match the analytic **Mean** and **SD** to Monte Carlo error.

---

## 4. Connection to glmbayes

This scalar case is the template for **Gaussian linear models with `lmb()`** (Chapter 3).  In regression, the single unknown \(\mu\) is replaced by a coefficient vector \(\beta \in \mathbb{R}^p\), the scalar prior variance \(\tau_0^2\) becomes a \(p \times p\) prior covariance matrix \(\Sigma_0\), and the precision-weighted average becomes the matrix formula \(\mu_n = (\Sigma_0^{-1} + X^\top X / \sigma^2)^{-1}(\Sigma_0^{-1}\mu_0 + X^\top y / \sigma^2)\) — but the logic is identical.

---

## See also

- **Chapter-02-S01** — introduction to Bayesian inference and conjugacy.
- **Chapter-02-S03** — Beta–Binomial conjugacy for one proportion.
- **Chapter-02-S04** — Gamma–Poisson conjugacy for one count rate.
- **Chapter-02-S05** — Gamma–Gamma conjugacy for a Gamma response rate.
- **Chapter 03** — **`lmb()`** for multivariate Gaussian linear models.