---
title: "Chapter 02-S01: Conjugate Models — Introduction and Overview"
author: "Kjell Nygren"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: REFERENCES.bib
reference-section-title: References
vignette: >
  %\VignetteIndexEntry{Chapter 02-S01: Conjugate Models — Introduction and Overview}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

---

## 1. What is Bayesian inference, and why does it matter?

### 1.1 The big picture

In classical (frequentist) statistics, a parameter like a probability or a mean is treated as a fixed, unknown constant.  You collect data, compute an estimate, and report a confidence interval — an interval that, in repeated sampling, would contain the true value a certain percentage of the time.

**Bayesian inference** takes a different philosophy: the parameter itself is treated as a **random variable** that has a **probability distribution**.  Before you see any data, you express your uncertainty about the parameter as a **prior distribution**.  After you observe data, you update that distribution using the information the data provides, arriving at a **posterior distribution**.

The posterior is the complete answer in a Bayesian analysis.  It is a full probability distribution over all plausible values of the parameter, not just a point estimate or an interval.  You can:

- Report its **mean** or **median** as a point estimate.
- Report a **credible interval** — an interval you are genuinely 95% confident contains the parameter, in the sense that the parameter falls in it with posterior probability 0.95.
- Use it as a new prior if more data arrive later.

### 1.2 Bayes' theorem

Everything follows from **Bayes' theorem**.  Let \(\theta\) denote the unknown parameter and \(y\) the observed data.  Then

\[
  \underbrace{p(\theta \mid y)}_{\text{posterior}}
  \;=\;
  \frac{\overbrace{p(y \mid \theta)}^{\text{likelihood}}
        \;\times\;
        \overbrace{p(\theta)}^{\text{prior}}}
       {\underbrace{p(y)}_{\text{normalizing constant}}}.
\]

In words:

| Term | Meaning |
|------|---------|
| **Prior** \(p(\theta)\) | What you believe about \(\theta\) *before* seeing data |
| **Likelihood** \(p(y \mid \theta)\) | How probable the observed data \(y\) are *if* the parameter were \(\theta\) |
| **Posterior** \(p(\theta \mid y)\) | Your updated belief about \(\theta\) *after* seeing \(y\) |
| **Normalizing constant** \(p(y)\) | Ensures the posterior integrates to 1; often written as a proportionality |

Because \(p(y)\) does not depend on \(\theta\), we can drop it and write the proportionality form you will see constantly:

\[
  p(\theta \mid y) \;\propto\; p(y \mid \theta)\; p(\theta).
\]

The normalizing constant is worked out by whatever method ensures the posterior is a proper probability distribution.

### 1.3 Why conjugacy is a gift

For most likelihood–prior combinations, the posterior has no simple closed form — you need numerical methods (like the sampling done by **`glmb()`**).  But for a special set of **conjugate** prior–likelihood pairs, the posterior is in the **same family** as the prior: only the parameters change.  This makes hand calculation possible and builds the geometric intuition you need before tackling regression.

The following sub-vignettes work through four conjugate pairs in detail:

| Sub-vignette | Conjugate pair | Unknown parameter |
|---|---|---|
| Chapter-02-S02 | Normal–Normal | Single mean \(\mu\) with known variance |
| Chapter-02-S03 | Beta–Binomial | Single proportion \(\theta\) |
| Chapter-02-S04 | Gamma–Poisson | Single count rate \(\lambda\) |
| Chapter-02-S05 | Gamma–Gamma | Rate \(\beta\) of a Gamma response with known shape |

Each sub-vignette explains the setup intuitively, derives the posterior update rule, works a numerical example in R, and shows how the idea generalizes to the regression models in **glmbayes**.

---

## 2. Three principles that run through all conjugate models

Before diving in, here are three ideas that appear in every single conjugate example.  Recognizing them will help you build intuition quickly.

### 2.1 The posterior is always a compromise

In every conjugate model the posterior mean is a **weighted average** of the prior mean and the data-based estimate:

\[
  \mathbb{E}[\theta \mid y]
  \;=\;
  w_{\text{prior}} \cdot \underbrace{\mu_{\text{prior}}}_{\text{prior mean}}
  \;+\;
  w_{\text{data}} \cdot \underbrace{\hat\theta_{\text{data}}}_{\text{data estimate}},
\]

where the weights depend on the relative amount of information in the prior versus the data.  More data shifts weight toward the data; a stronger (tighter) prior shifts it toward \(\mu_{\text{prior}}\).

This **shrinkage** is one of Bayesian analysis's most practically important features — it prevents dramatic overreaction to small samples and reduces overfitting in regression.

### 2.2 Credible intervals mean what you think they mean

A **95% credible interval** \([L, U]\) satisfies \(P(L \leq \theta \leq U \mid y) = 0.95\).  This is the statement most people *think* a frequentist confidence interval makes, but it does not.  A frequentist 95% CI says: in 95% of repeated experiments, the interval we compute would contain the true \(\theta\) — it says nothing about this particular interval.  The Bayesian credible interval is a direct probability statement about where \(\theta\) is, given the data and prior you specified.

### 2.3 The conjugate prototype and the regression generalization

Each scalar conjugate model is the intuition behind a fully general regression model in **glmbayes**.  In the regression version, the single unknown parameter \(\theta\) is replaced by a coefficient vector \(\beta \in \mathbb{R}^p\), and the analytic update formula becomes a simulation-based draw implemented by `glmb()`.  The logic — prior × likelihood → posterior — is identical; only the mechanics differ.

---

## 3. Putting it all together: from simple models to regression

After working through the four sub-vignettes, the following table summarizes how each conjugate prototype maps to a more general model in the package.

### 3.1 Roadmap

| Conjugate prototype | Where it generalizes in **glmbayes** |
|---------------------|--------------------------------------|
| Beta–Binomial | Binomial **`glmb()`** with normal priors on regression coefficients (Chapters 7–9) |
| Gamma–Poisson | Conjugate \(\Gamma\) rate prior + **`glmb()`** (Chapter-02-S04, Bayes Rules! main example); general Poisson regression (Chapter 10, Appendix A) |
| Gamma–Gamma | **`glmb()`** with `Gamma(link="identity")` and `dGamma(Inv_Dispersion=FALSE, lik_shape=k)` (Chapter-02-S05); non-conjugate Gamma regression with `Gamma(log)` (Chapter 10) |
| Normal–Normal | **`lmb()`** for multivariate Gaussian regression (Chapter 3) |

### 3.2 What conjugacy gives you — and what it does not

Conjugate models are exact when the model assumptions hold.  Their limitations:

1. They require a specific prior family (Beta for proportions, Gamma for rates, Normal for means).
2. They are restricted to intercept-only (single-parameter) settings; adding covariates breaks conjugacy.
3. When the likelihood shape is unknown (e.g., Gamma regression with unknown \(k\)), the conjugate update no longer applies.

When conjugacy is lost or when you want **IID draws** with flexible priors, **glmbayes** uses envelope-based accept–reject sampling [@Nygren2006].

---

## See also

- **Chapter-02-S02** — Normal–Normal conjugacy for one mean.
- **Chapter-02-S03** — Beta–Binomial conjugacy for one proportion.
- **Chapter-02-S04** — Gamma–Poisson conjugacy for one count rate.
- **Chapter-02-S05** — Gamma–Gamma conjugacy for a Gamma response rate.
- **Chapter 01** — first **`glmb()`** fits at regression scale.
- **Chapter 03** — **`lmb()`** for multivariate Gaussian linear models.