Chapter 02-S01: Conjugate Models — Introduction and Overview


1. What is Bayesian inference, and why does it matter?

1.1 The big picture

In classical (frequentist) statistics, a parameter like a probability or a mean is treated as a fixed, unknown constant. You collect data, compute an estimate, and report a confidence interval — an interval that, in repeated sampling, would contain the true value a certain percentage of the time.

Bayesian inference takes a different philosophy: the parameter itself is treated as a random variable that has a probability distribution. Before you see any data, you express your uncertainty about the parameter as a prior distribution. After you observe data, you update that distribution using the information the data provides, arriving at a posterior distribution.

The posterior is the complete answer in a Bayesian analysis. It is a full probability distribution over all plausible values of the parameter, not just a point estimate or an interval. You can:

  • Report its mean or median as a point estimate.
  • Report a credible interval — an interval you are genuinely 95% confident contains the parameter, in the sense that the parameter falls in it with posterior probability 0.95.
  • Use it as a new prior if more data arrive later.

1.2 Bayes’ theorem

Everything follows from Bayes’ theorem. Let \(\theta\) denote the unknown parameter and \(y\) the observed data. Then

\[ \underbrace{p(\theta \mid y)}_{\text{posterior}} \;=\; \frac{\overbrace{p(y \mid \theta)}^{\text{likelihood}} \;\times\; \overbrace{p(\theta)}^{\text{prior}}} {\underbrace{p(y)}_{\text{normalizing constant}}}. \]

In words:

Term Meaning
Prior \(p(\theta)\) What you believe about \(\theta\) before seeing data
Likelihood \(p(y \mid \theta)\) How probable the observed data \(y\) are if the parameter were \(\theta\)
Posterior \(p(\theta \mid y)\) Your updated belief about \(\theta\) after seeing \(y\)
Normalizing constant \(p(y)\) Ensures the posterior integrates to 1; often written as a proportionality

Because \(p(y)\) does not depend on \(\theta\), we can drop it and write the proportionality form you will see constantly:

\[ p(\theta \mid y) \;\propto\; p(y \mid \theta)\; p(\theta). \]

The normalizing constant is worked out by whatever method ensures the posterior is a proper probability distribution.

1.3 Why conjugacy is a gift

For most likelihood–prior combinations, the posterior has no simple closed form — you need numerical methods (like the sampling done by glmb()). But for a special set of conjugate prior–likelihood pairs, the posterior is in the same family as the prior: only the parameters change. This makes hand calculation possible and builds the geometric intuition you need before tackling regression.

The following sub-vignettes work through four conjugate pairs in detail:

Sub-vignette Conjugate pair Unknown parameter
Chapter-02-S02 Normal–Normal Single mean \(\mu\) with known variance
Chapter-02-S03 Beta–Binomial Single proportion \(\theta\)
Chapter-02-S04 Gamma–Poisson Single count rate \(\lambda\)
Chapter-02-S05 Gamma–Gamma Rate \(\beta\) of a Gamma response with known shape

Each sub-vignette explains the setup intuitively, derives the posterior update rule, works a numerical example in R, and shows how the idea generalizes to the regression models in glmbayes.


2. Three principles that run through all conjugate models

Before diving in, here are three ideas that appear in every single conjugate example. Recognizing them will help you build intuition quickly.

2.1 The posterior is always a compromise

In every conjugate model the posterior mean is a weighted average of the prior mean and the data-based estimate:

\[ \mathbb{E}[\theta \mid y] \;=\; w_{\text{prior}} \cdot \underbrace{\mu_{\text{prior}}}_{\text{prior mean}} \;+\; w_{\text{data}} \cdot \underbrace{\hat\theta_{\text{data}}}_{\text{data estimate}}, \]

where the weights depend on the relative amount of information in the prior versus the data. More data shifts weight toward the data; a stronger (tighter) prior shifts it toward \(\mu_{\text{prior}}\).

This shrinkage is one of Bayesian analysis’s most practically important features — it prevents dramatic overreaction to small samples and reduces overfitting in regression.

2.2 Credible intervals mean what you think they mean

A 95% credible interval \([L, U]\) satisfies \(P(L \leq \theta \leq U \mid y) = 0.95\). This is the statement most people think a frequentist confidence interval makes, but it does not. A frequentist 95% CI says: in 95% of repeated experiments, the interval we compute would contain the true \(\theta\) — it says nothing about this particular interval. The Bayesian credible interval is a direct probability statement about where \(\theta\) is, given the data and prior you specified.

2.3 The conjugate prototype and the regression generalization

Each scalar conjugate model is the intuition behind a fully general regression model in glmbayes. In the regression version, the single unknown parameter \(\theta\) is replaced by a coefficient vector \(\beta \in \mathbb{R}^p\), and the analytic update formula becomes a simulation-based draw implemented by glmb(). The logic — prior × likelihood → posterior — is identical; only the mechanics differ.


3. Putting it all together: from simple models to regression

After working through the four sub-vignettes, the following table summarizes how each conjugate prototype maps to a more general model in the package.

3.1 Roadmap

Conjugate prototype Where it generalizes in glmbayes
Beta–Binomial Binomial glmb() with normal priors on regression coefficients (Chapters 7–9)
Gamma–Poisson Conjugate \(\Gamma\) rate prior + glmb() (Chapter-02-S04, Bayes Rules! main example); general Poisson regression (Chapter 10, Appendix A)
Gamma–Gamma glmb() with Gamma(link="identity") and dGamma(Inv_Dispersion=FALSE, lik_shape=k) (Chapter-02-S05); non-conjugate Gamma regression with Gamma(log) (Chapter 10)
Normal–Normal lmb() for multivariate Gaussian regression (Chapter 3)

3.2 What conjugacy gives you — and what it does not

Conjugate models are exact when the model assumptions hold. Their limitations:

  1. They require a specific prior family (Beta for proportions, Gamma for rates, Normal for means).
  2. They are restricted to intercept-only (single-parameter) settings; adding covariates breaks conjugacy.
  3. When the likelihood shape is unknown (e.g., Gamma regression with unknown \(k\)), the conjugate update no longer applies.

When conjugacy is lost or when you want IID draws with flexible priors, glmbayes uses envelope-based accept–reject sampling (Nygren and Nygren 2006).


See also

  • Chapter-02-S02 — Normal–Normal conjugacy for one mean.
  • Chapter-02-S03 — Beta–Binomial conjugacy for one proportion.
  • Chapter-02-S04 — Gamma–Poisson conjugacy for one count rate.
  • Chapter-02-S05 — Gamma–Gamma conjugacy for a Gamma response rate.
  • Chapter 01 — first glmb() fits at regression scale.
  • Chapter 03lmb() for multivariate Gaussian linear models.

References

Nygren, K. N., and L. M. Nygren. 2006. Likelihood Subgradient Densities.” Journal of the American Statistical Association 101 (475): 1144–56. https://doi.org/10.1198/016214506000000357.