In classical (frequentist) statistics, a parameter like a probability or a mean is treated as a fixed, unknown constant. You collect data, compute an estimate, and report a confidence interval — an interval that, in repeated sampling, would contain the true value a certain percentage of the time.
Bayesian inference takes a different philosophy: the parameter itself is treated as a random variable that has a probability distribution. Before you see any data, you express your uncertainty about the parameter as a prior distribution. After you observe data, you update that distribution using the information the data provides, arriving at a posterior distribution.
The posterior is the complete answer in a Bayesian analysis. It is a full probability distribution over all plausible values of the parameter, not just a point estimate or an interval. You can:
Everything follows from Bayes’ theorem. Let \(\theta\) denote the unknown parameter and \(y\) the observed data. Then
\[ \underbrace{p(\theta \mid y)}_{\text{posterior}} \;=\; \frac{\overbrace{p(y \mid \theta)}^{\text{likelihood}} \;\times\; \overbrace{p(\theta)}^{\text{prior}}} {\underbrace{p(y)}_{\text{normalizing constant}}}. \]
In words:
| Term | Meaning |
|---|---|
| Prior \(p(\theta)\) | What you believe about \(\theta\) before seeing data |
| Likelihood \(p(y \mid \theta)\) | How probable the observed data \(y\) are if the parameter were \(\theta\) |
| Posterior \(p(\theta \mid y)\) | Your updated belief about \(\theta\) after seeing \(y\) |
| Normalizing constant \(p(y)\) | Ensures the posterior integrates to 1; often written as a proportionality |
Because \(p(y)\) does not depend on \(\theta\), we can drop it and write the proportionality form you will see constantly:
\[ p(\theta \mid y) \;\propto\; p(y \mid \theta)\; p(\theta). \]
The normalizing constant is worked out by whatever method ensures the posterior is a proper probability distribution.
For most likelihood–prior combinations, the posterior has no simple
closed form — you need numerical methods (like the sampling done by
glmb()). But for a special set of
conjugate prior–likelihood pairs, the posterior is in
the same family as the prior: only the parameters
change. This makes hand calculation possible and builds the geometric
intuition you need before tackling regression.
The following sub-vignettes work through four conjugate pairs in detail:
| Sub-vignette | Conjugate pair | Unknown parameter |
|---|---|---|
| Chapter-02-S02 | Normal–Normal | Single mean \(\mu\) with known variance |
| Chapter-02-S03 | Beta–Binomial | Single proportion \(\theta\) |
| Chapter-02-S04 | Gamma–Poisson | Single count rate \(\lambda\) |
| Chapter-02-S05 | Gamma–Gamma | Rate \(\beta\) of a Gamma response with known shape |
Each sub-vignette explains the setup intuitively, derives the posterior update rule, works a numerical example in R, and shows how the idea generalizes to the regression models in glmbayes.
Before diving in, here are three ideas that appear in every single conjugate example. Recognizing them will help you build intuition quickly.
In every conjugate model the posterior mean is a weighted average of the prior mean and the data-based estimate:
\[ \mathbb{E}[\theta \mid y] \;=\; w_{\text{prior}} \cdot \underbrace{\mu_{\text{prior}}}_{\text{prior mean}} \;+\; w_{\text{data}} \cdot \underbrace{\hat\theta_{\text{data}}}_{\text{data estimate}}, \]
where the weights depend on the relative amount of information in the prior versus the data. More data shifts weight toward the data; a stronger (tighter) prior shifts it toward \(\mu_{\text{prior}}\).
This shrinkage is one of Bayesian analysis’s most practically important features — it prevents dramatic overreaction to small samples and reduces overfitting in regression.
A 95% credible interval \([L, U]\) satisfies \(P(L \leq \theta \leq U \mid y) = 0.95\). This is the statement most people think a frequentist confidence interval makes, but it does not. A frequentist 95% CI says: in 95% of repeated experiments, the interval we compute would contain the true \(\theta\) — it says nothing about this particular interval. The Bayesian credible interval is a direct probability statement about where \(\theta\) is, given the data and prior you specified.
Each scalar conjugate model is the intuition behind a fully general
regression model in glmbayes. In the regression
version, the single unknown parameter \(\theta\) is replaced by a coefficient
vector \(\beta \in \mathbb{R}^p\), and
the analytic update formula becomes a simulation-based draw implemented
by glmb(). The logic — prior × likelihood → posterior — is
identical; only the mechanics differ.
After working through the four sub-vignettes, the following table summarizes how each conjugate prototype maps to a more general model in the package.
| Conjugate prototype | Where it generalizes in glmbayes |
|---|---|
| Beta–Binomial | Binomial glmb() with normal priors on
regression coefficients (Chapters 7–9) |
| Gamma–Poisson | Conjugate \(\Gamma\) rate prior +
glmb() (Chapter-02-S04, Bayes Rules! main
example); general Poisson regression (Chapter 10, Appendix A) |
| Gamma–Gamma | glmb() with
Gamma(link="identity") and
dGamma(Inv_Dispersion=FALSE, lik_shape=k) (Chapter-02-S05);
non-conjugate Gamma regression with Gamma(log) (Chapter
10) |
| Normal–Normal | lmb() for multivariate Gaussian
regression (Chapter 3) |
Conjugate models are exact when the model assumptions hold. Their limitations:
When conjugacy is lost or when you want IID draws with flexible priors, glmbayes uses envelope-based accept–reject sampling (Nygren and Nygren 2006).
glmb() fits at regression scale.lmb()
for multivariate Gaussian linear models.