--- title: "Chapter 02-S03: Beta–Binomial Conjugacy for One Proportion" author: "Kjell Nygren" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: REFERENCES.bib reference-section-title: References vignette: > %\VignetteIndexEntry{Chapter 02-S03: Beta–Binomial Conjugacy for One Proportion} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(glmbayes) if (requireNamespace("bayesrules", quietly = TRUE)) library(bayesrules) ``` --- ## 1. The model ### 1.1 Likelihood Suppose we observe \(n\) independent binary trials — each a success (1) or failure (0) — with a common success probability \(\theta \in (0,1)\). The total number of successes \(y = \sum_i y_i\) follows: \[ y \mid \theta \;\sim\; \mathrm{Binomial}(n,\, \theta). \] The likelihood is \[ p(y \mid \theta) \;=\; \binom{n}{y}\,\theta^y\,(1-\theta)^{n-y}, \] and everything relevant for updating our beliefs about \(\theta\) is captured by the two sufficient statistics \(y\) (successes) and \(n - y\) (failures). ### 1.2 Prior The parameter \(\theta\) is a probability, so its support is \([0,1]\). The **Beta distribution** is the natural conjugate prior: \[ p(\theta) \;=\; \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}, \qquad \theta \in [0,1], \] with shape parameters \(a > 0\) and \(b > 0\). Its mean is \(a/(a+b)\) and its variance is \(ab/[(a+b)^2(a+b+1)]\). **Interpreting the parameters.** Think of \(a - 1\) as a "prior number of successes" and \(b - 1\) as a "prior number of failures" accumulated before the current study. The total "prior sample size" is \(a + b - 2\) and the prior mean is \(a/(a+b)\). | Prior belief | Parameter choice | |---|---| | No information — completely flat | \(a = b = 1\) (uniform on \([0,1]\)) | | Mild belief near 0.5 | \(a = b = 3\) | | Moderate belief near 0.4 | \(a = 4, b = 6\) (mean = 0.4) | | Strong belief near 0.2 | \(a = 4, b = 16\) (mean = 0.2) | ### 1.3 Posterior derivation Multiply the likelihood kernel \(\theta^y(1-\theta)^{n-y}\) by the prior kernel \(\theta^{a-1}(1-\theta)^{b-1}\): \[ p(\theta \mid y) \;\propto\; \theta^{(a+y)-1}(1-\theta)^{(b+n-y)-1}. \] This is the kernel of a Beta distribution, so the conjugate posterior is: \[ \boxed{\theta \mid y \;\sim\; \mathrm{Beta}(a + y,\; b + n - y).} \] The update rule is simple: - Add the **number of successes** \(y\) to the first shape parameter. - Add the **number of failures** \(n - y\) to the second shape parameter. ### 1.4 The posterior mean as a compromise \[ \mathbb{E}[\theta \mid y] \;=\; \frac{a + y}{a + b + n} \;=\; \frac{a+b}{a+b+n}\cdot\frac{a}{a+b} \;+\; \frac{n}{a+b+n}\cdot\frac{y}{n}. \] This is a weighted average of the prior mean \(a/(a+b)\) and the data frequency \(y/n\), with weights proportional to the "prior sample size" \(a+b\) and the actual sample size \(n\). When \(n\) is much larger than \(a + b\) the data dominate; when \(n\) is small the prior pulls the estimate toward its own mean. --- ## 2. `bayesrules` illustration The **`bayesrules`** package provides `plot_beta_binomial()` and `summarize_beta_binomial()` for quick visualization and tabular summaries of the Beta–Binomial update [@JohnsonLauEtAl2022]. **Scenario.** A clinical researcher believes, based on earlier trials, that about 40% of patients respond to a new analgesic. They encode this as a **Beta(4, 6)** prior (mean = 0.4, effective prior sample size = 10). In a new pilot study they observe \(y = 14\) responders out of \(n = 30\) patients. ```{r bb-data} a0 <- 4; b0 <- 6 ## Beta(4, 6) prior: mean = 0.4 y <- 14; n <- 30 ## data: 14/30 successes ## Analytic posterior post_a <- a0 + y ## 4 + 14 = 18 post_b <- b0 + (n - y) ## 6 + 16 = 22 ``` ```{r bb-bayesrules-plot, eval = requireNamespace("bayesrules", quietly = TRUE)} library(bayesrules) ## Overlay prior, likelihood (scaled), and posterior densities plot_beta_binomial( alpha = a0, beta = b0, y = y, n = n, prior = TRUE, likelihood = TRUE, posterior = TRUE ) ``` ```{r bb-bayesrules-summary, eval = requireNamespace("bayesrules", quietly = TRUE)} ## Tabular summary of prior, likelihood, and posterior summarize_beta_binomial(alpha = a0, beta = b0, y = y, n = n) ``` **Reading the output.** The posterior parameters are Beta(18, 22), giving a posterior mean of \(18/40 = 0.45\) — pulled from the prior mean (0.40) toward the data frequency (\(14/30 \approx 0.47\)) in proportion to their relative weights. The posterior is noticeably narrower than the prior, reflecting the information gained from 30 observations. **Analytic posterior.** ```{r bb-analytic} bb_analytic <- data.frame( Example = "Analgesic trial (BayesRules)", n = n, y = y, Posterior = sprintf("Beta(%d, %d)", post_a, post_b), Mean = post_a / (post_a + post_b), SD = sqrt(post_a * post_b / ((post_a + post_b)^2 * (post_a + post_b + 1))), check.names = FALSE ) knitr::kable(bb_analytic, digits = 4, caption = "Conjugate Beta--Binomial posterior") ``` --- ## 3. Real data: the Bechdel test The **Bechdel test** asks whether a film features at least two women who talk to each other about something other than a man. The `bayesrules::bechdel` dataset records the test result for `r if(requireNamespace("bayesrules", quietly=TRUE)) nrow(bayesrules::bechdel) else "1794"` films [@JohnsonLauEtAl2022]. **Research question.** What fraction of films pass the Bechdel test? **Prior.** Based on earlier analyses suggesting roughly 45% of films pass, we specify a **Beta(9, 11)** prior (mean = 0.45, effective sample size = 20 — moderately informative). ```{r bechdel-data, eval = requireNamespace("bayesrules", quietly = TRUE)} library(bayesrules) ## Binary outcome: 1 = PASS, 0 = FAIL pass <- as.integer(bechdel[["binary"]] == "PASS") n_bech <- length(pass) y_bech <- sum(pass) cat(sprintf("n = %d, y (PASS) = %d, proportion = %.3f\n", n_bech, y_bech, y_bech / n_bech)) ``` ```{r bechdel-posterior, eval = requireNamespace("bayesrules", quietly = TRUE)} a0_b <- 9; b0_b <- 11 ## Beta(9, 11) prior: mean = 0.45 post_a_b <- a0_b + y_bech post_b_b <- b0_b + (n_bech - y_bech) c(prior_mean = a0_b / (a0_b + b0_b), data_freq = round(y_bech / n_bech, 4), post_mean = round(post_a_b / (post_a_b + post_b_b), 4)) ## 95% credible interval round(qbeta(c(0.025, 0.975), post_a_b, post_b_b), 4) ``` With over 1700 films the data dominate the prior; the posterior mean is nearly identical to the observed proportion. ```{r bechdel-analytic, eval = requireNamespace("bayesrules", quietly = TRUE)} analytic_mean_b <- post_a_b / (post_a_b + post_b_b) analytic_sd_b <- sqrt(post_a_b * post_b_b / ((post_a_b + post_b_b)^2 * (post_a_b + post_b_b + 1))) bech_analytic <- data.frame( Dataset = "Bechdel test", n = n_bech, y = y_bech, Posterior = sprintf("Beta(%d, %d)", post_a_b, post_b_b), Mean = analytic_mean_b, SD = analytic_sd_b, check.names = FALSE ) knitr::kable(bech_analytic, digits = 4, caption = "Conjugate Beta--Binomial posterior") ``` ### 3.1 Fitting with **`glmb()`** using **`dBeta()`** The scalar Beta–Binomial model generalizes to **`glmb()`** with `family = binomial(link = "identity")` and the `dBeta()` prior. In this intercept-only setting the single regression coefficient **is** \(\theta\) directly on the probability scale. ```{r bechdel-glmb, eval = requireNamespace("bayesrules", quietly = TRUE)} df_bech <- data.frame(y = pass) bech_beta <- matrix(a0_b / (a0_b + b0_b), nrow = 1, ncol = 1, dimnames = list(NULL, "(Intercept)")) bech_pf <- dBeta(shape1 = a0_b, shape2 = b0_b, beta = bech_beta) set.seed(2026) fit_bech <- glmb( n = 20000, y ~ 1, data = df_bech, weights = rep(1L, n_bech), family = binomial(link = "identity"), pfamily = bech_pf ) print(fit_bech) ``` ```{r bechdel-compare, eval = requireNamespace("bayesrules", quietly = TRUE)} bech_compare <- data.frame( Dataset = "Bechdel test", Posterior = bech_analytic$Posterior, `Analytic Mean` = bech_analytic$Mean, `Analytic SD` = bech_analytic$SD, `glmb Post.Mean` = fit_bech$coef.means["(Intercept)"], `glmb Post.Sd` = sd(fit_bech$coefficients[, "(Intercept)", drop = TRUE]), check.names = FALSE ) knitr::kable(bech_compare, digits = 4, caption = "Analytic vs. glmb() posterior mean and SD") ``` The **`glmb Post.Mean`** and **`glmb Post.Sd`** columns should match the analytic **Mean** and **SD** to Monte Carlo error because `dBeta()` implements the exact conjugate update. --- ## 4. Connection to glmbayes This scalar prototype is the intuition behind **Binomial `glmb()`** (Chapters 7–9), where instead of a single \(\theta\) you model \(\mathrm{logit}(\theta_i) = x_i^\top \beta\) and place a multivariate Normal prior on the coefficient vector \(\beta\). The conjugate `dBeta()` path is specific to intercept-only models with the identity link; once covariates enter, the conjugacy is lost and the general envelope sampler takes over. --- ## See also - **Chapter-02-S01** — introduction to Bayesian inference and conjugacy. - **Chapter-02-S02** — Normal–Normal conjugacy for one mean. - **Chapter-02-S04** — Gamma–Poisson conjugacy for one count rate. - **Chapter-02-S05** — Gamma–Gamma conjugacy for a Gamma response rate. - **Chapters 07–09** — Binomial regression with **`glmb()`**.