Priors & Posteriors
What is Bayesian inference, really?
Imagine you’re trying to estimate something — say, the average effect of a tutoring program on test scores. In frequentist statistics, you collect data, compute a point estimate, and that’s your answer.
In Bayesian statistics, you do something different:
- Start with a prior — what you believed before seeing data. Maybe from past studies, expert opinion, or just “I have no idea” (a flat prior).
- Observe data — the likelihood tells you how probable the data is for each possible value of the parameter.
- Combine them — Bayes’ theorem multiplies the prior by the likelihood to give you the posterior: your updated belief after seeing the data.
\[\underbrace{P(\theta \mid \text{data})}_{\text{posterior}} \propto \underbrace{P(\text{data} \mid \theta)}_{\text{likelihood}} \times \underbrace{P(\theta)}_{\text{prior}}\]
The simulation below makes this tangible. You’re estimating the true mean of a normal distribution. Set a prior, generate data, and watch the posterior form.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 600
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("true_mu", HTML("True μ (unknown to you):"),
min = -5, max = 5, value = 2, step = 0.5),
sliderInput("prior_mu", "Prior mean:",
min = -5, max = 5, value = 0, step = 0.5),
sliderInput("prior_sd", "Prior SD (certainty):",
min = 0.5, max = 10, value = 3, step = 0.5),
sliderInput("n", "Sample size (data):",
min = 1, max = 200, value = 5, step = 1),
sliderInput("sigma", HTML("Data noise (σ):"),
min = 0.5, max = 5, value = 2, step = 0.5),
actionButton("go", "New draw", class = "btn-primary", width = "100%"),
uiOutput("results")
),
mainPanel(
width = 9,
plotOutput("posterior_plot", height = "450px")
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$go
true_mu <- input$true_mu
prior_mu <- input$prior_mu
prior_sd <- input$prior_sd
n <- input$n
sigma <- input$sigma
# Generate data
y <- rnorm(n, mean = true_mu, sd = sigma)
y_bar <- mean(y)
# Posterior (conjugate normal-normal)
prior_prec <- 1 / prior_sd^2
data_prec <- n / sigma^2
post_prec <- prior_prec + data_prec
post_sd <- 1 / sqrt(post_prec)
post_mu <- (prior_prec * prior_mu + data_prec * y_bar) / post_prec
# Shrinkage weight on prior
w_prior <- prior_prec / post_prec
list(true_mu = true_mu, prior_mu = prior_mu, prior_sd = prior_sd,
y_bar = y_bar, sigma = sigma, n = n,
post_mu = post_mu, post_sd = post_sd, w_prior = w_prior)
})
output$posterior_plot <- renderPlot({
d <- dat()
xmin <- min(d$prior_mu - 3.5 * d$prior_sd, d$post_mu - 4 * d$post_sd, d$true_mu - 2)
xmax <- max(d$prior_mu + 3.5 * d$prior_sd, d$post_mu + 4 * d$post_sd, d$true_mu + 2)
x <- seq(xmin, xmax, length.out = 500)
y_prior <- dnorm(x, d$prior_mu, d$prior_sd)
y_like <- dnorm(x, d$y_bar, d$sigma / sqrt(d$n))
y_post <- dnorm(x, d$post_mu, d$post_sd)
ylim <- c(0, max(y_prior, y_like, y_post) * 1.15)
par(mar = c(4.5, 4.5, 3, 1))
plot(x, y_prior, type = "l", lwd = 2.5, col = "#e74c3c",
xlab = expression(mu), ylab = "Density",
main = "Prior + Likelihood = Posterior",
ylim = ylim)
lines(x, y_like, lwd = 2.5, col = "#3498db")
lines(x, y_post, lwd = 3, col = "#27ae60")
# Shade posterior
polygon(c(x, rev(x)),
c(y_post, rep(0, length(x))),
col = adjustcolor("#27ae60", 0.2), border = NA)
# True value
abline(v = d$true_mu, lty = 2, lwd = 2, col = "#2c3e50")
legend("topright", bty = "n", cex = 0.9,
legend = c("Prior (your belief before data)",
"Likelihood (what the data says)",
"Posterior (updated belief)",
expression("True " * mu)),
col = c("#e74c3c", "#3498db", "#27ae60", "#2c3e50"),
lwd = c(2.5, 2.5, 3, 2),
lty = c(1, 1, 1, 2))
})
output$results <- renderUI({
d <- dat()
tags$div(class = "stats-box",
HTML(paste0(
"<b>Prior mean:</b> ", d$prior_mu, "<br>",
"<b>Data mean:</b> ", round(d$y_bar, 3), "<br>",
"<b>Posterior mean:</b> ", round(d$post_mu, 3), "<br>",
"<b>Posterior SD:</b> ", round(d$post_sd, 3), "<br>",
"<hr style='margin:8px 0'>",
"<b>Weight on prior:</b> ", round(d$w_prior * 100, 1), "%<br>",
"<b>Weight on data:</b> ", round((1 - d$w_prior) * 100, 1), "%<br>",
"<small>Posterior = weighted average of prior & data</small>"
))
)
})
}
shinyApp(ui, server)
Things to try
- n = 1: the posterior is mostly the prior (red). You barely have data.
- Slide n to 100: the posterior (green) collapses onto the data mean (blue). Data overwhelms the prior. With enough data, the prior doesn’t matter.
- Set prior SD = 0.5 (strong prior) with n = 5: the posterior is pulled toward the prior. This is shrinkage — the prior is “shrinking” your estimate away from the data and toward your prior belief.
- Set prior SD = 10 (vague prior): the posterior tracks the data almost exactly. A flat prior says “I have no opinion” and lets the data speak.
- Watch the weight on prior in the sidebar — it shows exactly how much the posterior is a compromise between prior and data.
Shrinkage: the Bayesian superpower
Look at the “weight on prior” number in the sidebar. The posterior mean is literally a weighted average:
\[\mu_{post} = w \cdot \mu_{prior} + (1 - w) \cdot \bar{y}\]
where \(w\) depends on how confident your prior is relative to how much data you have.
This is shrinkage: the posterior “shrinks” the data estimate toward the prior. When is this useful?
- Small samples: noisy data gets regularized toward a sensible default.
- Many groups: estimating batting averages for 500 baseball players? Shrink extreme estimates toward the league average. A player who went 3-for-3 on opening day probably isn’t a .1000 hitter.
- Hierarchical models: borrow strength across groups by shrinking toward a common mean.
Shrinkage isn’t bias — it’s a bias-variance tradeoff. You add a little bias but reduce variance a lot, often improving overall accuracy.