Bayesian Updating: One Gentle Introduction
The Bayesian recipe
Bayesian inference has exactly one formula:
\[\text{Posterior} \propto \text{Prior} \times \text{Likelihood}\]
That’s it. Everything else is details.
- Prior: what you believed before seeing data
- Likelihood: how probable the data is under each possible parameter value
- Posterior: your updated belief after seeing data
The key insight: your belief gets updated as data arrives. Start with a prior (maybe vague, maybe informed), observe data, and the posterior combines both. With enough data, the prior gets overwhelmed — the data speaks for itself.
How is this different from frequentist inference?
| Frequentist | Bayesian | |
|---|---|---|
| Parameters | Fixed but unknown | Random variables with distributions |
| Probability | Long-run frequency | Degree of belief |
| Can say | “If H₀ is true, data this extreme has p = 0.03” | “Given the data, P(parameter > 0) = 0.97” |
| CI / CrI | “95% of intervals from this procedure contain \(\theta\)” | “There’s a 95% probability \(\theta\) is in this interval” |
The Bayesian interpretation is often what people think a confidence interval means — but it requires a prior.
Simulation 1: Coin flipping — watch the posterior update
Start with a prior belief about a coin’s bias. Flip the coin one at a time and watch the posterior distribution update in real time. With enough flips, the posterior concentrates around the true bias — regardless of the prior.
#| standalone: true
#| viewerHeight: 680
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("true_p", "True coin bias:",
min = 0.1, max = 0.9, value = 0.7, step = 0.05),
selectInput("prior", "Prior belief:",
choices = c("Uniform (a=1, b=1)" = "uniform",
"Skeptical of bias (a=5, b=5)" = "skeptical",
"Believe heads-biased (a=8, b=2)" = "heads",
"Believe tails-biased (a=2, b=8)" = "tails",
"Very strong fair (a=50, b=50)" = "strong_fair")),
sliderInput("n_flips", "Number of flips:",
min = 1, max = 200, value = 10, step = 1),
actionButton("go", "Flip coins",
class = "btn-primary", width = "100%"),
uiOutput("results")
),
mainPanel(
width = 9,
fluidRow(
column(12, plotOutput("posterior_plot", height = "400px"))
)
)
)
)
server <- function(input, output, session) {
get_prior <- function(prior) {
switch(prior,
"uniform" = c(a = 1, b = 1),
"skeptical" = c(a = 5, b = 5),
"heads" = c(a = 8, b = 2),
"tails" = c(a = 2, b = 8),
"strong_fair" = c(a = 50, b = 50)
)
}
dat <- reactive({
input$go
true_p <- input$true_p
prior <- input$prior
n_flips <- input$n_flips
ab <- get_prior(prior)
a0 <- ab["a"]; b0 <- ab["b"]
# Generate flips
flips <- rbinom(n_flips, 1, true_p)
heads <- sum(flips)
tails <- n_flips - heads
# Posterior parameters
a_post <- a0 + heads
b_post <- b0 + tails
list(a0 = a0, b0 = b0, a_post = a_post, b_post = b_post,
true_p = true_p, n_flips = n_flips, heads = heads,
tails = tails, flips = flips)
})
output$posterior_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
x <- seq(0, 1, length.out = 500)
y_prior <- dbeta(x, d$a0, d$b0)
y_post <- dbeta(x, d$a_post, d$b_post)
ylim <- c(0, max(c(y_prior, y_post)) * 1.1)
plot(x, y_post, type = "l", lwd = 3, col = "#3498db",
xlab = expression("Coin bias (" * theta * ")"),
ylab = "Density",
main = paste0("Bayesian updating after ", d$n_flips,
" flips (", d$heads, "H, ", d$tails, "T)"),
ylim = ylim)
# Prior
lines(x, y_prior, lwd = 2, col = "#95a5a6", lty = 2)
# Shade posterior
polygon(c(0, x, 1), c(0, y_post, 0),
col = adjustcolor("#3498db", 0.2), border = NA)
# True value
abline(v = d$true_p, lwd = 2.5, col = "#e74c3c", lty = 2)
# Posterior mean
post_mean <- d$a_post / (d$a_post + d$b_post)
abline(v = post_mean, lwd = 2, col = "#27ae60", lty = 3)
# 95% credible interval
ci_lo <- qbeta(0.025, d$a_post, d$b_post)
ci_hi <- qbeta(0.975, d$a_post, d$b_post)
x_ci <- seq(ci_lo, ci_hi, length.out = 200)
y_ci <- dbeta(x_ci, d$a_post, d$b_post)
polygon(c(ci_lo, x_ci, ci_hi), c(0, y_ci, 0),
col = adjustcolor("#3498db", 0.3), border = NA)
legend("topright", bty = "n", cex = 0.85,
legend = c("Prior",
"Posterior",
"95% credible interval",
paste0("True \u03b8 = ", d$true_p),
paste0("Posterior mean = ", round(post_mean, 3))),
col = c("#95a5a6", "#3498db",
adjustcolor("#3498db", 0.4),
"#e74c3c", "#27ae60"),
lwd = c(2, 3, 8, 2.5, 2),
lty = c(2, 1, 1, 2, 3))
})
output$results <- renderUI({
d <- dat()
post_mean <- d$a_post / (d$a_post + d$b_post)
ci_lo <- qbeta(0.025, d$a_post, d$b_post)
ci_hi <- qbeta(0.975, d$a_post, d$b_post)
prior_mean <- d$a0 / (d$a0 + d$b0)
tags$div(class = "stats-box",
HTML(paste0(
"<b>Prior:</b> Beta(", d$a0, ", ", d$b0, ")<br>",
"Prior mean: ", round(prior_mean, 3), "<br>",
"<hr style='margin:8px 0'>",
"<b>Data:</b> ", d$heads, "H / ", d$tails, "T<br>",
"MLE: ", round(d$heads / d$n_flips, 3), "<br>",
"<hr style='margin:8px 0'>",
"<b>Posterior:</b> Beta(", d$a_post, ", ", d$b_post, ")<br>",
"Post. mean: ", round(post_mean, 3), "<br>",
"95% CrI: [", round(ci_lo, 3), ", ", round(ci_hi, 3), "]<br>",
"<b>True \u03b8:</b> ", d$true_p
))
)
})
}
shinyApp(ui, server)
Simulation 2: Prior sensitivity — different priors, same data
Same coin, same flips, but different starting beliefs. With little data, the prior matters a lot. With enough data, all priors converge to the same posterior. The data overwhelms the prior.
#| standalone: true
#| viewerHeight: 580
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("true_p2", "True coin bias:",
min = 0.1, max = 0.9, value = 0.7, step = 0.05),
sliderInput("n_flips2", "Number of flips:",
min = 1, max = 500, value = 10, step = 1),
actionButton("go2", "Flip coins",
class = "btn-primary", width = "100%"),
uiOutput("results2")
),
mainPanel(
width = 9,
plotOutput("sensitivity_plot", height = "450px")
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$go2
true_p <- input$true_p2
n_flips <- input$n_flips2
# Generate one set of flips (shared across priors)
flips <- rbinom(n_flips, 1, true_p)
heads <- sum(flips)
tails <- n_flips - heads
priors <- list(
"Uniform (1,1)" = c(1, 1),
"Fair-biased (10,10)" = c(10, 10),
"Heads-biased (8,2)" = c(8, 2),
"Tails-biased (2,8)" = c(2, 8),
"Strong fair (50,50)" = c(50, 50)
)
list(priors = priors, heads = heads, tails = tails,
true_p = true_p, n_flips = n_flips)
})
output$sensitivity_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
x <- seq(0, 1, length.out = 500)
cols <- c("#3498db", "#e74c3c", "#27ae60", "#9b59b6", "#e67e22")
# Find y range
ymax <- 0
for (i in seq_along(d$priors)) {
ab <- d$priors[[i]]
y <- dbeta(x, ab[1] + d$heads, ab[2] + d$tails)
ymax <- max(ymax, max(y))
}
plot(NULL, xlim = c(0, 1), ylim = c(0, ymax * 1.1),
xlab = expression("Coin bias (" * theta * ")"),
ylab = "Posterior density",
main = paste0("5 priors, same data (",
d$heads, "H / ", d$tails, "T out of ",
d$n_flips, " flips)"))
for (i in seq_along(d$priors)) {
ab <- d$priors[[i]]
y <- dbeta(x, ab[1] + d$heads, ab[2] + d$tails)
lines(x, y, lwd = 2.5, col = cols[i])
}
abline(v = d$true_p, lty = 2, lwd = 2.5, col = "#2c3e50")
legend("topright", bty = "n", cex = 0.8,
legend = c(names(d$priors),
paste0("True \u03b8 = ", d$true_p)),
col = c(cols, "#2c3e50"),
lwd = c(rep(2.5, 5), 2.5),
lty = c(rep(1, 5), 2))
})
output$results2 <- renderUI({
d <- dat()
lines <- sapply(seq_along(d$priors), function(i) {
ab <- d$priors[[i]]
a_post <- ab[1] + d$heads
b_post <- ab[2] + d$tails
post_mean <- a_post / (a_post + b_post)
paste0(names(d$priors)[i], ": ",
round(post_mean, 3))
})
mle <- d$heads / d$n_flips
tags$div(class = "stats-box",
HTML(paste0(
"<b>Posterior means:</b><br>",
paste(lines, collapse = "<br>"), "<br>",
"<hr style='margin:8px 0'>",
"<b>MLE:</b> ", round(mle, 3), "<br>",
"<b>True \u03b8:</b> ", d$true_p, "<br>",
"<small>With more flips, all posteriors<br>",
"converge to the MLE.</small>"
))
)
})
}
shinyApp(ui, server)
Things to try
- Sim 1, 5 flips: the posterior is wide and heavily influenced by the prior. The 95% credible interval is broad.
- Sim 1, 100 flips: the posterior is sharp and centered near the true bias. The prior barely matters anymore.
- Sim 1, strong fair prior (a=50, b=50) with true bias = 0.7: the prior pulls the posterior toward 0.5. You need many flips to overcome a strong prior.
- Sim 2, 10 flips: the five posteriors are spread apart — the prior matters a lot.
- Sim 2, 200 flips: all five posteriors are nearly identical and centered on the true value. The data overwhelms the prior.
- Sim 2, slide from 1 to 500 flips: watch the posteriors converge in real time. This is the key Bayesian result — with enough data, the prior washes out.
The bottom line
- Bayesian inference = prior × likelihood → posterior. That’s the entire framework.
- The prior encodes what you know before seeing data. It can be informative (based on previous studies) or vague (letting the data speak).
- The posterior is your updated belief. It’s a full distribution, not just a point estimate — you get a complete picture of uncertainty.
- With enough data, the prior doesn’t matter. All reasonable priors converge to the same posterior. This is reassuring: Bayesian inference isn’t “subjective” in the long run.
- The credible interval has the interpretation people want from a confidence interval: “There’s a 95% probability that \(\theta\) is in this interval.” But this requires accepting the Bayesian framework (and a prior).
Did you know?
- Thomas Bayes was an English Presbyterian minister who never published his theorem. His friend Richard Price found the manuscript after Bayes’ death and published it in 1763 as “An Essay towards solving a Problem in the Doctrine of Chances.” Bayes’ original example was essentially the coin flipping problem in this simulation.
- Bayesian methods were largely abandoned in the 20th century because they were computationally intractable — computing posteriors for realistic problems required integrals that couldn’t be solved analytically. The revival came with Markov Chain Monte Carlo (MCMC) algorithms in the 1990s, which made it possible to sample from posteriors instead of computing them exactly.
- Pierre-Simon Laplace independently discovered Bayes’ theorem and used it extensively — including to estimate the mass of Saturn, the probability that the sun would rise tomorrow, and the bias of a coin. Laplace was arguably the first true Bayesian; Bayes himself only scratched the surface.
- This page is a teaser. The full Bayesian course covers MCMC, hierarchical models, prior selection, and model comparison in depth.