Bayesian Updating: One Gentle Introduction

The Bayesian recipe

Bayesian inference has exactly one formula:

\[\text{Posterior} \propto \text{Prior} \times \text{Likelihood}\]

That’s it. Everything else is details.

Prior: what you believed before seeing data
Likelihood: how probable the data is under each possible parameter value
Posterior: your updated belief after seeing data

The key insight: your belief gets updated as data arrives. Start with a prior (maybe vague, maybe informed), observe data, and the posterior combines both. With enough data, the prior gets overwhelmed — the data speaks for itself.

How is this different from frequentist inference?

	Frequentist	Bayesian
Parameters	Fixed but unknown	Random variables with distributions
Probability	Long-run frequency	Degree of belief
Can say	“If H₀ is true, data this extreme has p = 0.03”	“Given the data, P(parameter > 0) = 0.97”
CI / CrI	“95% of intervals from this procedure contain \(\theta\)”	“There’s a 95% probability \(\theta\) is in this interval”

The Bayesian interpretation is often what people think a confidence interval means — but it requires a prior.

The Oracle View. In these simulations, we set the true coin bias — so we can watch the posterior converge to the right answer. In practice, you don’t know the true parameter. You choose a prior, observe data, and update. The posterior is your best belief — but you never get to check it against the truth.

Simulation 1: Coin flipping — watch the posterior update

Start with a prior belief about a coin’s bias. Flip the coin one at a time and watch the posterior distribution update in real time. With enough flips, the posterior concentrates around the true bias — regardless of the prior.

#| standalone: true
#| viewerHeight: 680

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("true_p", "True coin bias:",
                  min = 0.1, max = 0.9, value = 0.7, step = 0.05),

      selectInput("prior", "Prior belief:",
                  choices = c("Uniform (a=1, b=1)" = "uniform",
                              "Skeptical of bias (a=5, b=5)" = "skeptical",
                              "Believe heads-biased (a=8, b=2)" = "heads",
                              "Believe tails-biased (a=2, b=8)" = "tails",
                              "Very strong fair (a=50, b=50)" = "strong_fair")),

      sliderInput("n_flips", "Number of flips:",
                  min = 1, max = 200, value = 10, step = 1),

      actionButton("go", "Flip coins",
                   class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(12, plotOutput("posterior_plot", height = "400px"))
      )
    )
  )
)

server <- function(input, output, session) {

  get_prior <- function(prior) {
    switch(prior,
      "uniform"      = c(a = 1, b = 1),
      "skeptical"    = c(a = 5, b = 5),
      "heads"        = c(a = 8, b = 2),
      "tails"        = c(a = 2, b = 8),
      "strong_fair"  = c(a = 50, b = 50)
    )
  }

  dat <- reactive({
    input$go
    true_p  <- input$true_p
    prior   <- input$prior
    n_flips <- input$n_flips

    ab <- get_prior(prior)
    a0 <- ab["a"]; b0 <- ab["b"]

    # Generate flips
    flips <- rbinom(n_flips, 1, true_p)
    heads <- sum(flips)
    tails <- n_flips - heads

    # Posterior parameters
    a_post <- a0 + heads
    b_post <- b0 + tails

    list(a0 = a0, b0 = b0, a_post = a_post, b_post = b_post,
         true_p = true_p, n_flips = n_flips, heads = heads,
         tails = tails, flips = flips)
  })

  output$posterior_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    x <- seq(0, 1, length.out = 500)
    y_prior <- dbeta(x, d$a0, d$b0)
    y_post  <- dbeta(x, d$a_post, d$b_post)

    ylim <- c(0, max(c(y_prior, y_post)) * 1.1)

    plot(x, y_post, type = "l", lwd = 3, col = "#3498db",
         xlab = expression("Coin bias (" * theta * ")"),
         ylab = "Density",
         main = paste0("Bayesian updating after ", d$n_flips,
                      " flips (", d$heads, "H, ", d$tails, "T)"),
         ylim = ylim)

    # Prior
    lines(x, y_prior, lwd = 2, col = "#95a5a6", lty = 2)

    # Shade posterior
    polygon(c(0, x, 1), c(0, y_post, 0),
            col = adjustcolor("#3498db", 0.2), border = NA)

    # True value
    abline(v = d$true_p, lwd = 2.5, col = "#e74c3c", lty = 2)

    # Posterior mean
    post_mean <- d$a_post / (d$a_post + d$b_post)
    abline(v = post_mean, lwd = 2, col = "#27ae60", lty = 3)

    # 95% credible interval
    ci_lo <- qbeta(0.025, d$a_post, d$b_post)
    ci_hi <- qbeta(0.975, d$a_post, d$b_post)
    x_ci <- seq(ci_lo, ci_hi, length.out = 200)
    y_ci <- dbeta(x_ci, d$a_post, d$b_post)
    polygon(c(ci_lo, x_ci, ci_hi), c(0, y_ci, 0),
            col = adjustcolor("#3498db", 0.3), border = NA)

    legend("topright", bty = "n", cex = 0.85,
           legend = c("Prior",
                      "Posterior",
                      "95% credible interval",
                      paste0("True \u03b8 = ", d$true_p),
                      paste0("Posterior mean = ", round(post_mean, 3))),
           col = c("#95a5a6", "#3498db",
                   adjustcolor("#3498db", 0.4),
                   "#e74c3c", "#27ae60"),
           lwd = c(2, 3, 8, 2.5, 2),
           lty = c(2, 1, 1, 2, 3))
  })

  output$results <- renderUI({
    d <- dat()
    post_mean <- d$a_post / (d$a_post + d$b_post)
    ci_lo <- qbeta(0.025, d$a_post, d$b_post)
    ci_hi <- qbeta(0.975, d$a_post, d$b_post)

    prior_mean <- d$a0 / (d$a0 + d$b0)

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>Prior:</b> Beta(", d$a0, ", ", d$b0, ")<br>",
        "Prior mean: ", round(prior_mean, 3), "<br>",
        "<hr style='margin:8px 0'>",
        "<b>Data:</b> ", d$heads, "H / ", d$tails, "T<br>",
        "MLE: ", round(d$heads / d$n_flips, 3), "<br>",
        "<hr style='margin:8px 0'>",
        "<b>Posterior:</b> Beta(", d$a_post, ", ", d$b_post, ")<br>",
        "Post. mean: ", round(post_mean, 3), "<br>",
        "95% CrI: [", round(ci_lo, 3), ", ", round(ci_hi, 3), "]<br>",
        "<b>True \u03b8:</b> ", d$true_p
      ))
    )
  })
}

shinyApp(ui, server)

Simulation 2: Prior sensitivity — different priors, same data

Same coin, same flips, but different starting beliefs. With little data, the prior matters a lot. With enough data, all priors converge to the same posterior. The data overwhelms the prior.

#| standalone: true
#| viewerHeight: 580

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("true_p2", "True coin bias:",
                  min = 0.1, max = 0.9, value = 0.7, step = 0.05),

      sliderInput("n_flips2", "Number of flips:",
                  min = 1, max = 500, value = 10, step = 1),

      actionButton("go2", "Flip coins",
                   class = "btn-primary", width = "100%"),

      uiOutput("results2")
    ),

    mainPanel(
      width = 9,
      plotOutput("sensitivity_plot", height = "450px")
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go2
    true_p  <- input$true_p2
    n_flips <- input$n_flips2

    # Generate one set of flips (shared across priors)
    flips <- rbinom(n_flips, 1, true_p)
    heads <- sum(flips)
    tails <- n_flips - heads

    priors <- list(
      "Uniform (1,1)"        = c(1, 1),
      "Fair-biased (10,10)"  = c(10, 10),
      "Heads-biased (8,2)"   = c(8, 2),
      "Tails-biased (2,8)"   = c(2, 8),
      "Strong fair (50,50)"  = c(50, 50)
    )

    list(priors = priors, heads = heads, tails = tails,
         true_p = true_p, n_flips = n_flips)
  })

  output$sensitivity_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    x <- seq(0, 1, length.out = 500)
    cols <- c("#3498db", "#e74c3c", "#27ae60", "#9b59b6", "#e67e22")

    # Find y range
    ymax <- 0
    for (i in seq_along(d$priors)) {
      ab <- d$priors[[i]]
      y <- dbeta(x, ab[1] + d$heads, ab[2] + d$tails)
      ymax <- max(ymax, max(y))
    }

    plot(NULL, xlim = c(0, 1), ylim = c(0, ymax * 1.1),
         xlab = expression("Coin bias (" * theta * ")"),
         ylab = "Posterior density",
         main = paste0("5 priors, same data (",
                      d$heads, "H / ", d$tails, "T out of ",
                      d$n_flips, " flips)"))

    for (i in seq_along(d$priors)) {
      ab <- d$priors[[i]]
      y <- dbeta(x, ab[1] + d$heads, ab[2] + d$tails)
      lines(x, y, lwd = 2.5, col = cols[i])
    }

    abline(v = d$true_p, lty = 2, lwd = 2.5, col = "#2c3e50")

    legend("topright", bty = "n", cex = 0.8,
           legend = c(names(d$priors),
                      paste0("True \u03b8 = ", d$true_p)),
           col = c(cols, "#2c3e50"),
           lwd = c(rep(2.5, 5), 2.5),
           lty = c(rep(1, 5), 2))
  })

  output$results2 <- renderUI({
    d <- dat()

    lines <- sapply(seq_along(d$priors), function(i) {
      ab <- d$priors[[i]]
      a_post <- ab[1] + d$heads
      b_post <- ab[2] + d$tails
      post_mean <- a_post / (a_post + b_post)
      paste0(names(d$priors)[i], ": ",
             round(post_mean, 3))
    })

    mle <- d$heads / d$n_flips

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>Posterior means:</b><br>",
        paste(lines, collapse = "<br>"), "<br>",
        "<hr style='margin:8px 0'>",
        "<b>MLE:</b> ", round(mle, 3), "<br>",
        "<b>True \u03b8:</b> ", d$true_p, "<br>",
        "<small>With more flips, all posteriors<br>",
        "converge to the MLE.</small>"
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

Sim 1, 5 flips: the posterior is wide and heavily influenced by the prior. The 95% credible interval is broad.
Sim 1, 100 flips: the posterior is sharp and centered near the true bias. The prior barely matters anymore.
Sim 1, strong fair prior (a=50, b=50) with true bias = 0.7: the prior pulls the posterior toward 0.5. You need many flips to overcome a strong prior.
Sim 2, 10 flips: the five posteriors are spread apart — the prior matters a lot.
Sim 2, 200 flips: all five posteriors are nearly identical and centered on the true value. The data overwhelms the prior.
Sim 2, slide from 1 to 500 flips: watch the posteriors converge in real time. This is the key Bayesian result — with enough data, the prior washes out.

The bottom line

Bayesian inference = prior × likelihood → posterior. That’s the entire framework.
The prior encodes what you know before seeing data. It can be informative (based on previous studies) or vague (letting the data speak).
The posterior is your updated belief. It’s a full distribution, not just a point estimate — you get a complete picture of uncertainty.
With enough data, the prior doesn’t matter. All reasonable priors converge to the same posterior. This is reassuring: Bayesian inference isn’t “subjective” in the long run.
The credible interval has the interpretation people want from a confidence interval: “There’s a 95% probability that \(\theta\) is in this interval.” But this requires accepting the Bayesian framework (and a prior).

Did you know?

Thomas Bayes was an English Presbyterian minister who never published his theorem. His friend Richard Price found the manuscript after Bayes’ death and published it in 1763 as “An Essay towards solving a Problem in the Doctrine of Chances.” Bayes’ original example was essentially the coin flipping problem in this simulation.
Bayesian methods were largely abandoned in the 20th century because they were computationally intractable — computing posteriors for realistic problems required integrals that couldn’t be solved analytically. The revival came with Markov Chain Monte Carlo (MCMC) algorithms in the 1990s, which made it possible to sample from posteriors instead of computing them exactly.
Pierre-Simon Laplace independently discovered Bayes’ theorem and used it extensively — including to estimate the mass of Saturn, the probability that the sun would rise tomorrow, and the bias of a coin. Laplace was arguably the first true Bayesian; Bayes himself only scratched the surface.
This page is a teaser. The full Bayesian course covers MCMC, hierarchical models, prior selection, and model comparison in depth.