Bayesian Causal Inference

Why go Bayesian for causal effects?

Frequentist causal inference gives you a point estimate and a confidence interval. But sometimes you want more:

  • “What’s the probability the program actually works?” A frequentist CI doesn’t answer this. A posterior does: \(P(\tau > 0 \mid \text{data})\).
  • “A previous study found an effect of $1,500. Can I use that?” Bayesian priors let you formally incorporate prior evidence.
  • “My sample is tiny. Can I still say anything?” With a reasonable prior, Bayesian estimates are more stable in small samples.

The cost: you need to specify a prior, and your results depend on it. The payoff: richer inference and better small-sample behavior.

Connection to the courses. This page bridges the Bayesian inference and causal inference courses. We assume you know the basics of priors and posteriors and doubly robust estimation. If not, start there.


Setup: a job training program

A nonprofit runs a job training program for unemployed workers. We observe earnings for participants (treated) and non-participants (control). The question: does the program increase earnings?

We’ll use selection on observables — controlling for baseline covariates like age and prior earnings — and compare three approaches:

  1. Difference in means (naive)
  2. Regression adjustment (frequentist, with controls)
  3. Bayesian regression (with an informative prior from a previous study)
#| standalone: true
#| viewerHeight: 700

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .info-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.8;
    }
    .info-box b { color: #2c3e50; }
    .good  { color: #27ae60; font-weight: bold; }
    .bad   { color: #e74c3c; font-weight: bold; }
    .muted { color: #7f8c8d; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      tags$h5("True DGP"),
      sliderInput("true_ate", "True treatment effect ($):",
                  min = -500, max = 3000, value = 1200, step = 100),

      sliderInput("n_obs", "Sample size:",
                  min = 30, max = 500, value = 80, step = 10),

      sliderInput("confounding", "Selection bias strength:",
                  min = 0, max = 1, value = 0.5, step = 0.1),

      tags$hr(),
      tags$h5("Bayesian prior"),
      sliderInput("prior_mean", "Prior mean ($):",
                  min = -1000, max = 3000, value = 1500, step = 100),

      sliderInput("prior_sd", "Prior SD ($):",
                  min = 100, max = 3000, value = 800, step = 100),

      tags$hr(),
      actionButton("go", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("info")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(6, plotOutput("estimates_plot", height = "420px")),
        column(6, plotOutput("posterior_plot", height = "420px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n    <- input$n_obs
    ate  <- input$true_ate
    conf <- input$confounding

    # Covariates
    age <- rnorm(n, 35, 8)
    prior_earn <- rnorm(n, 20000, 5000)

    # Selection into treatment depends on covariates
    # Higher prior earnings -> more likely to be treated (positive selection)
    latent <- conf * (prior_earn - 20000) / 5000 + rnorm(n)
    treat  <- as.integer(latent > 0)

    # Outcome: earnings
    noise <- rnorm(n, sd = 3000)
    earnings <- 15000 + 200 * (age - 35) + 0.3 * (prior_earn - 20000) +
                ate * treat + noise

    df <- data.frame(earnings = earnings, treat = treat,
                     age = age, prior_earn = prior_earn)

    # 1. Naive difference in means
    naive_est <- mean(earnings[treat == 1]) - mean(earnings[treat == 0])
    n1 <- sum(treat); n0 <- n - n1
    se_naive <- sqrt(var(earnings[treat == 1]) / max(n1, 1) +
                     var(earnings[treat == 0]) / max(n0, 1))

    # 2. Regression adjustment (frequentist)
    fit_freq <- lm(earnings ~ treat + age + prior_earn, data = df)
    freq_est <- coef(fit_freq)["treat"]
    freq_se  <- summary(fit_freq)$coefficients["treat", "Std. Error"]

    # 3. Bayesian (conjugate normal update)
    # Likelihood: treat coefficient ~ N(freq_est, freq_se^2)
    # Prior: tau ~ N(prior_mean, prior_sd^2)
    prior_mean <- input$prior_mean
    prior_sd   <- input$prior_sd
    prior_prec <- 1 / prior_sd^2
    data_prec  <- 1 / freq_se^2

    post_prec <- prior_prec + data_prec
    post_var  <- 1 / post_prec
    post_mean <- post_var * (prior_prec * prior_mean + data_prec * freq_est)
    post_sd   <- sqrt(post_var)

    # P(tau > 0 | data)
    p_positive <- 1 - pnorm(0, post_mean, post_sd)

    list(df = df, ate = ate,
         naive_est = naive_est, se_naive = se_naive,
         freq_est = freq_est, freq_se = freq_se,
         prior_mean = prior_mean, prior_sd = prior_sd,
         post_mean = post_mean, post_sd = post_sd,
         p_positive = p_positive,
         n = n, n1 = n1, n0 = n0)
  })

  output$estimates_plot <- renderPlot({
    d <- dat()
    par(mar = c(5, 10, 3, 2))

    ests <- c(d$naive_est, d$freq_est, d$post_mean)
    ses  <- c(d$se_naive, d$freq_se, d$post_sd)
    labs <- c("Naive\n(diff in means)", "Regression\nadjustment", "Bayesian\nposterior")
    cols <- c("#e74c3c", "#3498db", "#27ae60")

    xlim <- range(c(ests - 2.5 * ses, ests + 2.5 * ses, d$ate))

    plot(NULL, xlim = xlim, ylim = c(0.5, 3.5),
         xlab = "Treatment effect ($)", ylab = "", yaxt = "n",
         main = "Three estimates of the training effect")
    axis(2, at = 1:3, labels = labs, las = 1, cex.axis = 0.9)

    abline(v = d$ate, col = "gray40", lty = 2, lwd = 2)
    text(d$ate, 3.45, paste0("Truth: $", d$ate), cex = 0.85, col = "gray40")

    for (i in 1:3) {
      ci_lo <- ests[i] - 1.96 * ses[i]
      ci_hi <- ests[i] + 1.96 * ses[i]
      segments(ci_lo, i, ci_hi, i, col = cols[i], lwd = 3)
      points(ests[i], i, pch = 16, col = cols[i], cex = 1.8)
    }

    abline(v = 0, col = "gray80", lty = 3)
    legend("topright", bty = "n", cex = 0.8,
           legend = c("Point estimate", "95% interval", "True effect"),
           pch = c(16, NA, NA), lty = c(NA, 1, 2),
           lwd = c(NA, 3, 2), col = c("gray40", "gray40", "gray40"))
  })

  output$posterior_plot <- renderPlot({
    d <- dat()
    par(mar = c(5, 4.5, 3, 1))

    # Plot range
    all_means <- c(d$prior_mean, d$freq_est, d$post_mean)
    all_sds   <- c(d$prior_sd, d$freq_se, d$post_sd)
    xlo <- min(all_means - 3 * all_sds)
    xhi <- max(all_means + 3 * all_sds)
    x   <- seq(xlo, xhi, length.out = 500)

    # Densities
    d_prior <- dnorm(x, d$prior_mean, d$prior_sd)
    d_like  <- dnorm(x, d$freq_est, d$freq_se)
    d_post  <- dnorm(x, d$post_mean, d$post_sd)

    ymax <- max(c(d_prior, d_like, d_post)) * 1.15

    plot(x, d_like, type = "l", col = "#3498db", lwd = 2.5,
         xlab = "Treatment effect ($)", ylab = "Density",
         main = "Prior, likelihood & posterior",
         ylim = c(0, ymax))
    lines(x, d_prior, col = "#e67e22", lwd = 2.5, lty = 2)
    lines(x, d_post, col = "#27ae60", lwd = 3)

    # Shade posterior
    poly_x <- c(x, rev(x))
    poly_y <- c(d_post, rep(0, length(x)))
    polygon(poly_x, poly_y, col = "#27ae6025", border = NA)

    abline(v = d$ate, col = "gray40", lty = 2, lwd = 1.5)
    abline(v = 0, col = "gray80", lty = 3)

    legend("topright", bty = "n", cex = 0.85,
           legend = c(paste0("Prior: N(", d$prior_mean, ", ", d$prior_sd, "\u00b2)"),
                      paste0("Likelihood (data): N(",
                             round(d$freq_est), ", ", round(d$freq_se), "\u00b2)"),
                      paste0("Posterior: N(",
                             round(d$post_mean), ", ", round(d$post_sd), "\u00b2)")),
           col = c("#e67e22", "#3498db", "#27ae60"),
           lwd = c(2.5, 2.5, 3), lty = c(2, 1, 1))
  })

  output$info <- renderUI({
    d <- dat()

    # Coverage
    freq_covers <- (d$freq_est - 1.96 * d$freq_se <= d$ate) &
                   (d$freq_est + 1.96 * d$freq_se >= d$ate)
    bayes_covers <- (d$post_mean - 1.96 * d$post_sd <= d$ate) &
                    (d$post_mean + 1.96 * d$post_sd >= d$ate)

    freq_tag <- if (freq_covers) "<span class='good'>Yes</span>" else "<span class='bad'>No</span>"
    bayes_tag <- if (bayes_covers) "<span class='good'>Yes</span>" else "<span class='bad'>No</span>"

    tags$div(class = "info-box", HTML(paste0(
      "<b>Naive:</b> $", round(d$naive_est), "<br>",
      "<b>Regression:</b> $", round(d$freq_est),
      " &plusmn; ", round(1.96 * d$freq_se), "<br>",
      "<b>Bayesian:</b> $", round(d$post_mean),
      " &plusmn; ", round(1.96 * d$post_sd), "<br>",
      "<b>P(effect > 0 | data):</b> ", round(d$p_positive, 3), "<br>",
      "<hr style='margin:8px 0'>",
      "<small>Covers truth? Freq: ", freq_tag,
      " | Bayes: ", bayes_tag, "</small>"
    )))
  })
}

shinyApp(ui, server)

How the Bayesian update works

The mechanics are simple. The regression gives us a likelihood for the treatment effect:

\[\hat{\tau}_{\text{data}} \sim N(\hat{\tau}_{\text{freq}}, \; \text{SE}^2)\]

We combine this with a prior from a previous study:

\[\tau \sim N(\mu_0, \; \sigma_0^2)\]

The posterior is another normal (conjugate updating):

\[\tau \mid \text{data} \sim N\!\left(\frac{\sigma_0^{-2} \mu_0 + \text{SE}^{-2} \hat{\tau}_{\text{freq}}}{\sigma_0^{-2} + \text{SE}^{-2}}, \;\; \frac{1}{\sigma_0^{-2} + \text{SE}^{-2}}\right)\]

The posterior mean is a precision-weighted average of the prior mean and the data estimate. Whichever has smaller variance (higher precision) gets more weight.

Things to try

  • Small sample (n=30), high confounding: the naive estimate is way off. Regression adjustment helps but is noisy. The Bayesian estimate, anchored by the prior, is more stable.
  • Large sample (n=500): the prior barely matters — the data dominates. All three estimates converge. This is the Bayesian guarantee: with enough data, the prior washes out.
  • Wrong prior (set prior mean = -500, true effect = 1200): watch how the posterior gets pulled toward the wrong prior at small \(n\) but corrects at large \(n\). This is the risk of informative priors.
  • Set confounding = 0: naive and regression estimates agree (no selection bias). The Bayesian estimate is a compromise between data and prior.

Prior sensitivity

The most common objection to Bayesian causal inference: “your results depend on the prior.” This is true — and it’s a feature, not a bug, when used honestly. The key question is: how much do the results depend on the prior?

A robust result is one where the conclusion holds across a range of reasonable priors. If flipping the prior from optimistic to skeptical changes your policy recommendation, that’s important to know.

#| standalone: true
#| viewerHeight: 600

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .info-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.8;
    }
    .info-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("n_sens", "Sample size:",
                  min = 30, max = 500, value = 80, step = 10),

      tags$hr(),
      tags$h5("Data (fixed once drawn)"),
      sliderInput("true_effect", "True effect ($):",
                  min = 0, max = 3000, value = 1200, step = 100),

      actionButton("draw", "Draw new data", class = "btn-primary", width = "100%"),

      tags$hr(),
      tags$h5("Vary the prior"),
      sliderInput("pm", "Prior mean ($):",
                  min = -2000, max = 5000, value = 1500, step = 100),

      sliderInput("ps", "Prior SD ($):",
                  min = 100, max = 5000, value = 800, step = 100),

      uiOutput("sens_info")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(6, plotOutput("sens_posterior", height = "420px")),
        column(6, plotOutput("sens_ppos", height = "420px"))
      )
    )
  )
)

server <- function(input, output, session) {

  # Fix the data on button press
  fixed_data <- reactiveVal(NULL)

  observeEvent(input$draw, {
    n   <- input$n_sens
    ate <- input$true_effect

    age <- rnorm(n, 35, 8)
    pe  <- rnorm(n, 20000, 5000)
    lat <- 0.5 * (pe - 20000) / 5000 + rnorm(n)
    treat <- as.integer(lat > 0)
    earnings <- 15000 + 200 * (age - 35) + 0.3 * (pe - 20000) +
                ate * treat + rnorm(n, sd = 3000)

    fit <- lm(earnings ~ treat + age + pe)
    freq_est <- coef(fit)["treat"]
    freq_se  <- summary(fit)$coefficients["treat", "Std. Error"]

    fixed_data(list(freq_est = freq_est, freq_se = freq_se, ate = ate, n = n))
  }, ignoreNULL = FALSE)

  # Recompute posterior whenever prior sliders change
  post <- reactive({
    fd <- fixed_data()
    req(fd)

    pm <- input$pm
    ps <- input$ps

    prior_prec <- 1 / ps^2
    data_prec  <- 1 / fd$freq_se^2
    post_prec  <- prior_prec + data_prec
    post_var   <- 1 / post_prec
    post_mean  <- post_var * (prior_prec * pm + data_prec * fd$freq_est)
    post_sd    <- sqrt(post_var)

    p_pos <- 1 - pnorm(0, post_mean, post_sd)

    list(post_mean = post_mean, post_sd = post_sd, p_pos = p_pos,
         pm = pm, ps = ps,
         freq_est = fd$freq_est, freq_se = fd$freq_se, ate = fd$ate)
  })

  output$sens_posterior <- renderPlot({
    d <- post()
    par(mar = c(5, 4.5, 3, 1))

    all_means <- c(d$pm, d$freq_est, d$post_mean)
    all_sds   <- c(d$ps, d$freq_se, d$post_sd)
    xlo <- min(all_means - 3.5 * all_sds, -500)
    xhi <- max(all_means + 3.5 * all_sds, 3000)
    x   <- seq(xlo, xhi, length.out = 500)

    d_prior <- dnorm(x, d$pm, d$ps)
    d_like  <- dnorm(x, d$freq_est, d$freq_se)
    d_post  <- dnorm(x, d$post_mean, d$post_sd)

    ymax <- max(c(d_prior, d_like, d_post)) * 1.15

    plot(x, d_like, type = "l", col = "#3498db", lwd = 2.5,
         xlab = "Treatment effect ($)", ylab = "Density",
         main = "Drag the prior, watch the posterior",
         ylim = c(0, ymax))
    lines(x, d_prior, col = "#e67e22", lwd = 2.5, lty = 2)
    lines(x, d_post, col = "#27ae60", lwd = 3)

    polygon(c(x, rev(x)), c(d_post, rep(0, length(x))),
            col = "#27ae6025", border = NA)

    abline(v = d$ate, col = "gray40", lty = 2, lwd = 1.5)
    abline(v = 0, col = "gray80", lty = 3)

    legend("topright", bty = "n", cex = 0.85,
           legend = c("Prior", "Likelihood (data)", "Posterior", "Truth"),
           col = c("#e67e22", "#3498db", "#27ae60", "gray40"),
           lwd = c(2.5, 2.5, 3, 1.5), lty = c(2, 1, 1, 2))
  })

  output$sens_ppos <- renderPlot({
    fd <- fixed_data()
    req(fd)
    par(mar = c(5, 4.5, 3, 1))

    # Compute P(tau > 0) for a grid of prior means
    prior_means <- seq(-2000, 5000, by = 100)
    ps <- input$ps
    prior_prec <- 1 / ps^2
    data_prec  <- 1 / fd$freq_se^2

    p_pos_grid <- sapply(prior_means, function(pm) {
      post_prec <- prior_prec + data_prec
      post_var  <- 1 / post_prec
      post_mean <- post_var * (prior_prec * pm + data_prec * fd$freq_est)
      post_sd   <- sqrt(post_var)
      1 - pnorm(0, post_mean, post_sd)
    })

    plot(prior_means, p_pos_grid, type = "l", col = "#9b59b6", lwd = 2.5,
         xlab = "Prior mean ($)", ylab = "P(effect > 0 | data)",
         main = "Sensitivity: how prior changes the conclusion",
         ylim = c(0, 1))
    abline(h = 0.5, col = "gray70", lty = 3)
    abline(h = 0.95, col = "#27ae60", lty = 3)
    abline(v = input$pm, col = "#e67e22", lty = 2, lwd = 2)

    points(input$pm, post()$p_pos, pch = 16, col = "#e67e22", cex = 2)

    text(max(prior_means) * 0.7, 0.97, "95% threshold", col = "#27ae60", cex = 0.8)
    text(max(prior_means) * 0.7, 0.52, "50% threshold", col = "gray50", cex = 0.8)
  })

  output$sens_info <- renderUI({
    d <- post()

    robust_msg <- if (d$p_pos > 0.95) {
      "<span style='color:#27ae60'>Robust: P > 95% regardless of prior</span>"
    } else if (d$p_pos > 0.5) {
      "<span style='color:#e67e22'>Likely positive, but sensitive to prior</span>"
    } else {
      "<span style='color:#e74c3c'>Inconclusive or negative</span>"
    }

    prior_wt <- round(100 * (1/d$ps^2) / (1/d$ps^2 + 1/d$freq_se^2), 1)

    tags$div(class = "info-box", HTML(paste0(
      "<b>Posterior mean:</b> $", round(d$post_mean), "<br>",
      "<b>P(effect > 0):</b> ", round(d$p_pos, 3), "<br>",
      "<b>Prior weight:</b> ", prior_wt, "%<br>",
      "<b>Data weight:</b> ", round(100 - prior_wt, 1), "%<br>",
      "<hr style='margin:8px 0'>",
      robust_msg
    )))
  })
}

shinyApp(ui, server)

Reading the sensitivity plot

The right panel shows \(P(\tau > 0 \mid \text{data})\) as a function of the prior mean, holding prior SD fixed. This is a sensitivity analysis:

  • If the curve stays above 0.95 across all reasonable prior means, your conclusion is robust — even a skeptic would agree.
  • If the curve crosses 0.5, a skeptical prior would flip your conclusion. That’s important to report.
  • Tight priors (small SD) make the curve steeper — the prior matters more. Vague priors (large SD) flatten the curve — the data dominates.

Things to try

  • Draw data, then drag prior mean from -2000 to +5000: watch the posterior (green) slide between prior and data. With large \(n\), it barely moves.
  • Set prior SD = 100 (very confident prior): the posterior sticks near the prior mean regardless of data. This is dogmatic — bad practice.
  • Set prior SD = 5000 (very vague): the posterior collapses onto the data. The prior contributes nothing — you’ve recovered the frequentist estimate.
  • Small n (30) vs large n (500): at \(n = 30\) the prior matters a lot. At \(n = 500\) even a strong prior gets overwhelmed.

When does Bayesian help most?

#| standalone: true
#| viewerHeight: 600

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .info-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.8;
    }
    .info-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("true_eff3", "True effect ($):",
                  min = 0, max = 3000, value = 1200, step = 100),

      sliderInput("prior_mean3", "Prior mean ($):",
                  min = 0, max = 3000, value = 1500, step = 100),

      sliderInput("prior_sd3", "Prior SD ($):",
                  min = 200, max = 2000, value = 800, step = 100),

      sliderInput("n_sims", "Simulations:",
                  min = 100, max = 1000, value = 500, step = 100),

      actionButton("run", "Run simulation", class = "btn-primary", width = "100%"),

      uiOutput("mc_info")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(6, plotOutput("mse_plot", height = "420px")),
        column(6, plotOutput("coverage_plot", height = "420px"))
      )
    )
  )
)

server <- function(input, output, session) {

  results <- reactiveVal(NULL)

  observeEvent(input$run, {
    ate <- input$true_eff3
    pm  <- input$prior_mean3
    ps  <- input$prior_sd3
    B   <- input$n_sims

    sample_sizes <- c(30, 50, 80, 120, 200, 350, 500)

    out <- data.frame()

    for (n in sample_sizes) {
      freq_ests <- numeric(B)
      bayes_ests <- numeric(B)
      freq_covers <- logical(B)
      bayes_covers <- logical(B)

      for (b in 1:B) {
        age <- rnorm(n, 35, 8)
        pe  <- rnorm(n, 20000, 5000)
        lat <- 0.5 * (pe - 20000) / 5000 + rnorm(n)
        treat <- as.integer(lat > 0)
        earnings <- 15000 + 200 * (age - 35) + 0.3 * (pe - 20000) +
                    ate * treat + rnorm(n, sd = 3000)

        fit <- lm(earnings ~ treat + age + pe)
        fe  <- coef(fit)["treat"]
        fse <- summary(fit)$coefficients["treat", "Std. Error"]

        prior_prec <- 1 / ps^2
        data_prec  <- 1 / fse^2
        post_prec  <- prior_prec + data_prec
        post_var   <- 1 / post_prec
        post_mean  <- post_var * (prior_prec * pm + data_prec * fe)
        post_sd    <- sqrt(post_var)

        freq_ests[b]  <- fe
        bayes_ests[b] <- post_mean

        freq_covers[b]  <- (fe - 1.96 * fse <= ate) & (fe + 1.96 * fse >= ate)
        bayes_covers[b] <- (post_mean - 1.96 * post_sd <= ate) &
                           (post_mean + 1.96 * post_sd >= ate)
      }

      out <- rbind(out, data.frame(
        n = n,
        mse_freq  = mean((freq_ests - ate)^2),
        mse_bayes = mean((bayes_ests - ate)^2),
        cov_freq  = mean(freq_covers),
        cov_bayes = mean(bayes_covers)
      ))
    }

    results(out)
  })

  output$mse_plot <- renderPlot({
    res <- results()
    req(res)
    par(mar = c(5, 5, 3, 1))

    ylim <- range(c(res$mse_freq, res$mse_bayes))

    plot(res$n, res$mse_freq / 1e6, type = "b", pch = 16, col = "#3498db",
         lwd = 2.5, cex = 1.3,
         xlab = "Sample size", ylab = "MSE (millions $\u00b2)",
         main = "Mean squared error",
         ylim = ylim / 1e6)
    lines(res$n, res$mse_bayes / 1e6, type = "b", pch = 17, col = "#27ae60",
          lwd = 2.5, cex = 1.3)

    legend("topright", bty = "n", cex = 0.95,
           legend = c("Frequentist", "Bayesian"),
           col = c("#3498db", "#27ae60"), pch = c(16, 17), lwd = 2.5)
  })

  output$coverage_plot <- renderPlot({
    res <- results()
    req(res)
    par(mar = c(5, 5, 3, 1))

    plot(res$n, res$cov_freq, type = "b", pch = 16, col = "#3498db",
         lwd = 2.5, cex = 1.3,
         xlab = "Sample size", ylab = "Coverage (95% interval)",
         main = "Coverage probability",
         ylim = c(0.7, 1))
    lines(res$n, res$cov_bayes, type = "b", pch = 17, col = "#27ae60",
          lwd = 2.5, cex = 1.3)
    abline(h = 0.95, col = "gray50", lty = 2, lwd = 1.5)

    legend("bottomright", bty = "n", cex = 0.95,
           legend = c("Frequentist", "Bayesian", "Nominal 95%"),
           col = c("#3498db", "#27ae60", "gray50"),
           pch = c(16, 17, NA), lty = c(1, 1, 2), lwd = 2.5)
  })

  output$mc_info <- renderUI({
    res <- results()
    req(res)

    small <- res[res$n == min(res$n), ]
    big   <- res[res$n == max(res$n), ]

    tags$div(class = "info-box", HTML(paste0(
      "<b>At n=", small$n, ":</b><br>",
      "Freq MSE: ", round(small$mse_freq / 1e6, 2), "M<br>",
      "Bayes MSE: ", round(small$mse_bayes / 1e6, 2), "M<br>",
      "<hr style='margin:6px 0'>",
      "<b>At n=", big$n, ":</b><br>",
      "Freq MSE: ", round(big$mse_freq / 1e6, 2), "M<br>",
      "Bayes MSE: ", round(big$mse_bayes / 1e6, 2), "M<br>",
      "<hr style='margin:6px 0'>",
      "<small>Bayesian advantage is largest at small n, ",
      "vanishes at large n.</small>"
    )))
  })
}

shinyApp(ui, server)

What the Monte Carlo shows

  • Left panel (MSE): at small \(n\), the Bayesian estimator has lower MSE because the prior shrinks noisy estimates toward a reasonable value. At large \(n\), both converge — the data overwhelms the prior.
  • Right panel (coverage): both methods achieve roughly 95% coverage at large \(n\). At small \(n\), the Bayesian interval can have better coverage if the prior is reasonable, because it’s wider and better centered.

Things to try

  • Good prior (prior mean near truth): Bayesian MSE is much lower at small \(n\). Free lunch from prior knowledge.
  • Bad prior (prior mean = 0, truth = 2000): Bayesian MSE is higher at small \(n\) — the prior pulls you away from truth. But at large \(n\), the data corrects it.
  • Very tight prior (SD = 200): amplifies both the benefit (when right) and the cost (when wrong). Very vague prior (SD = 2000): negligible difference from frequentist.

The bottom line

Frequentist Bayesian
What you get Point estimate + CI Full posterior distribution
Prior knowledge Not formally used Incorporated via prior
“Does the program work?” “We reject \(H_0: \tau = 0\) at 5%” \(P(\tau > 0 \mid \text{data}) = 0.97\)
Small samples Noisy, wide CIs Stabilized by prior
Large samples Gold standard Converges to frequentist
Risk None from prior Bad prior = bad estimate (at small \(n\))

In practice, the Bayesian approach is most valuable when:

  1. You have prior evidence — meta-analyses, pilot studies, or domain expertise that you want to formally incorporate
  2. Your sample is small — the prior provides regularization, reducing MSE
  3. You need probabilistic statements — “90% probability the effect exceeds $500” is more useful for policy than “we fail to reject at 5%”
  4. Sensitivity analysis matters — showing results are robust across priors is more convincing than a single p-value

When your sample is large and you have no strong prior information, the Bayesian and frequentist answers will be nearly identical. Use whichever your audience expects.


Did you know?

  • The LaLonde (1986) critique showed that non-experimental estimates of job training effects could differ wildly from experimental benchmarks. This launched the credibility revolution in economics — and is why we care so much about identification. A Bayesian prior doesn’t fix bad identification; it just stabilizes the estimate conditional on your assumptions being right.
  • The Bayesian approach to clinical trials is gaining ground at the FDA. Berry (2006) argued that Bayesian adaptive designs let you stop trials earlier (saving money and lives) by continuously updating the probability that a drug works. The same logic applies to program evaluation.
  • Rubin (1978) — the father of the potential outcomes framework — was himself a Bayesian. His framework for causal inference was motivated by Bayesian thinking about missing data: the untreated potential outcome for a treated unit is “missing,” and we can reason about it using posterior predictive distributions.