Application: Returns to Education

Putting it all together: returns to education

You’ve learned the pieces — Bayes’ theorem, priors and posteriors, shrinkage, MCMC, Bayesian regression. Now let’s use them on a real question.

What’s the causal return to an extra year of education on wages?

This is one of the most studied questions in labor economics. The standard approach uses the Mincer equation:

\[ \log(\text{wage}) = \beta_0 + \beta_1 \cdot \text{educ} + \beta_2 \cdot \text{exper} + \beta_3 \cdot \text{exper}^2 + \varepsilon \]

We’ll simulate data from this model and walk through the full Bayesian workflow: estimate, interpret, check sensitivity, and compare to OLS.

What the literature says: Estimates of the return to education typically range from 5% to 15% per year (β₁ ≈ 0.05–0.15 in log points). Card (1999) found ~10% using instrumental variables. We’ll use this as our prior information.


Simulation: Full Bayesian workflow

Walk through all five steps. The left panel shows the posterior for the education coefficient with the prior and OLS estimate. The right panel shows how the posterior shifts as you change the prior strength.

#| standalone: true
#| viewerHeight: 680

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 13px; line-height: 1.8;
    }
    .stats-box b { color: #2c3e50; }
    .good  { color: #27ae60; font-weight: bold; }
    .bad   { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      tags$h5("Data"),
      sliderInput("n", "Sample size (n):",
                  min = 30, max = 1000, value = 200, step = 10),

      sliderInput("true_beta", HTML("True &beta;<sub>educ</sub>:"),
                  min = 0.01, max = 0.20, value = 0.10, step = 0.01),

      tags$hr(),
      tags$h5("Prior"),

      sliderInput("prior_mean", HTML("Prior mean for &beta;<sub>educ</sub>:"),
                  min = 0, max = 0.20, value = 0.10, step = 0.01),

      sliderInput("prior_sd", HTML("Prior SD for &beta;<sub>educ</sub>:"),
                  min = 0.005, max = 0.10, value = 0.03, step = 0.005),

      actionButton("go", "New data", class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(6, plotOutput("scatter_plot", height = "300px")),
        column(6, plotOutput("posterior_plot", height = "300px"))
      ),
      fluidRow(
        column(12, plotOutput("sensitivity_plot", height = "280px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n          <- input$n
    true_beta  <- input$true_beta
    prior_mean <- input$prior_mean
    prior_sd   <- input$prior_sd

    # Simulate Mincer equation
    educ  <- round(rnorm(n, 13, 2.5))   # years of education ~13 mean
    educ  <- pmax(educ, 8)               # floor at 8 years
    exper <- round(pmax(rnorm(n, 15, 8), 0))  # years of experience
    sigma <- 0.4                          # residual SD in log wages

    log_wage <- 1.0 + true_beta * educ + 0.03 * exper - 0.0005 * exper^2 +
                rnorm(n, 0, sigma)

    # OLS: regress log_wage on educ, exper, exper^2
    X <- cbind(1, educ, exper, exper^2)
    XtX_inv <- solve(t(X) %*% X)
    beta_ols <- as.numeric(XtX_inv %*% t(X) %*% log_wage)
    resid <- log_wage - X %*% beta_ols
    s2 <- sum(resid^2) / (n - 4)
    se_ols <- sqrt(s2 * diag(XtX_inv))

    # Extract education coefficient (index 2)
    b_ols  <- beta_ols[2]
    se_b   <- se_ols[2]

    # Bayesian posterior (conjugate, focusing on educ coeff)
    prior_prec <- 1 / prior_sd^2
    data_prec  <- 1 / se_b^2
    post_prec  <- prior_prec + data_prec
    post_sd    <- 1 / sqrt(post_prec)
    post_mean  <- (prior_mean * prior_prec + b_ols * data_prec) / post_prec

    w_prior <- prior_prec / post_prec
    w_data  <- data_prec / post_prec

    # CrI
    cri_lo <- qnorm(0.025, post_mean, post_sd)
    cri_hi <- qnorm(0.975, post_mean, post_sd)

    # CI
    ci_lo <- b_ols - 1.96 * se_b
    ci_hi <- b_ols + 1.96 * se_b

    # P(beta > 0 | data)
    p_positive <- 1 - pnorm(0, post_mean, post_sd)

    # P(beta > 0.05 | data)
    p_above_05 <- 1 - pnorm(0.05, post_mean, post_sd)

    # Sensitivity: how posterior mean changes with prior SD
    prior_sd_grid <- seq(0.005, 0.15, length.out = 100)
    sens_means <- numeric(length(prior_sd_grid))
    sens_lo    <- numeric(length(prior_sd_grid))
    sens_hi    <- numeric(length(prior_sd_grid))

    for (i in seq_along(prior_sd_grid)) {
      pp <- 1 / prior_sd_grid[i]^2
      posp <- pp + data_prec
      pm <- (prior_mean * pp + b_ols * data_prec) / posp
      ps <- 1 / sqrt(posp)
      sens_means[i] <- pm
      sens_lo[i] <- qnorm(0.025, pm, ps)
      sens_hi[i] <- qnorm(0.975, pm, ps)
    }

    list(educ = educ, log_wage = log_wage, exper = exper,
         beta_ols_all = beta_ols, se_ols_all = se_ols,
         b_ols = b_ols, se_b = se_b, ci_lo = ci_lo, ci_hi = ci_hi,
         prior_mean = prior_mean, prior_sd = prior_sd,
         post_mean = post_mean, post_sd = post_sd,
         cri_lo = cri_lo, cri_hi = cri_hi,
         w_prior = w_prior, w_data = w_data,
         p_positive = p_positive, p_above_05 = p_above_05,
         true_beta = true_beta, sigma = sigma,
         prior_sd_grid = prior_sd_grid, sens_means = sens_means,
         sens_lo = sens_lo, sens_hi = sens_hi)
  })

  # --- Left: scatterplot with OLS and Bayesian regression lines ---
  output$scatter_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    plot(d$educ, d$log_wage, pch = 16, col = "#2c3e5040", cex = 0.9,
         xlab = "Years of education", ylab = "log(wage)",
         main = "Data + Regression Lines")

    # Predict at education values, holding experience at mean
    educ_seq <- seq(min(d$educ), max(d$educ), length.out = 100)
    mean_exp <- mean(d$exper)

    # OLS fitted line (red)
    ols_pred <- d$beta_ols_all[1] + d$beta_ols_all[2] * educ_seq +
                d$beta_ols_all[3] * mean_exp + d$beta_ols_all[4] * mean_exp^2
    lines(educ_seq, ols_pred, col = "#e74c3c", lwd = 2.5)

    # Bayesian fitted line (blue) — same intercept/exper coeffs, Bayesian educ slope
    bayes_pred <- d$beta_ols_all[1] + d$post_mean * educ_seq +
                  d$beta_ols_all[3] * mean_exp + d$beta_ols_all[4] * mean_exp^2
    lines(educ_seq, bayes_pred, col = "#3498db", lwd = 2.5)

    # True line (green dashed)
    true_pred <- 1.0 + d$true_beta * educ_seq +
                 0.03 * mean_exp - 0.0005 * mean_exp^2
    lines(educ_seq, true_pred, col = "#27ae60", lwd = 2, lty = 2)

    legend("topleft", bty = "n", cex = 0.8,
           legend = c(
             paste0("OLS: ", round(d$b_ols, 4)),
             paste0("Bayesian: ", round(d$post_mean, 4)),
             paste0("True: ", d$true_beta)
           ),
           col = c("#e74c3c", "#3498db", "#27ae60"),
           lwd = c(2.5, 2.5, 2), lty = c(1, 1, 2))
  })

  # --- Right: posterior density with OLS comparison ---
  output$posterior_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    # Range for plotting
    all_means <- c(d$prior_mean, d$b_ols, d$post_mean)
    all_sds   <- c(d$prior_sd, d$se_b, d$post_sd)
    xlim <- range(c(all_means - 3.5 * all_sds, all_means + 3.5 * all_sds))
    x_seq <- seq(xlim[1], xlim[2], length.out = 400)

    prior_y <- dnorm(x_seq, d$prior_mean, d$prior_sd)
    lik_y   <- dnorm(x_seq, d$b_ols, d$se_b)
    post_y  <- dnorm(x_seq, d$post_mean, d$post_sd)

    ylim <- c(0, max(c(prior_y, lik_y, post_y)) * 1.25)

    plot(NULL, xlim = xlim, ylim = ylim,
         xlab = expression(beta[educ]),
         ylab = "Density",
         main = expression("OLS vs Bayesian: " * beta[educ]))

    # Prior (red dashed)
    lines(x_seq, prior_y, col = "#e74c3c", lwd = 2, lty = 2)

    # Likelihood / OLS (gray dotted)
    lines(x_seq, lik_y, col = "gray50", lwd = 2, lty = 3)

    # Posterior (blue filled)
    polygon(c(x_seq, rev(x_seq)), c(post_y, rep(0, length(x_seq))),
            col = "#3498db30", border = NA)
    lines(x_seq, post_y, col = "#3498db", lwd = 2.5)

    # True value
    abline(v = d$true_beta, lty = 2, col = "#27ae60", lwd = 1.5)

    # OLS CI bracket (red)
    y_ci <- max(post_y) * 0.05
    arrows(d$ci_lo, y_ci, d$ci_hi, y_ci,
           code = 3, angle = 90, length = 0.04, lwd = 2, col = "#e74c3c")
    text(d$b_ols, y_ci + max(post_y) * 0.07, "95% CI", cex = 0.7,
         col = "#e74c3c", font = 2)

    # Bayesian CrI bracket (blue)
    y_cri <- max(post_y) * 0.15
    arrows(d$cri_lo, y_cri, d$cri_hi, y_cri,
           code = 3, angle = 90, length = 0.04, lwd = 2, col = "#3498db")
    text(d$post_mean, y_cri + max(post_y) * 0.07, "95% CrI", cex = 0.7,
         col = "#3498db", font = 2)

    legend("topright", bty = "n", cex = 0.75,
           legend = c("Prior", "OLS (likelihood)",
                      "Posterior", expression("True " * beta[educ])),
           col = c("#e74c3c", "gray50", "#3498db", "#27ae60"),
           lwd = c(2, 2, 2.5, 1.5),
           lty = c(2, 3, 1, 2))
  })

  # --- Bottom: sensitivity to prior strength ---
  output$sensitivity_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 2.5, 1))

    ylim <- range(c(d$sens_lo, d$sens_hi, d$b_ols, d$true_beta))

    plot(d$prior_sd_grid, d$sens_means, type = "l",
         lwd = 2.5, col = "#3498db",
         xlab = expression("Prior SD for " * beta[educ]),
         ylab = expression("Posterior mean of " * beta[educ]),
         main = "Sensitivity: How Much Does the Prior Matter?",
         ylim = ylim)

    # CrI band
    polygon(c(d$prior_sd_grid, rev(d$prior_sd_grid)),
            c(d$sens_lo, rev(d$sens_hi)),
            col = "#3498db15", border = NA)
    lines(d$prior_sd_grid, d$sens_lo, col = "#3498db50", lty = 2)
    lines(d$prior_sd_grid, d$sens_hi, col = "#3498db50", lty = 2)

    # OLS reference
    abline(h = d$b_ols, col = "#e74c3c", lwd = 1.5, lty = 3)

    # True value
    abline(h = d$true_beta, col = "#27ae60", lwd = 1.5, lty = 2)

    # Prior mean reference
    abline(h = d$prior_mean, col = "#f39c12", lwd = 1.5, lty = 3)

    # Current prior SD
    abline(v = d$prior_sd, col = "#2c3e50", lwd = 1, lty = 3)
    points(d$prior_sd, d$post_mean, pch = 16, col = "#2c3e50", cex = 1.5)

    legend("bottomright", bty = "n", cex = 0.75,
           legend = c("Posterior mean", "95% CrI",
                      "OLS estimate", expression("True " * beta),
                      "Prior mean", "You are here"),
           col = c("#3498db", "#3498db50", "#e74c3c",
                   "#27ae60", "#f39c12", "#2c3e50"),
           lwd = c(2.5, 1.5, 1.5, 1.5, 1.5, NA),
           lty = c(1, 2, 3, 2, 3, NA),
           pch = c(NA, NA, NA, NA, NA, 16))
  })

  output$results <- renderUI({
    d <- dat()

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>OLS:</b> ", round(d$b_ols, 4),
        " [", round(d$ci_lo, 4), ", ", round(d$ci_hi, 4), "]<br>",
        "<b>Posterior:</b> ", round(d$post_mean, 4),
        " [", round(d$cri_lo, 4), ", ", round(d$cri_hi, 4), "]<br>",
        "<b>True:</b> ", d$true_beta, "<br>",
        "<hr style='margin:8px 0'>",
        "<b>Prior weight:</b> ", round(d$w_prior * 100, 1), "%<br>",
        "<b>Data weight:</b> ", round(d$w_data * 100, 1), "%<br>",
        "<hr style='margin:8px 0'>",
        "<b>P(&beta; > 0 | data):</b> ",
        round(d$p_positive, 4), "<br>",
        "<b>P(&beta; > 0.05 | data):</b> ",
        round(d$p_above_05, 4)
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

  • Strong correct prior (prior mean = 0.10, prior SD = 0.02, true β = 0.10): the CrI is tighter than the OLS CI. Incorporating good prior information genuinely improves precision. This is the Bayesian advantage.
  • Wrong prior + small n (prior mean = 0.02, true β = 0.10, n = 30): the posterior is pulled toward 0.02 — the prior biases the estimate. Now increase n to 500 — the data overwhelm the wrong prior.
  • Large n (n = 1000): the posterior and OLS converge regardless of the prior. The right panel shows the sensitivity curve flattening — the prior SD barely matters.
  • P(β > 0 | data): this direct probability statement is impossible with frequentist methods. A 95% CI of [0.03, 0.12] doesn’t tell you the probability that β > 0. The posterior does.
  • Right panel insight: as prior SD increases (prior gets vaguer), the posterior mean converges to OLS. The x-axis is “how much you trust the prior” and the curve shows the bias-precision tradeoff.

The Stata workflow

Here’s the complete analysis in Stata, mirroring each step above:

* ----- Step 1: Look at the data -----
summarize wage education experience
scatter wage education

* ----- Step 2: OLS -----
gen log_wage = log(wage)
gen exper2 = experience * experience
reg log_wage education experience exper2

* ----- Step 3: Bayesian with default (vague) priors -----
bayes: reg log_wage education experience exper2

* ----- Step 4: Bayesian with informative prior -----
* Literature says ~10% return, we're fairly confident
bayes, prior({log_wage:education}, normal(0.10, 0.01)) : ///
    reg log_wage education experience exper2

* Note: normal(0.10, 0.01) means prior variance 0.01,
* so prior SD = 0.1 — centered at 10% with moderate spread

* ----- Step 5: Check convergence -----
bayesgraph diagnostics {log_wage:education}
* Look for: stationary trace, high ESS, R-hat near 1

* ----- Step 6: Compare models -----
bayesstats ic
* Compare DIC/WAIC to the default-prior model

Interpreting Stata output: The key columns are Mean (posterior mean), Std. Dev. (posterior SD), and the Equal-tailed credible interval. Compare these directly to the OLS coefficient, standard error, and confidence interval. With vague priors and moderate n, they’ll be nearly identical.


What did we learn?

  1. With a reasonable prior, Bayesian and OLS agree — especially with large samples. The prior adds almost no bias but can reduce uncertainty.

  2. The prior adds value when samples are small or when you have genuine outside information (literature, expert knowledge, previous studies).

  3. The posterior gives direct probability statements that OLS can’t: \(P(\beta > 0.05 \mid \text{data})\) answers the question “how confident should I be that the return exceeds 5%?” directly.

  4. Sensitivity analysis is easy and important. The right panel shows exactly how your conclusions depend on the prior. If the answer doesn’t change much across reasonable priors, your results are robust.

Where to go deeper:


Did you know?

  • Mincer (1974) introduced the log-linear earnings equation in Schooling, Experience, and Earnings. The specification — log wages regressed on education and a quadratic in experience — became the workhorse of labor economics. Nearly every study of returns to education starts from this framework.

  • Card (1999, 2001) used instrumental variables (geographic proximity to colleges) to address the endogeneity of education, finding returns of about 10% per year. His work highlighted that OLS may underestimate the return for marginal students — those on the fence about attending college.

  • Bayesian approaches in development economics: researchers studying returns to education in developing countries often face small samples and noisy data. Informative priors based on findings from similar countries can stabilize estimates — a natural application of the framework you just practiced.