The Oregon Experiment

A natural randomized experiment

In 2008, Oregon wanted to expand its Medicaid program but had limited funds. The state opened a waiting list and received about 90,000 names for roughly 10,000 slots. Oregon used a lottery to determine who got the chance to enroll. This created a natural randomized experiment: lottery winners had the opportunity to gain Medicaid coverage, lottery losers did not.

Finkelstein et al. (2012) and a series of follow-up papers used this lottery to estimate the causal effect of Medicaid on a range of outcomes.

Key findings

Healthcare utilization. Medicaid increased overall utilization — more doctor visits, more prescriptions, more preventive care. Notably, it increased emergency room visits by about 40%, contrary to the expectation that insurance would reduce ER use by shifting care to primary care settings. This finding (Taubman et al., 2014) was politically inconvenient and widely debated.

Financial strain. Medicaid substantially reduced financial hardship. Catastrophic out-of-pocket spending fell, medical debt declined, and the probability of having a medical bill sent to collections dropped sharply.

Self-reported health. Medicaid improved self-reported health and reduced depression (a clinically significant effect).

Measured physical health. The most controversial finding: Medicaid had no statistically significant effect on measured physical health markers — blood pressure, cholesterol, or glycated hemoglobin (diabetes control) — after about two years. This null result was widely debated: does it mean Medicaid does not improve health, or was the study underpowered to detect clinically meaningful effects?

Methodology: ITT vs LATE

The lottery was not a simple randomization of Medicaid coverage. Winning the lottery gave you the opportunity to apply for Medicaid — but not everyone who won actually enrolled (some did not complete the application, some turned out to be ineligible). This is noncompliance.

The relevant estimands are:

  • Intent-to-Treat (ITT): the effect of winning the lottery on the outcome. This is straightforward to estimate and valid regardless of noncompliance.
  • First stage: the effect of winning the lottery on Medicaid enrollment. This is the compliance rate.
  • Local Average Treatment Effect (LATE): the effect of actually enrolling in Medicaid on the outcome, estimated via instrumental variables. LATE = ITT / First Stage.

LATE gives us the effect for compliers — people who enrolled when they won but would not have enrolled otherwise. It does not tell us about always-takers (who would have found insurance anyway) or never-takers (who would not enroll even if selected).

#| standalone: true
#| viewerHeight: 700

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
    .info-box {
      background: #ebf5fb; border-radius: 6px; padding: 12px;
      margin-top: 10px; font-size: 13px;
    }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("comply", "Compliance rate (%):",
                  min = 10, max = 100, value = 30, step = 5),

      sliderInput("true_late", "True LATE of Medicaid ($):",
                  min = -500, max = 2000, value = 500, step = 50),

      sliderInput("n_exp", "Sample size:",
                  min = 500, max = 20000, value = 5000, step = 500),

      actionButton("resim_exp", "New draw", class = "btn-primary", width = "100%"),

      tags$hr(),
      uiOutput("info_box")
    ),

    mainPanel(
      width = 9,
      plotOutput("iv_plot", height = "280px"),
      plotOutput("power_plot", height = "300px")
    )
  )
)

server <- function(input, output, session) {

  sim <- reactive({
    input$resim_exp
    comp   <- input$comply / 100
    late   <- input$true_late
    n      <- input$n_exp

    set.seed(sample(1:10000, 1))

    # Generate data
    # Z = lottery win (randomized)
    Z <- rbinom(n, 1, 0.5)

    # D = Medicaid enrollment
    # Compliers: D = Z; Always-takers: D = 1; Never-takers: D = 0
    # For simplicity: P(D=1|Z=1) = comp, P(D=1|Z=0) = 0.05 (some always-takers)
    at_rate <- 0.05  # always-taker rate
    D <- rep(0, n)
    D[Z == 1] <- rbinom(sum(Z == 1), 1, comp)
    D[Z == 0] <- rbinom(sum(Z == 0), 1, at_rate)

    # Y = outcome (e.g., healthcare spending)
    # Y = baseline + LATE * D + noise
    baseline <- 3000
    noise_sd <- 5000
    Y <- baseline + late * D + rnorm(n, 0, noise_sd)

    # ITT: effect of Z on Y
    itt_est <- mean(Y[Z == 1]) - mean(Y[Z == 0])
    itt_se  <- sqrt(var(Y[Z == 1]) / sum(Z == 1) + var(Y[Z == 0]) / sum(Z == 0))
    itt_t   <- itt_est / itt_se
    itt_p   <- 2 * (1 - pnorm(abs(itt_t)))

    # First stage: effect of Z on D
    fs_est <- mean(D[Z == 1]) - mean(D[Z == 0])
    fs_se  <- sqrt(var(D[Z == 1]) / sum(Z == 1) + var(D[Z == 0]) / sum(Z == 0))
    fs_t   <- fs_est / fs_se
    fs_f   <- fs_t^2  # F-stat

    # LATE = ITT / first stage
    late_est <- itt_est / fs_est
    # Approximate SE via delta method
    late_se  <- abs(itt_se / fs_est)  # simplified
    late_t   <- late_est / late_se
    late_p   <- 2 * (1 - pnorm(abs(late_t)))

    # CI
    itt_ci  <- itt_est + c(-1, 1) * 1.96 * itt_se
    late_ci <- late_est + c(-1, 1) * 1.96 * late_se

    list(itt_est = itt_est, itt_se = itt_se, itt_p = itt_p, itt_ci = itt_ci,
         fs_est = fs_est, fs_se = fs_se, fs_f = fs_f,
         late_est = late_est, late_se = late_se, late_p = late_p, late_ci = late_ci,
         true_late = late, true_itt = late * (comp - at_rate),
         comp = comp, n = n, Z = Z, D = D, Y = Y)
  })

  output$iv_plot <- renderPlot({
    s <- sim()

    par(mfrow = c(1, 3), mar = c(4, 4.5, 3, 1))

    # Panel 1: ITT
    cols_itt <- c("#3498db", "#e74c3c")
    means_itt <- c(mean(s$Y[s$Z == 0]), mean(s$Y[s$Z == 1]))
    bp1 <- barplot(means_itt / 1000, names.arg = c("Lost", "Won"),
            col = cols_itt, main = "ITT: Lottery -> Outcome",
            ylab = "Mean outcome ($1000s)",
            ylim = c(0, max(means_itt) / 1000 * 1.3),
            cex.lab = 1)
    text(bp1, means_itt / 1000 + max(means_itt) / 1000 * 0.05,
         paste0("$", round(means_itt)), cex = 0.8, font = 2)

    # Bracket with ITT
    mid_x <- mean(bp1)
    top_y <- max(means_itt) / 1000 * 1.15
    segments(bp1[1], top_y, bp1[2], top_y, lwd = 2, col = "#2c3e50")
    text(mid_x, top_y + max(means_itt) / 1000 * 0.05,
         paste0("ITT = $", round(s$itt_est)),
         col = "#2c3e50", font = 2, cex = 0.85)

    # Panel 2: First stage
    enroll_rates <- c(mean(s$D[s$Z == 0]), mean(s$D[s$Z == 1]))
    bp2 <- barplot(enroll_rates * 100, names.arg = c("Lost", "Won"),
            col = cols_itt, main = "First Stage: Lottery -> Enrollment",
            ylab = "Enrollment rate (%)",
            ylim = c(0, 100),
            cex.lab = 1)
    text(bp2, enroll_rates * 100 + 5,
         paste0(round(enroll_rates * 100, 1), "%"), cex = 0.8, font = 2)

    # F-stat
    f_col <- ifelse(s$fs_f > 10, "#27ae60", "#e74c3c")
    text(mean(bp2), 90,
         paste0("F = ", round(s$fs_f, 1)),
         col = f_col, font = 2, cex = 1)

    # Panel 3: LATE
    est_vals <- c(s$true_late, s$late_est)
    est_cols <- c("#7f8c8d", "#2c3e50")
    bp3 <- barplot(est_vals, names.arg = c("True LATE", "Estimated\nLATE"),
            col = est_cols, main = "LATE = ITT / First Stage",
            ylab = "Effect on outcome ($)",
            ylim = c(min(0, min(est_vals) * 1.3),
                     max(est_vals) * 1.3),
            cex.lab = 1)
    text(bp3, est_vals + max(abs(est_vals)) * 0.08,
         paste0("$", round(est_vals)), cex = 0.8, font = 2)

    # Error bars on LATE estimate
    arrows(bp3[2], s$late_ci[1], bp3[2], s$late_ci[2],
           code = 3, angle = 90, length = 0.1, lwd = 2, col = "#2c3e50")
  })

  output$power_plot <- renderPlot({
    s <- sim()

    # Show how LATE precision depends on compliance rate
    comp_seq <- seq(0.1, 1, by = 0.05)
    true_itt <- s$true_late * (comp_seq - 0.05)
    noise_sd <- 5000

    # SE of ITT is roughly constant
    itt_se_approx <- noise_sd * sqrt(4 / s$n)
    late_se_approx <- itt_se_approx / (comp_seq - 0.05)
    late_se_approx[late_se_approx < 0 | late_se_approx > 50000] <- NA

    par(mfrow = c(1, 2), mar = c(5, 5, 3, 2))

    # Panel 1: LATE SE vs compliance
    plot(comp_seq * 100, late_se_approx, type = "l", lwd = 3, col = "#e74c3c",
         xlab = "Compliance rate (%)",
         ylab = "SE of LATE estimate ($)",
         main = "LATE Precision vs Compliance",
         ylim = c(0, min(5000, max(late_se_approx, na.rm = TRUE))),
         cex.lab = 1.1)

    # Mark current compliance
    curr_se <- itt_se_approx / max(0.01, s$comp - 0.05)
    if (curr_se < 5000) {
      points(s$comp * 100, curr_se, pch = 19, cex = 2, col = "#2c3e50")
      text(s$comp * 100 + 3, curr_se,
           paste0("SE = $", round(curr_se)),
           col = "#2c3e50", cex = 0.85, adj = 0)
    }

    abline(h = abs(s$true_late), lty = 2, col = "#27ae60", lwd = 1.5)
    text(80, abs(s$true_late) + 100, "|True LATE|", col = "#27ae60", cex = 0.8)

    # Panel 2: CI width vs compliance
    ci_width <- 2 * 1.96 * late_se_approx
    plot(comp_seq * 100, ci_width, type = "l", lwd = 3, col = "#3498db",
         xlab = "Compliance rate (%)",
         ylab = "95% CI width ($)",
         main = "Confidence Interval Width",
         ylim = c(0, min(15000, max(ci_width, na.rm = TRUE))),
         cex.lab = 1.1)

    curr_ci_w <- 2 * 1.96 * curr_se
    if (curr_ci_w < 15000) {
      points(s$comp * 100, curr_ci_w, pch = 19, cex = 2, col = "#2c3e50")
      text(s$comp * 100 + 3, curr_ci_w,
           paste0("Width = $", round(curr_ci_w)),
           col = "#2c3e50", cex = 0.85, adj = 0)
    }

    # Show that CI must be narrower than 2*|LATE| to be "significant"
    abline(h = 2 * abs(s$true_late), lty = 2, col = "#e74c3c", lwd = 1.5)
    text(80, 2 * abs(s$true_late) + 200,
         "Need CI < this for significance", col = "#e74c3c", cex = 0.75)
  })

  output$info_box <- renderUI({
    s <- sim()

    sig_itt  <- ifelse(s$itt_p < 0.05, "<span class='good'>Yes</span>",
                       "<span class='bad'>No</span>")
    sig_late <- ifelse(s$late_p < 0.05, "<span class='good'>Yes</span>",
                       "<span class='bad'>No</span>")
    strong_iv <- ifelse(s$fs_f > 10, "<span class='good'>Yes (F > 10)</span>",
                        "<span class='bad'>No (F < 10)</span>")

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>ITT:</b> $", round(s$itt_est),
        " (SE ", round(s$itt_se), ")<br>",
        "Significant? ", sig_itt, "<br>",
        "<hr style='margin:8px 0'>",
        "<b>First stage:</b> ", round(s$fs_est, 3), "<br>",
        "Strong instrument? ", strong_iv, "<br>",
        "<hr style='margin:8px 0'>",
        "<b>LATE:</b> $", round(s$late_est),
        " (SE ", round(s$late_se), ")<br>",
        "Significant? ", sig_late, "<br>",
        "True LATE: $", s$true_late, "<br>",
        "<hr style='margin:8px 0'>",
        "95% CI: [$", round(s$late_ci[1]),
        ", $", round(s$late_ci[2]), "]"
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

  • Set compliance to 30% (close to Oregon’s actual rate): the LATE is about 3x the ITT, and the confidence interval is wide. This is why the Oregon study struggled with power for some outcomes.
  • Set compliance to 100%: LATE = ITT. No amplification needed, tight CI.
  • Set compliance to 10%: the LATE blows up — dividing by a tiny first stage makes the estimate extremely noisy. This is the weak instrument problem applied to noncompliance.
  • Set true LATE to 0 and draw multiple samples: sometimes the LATE looks “significant” by chance, but on average it should be zero.
  • Increase sample size to 20,000: even with low compliance, precision improves. Oregon had ~75,000 people, which is why it could detect some effects despite low compliance.
  • Notice the bottom panels: they show that LATE precision deteriorates rapidly as compliance falls. This is the fundamental challenge of experiments with noncompliance.

The debate over null health results

The Oregon experiment’s null finding on physical health measures sparked intense debate:

“Medicaid doesn’t work” interpretation. Some argued that if Medicaid does not improve blood pressure, cholesterol, or diabetes control, it is not worth expanding. This interpretation was used by ACA opponents.

Power critique. Baicker et al. (2013) noted that the study could not rule out clinically meaningful improvements. The confidence intervals for blood pressure included reductions of 2–3 mm Hg — effects that would be important at the population level. The study was underpowered for physical health outcomes, partly because of low compliance (~30%) and partly because the sample was relatively young and healthy.

Timeframe critique. Two years may not be long enough. Chronic disease management requires sustained access to care. The benefits of blood pressure control accumulate over decades, not months.

The right outcomes. Mental health improved significantly. Financial strain fell dramatically. If the goal of insurance is to provide financial protection and peace of mind — not just to change clinical biomarkers — then the Oregon results are consistent with insurance “working.”

This debate mirrors the broader question of what insurance is for. Is it a health intervention (measured by clinical outcomes) or a financial product (measured by risk protection)? The answer has major implications for Medicaid expansion policy. See Moral Hazard for the welfare theory behind this distinction.


Did you know?

  • About 90,000 people signed up for Oregon’s Medicaid lottery in one week — for roughly 10,000 slots. This massive oversubscription is what made the experiment possible: it created a large randomized control group.

  • Taubman et al. (2014) found that Medicaid increased ER visits by about 40%. This contradicted the common policy argument that expanding insurance would reduce ER use by giving people a “medical home” in primary care. The result suggests that insurance reduces the cost barrier to all types of care, including the ER — consistent with standard price theory and the moral hazard predictions discussed on the Moral Hazard page.

  • The debate about whether the null physical health results mean Medicaid “doesn’t work” is partly a statistical power issue. The study had a compliance rate of only about 25–30% (many lottery winners did not enroll). Using the IV/LATE framework, this means dividing a small ITT by a small first stage, producing wide confidence intervals. A larger sample or higher compliance could have detected smaller effects.

  • Amy Finkelstein, the lead researcher, has emphasized that the Oregon results should not be interpreted as “insurance doesn’t matter.” The financial protection and mental health benefits were large and statistically significant. The null results were specific to a handful of measured physical health biomarkers over a short follow-up period.