Potential Outcomes & ATE

The fundamental problem of causal inference

For every person, there are two potential outcomes:

  • \(Y_i(1)\): what happens if they get the treatment
  • \(Y_i(0)\): what happens if they don’t

The individual treatment effect is \(\tau_i = Y_i(1) - Y_i(0)\). The problem? We only ever observe one of these. A person is either treated or not — never both. The unobserved outcome is the counterfactual.

The Average Treatment Effect (ATE) is:

\[\text{ATE} = E[Y(1) - Y(0)]\]

Since we can’t observe both for anyone, we need assumptions (like random assignment) to estimate it.

Assumptions

  1. SUTVA (Stable Unit Treatment Value Assumption): one person’s treatment doesn’t affect another person’s outcome — no interference between units
  2. Consistency: the observed outcome for a treated person equals their potential outcome under treatment, \(Y_i = Y_i(1)\) if treated
  3. Random assignment (for unbiased estimation): treatment is independent of potential outcomes — \(Y(0), Y(1) \perp D\)

Why randomization works

Start with the only thing we can compute from data — the difference in observed group means:

\[E[Y \mid D=1] - E[Y \mid D=0]\]

By consistency (\(Y_i = D_i Y_i(1) + (1 - D_i) Y_i(0)\)), this equals:

\[E[Y(1) \mid D=1] - E[Y(0) \mid D=0]\]

Now add and subtract \(E[Y(0) \mid D=1]\):

\[\underbrace{E[Y(1) \mid D=1] - E[Y(0) \mid D=1]}_{\text{ATT}} + \underbrace{E[Y(0) \mid D=1] - E[Y(0) \mid D=0]}_{\text{Selection bias}}\]

This is the fundamental decomposition:

\[\text{Difference in means} = \text{ATT} + \text{Selection bias}\]

The selection bias term asks: would the treated group have had different outcomes even without treatment? If sicker people seek treatment, then \(E[Y(0) \mid D=1] < E[Y(0) \mid D=0]\) — the treated group would have done worse anyway — and the naive comparison underestimates the effect.

Example: the college earnings premium. People who go to college earn more than those who don’t. But is that because college causes higher earnings, or because the same people who go to college — smart, motivated, from wealthier families — would have earned more anyway? That’s selection bias: \(E[Y(0) \mid D=1] > E[Y(0) \mid D=0]\). College-goers would have out-earned non-college-goers even without college. The naive earnings gap overestimates the causal effect of college because it bundles the true effect with the selection bias.

Under random assignment, selection bias = 0

If \(D\) is independent of potential outcomes — \(Y(0), Y(1) \perp D\) — then conditioning on \(D\) doesn’t matter:

\[E[Y(0) \mid D=1] = E[Y(0) \mid D=0] = E[Y(0)]\]

The selection bias term is exactly zero. And the ATT simplifies:

\[\text{ATT} = E[Y(1) \mid D=1] - E[Y(0) \mid D=1] = E[Y(1)] - E[Y(0)] = \text{ATE}\]

So under random assignment:

\[\boxed{\text{Difference in means} = \text{ATT} = \text{ATE}}\]

The naive estimator — just comparing group averages — gives you the causal effect. No modeling, no assumptions about functional form. That’s why randomization is the gold standard.

Without randomization, the selection bias term is generally nonzero, and the difference in means ≠ ATE. Every method in this course is a strategy for eliminating or working around that selection bias.

The simulation below lets you see both potential outcomes (which you never get in real life), watch selection bias appear, and see how randomization kills it.

#| standalone: true
#| viewerHeight: 620

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("n", "Population size:",
                  min = 100, max = 2000, value = 500, step = 100),

      sliderInput("ate", "True ATE:",
                  min = -2, max = 5, value = 2, step = 0.5),

      selectInput("assign", "Treatment assignment:",
                  choices = c("Random (coin flip)",
                              "Self-selection (high Y0 seek treatment)",
                              "Self-selection (low Y0 seek treatment)")),

      actionButton("go", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(6, plotOutput("po_plot", height = "400px")),
        column(6, plotOutput("obs_plot", height = "400px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n   <- input$n
    ate <- input$ate

    # Potential outcomes
    y0 <- rnorm(n, mean = 5, sd = 2)
    y1 <- y0 + ate + rnorm(n, sd = 0.5)

    # Assignment
    if (input$assign == "Random (coin flip)") {
      treat <- rbinom(n, 1, 0.5)
    } else if (input$assign == "Self-selection (high Y0 seek treatment)") {
      prob <- pnorm(y0, mean = mean(y0), sd = sd(y0))
      treat <- rbinom(n, 1, prob)
    } else {
      prob <- 1 - pnorm(y0, mean = mean(y0), sd = sd(y0))
      treat <- rbinom(n, 1, prob)
    }

    # Observed outcome
    y_obs <- ifelse(treat == 1, y1, y0)

    # Estimates
    naive <- mean(y_obs[treat == 1]) - mean(y_obs[treat == 0])
    true_ate <- mean(y1 - y0)

    list(y0 = y0, y1 = y1, treat = treat, y_obs = y_obs,
         naive = naive, true_ate = true_ate, ate = ate)
  })

  output$po_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))
    plot(d$y0, d$y1, pch = 16, cex = 0.6,
         col = ifelse(d$treat == 1, "#3498db80", "#e74c3c80"),
         xlab = "Y(0) — outcome without treatment",
         ylab = "Y(1) — outcome with treatment",
         main = "Both Potential Outcomes (God's view)")
    abline(0, 1, lty = 2, col = "gray40", lwd = 1.5)
    legend("topleft", bty = "n", cex = 0.85,
           legend = c("Treated", "Control", "45° line (no effect)"),
           col = c("#3498db", "#e74c3c", "gray40"),
           pch = c(16, 16, NA), lty = c(NA, NA, 2), lwd = c(NA, NA, 1.5))
  })

  output$obs_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    grp <- factor(d$treat, labels = c("Control", "Treated"))
    boxplot(d$y_obs ~ grp,
            col = c("#e74c3c40", "#3498db40"),
            border = c("#e74c3c", "#3498db"),
            main = "What we actually observe",
            ylab = "Observed Y", xlab = "")

    m0 <- mean(d$y_obs[d$treat == 0])
    m1 <- mean(d$y_obs[d$treat == 1])
    points(1:2, c(m0, m1), pch = 18, cex = 2.5, col = c("#e74c3c", "#3498db"))
  })

  output$results <- renderUI({
    d <- dat()
    bias <- d$naive - d$true_ate
    biased <- abs(bias) > 0.3

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>True ATE:</b> ", round(d$true_ate, 3), "<br>",
        "<b>Naive estimate:</b> ", round(d$naive, 3), "<br>",
        "<b>Bias:</b> <span class='", ifelse(biased, "bad", "good"), "'>",
        round(bias, 3), "</span><br>",
        if (biased) "<br><small>Selection bias: treated & control groups aren't comparable.</small>"
        else "<br><small>Random assignment makes groups comparable.</small>"
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

  • Start with random assignment: the naive estimate is close to the true ATE.
  • Switch to self-selection (high Y₀ seek treatment): people who would have done well anyway are the ones getting treated. The naive estimate is too high — that’s positive selection bias.
  • Switch to self-selection (low Y₀ seek treatment): now the opposite. Sicker people seek treatment, making it look less effective than it is.
  • The left plot shows both potential outcomes — something you never see in real data. That’s the fundamental problem.

What if you can’t randomize?

In a true experiment (RCT):

  • You randomly assign people to treatment vs control
  • Because it’s random, the two groups are identical on average — so any difference in outcomes must be caused by the treatment

But most questions in economics, policy, and social science can’t be answered with an RCT. You can’t randomly assign poverty, or force some cities to build highways and others not to. So how do you estimate causal effects?

Natural experiments

A natural experiment is when something in the real world — a policy change, a rule, a geographic boundary, a disaster — creates treatment and control groups that are as-if randomly assigned. Nobody designed it as an experiment, but the logic is the same.

Example: Say the government announced “all tracts in counties with food desert score > X get a healthy food program.” Tracts at X+1 vs X−1 didn’t choose to be on different sides of that line — the cutoff did it for them. So comparing those tracts is like comparing treatment and control in an experiment.

  • The word “natural” = it happened in the real world, not in a lab.
  • The word “experiment” = it created as-if random variation in who got treated.

The whole point is to get around the selection bias problem the simulation above shows. If people (or firms, or cities) choose their treatment status, the naive comparison is biased. A natural experiment gives you variation that the units didn’t choose.

The rest of this course

Every method in this course is a strategy for exploiting natural experiments or otherwise correcting for selection bias:

Method The idea
Selection on Observables Condition on everything that drives both treatment and outcome
IPW Reweight observations so treated and control look similar on observables
Entropy Balancing Directly balance covariates between groups without modeling the propensity score
Difference-in-Differences Compare changes over time between treated and control groups
Instrumental Variables Use exogenous variation to isolate the causal effect
Regression Discontinuity Exploit a cutoff rule as a natural experiment
Synthetic Control Build a weighted counterfactual from donor units

Each one makes a different assumption about why the comparison is valid. The art of causal inference is choosing the right method for your setting — and being honest about when the assumptions fail.


In Stata

The simplest causal estimate — a difference in means under random assignment:

* Difference in means (t-test)
ttest outcome, by(treatment)

* Same thing as a regression (identical point estimate)
reg outcome treatment

* With controls (gains precision in an RCT, required under SOO)
reg outcome treatment x1 x2 x3

With random assignment, reg outcome treatment gives you an unbiased estimate of the ATE. The coefficient on treatment is the causal effect. Adding controls doesn’t change the estimate (in expectation) but can shrink the standard errors.