Selection on Observables

The idea

You want the causal effect of a treatment, but people select into treatment based on their characteristics. Sicker patients seek medication, motivated students enroll in programs, richer firms adopt new technology.

The selection on observables strategy says: if you can observe everything that drives both treatment and outcome, you can condition on it and recover the causal effect. Once you hold those variables fixed, treatment is as good as random.

\[Y(0), Y(1) \perp D \mid X\]

This is the conditional independence assumption (CIA), also called unconfoundedness or ignorability. It says: among people with the same \(X\), who gets treated is effectively random.

How is this different from “just running a regression”?

It’s the same logic, made precise. When you run \(Y = \alpha + \tau D + \beta X + \varepsilon\) and claim \(\hat{\tau}\) is causal, you’re implicitly assuming selection on observables — that \(X\) contains all the confounders. The difference is that causal inference makes this assumption explicit and offers multiple ways to implement it, each with different strengths:

Method How it adjusts for X
Regression Adjustment Models the outcome as a function of X and D
Matching Pairs treated and control units with similar X
IPW Reweights units by their probability of treatment given X
Entropy Balancing Directly reweights controls to match treated group’s X distribution
Doubly robust Combines regression and weighting — consistent if either model is correct

All of these rely on the same fundamental assumption. They differ in how they use X to make the comparison fair.

Assumptions

  1. Conditional independence (CIA): all confounders are observed and included in X. If an unobserved variable affects both treatment and outcome, every method above is biased. This is untestable — you argue it based on institutional knowledge.
  2. Overlap (common support): for every value of X, there are both treated and untreated units — \(0 < P(D = 1 \mid X) < 1\). If some covariate profiles always get treated, you can’t estimate the counterfactual for them.
  3. SUTVA: one unit’s treatment doesn’t affect another’s outcome.

When does selection on observables fail?

When there are unobserved confounders — variables that affect both treatment and outcome but aren’t in your data. No amount of regression, matching, or weighting can fix this.

Examples:

  • Returns to education: ability is unobserved. More able people get more education and earn more. Controlling for test scores helps but doesn’t fully capture ability. → You need IV.
  • Effect of a new policy: states that adopt the policy may differ in unobservable ways (political will, citizen preferences). → You need DID or synthetic control.
  • Effect of a drug: patients who take the drug may be sicker in ways the chart doesn’t capture. → You need an RCT.

The simulation below shows what happens when the CIA holds vs when it doesn’t.

#| standalone: true
#| viewerHeight: 620

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("n_so", "Sample size:",
                  min = 200, max = 2000, value = 500, step = 100),

      sliderInput("ate_so", "True ATE:",
                  min = 0, max = 5, value = 2, step = 0.5),

      sliderInput("obs_conf", "Observed confounding (X):",
                  min = 0, max = 3, value = 1.5, step = 0.25),

      sliderInput("unobs_conf", "Unobserved confounding (U):",
                  min = 0, max = 3, value = 0, step = 0.25),

      actionButton("go_so", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("results_so")
    ),

    mainPanel(
      width = 9,
      plotOutput("so_plot", height = "450px")
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go_so
    n   <- input$n_so
    ate <- input$ate_so
    gx  <- input$obs_conf
    gu  <- input$unobs_conf

    # Observed confounder
    x <- rnorm(n)

    # Unobserved confounder
    u <- rnorm(n)

    # Treatment depends on both
    p <- pnorm(gx * x + gu * u)
    treat <- rbinom(n, 1, p)

    # Outcome depends on both
    y <- 1 + 2 * x + 1.5 * u + ate * treat + rnorm(n)

    # Naive (no controls)
    naive <- coef(lm(y ~ treat))[2]

    # Controlling for X only
    ctrl_x <- coef(lm(y ~ treat + x))[2]

    # Oracle: controlling for X and U
    oracle <- coef(lm(y ~ treat + x + u))[2]

    list(x = x, u = u, treat = treat, y = y,
         naive = naive, ctrl_x = ctrl_x, oracle = oracle,
         ate = ate, gx = gx, gu = gu)
  })

  output$so_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    estimates <- c(d$naive, d$ctrl_x, d$oracle)
    biases <- estimates - d$ate
    labels <- c("Naive\n(no controls)", "Control for X\n(observed)", "Control for X + U\n(oracle)")
    cols <- c("#e74c3c", ifelse(abs(biases[2]) < 0.3, "#27ae60", "#f39c12"), "#27ae60")

    bp <- barplot(estimates, col = cols, border = NA,
                  names.arg = labels, cex.names = 0.85,
                  main = "Estimated Treatment Effect by Method",
                  ylab = "Estimate", ylim = c(0, max(estimates) * 1.4))

    # True ATE line
    abline(h = d$ate, lty = 2, col = "gray40", lwd = 2)
    text(0.2, d$ate + 0.15, paste0("True ATE = ", d$ate),
         col = "gray40", cex = 0.85, adj = 0)

    # Bias labels
    text(bp, estimates + 0.15,
         paste0(round(estimates, 2), "\n(bias: ", round(biases, 2), ")"),
         cex = 0.8)
  })

  output$results_so <- renderUI({
    d <- dat()
    bias_naive <- d$naive - d$ate
    bias_x <- d$ctrl_x - d$ate
    bias_oracle <- d$oracle - d$ate
    cia_holds <- d$gu == 0

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>True ATE:</b> ", d$ate, "<br>",
        "<hr style='margin:6px 0'>",
        "<b>Naive:</b> ", round(d$naive, 2),
        " <span class='bad'>(bias: ", round(bias_naive, 2), ")</span><br>",
        "<b>Control X:</b> ", round(d$ctrl_x, 2),
        " <span class='", ifelse(cia_holds, "good", "bad"), "'>(bias: ",
        round(bias_x, 2), ")</span><br>",
        "<b>Oracle (X+U):</b> ", round(d$oracle, 2),
        " <span class='good'>(bias: ", round(bias_oracle, 2), ")</span><br>",
        "<hr style='margin:6px 0'>",
        if (cia_holds)
          "<small>CIA holds: controlling for X is enough.</small>"
        else
          "<small>CIA violated: U confounds treatment. Controlling for X alone leaves residual bias.</small>"
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

  • Unobserved confounding = 0: the CIA holds. Controlling for X eliminates all bias — the green and oracle bars match. This is the world where selection on observables works.
  • Unobserved confounding = 1.5: now there’s a confounder you can’t see. Controlling for X helps (reduces bias vs naive) but doesn’t eliminate it. Only the oracle, who controls for both X and U, gets the right answer.
  • Unobserved confounding = 3: controlling for X barely helps. The bias is large. No amount of regression, matching, or weighting on X can fix this — you need a different identification strategy.
  • Set observed confounding = 0, unobserved = 2: all the confounding is unobserved. Naive and “control for X” give the same (biased) answer because X isn’t a confounder here.

Estimation tools (not just for selection on observables)

Once you have an identification strategy, you need a way to implement it. The tools below are often associated with selection on observables, but they’re general-purpose — they show up in other strategies too.

Tool Used in selection on observables Also used in
Regression adjustment Control for X in a regression DID with covariates, RDD with covariates
Matching Pair treated/control on X DID matching estimators
IPW Reweight by propensity score IPW-DID (Abadie 2005, Callaway & Sant’Anna 2021)
Entropy Balancing Balance covariates with weights Weighted DID
Doubly robust Combine regression + weighting DR-DID (Sant’Anna & Zhao 2020)

The identification strategy tells you why your comparison is valid. The estimation tool tells you how to make the comparison. Don’t confuse the two — IPW is a tool, not a strategy.


In Stata

All SOO estimators live under teffects:

* Regression adjustment
teffects ra (outcome x1 x2) (treatment)

* Inverse probability weighting
teffects ipw (outcome) (treatment x1 x2)

* Nearest-neighbor matching
teffects nnmatch (outcome x1 x2) (treatment), nneighbor(1)

* Doubly robust (AIPW)
teffects aipw (outcome x1 x2) (treatment x1 x2)

* Check overlap after any teffects command
teffects overlap

Same identification assumption (CIA) behind all of them — different ways to use \(X\) to make the comparison fair. See each page for details.


Did you know?

  • The conditional independence assumption was formalized by Rosenbaum & Rubin (1983) in their foundational paper on propensity scores. They showed that conditioning on a scalar propensity score is sufficient — you don’t need to match on every covariate separately.

  • The term “selection on observables” is economics jargon. In statistics it’s called ignorability or no unmeasured confounding. In epidemiology it’s the exchangeability assumption. Same idea, different fields, different names.

  • Altonji, Elder & Taber (2005) proposed a practical check: compare how much the estimate changes when you add observed controls. If adding strong predictors of the outcome barely moves the estimate, it’s less likely that unobservables would change it much either. Not a proof — but a useful heuristic.