Inverse Probability Weighting

The problem

In observational data, treatment isn’t random. People who get treated differ from those who don’t — they may be older, sicker, richer, etc. A naive comparison of outcomes is biased by these confounders.

The idea

Inverse Probability Weighting (IPW) reweights observations so that the treated and control groups look alike on observed covariates. The steps:

Estimate the propensity score \(e(X) = P(\text{treated} \mid X)\) — the probability of treatment given covariates.
Weight each observation inversely by its probability of receiving the treatment it actually got:
- Treated units get weight \(1 / e(X)\)
- Control units get weight \(1 / (1 - e(X))\)
Compute the weighted difference in means.

Intuition: if a treated person had only a 20% chance of being treated (based on their X), they’re “surprising” — they represent 5 similar people who weren’t treated. So they get upweighted. This creates a pseudo-population where treatment is independent of X.

Assumptions

Unconfoundedness (selection on observables): conditional on observed covariates \(X\), treatment is independent of potential outcomes — \(Y(0), Y(1) \perp D \mid X\). All confounders are observed and included.
Overlap (positivity): every unit has a non-zero probability of being in either group — \(0 < e(X) < 1\) for all \(X\). No one is deterministically treated or untreated.
Correct propensity score model: the functional form of \(e(X)\) is correctly specified. If you use a logit and the true relationship is nonlinear, the weights are wrong.

#| standalone: true
#| viewerHeight: 650

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("n", "Sample size:",
                  min = 200, max = 2000, value = 500, step = 100),

      sliderInput("ate", "True ATE:",
                  min = 0, max = 5, value = 2, step = 0.5),

      sliderInput("confounding", "Confounding strength:",
                  min = 0, max = 3, value = 1.5, step = 0.25),

      actionButton("go", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(6, plotOutput("balance_plot", height = "380px")),
        column(6, plotOutput("ps_plot", height = "380px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n    <- input$n
    ate  <- input$ate
    conf <- input$confounding

    # Confounder
    x <- rnorm(n)

    # Treatment depends on x (confounding)
    p_true <- pnorm(conf * x)
    treat <- rbinom(n, 1, p_true)

    # Outcome depends on x and treatment
    y <- 1 + 2 * x + ate * treat + rnorm(n)

    # Naive estimate
    naive <- mean(y[treat == 1]) - mean(y[treat == 0])

    # IPW estimate
    ps <- fitted(glm(treat ~ x, family = binomial))
    w <- ifelse(treat == 1, 1 / ps, 1 / (1 - ps))
    ipw_est <- weighted.mean(y[treat == 1], w[treat == 1]) -
               weighted.mean(y[treat == 0], w[treat == 0])

    list(x = x, treat = treat, y = y, ps = ps, w = w,
         naive = naive, ipw_est = ipw_est, ate = ate)
  })

  output$balance_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    # Unweighted densities
    x_t <- d$x[d$treat == 1]
    x_c <- d$x[d$treat == 0]

    rng <- range(d$x)
    dens_t <- density(x_t, from = rng[1], to = rng[2])
    dens_c <- density(x_c, from = rng[1], to = rng[2])

    ylim <- c(0, max(dens_t$y, dens_c$y) * 1.2)

    plot(dens_t, col = "#3498db", lwd = 2.5, main = "Covariate Balance (X)",
         xlab = "X (confounder)", ylab = "Density", ylim = ylim)
    lines(dens_c, col = "#e74c3c", lwd = 2.5)

    legend("topright", bty = "n", cex = 0.85,
           legend = c("Treated", "Control"),
           col = c("#3498db", "#e74c3c"), lwd = 2.5)
  })

  output$ps_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    plot(d$x, d$ps, pch = 16, cex = 0.5,
         col = ifelse(d$treat == 1, "#3498db80", "#e74c3c80"),
         xlab = "X (confounder)", ylab = "Propensity score e(X)",
         main = "Propensity Score vs Confounder")
    abline(h = 0.5, lty = 2, col = "gray50")
    legend("topleft", bty = "n", cex = 0.85,
           legend = c("Treated", "Control"),
           col = c("#3498db", "#e74c3c"), pch = 16)
  })

  output$results <- renderUI({
    d <- dat()
    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>True ATE:</b> ", d$ate, "<br>",
        "<b>Naive estimate:</b> <span class='bad'>", round(d$naive, 3), "</span><br>",
        "<b>IPW estimate:</b> <span class='good'>", round(d$ipw_est, 3), "</span><br>",
        "<hr style='margin:8px 0'>",
        "<b>Naive bias:</b> ", round(d$naive - d$ate, 3), "<br>",
        "<b>IPW bias:</b> ", round(d$ipw_est - d$ate, 3)
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

Confounding = 0: treatment is random. Naive and IPW give the same answer.
Confounding = 1.5: the covariate distributions for treated and control diverge (left plot). Naive is biased, IPW corrects it.
Confounding = 3: extreme selection. The propensity scores are near 0 or 1 (right plot), meaning some units get huge weights. IPW becomes noisy — this is the extreme weights problem.

In Stata

* IPW with Stata's teffects
teffects ipw (outcome) (treatment x1 x2)

* Check propensity score overlap
teffects overlap

* Manual approach: estimate propensity score, then weight
logit treatment x1 x2
predict pscore, pr
gen ipw = treatment/pscore + (1-treatment)/(1-pscore)
reg outcome treatment [pw=ipw]

teffects ipw handles everything — propensity score estimation, weighting, and correct standard errors. The manual approach is useful for understanding what’s happening under the hood.