Inverse Probability Weighting

The problem

In observational data, treatment isn’t random. People who get treated differ from those who don’t — they may be older, sicker, richer, etc. A naive comparison of outcomes is biased by these confounders.

The idea

Inverse Probability Weighting (IPW) reweights observations so that the treated and control groups look alike on observed covariates. The steps:

Estimate the propensity score \(e(X) = P(\text{treated} \mid X)\) — the probability of treatment given covariates.
Weight each observation inversely by its probability of receiving the treatment it actually got:
- Treated units get weight \(1 / e(X)\)
- Control units get weight \(1 / (1 - e(X))\)
Compute the weighted difference in means.

Intuition: if a treated person had only a 20% chance of being treated (based on their X), they’re “surprising” — they represent 5 similar people who weren’t treated. So they get upweighted. This creates a pseudo-population where treatment is independent of X.

#| '!! shinylive warning !!': |
#|   shinylive does not work in self-contained HTML documents.
#|   Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 650

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("n", "Sample size:",
                  min = 200, max = 2000, value = 500, step = 100),

      sliderInput("ate", "True ATE:",
                  min = 0, max = 5, value = 2, step = 0.5),

      sliderInput("confounding", "Confounding strength:",
                  min = 0, max = 3, value = 1.5, step = 0.25),

      actionButton("go", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(6, plotOutput("balance_plot", height = "380px")),
        column(6, plotOutput("ps_plot", height = "380px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n    <- input$n
    ate  <- input$ate
    conf <- input$confounding

    # Confounder
    x <- rnorm(n)

    # Treatment depends on x (confounding)
    p_true <- pnorm(conf * x)
    treat <- rbinom(n, 1, p_true)

    # Outcome depends on x and treatment
    y <- 1 + 2 * x + ate * treat + rnorm(n)

    # Naive estimate
    naive <- mean(y[treat == 1]) - mean(y[treat == 0])

    # IPW estimate
    ps <- fitted(glm(treat ~ x, family = binomial))
    w <- ifelse(treat == 1, 1 / ps, 1 / (1 - ps))
    ipw_est <- weighted.mean(y[treat == 1], w[treat == 1]) -
               weighted.mean(y[treat == 0], w[treat == 0])

    list(x = x, treat = treat, y = y, ps = ps, w = w,
         naive = naive, ipw_est = ipw_est, ate = ate)
  })

  output$balance_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    # Unweighted densities
    x_t <- d$x[d$treat == 1]
    x_c <- d$x[d$treat == 0]

    rng <- range(d$x)
    dens_t <- density(x_t, from = rng[1], to = rng[2])
    dens_c <- density(x_c, from = rng[1], to = rng[2])

    ylim <- c(0, max(dens_t$y, dens_c$y) * 1.2)

    plot(dens_t, col = "#3498db", lwd = 2.5, main = "Covariate Balance (X)",
         xlab = "X (confounder)", ylab = "Density", ylim = ylim)
    lines(dens_c, col = "#e74c3c", lwd = 2.5)

    legend("topright", bty = "n", cex = 0.85,
           legend = c("Treated", "Control"),
           col = c("#3498db", "#e74c3c"), lwd = 2.5)
  })

  output$ps_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    plot(d$x, d$ps, pch = 16, cex = 0.5,
         col = ifelse(d$treat == 1, "#3498db80", "#e74c3c80"),
         xlab = "X (confounder)", ylab = "Propensity score e(X)",
         main = "Propensity Score vs Confounder")
    abline(h = 0.5, lty = 2, col = "gray50")
    legend("topleft", bty = "n", cex = 0.85,
           legend = c("Treated", "Control"),
           col = c("#3498db", "#e74c3c"), pch = 16)
  })

  output$results <- renderUI({
    d <- dat()
    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>True ATE:</b> ", d$ate, "<br>",
        "<b>Naive estimate:</b> <span class='bad'>", round(d$naive, 3), "</span><br>",
        "<b>IPW estimate:</b> <span class='good'>", round(d$ipw_est, 3), "</span><br>",
        "<hr style='margin:8px 0'>",
        "<b>Naive bias:</b> ", round(d$naive - d$ate, 3), "<br>",
        "<b>IPW bias:</b> ", round(d$ipw_est - d$ate, 3)
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

Confounding = 0: treatment is random. Naive and IPW give the same answer.
Confounding = 1.5: the covariate distributions for treated and control diverge (left plot). Naive is biased, IPW corrects it.
Confounding = 3: extreme selection. The propensity scores are near 0 or 1 (right plot), meaning some units get huge weights. IPW becomes noisy — this is the extreme weights problem.