Difference-in-Differences

The idea

You have two groups: one gets treated at some point, the other never does. You observe both before and after treatment. The key assumption: absent treatment, both groups would have followed parallel trends.

\[\hat{\tau}_{DID} = (\bar{Y}_{treat,post} - \bar{Y}_{treat,pre}) - (\bar{Y}_{ctrl,post} - \bar{Y}_{ctrl,pre})\]

The first difference removes time-invariant group characteristics. The second difference removes common time trends. What’s left is the treatment effect.

Assumptions

Parallel trends: absent treatment, the treated and control groups would have followed the same trajectory over time — the key assumption
No anticipation: treated units don’t change behavior before the treatment date
SUTVA: treatment of one group doesn’t spill over to the control group
Stable composition: the groups don’t change membership over time (no differential attrition)

When does DID fail?

When the parallel trends assumption is violated — if the treated group was already on a different trajectory before treatment. The simulation below lets you break this assumption and see the bias that results.

#| standalone: true
#| viewerHeight: 620

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("n_units", "Units per group:",
                  min = 20, max = 200, value = 50, step = 10),

      sliderInput("true_effect", "True treatment effect:",
                  min = 0, max = 5, value = 2, step = 0.5),

      sliderInput("trend_diff", "Differential pre-trend\n(violation of parallel trends):",
                  min = -1, max = 1, value = 0, step = 0.1),

      sliderInput("sigma", "Noise (SD):",
                  min = 0.5, max = 3, value = 1, step = 0.25),

      actionButton("go", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      plotOutput("did_plot", height = "450px")
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n     <- input$n_units
    tau   <- input$true_effect
    delta <- input$trend_diff
    sigma <- input$sigma

    periods <- -4:4
    treat_time <- 1  # treatment at t = 1

    # Group means over time
    ctrl_mean <- 3 + 0.3 * periods
    treat_mean <- 5 + (0.3 + delta) * periods

    # Add treatment effect post
    treat_mean[periods >= treat_time] <- treat_mean[periods >= treat_time] + tau

    # Generate unit-level data
    ctrl_data <- sapply(ctrl_mean, function(m) m + rnorm(n, sd = sigma))
    treat_data <- sapply(treat_mean, function(m) m + rnorm(n, sd = sigma))

    ctrl_means_obs <- colMeans(ctrl_data)
    treat_means_obs <- colMeans(treat_data)

    # DID estimate (using t=0 as pre, t=1 as post)
    pre_idx  <- which(periods == 0)
    post_idx <- which(periods == 1)

    did_est <- (treat_means_obs[post_idx] - treat_means_obs[pre_idx]) -
               (ctrl_means_obs[post_idx] - ctrl_means_obs[pre_idx])

    # Counterfactual for treated (parallel to control from t=0)
    cf <- treat_means_obs[pre_idx] + (ctrl_means_obs - ctrl_means_obs[pre_idx])

    list(periods = periods, ctrl = ctrl_means_obs, treat = treat_means_obs,
         cf = cf, did_est = did_est, tau = tau, delta = delta,
         treat_time = treat_time)
  })

  output$did_plot <- renderPlot({
    d <- dat()

    par(mar = c(4.5, 4.5, 3, 1))
    ylim <- range(c(d$ctrl, d$treat, d$cf)) + c(-0.5, 0.5)

    plot(d$periods, d$treat, type = "b", pch = 19, lwd = 2.5, col = "#3498db",
         xlab = "Time period", ylab = "Mean outcome",
         main = "Difference-in-Differences",
         ylim = ylim, xaxt = "n")
    axis(1, at = d$periods)

    lines(d$periods, d$ctrl, type = "b", pch = 19, lwd = 2.5, col = "#e74c3c")

    # Counterfactual (dashed, post only)
    post <- d$periods >= d$treat_time
    lines(d$periods[post], d$cf[post], type = "b", pch = 1, lwd = 2, lty = 2,
          col = "#3498db80")

    # Treatment onset
    abline(v = d$treat_time - 0.5, lty = 3, col = "gray50", lwd = 1.5)
    text(d$treat_time - 0.5, ylim[2], "Treatment", pos = 4, cex = 0.85, col = "gray40")

    # DID bracket
    pre_idx  <- which(d$periods == 0)
    post_idx <- which(d$periods == 1)
    arrows(max(d$periods) - 0.3, d$cf[post_idx],
           max(d$periods) - 0.3, d$treat[post_idx],
           code = 3, lwd = 2, col = "#27ae60", length = 0.1)
    text(max(d$periods) - 0.1, (d$cf[post_idx] + d$treat[post_idx]) / 2,
         paste0("DID = ", round(d$did_est, 2)),
         col = "#27ae60", cex = 0.9, adj = 0)

    legend("topleft", bty = "n", cex = 0.85,
           legend = c("Treated", "Control", "Counterfactual (parallel trends)"),
           col = c("#3498db", "#e74c3c", "#3498db80"),
           pch = c(19, 19, 1), lty = c(1, 1, 2), lwd = c(2.5, 2.5, 2))
  })

  output$results <- renderUI({
    d <- dat()
    bias <- d$did_est - d$tau
    biased <- abs(d$delta) > 0.05

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>True effect:</b> ", d$tau, "<br>",
        "<b>DID estimate:</b> ", round(d$did_est, 3), "<br>",
        "<b>Bias:</b> <span class='", ifelse(biased, "bad", "good"), "'>",
        round(bias, 3), "</span><br>",
        if (biased) "<br><small>Parallel trends violated &mdash; DID is biased.</small>"
        else "<br><small>Parallel trends hold &mdash; DID is unbiased.</small>"
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

Differential pre-trend = 0: parallel trends hold, DID nails the true effect.
Slide the differential pre-trend to +0.5: the treated group was already rising faster. DID attributes some of that trend to the treatment — the estimate is biased upward.
Set true effect = 0 with a differential trend: DID “finds” an effect that doesn’t exist. That’s how pre-trend violations create false positives.
Look at the pre-treatment periods — if the lines aren’t parallel before treatment, you should worry.

In Stata

* Classic 2x2 DID with interaction
reg outcome treated##post

* With controls
reg outcome treated##post x1 x2, cluster(state)

* Event study (test parallel trends visually)
reg outcome i.treated#i.year i.year i.treated, cluster(state)

* Modern staggered DID (Callaway & Sant'Anna 2021)
* ssc install csdid
csdid outcome x1 x2, ivar(id) time(year) gvar(first_treated)

The coefficient on treated#post (or 1.treated#1.post) is the DID estimate. Clustering standard errors at the group level (e.g., state) is standard practice.