Difference-in-Differences
The idea
You have two groups: one gets treated at some point, the other never does. You observe both before and after treatment. The key assumption: absent treatment, both groups would have followed parallel trends.
\[\hat{\tau}_{DID} = (\bar{Y}_{treat,post} - \bar{Y}_{treat,pre}) - (\bar{Y}_{ctrl,post} - \bar{Y}_{ctrl,pre})\]
The first difference removes time-invariant group characteristics. The second difference removes common time trends. What’s left is the treatment effect.
Assumptions
- Parallel trends: absent treatment, the treated and control groups would have followed the same trajectory over time — the key assumption
- No anticipation: treated units don’t change behavior before the treatment date
- SUTVA: treatment of one group doesn’t spill over to the control group
- Stable composition: the groups don’t change membership over time (no differential attrition)
When does DID fail?
When the parallel trends assumption is violated — if the treated group was already on a different trajectory before treatment. The simulation below lets you break this assumption and see the bias that results.
#| standalone: true
#| viewerHeight: 620
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
.good { color: #27ae60; font-weight: bold; }
.bad { color: #e74c3c; font-weight: bold; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("n_units", "Units per group:",
min = 20, max = 200, value = 50, step = 10),
sliderInput("true_effect", "True treatment effect:",
min = 0, max = 5, value = 2, step = 0.5),
sliderInput("trend_diff", "Differential pre-trend\n(violation of parallel trends):",
min = -1, max = 1, value = 0, step = 0.1),
sliderInput("sigma", "Noise (SD):",
min = 0.5, max = 3, value = 1, step = 0.25),
actionButton("go", "New draw", class = "btn-primary", width = "100%"),
uiOutput("results")
),
mainPanel(
width = 9,
plotOutput("did_plot", height = "450px")
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$go
n <- input$n_units
tau <- input$true_effect
delta <- input$trend_diff
sigma <- input$sigma
periods <- -4:4
treat_time <- 1 # treatment at t = 1
# Group means over time
ctrl_mean <- 3 + 0.3 * periods
treat_mean <- 5 + (0.3 + delta) * periods
# Add treatment effect post
treat_mean[periods >= treat_time] <- treat_mean[periods >= treat_time] + tau
# Generate unit-level data
ctrl_data <- sapply(ctrl_mean, function(m) m + rnorm(n, sd = sigma))
treat_data <- sapply(treat_mean, function(m) m + rnorm(n, sd = sigma))
ctrl_means_obs <- colMeans(ctrl_data)
treat_means_obs <- colMeans(treat_data)
# DID estimate (using t=0 as pre, t=1 as post)
pre_idx <- which(periods == 0)
post_idx <- which(periods == 1)
did_est <- (treat_means_obs[post_idx] - treat_means_obs[pre_idx]) -
(ctrl_means_obs[post_idx] - ctrl_means_obs[pre_idx])
# Counterfactual for treated (parallel to control from t=0)
cf <- treat_means_obs[pre_idx] + (ctrl_means_obs - ctrl_means_obs[pre_idx])
list(periods = periods, ctrl = ctrl_means_obs, treat = treat_means_obs,
cf = cf, did_est = did_est, tau = tau, delta = delta,
treat_time = treat_time)
})
output$did_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
ylim <- range(c(d$ctrl, d$treat, d$cf)) + c(-0.5, 0.5)
plot(d$periods, d$treat, type = "b", pch = 19, lwd = 2.5, col = "#3498db",
xlab = "Time period", ylab = "Mean outcome",
main = "Difference-in-Differences",
ylim = ylim, xaxt = "n")
axis(1, at = d$periods)
lines(d$periods, d$ctrl, type = "b", pch = 19, lwd = 2.5, col = "#e74c3c")
# Counterfactual (dashed, post only)
post <- d$periods >= d$treat_time
lines(d$periods[post], d$cf[post], type = "b", pch = 1, lwd = 2, lty = 2,
col = "#3498db80")
# Treatment onset
abline(v = d$treat_time - 0.5, lty = 3, col = "gray50", lwd = 1.5)
text(d$treat_time - 0.5, ylim[2], "Treatment", pos = 4, cex = 0.85, col = "gray40")
# DID bracket
pre_idx <- which(d$periods == 0)
post_idx <- which(d$periods == 1)
arrows(max(d$periods) - 0.3, d$cf[post_idx],
max(d$periods) - 0.3, d$treat[post_idx],
code = 3, lwd = 2, col = "#27ae60", length = 0.1)
text(max(d$periods) - 0.1, (d$cf[post_idx] + d$treat[post_idx]) / 2,
paste0("DID = ", round(d$did_est, 2)),
col = "#27ae60", cex = 0.9, adj = 0)
legend("topleft", bty = "n", cex = 0.85,
legend = c("Treated", "Control", "Counterfactual (parallel trends)"),
col = c("#3498db", "#e74c3c", "#3498db80"),
pch = c(19, 19, 1), lty = c(1, 1, 2), lwd = c(2.5, 2.5, 2))
})
output$results <- renderUI({
d <- dat()
bias <- d$did_est - d$tau
biased <- abs(d$delta) > 0.05
tags$div(class = "stats-box",
HTML(paste0(
"<b>True effect:</b> ", d$tau, "<br>",
"<b>DID estimate:</b> ", round(d$did_est, 3), "<br>",
"<b>Bias:</b> <span class='", ifelse(biased, "bad", "good"), "'>",
round(bias, 3), "</span><br>",
if (biased) "<br><small>Parallel trends violated — DID is biased.</small>"
else "<br><small>Parallel trends hold — DID is unbiased.</small>"
))
)
})
}
shinyApp(ui, server)
Things to try
- Differential pre-trend = 0: parallel trends hold, DID nails the true effect.
- Slide the differential pre-trend to +0.5: the treated group was already rising faster. DID attributes some of that trend to the treatment — the estimate is biased upward.
- Set true effect = 0 with a differential trend: DID “finds” an effect that doesn’t exist. That’s how pre-trend violations create false positives.
- Look at the pre-treatment periods — if the lines aren’t parallel before treatment, you should worry.
In Stata
* Classic 2x2 DID with interaction
reg outcome treated##post
* With controls
reg outcome treated##post x1 x2, cluster(state)
* Event study (test parallel trends visually)
reg outcome i.treated#i.year i.year i.treated, cluster(state)
* Modern staggered DID (Callaway & Sant'Anna 2021)
* ssc install csdid
csdid outcome x1 x2, ivar(id) time(year) gvar(first_treated)The coefficient on treated#post (or 1.treated#1.post) is the DID estimate. Clustering standard errors at the group level (e.g., state) is standard practice.