Synthetic Control
The idea
You have one treated unit — a state that passed a law, a country hit by a crisis, a company that changed policy. You want to know what would have happened without the treatment. But there’s no single control unit that’s a good comparison.
The synthetic control method builds a weighted combination of untreated units that matches the treated unit’s pre-treatment trajectory. That weighted combination — the “synthetic” version — serves as the counterfactual.
\[\hat{Y}_{1t}^{N} = \sum_{j=2}^{J+1} w_j \, Y_{jt}\]
where \(w_j \geq 0\) and \(\sum w_j = 1\). The weights are chosen so that the synthetic unit tracks the treated unit closely before treatment. After treatment, the gap between the treated unit and its synthetic version is the estimated effect.
The classic example
Abadie, Diamond & Hainmueller (2010): California passed Proposition 99 in 1988, a major tobacco control program. No single state is a good comparison — some are too urban, some too rural, some already had anti-smoking laws. The synthetic California is a weighted mix of states (Utah, Nevada, Colorado, Connecticut, Montana…) that together match California’s pre-1988 smoking trend almost exactly. After 1988, actual California diverges sharply below its synthetic version — that gap is the treatment effect.
Assumptions
- No interference / SUTVA: the treatment of the treated unit doesn’t affect the donor units (no spillovers)
- Convex hull: the treated unit’s pre-treatment outcomes can be expressed as a weighted average of the donors — the treated unit isn’t an outlier that no combination of donors can match
- No anticipation: the treated unit doesn’t change behavior before the treatment date
- Common factors: treated and donor units are driven by the same underlying factors, just with different loadings — the weights that work pre-treatment continue to work post-treatment
When does synthetic control fail?
- Bad pre-treatment fit: if the synthetic unit can’t track the treated unit before treatment, you can’t trust the post-treatment gap. There’s no magic — if no combination of donors resembles the treated unit, the method doesn’t work.
- Spillovers: if the treatment affects the donor units too (e.g., smokers move from California to Nevada), the synthetic control is contaminated.
- Too few donors: with very few comparison units, the weights are forced and the match may be poor.
#| standalone: true
#| viewerHeight: 680
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
.good { color: #27ae60; font-weight: bold; }
.bad { color: #e74c3c; font-weight: bold; }
.wt-table { font-size: 12px; margin-top: 6px; }
.wt-table td { padding: 1px 6px; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("tau_sc", "True treatment effect:",
min = -5, max = 5, value = -3, step = 0.5),
sliderInput("n_donors", "Number of donor units:",
min = 3, max = 15, value = 8, step = 1),
sliderInput("sigma_sc", "Noise (SD):",
min = 0.1, max = 2, value = 0.5, step = 0.1),
sliderInput("treat_time", "Treatment period:",
min = 8, max = 18, value = 12, step = 1),
actionButton("go_sc", "New draw", class = "btn-primary", width = "100%"),
uiOutput("results_sc")
),
mainPanel(
width = 9,
plotOutput("synth_plot", height = "500px")
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$go_sc
tau <- input$tau_sc
J <- input$n_donors
sigma <- input$sigma_sc
T0 <- input$treat_time
TT <- 25
# Generate donor unit trajectories
# Each donor has its own intercept and slope
set.seed(NULL)
intercepts <- rnorm(J, mean = 10, sd = 2)
slopes <- rnorm(J, mean = 0.3, sd = 0.15)
donors <- matrix(NA, nrow = TT, ncol = J)
for (j in 1:J) {
donors[, j] <- intercepts[j] + slopes[j] * (1:TT) + rnorm(TT, sd = sigma)
}
# True weights (sparse: pick 3-4 donors that matter)
n_active <- min(4, J)
active <- sample(1:J, n_active)
true_w <- rep(0, J)
raw <- runif(n_active, 0.1, 1)
true_w[active] <- raw / sum(raw)
# Treated unit = weighted combo of donors + treatment effect after T0
treated <- donors %*% true_w + rnorm(TT, sd = sigma * 0.5)
treated[(T0 + 1):TT] <- treated[(T0 + 1):TT] + tau
# Estimate synthetic control weights (OLS on pre-period, constrained to sum to 1)
# Simple approach: non-negative least squares via iterative projection
pre <- 1:T0
Y1_pre <- treated[pre]
Y0_pre <- donors[pre, ]
# Use a simple regression + normalize approach
# Unconstrained OLS, then clip negatives and renormalize
if (J <= T0) {
fit <- lm(Y1_pre ~ Y0_pre - 1)
w_hat <- coef(fit)
} else {
# More donors than periods: use ridge-like approach
lambda <- 0.01
w_hat <- solve(t(Y0_pre) %*% Y0_pre + lambda * diag(J),
t(Y0_pre) %*% Y1_pre)
}
w_hat[w_hat < 0] <- 0
if (sum(w_hat) > 0) w_hat <- w_hat / sum(w_hat) else w_hat <- rep(1/J, J)
# Synthetic control trajectory
synth <- donors %*% w_hat
# Estimated effect (post-treatment gap)
post <- (T0 + 1):TT
gaps <- treated[post] - synth[post]
avg_effect <- mean(gaps)
# Pre-treatment fit (RMSPE)
pre_rmspe <- sqrt(mean((treated[pre] - synth[pre])^2))
list(time = 1:TT, treated = as.numeric(treated),
synth = as.numeric(synth), donors = donors,
w_hat = w_hat, T0 = T0, tau = tau,
avg_effect = avg_effect, pre_rmspe = pre_rmspe, J = J)
})
output$synth_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
ylim <- range(c(d$treated, d$synth, d$donors)) + c(-1, 1)
# Donor units (gray background)
plot(d$time, d$donors[, 1], type = "n",
xlab = "Time", ylab = "Outcome",
main = "Synthetic Control Method",
ylim = ylim)
for (j in 1:d$J) {
lines(d$time, d$donors[, j], col = adjustcolor("gray70", 0.4), lwd = 0.8)
}
# Synthetic control
lines(d$time, d$synth, col = "#e74c3c", lwd = 3, lty = 2)
# Treated unit
lines(d$time, d$treated, col = "#3498db", lwd = 3)
# Treatment line
abline(v = d$T0 + 0.5, lty = 3, col = "gray40", lwd = 1.5)
text(d$T0 + 0.5, ylim[2], "Treatment", pos = 4, cex = 0.85, col = "gray40")
# Gap shading in post period
post <- (d$T0 + 1):length(d$time)
polygon(c(d$time[post], rev(d$time[post])),
c(d$treated[post], rev(d$synth[post])),
col = adjustcolor("#27ae60", 0.15), border = NA)
# Gap label
mid_post <- d$time[round(median(post))]
mid_gap <- (d$treated[round(median(post))] + d$synth[round(median(post))]) / 2
text(mid_post, mid_gap,
paste0("Avg gap = ", round(d$avg_effect, 2)),
col = "#27ae60", font = 2, cex = 0.9)
legend("topleft", bty = "n", cex = 0.85,
legend = c("Treated unit", "Synthetic control", "Donor units"),
col = c("#3498db", "#e74c3c", "gray70"),
lwd = c(3, 3, 1), lty = c(1, 2, 1))
})
output$results_sc <- renderUI({
d <- dat()
# Top weights
ord <- order(d$w_hat, decreasing = TRUE)
top <- ord[d$w_hat[ord] > 0.01]
wt_rows <- paste0(
sapply(top, function(j) {
paste0("<tr><td>Donor ", j, "</td><td><b>",
round(d$w_hat[j] * 100), "%</b></td></tr>")
}),
collapse = ""
)
tags$div(class = "stats-box",
HTML(paste0(
"<b>True effect:</b> ", d$tau, "<br>",
"<b>Avg post-treatment gap:</b> ",
"<span class='", ifelse(abs(d$avg_effect - d$tau) < 1, "good", "bad"), "'>",
round(d$avg_effect, 2), "</span><br>",
"<b>Pre-treatment RMSPE:</b> ", round(d$pre_rmspe, 3), "<br>",
"<hr style='margin:6px 0'>",
"<b>Weights:</b>",
"<table class='wt-table'>", wt_rows, "</table>"
))
)
})
}
shinyApp(ui, server)
Things to try
- Default settings (effect = -3): the synthetic control (red dashed) tracks the treated unit closely before treatment, then diverges. The green-shaded gap is the estimated effect.
- Set true effect = 0: the two lines should stay close after treatment too. If you see a big gap, it’s noise — this is why pre-treatment fit matters.
- Reduce donors to 3: fewer building blocks means a worse pre-treatment fit. The estimate gets noisier.
- Increase donors to 15: more building blocks, better fit. But watch the weights — most donors get zero weight. The method is naturally sparse.
- Move treatment period later (18): short post-period, harder to judge whether the gap is real or just noise.
- Crank up noise: the pre-treatment fit deteriorates and the post-treatment gap becomes unreliable.
Inference: placebo tests
With one treated unit, you can’t do standard inference. Instead, you run placebo tests:
In-space placebos. Apply the synthetic control method to each donor unit — pretend it was treated and build a synthetic version from the remaining donors. If the treated unit’s gap is much larger than the placebo gaps, the effect is likely real.
In-time placebos. Move the treatment date earlier (to a period when no treatment occurred). If you find a gap in the placebo period, your method is picking up something other than the treatment.
These aren’t formal p-values, but they give you a sense of whether the effect is distinguishable from noise.
In Stata
* Install synthetic control
* ssc install synth
* Set up panel data
tsset unit_id year
* Synthetic control
synth outcome x1 x2 outcome(1985) outcome(1988) outcome(1990), ///
trunit(3) trperiod(1994) fig
* Placebo tests (permute treatment across donors)
* ssc install synth_runner
synth_runner outcome x1 x2, trunit(3) trperiod(1994) gen_varssynth finds weights on donor units that match the treated unit’s pre-treatment trajectory. Always run placebo tests — if the treated unit’s gap isn’t larger than the placebos, the effect isn’t credible.
Did you know?
The synthetic control method was developed by Abadie & Gardeazabal (2003) to study the economic impact of terrorism in the Basque Country, and formalized by Abadie, Diamond & Hainmueller (2010) in the California tobacco study. It has since become one of the most widely used methods in policy evaluation.
Athey & Imbens (2017) called synthetic control “arguably the most important innovation in the policy evaluation literature in the last 15 years.”
The method works best when you have long pre-treatment panels (many time periods before treatment) and a moderate number of donor units. It struggles with short panels or when no combination of donors can approximate the treated unit.