Robust vs Clustered SEs: When Observations Aren’t Independent
The independence assumption you forgot about
OLS assumes that observations are independent. But in many real datasets they aren’t:
- Students within the same school share teachers and resources
- Patients within the same hospital get similar care
- Purchases by the same customer are correlated over time
- Employees within the same firm face the same management
When observations within a group (cluster) are correlated, OLS standard errors are too small — often dramatically so. Your t-statistics are inflated, your p-values are too small, and you reject the null far more than 5% of the time.
Three types of standard errors
| SE type | What it assumes | When to use |
|---|---|---|
| OLS (classical) | Errors are i.i.d. (independent, constant variance) | Almost never in practice |
| Robust (HC) | Errors can have different variances, but are independent | Cross-sectional data with heteroskedasticity |
| Clustered | Errors can be correlated within clusters | Any grouped/panel data |
Robust SEs fix heteroskedasticity but not within-cluster correlation. Clustered SEs fix both. If your data has clusters, you need clustered SEs.
Where does the independence assumption live?
The standard OLS SE formula assumes independent observations. Here is exactly where that assumption appears.
The OLS estimator for \(Y = X\beta + \varepsilon\) is
\[\hat{\beta} = (X'X)^{-1}X'Y = \beta + (X'X)^{-1}X'\varepsilon\]
so the variance of the estimator depends on the variance of the errors:
\[\text{Var}(\hat{\beta}) = (X'X)^{-1} \; X' \, \text{Var}(\varepsilon) \, X \; (X'X)^{-1}\]
This formula is always true. The key question is: what is \(\text{Var}(\varepsilon)\)?
Under independence, classical OLS assumes \(\text{Var}(\varepsilon) = \sigma^2 I\), where the identity matrix \(I\) has zeros on the off-diagonal — meaning \(\text{Cov}(\varepsilon_i, \varepsilon_j) = 0\) for all \(i \neq j\). Plugging this in:
\[\text{Var}(\hat{\beta}) = \sigma^2 (X'X)^{-1}\]
This is the default SE printed in regression output: \(\text{SE}(\hat{\beta}_j) = \sqrt{\hat{\sigma}^2 [(X'X)^{-1}]_{jj}}\).
When observations are correlated, the true error variance is \(\text{Var}(\varepsilon) = \Omega\), where \(\Omega\) is not diagonal. For example, with three students in the same school and one in a different school:
\[\Omega = \begin{bmatrix} \sigma^2 & \rho & \rho & 0 \\ \rho & \sigma^2 & \rho & 0 \\ \rho & \rho & \sigma^2 & 0 \\ 0 & 0 & 0 & \sigma^2 \end{bmatrix}\]
The correct variance is \((X'X)^{-1} X' \Omega X (X'X)^{-1}\), but the OLS formula pretends \(\Omega = \sigma^2 I\) and ignores those off-diagonal terms — so the estimated SE is too small.
Cluster-robust SEs fix this by replacing \(\sigma^2 I\) with an estimate of \(\Omega\):
\[\text{Var}(\hat{\beta}) = (X'X)^{-1} \left( \sum_g X_g' \hat{\varepsilon}_g \hat{\varepsilon}_g' X_g \right) (X'X)^{-1}\]
where \(g\) indexes clusters. This allows \(\text{Cov}(\varepsilon_{it}, \varepsilon_{is}) \neq 0\) within the same cluster — exactly the sandwich formula implemented in Stata’s , cluster() and R’s vcovCL().
Simulation 1: Clustered data and wrong SEs
Generate data with students nested in schools. Within each school, outcomes are correlated (shared school effect). Compare the three types of standard errors.
#| standalone: true
#| viewerHeight: 620
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
.good { color: #27ae60; font-weight: bold; }
.bad { color: #e74c3c; font-weight: bold; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("n_clusters", "Number of schools:",
min = 10, max = 100, value = 30, step = 5),
sliderInput("cluster_size", "Students per school:",
min = 5, max = 50, value = 20, step = 5),
sliderInput("icc", "ICC (intra-cluster correlation):",
min = 0, max = 0.8, value = 0.3, step = 0.05),
helpText("ICC = share of total variance due to the
school-level component. Higher = more clustering."),
actionButton("go", "New draw",
class = "btn-primary", width = "100%"),
uiOutput("results")
),
mainPanel(
width = 9,
fluidRow(
column(6, plotOutput("scatter", height = "400px")),
column(6, plotOutput("se_compare", height = "400px"))
)
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$go
G <- input$n_clusters
m <- input$cluster_size
icc <- input$icc
# Variance decomposition: total var = 1
sigma_b <- sqrt(icc) # between-cluster SD
sigma_w <- sqrt(1 - icc) # within-cluster SD
# Generate clustered data (no true effect: beta = 0)
cluster_id <- rep(seq_len(G), each = m)
school_effect <- rep(rnorm(G, sd = sigma_b), each = m)
x <- rnorm(G * m)
y <- 0 * x + school_effect + rnorm(G * m, sd = sigma_w)
fit <- lm(y ~ x)
b_hat <- coef(fit)[2]
n <- G * m
# OLS SE
ols_se <- summary(fit)$coefficients[2, 2]
# Robust SE (HC1)
r <- resid(fit)
X <- cbind(1, x)
bread <- solve(t(X) %*% X)
meat_hc <- t(X) %*% diag(r^2 * n / (n - 2)) %*% X
robust_se <- sqrt((bread %*% meat_hc %*% bread)[2, 2])
# Clustered SE (Liang-Zeger)
meat_cl <- matrix(0, 2, 2)
for (g in seq_len(G)) {
idx <- which(cluster_id == g)
Xg <- X[idx, , drop = FALSE]
rg <- r[idx]
meat_cl <- meat_cl + t(Xg) %*% (rg %*% t(rg)) %*% Xg
}
adj <- G / (G - 1) * (n - 1) / (n - 2)
cl_vcov <- adj * bread %*% meat_cl %*% bread
clustered_se <- sqrt(cl_vcov[2, 2])
list(x = x, y = y, cluster_id = cluster_id, fit = fit,
b_hat = b_hat, ols_se = ols_se, robust_se = robust_se,
clustered_se = clustered_se, G = G, m = m, icc = icc)
})
output$scatter <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
# Color by cluster (cycle through colors)
palette <- rep(c("#e74c3c", "#3498db", "#27ae60", "#9b59b6",
"#e67e22", "#1abc9c", "#34495e", "#f39c12"),
length.out = d$G)
cols <- palette[d$cluster_id]
plot(d$x, d$y, pch = 16, col = adjustcolor(cols, 0.5), cex = 0.6,
xlab = "X", ylab = "Y",
main = paste0(d$G, " schools, ", d$m, " students each"))
abline(d$fit, col = "#2c3e50", lwd = 2.5)
abline(h = 0, lty = 3, col = "gray60")
})
output$se_compare <- renderPlot({
d <- dat()
par(mar = c(4.5, 8, 3, 1))
ses <- c(d$ols_se, d$robust_se, d$clustered_se)
names_se <- c("OLS SE", "Robust SE", "Clustered SE")
cols <- c("#e74c3c", "#e67e22", "#27ae60")
# CIs
lo <- d$b_hat - 1.96 * ses
hi <- d$b_hat + 1.96 * ses
xlim <- range(c(lo, hi, 0)) + c(-0.3, 0.3)
plot(NULL, xlim = xlim, ylim = c(0.5, 3.5),
yaxt = "n", ylab = "", xlab = expression(hat(beta)),
main = "95% CIs with different SEs")
axis(2, at = 1:3, labels = names_se, las = 1, cex.axis = 0.85)
for (i in 1:3) {
segments(lo[i], i, hi[i], i, lwd = 4, col = cols[i])
points(d$b_hat, i, pch = 19, cex = 1.5, col = cols[i])
}
abline(v = 0, lty = 2, lwd = 2, col = "#2c3e50")
text(0, 3.4, expression("True " * beta * " = 0"), cex = 0.9)
})
output$results <- renderUI({
d <- dat()
ratio <- d$clustered_se / d$ols_se
tags$div(class = "stats-box",
HTML(paste0(
"<b>", expression("hat(beta)"), ":</b> ",
round(d$b_hat, 4), "<br>",
"<b>OLS SE:</b> <span class='bad'>",
round(d$ols_se, 4), "</span><br>",
"<b>Robust SE:</b> ",
round(d$robust_se, 4), "<br>",
"<b>Clustered SE:</b> <span class='good'>",
round(d$clustered_se, 4), "</span><br>",
"<hr style='margin:8px 0'>",
"<b>Cluster/OLS ratio:</b> ", round(ratio, 2), "x<br>",
"<small>With ICC = ", d$icc, ", clustered SE is<br>",
round(ratio, 1), "x larger than OLS SE.</small>"
))
)
})
}
shinyApp(ui, server)
Simulation 2: Rejection rates
The real test: run 500 experiments where the true effect is zero. How often does each type of SE incorrectly reject H₀ at the 5% level? OLS SEs reject far too often. Clustered SEs get it right.
#| standalone: true
#| viewerHeight: 520
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
.good { color: #27ae60; font-weight: bold; }
.bad { color: #e74c3c; font-weight: bold; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("G2", "Number of clusters:",
min = 10, max = 100, value = 30, step = 5),
sliderInput("m2", "Obs per cluster:",
min = 5, max = 50, value = 20, step = 5),
sliderInput("icc2", "ICC:",
min = 0, max = 0.8, value = 0.3, step = 0.05),
sliderInput("sims", "Number of simulations:",
min = 200, max = 1000, value = 500, step = 100),
actionButton("go2", "Run simulations",
class = "btn-primary", width = "100%"),
uiOutput("results2")
),
mainPanel(
width = 9,
plotOutput("rejection_plot", height = "420px")
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$go2
G <- input$G2
m <- input$m2
icc <- input$icc2
sims <- input$sims
sigma_b <- sqrt(icc)
sigma_w <- sqrt(1 - icc)
n <- G * m
reject_ols <- 0
reject_robust <- 0
reject_cluster <- 0
for (s in seq_len(sims)) {
cluster_id <- rep(seq_len(G), each = m)
school_eff <- rep(rnorm(G, sd = sigma_b), each = m)
x <- rnorm(n)
y <- 0 * x + school_eff + rnorm(n, sd = sigma_w)
fit <- lm(y ~ x)
b_hat <- coef(fit)[2]
r <- resid(fit)
X <- cbind(1, x)
# OLS
ols_se <- summary(fit)$coefficients[2, 2]
# Robust
bread <- solve(t(X) %*% X)
meat_hc <- t(X) %*% diag(r^2 * n / (n - 2)) %*% X
robust_se <- sqrt((bread %*% meat_hc %*% bread)[2, 2])
# Clustered
meat_cl <- matrix(0, 2, 2)
for (g in seq_len(G)) {
idx <- which(cluster_id == g)
Xg <- X[idx, , drop = FALSE]
rg <- r[idx]
meat_cl <- meat_cl + t(Xg) %*% (rg %*% t(rg)) %*% Xg
}
adj <- G / (G - 1) * (n - 1) / (n - 2)
cl_se <- sqrt((adj * bread %*% meat_cl %*% bread)[2, 2])
if (abs(b_hat / ols_se) > 1.96) reject_ols <- reject_ols + 1
if (abs(b_hat / robust_se) > 1.96) reject_robust <- reject_robust + 1
if (abs(b_hat / cl_se) > 1.96) reject_cluster <- reject_cluster + 1
}
list(rej_ols = reject_ols / sims,
rej_robust = reject_robust / sims,
rej_cluster = reject_cluster / sims,
sims = sims, G = G, m = m, icc = icc)
})
output$rejection_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 8, 3, 1))
rates <- c(d$rej_ols, d$rej_robust, d$rej_cluster) * 100
names_se <- c("OLS SE", "Robust SE", "Clustered SE")
cols <- c("#e74c3c", "#e67e22", "#27ae60")
bp <- barplot(rates, names.arg = names_se, col = cols,
horiz = TRUE, xlim = c(0, max(rates, 10) * 1.3),
xlab = "Rejection rate (%)",
main = paste0("False rejection rate at 5% level\n(",
d$sims, " simulations, ICC = ",
d$icc, ")"),
las = 1, border = NA)
abline(v = 5, lty = 2, lwd = 2.5, col = "#2c3e50")
text(5, max(bp) + 0.6, "Target: 5%", cex = 0.85)
text(rates + 1, bp, paste0(round(rates, 1), "%"),
cex = 0.9, adj = 0)
})
output$results2 <- renderUI({
d <- dat()
tags$div(class = "stats-box",
HTML(paste0(
"<b>OLS rejection rate:</b> <span class='",
ifelse(d$rej_ols > 0.08, "bad", "good"), "'>",
round(d$rej_ols * 100, 1), "%</span><br>",
"<b>Robust rejection rate:</b> <span class='",
ifelse(d$rej_robust > 0.08, "bad", "good"), "'>",
round(d$rej_robust * 100, 1), "%</span><br>",
"<b>Clustered rejection rate:</b> <span class='good'>",
round(d$rej_cluster * 100, 1), "%</span><br>",
"<hr style='margin:8px 0'>",
"<small>True \u03b2 = 0 in all simulations.<br>",
"Correct rejection rate is 5%.</small>"
))
)
})
}
shinyApp(ui, server)
Things to try
- ICC = 0 (no clustering): all three SEs give ~5% rejection. When there’s no within-cluster correlation, OLS SEs are fine.
- ICC = 0.3: OLS rejection shoots up to ~15–25%. Robust SEs help a little but not enough. Clustered SEs stay near 5%.
- ICC = 0.5: OLS can reject 30–40% of the time — catastrophic.
- Increase cluster size, hold clusters fixed: the problem gets worse with more observations per cluster. More correlated data doesn’t help — it just gives you a false sense of precision.
- Increase number of clusters: this does help. The effective sample size for clustered data is closer to the number of clusters than the number of observations.
Simulation 3: Choosing the cluster level
In hierarchical data — students in classrooms in schools in districts — you have to decide which level to cluster at. The same \(\hat{\beta}\) gets different standard errors depending on your choice:
- Too fine (e.g., classroom when correlation exists at the school level): SEs are too small, you over-reject.
- Too coarse (e.g., state when you only have 6 states): too few clusters for the asymptotic approximation, SEs may be unreliable.
- The right level: cluster where treatment is assigned, or at the coarsest level where errors are correlated.
This simulation generates a four-level hierarchy with variance at each level you control. Watch the SE staircase: SEs weakly increase as you cluster coarser, and the jumps tell you where the correlation lives. If SEs barely change going from school to district, there’s little district-level correlation — clustering at school is fine.
#| standalone: true
#| viewerHeight: 650
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
.good { color: #27ae60; font-weight: bold; }
.bad { color: #e74c3c; font-weight: bold; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("n_dist3", "Number of districts:",
min = 4, max = 12, value = 8, step = 1),
helpText("4 schools/district, 4 classrooms/school,
8 students/classroom"),
tags$hr(),
sliderInput("var_dist", "District variance share:",
min = 0, max = 0.5, value = 0.2, step = 0.05),
sliderInput("var_school", "School variance share:",
min = 0, max = 0.5, value = 0.15, step = 0.05),
sliderInput("var_class", "Classroom variance share:",
min = 0, max = 0.5, value = 0.1, step = 0.05),
uiOutput("var_indiv_text"),
actionButton("go3", "New draw",
class = "btn-primary", width = "100%"),
uiOutput("results3")
),
mainPanel(
width = 9,
fluidRow(
column(6, plotOutput("se_staircase", height = "450px")),
column(6, plotOutput("ci_levels", height = "450px"))
)
)
)
)
server <- function(input, output, session) {
# Helper: clustered SE for coefficient on x (index 2)
calc_cluster_se <- function(Xmat, r, cluster_id) {
n <- nrow(Xmat)
k <- ncol(Xmat)
bread <- solve(crossprod(Xmat))
clusters <- unique(cluster_id)
G <- length(clusters)
meat <- matrix(0, k, k)
for (g in clusters) {
idx <- which(cluster_id == g)
Xg <- Xmat[idx, , drop = FALSE]
rg <- r[idx]
score_g <- crossprod(Xg, rg)
meat <- meat + tcrossprod(score_g)
}
adj <- G / (G - 1) * (n - 1) / (n - k)
vcov <- adj * bread %*% meat %*% bread
sqrt(vcov[2, 2])
}
dat3 <- reactive({
input$go3
n_dist <- input$n_dist3
v_d <- input$var_dist
v_s <- input$var_school
v_c <- input$var_class
v_i <- max(0.01, 1 - v_d - v_s - v_c)
# Fixed hierarchy
sch_per_dist <- 4
cls_per_sch <- 4
stu_per_cls <- 8
n_sch <- n_dist * sch_per_dist
n_cls <- n_sch * cls_per_sch
n_stu <- n_cls * stu_per_cls
# IDs
dist_id <- rep(1:n_dist,
each = sch_per_dist * cls_per_sch * stu_per_cls)
school_id <- rep(1:n_sch,
each = cls_per_sch * stu_per_cls)
class_id <- rep(1:n_cls,
each = stu_per_cls)
# Random effects
d_eff <- rep(rnorm(n_dist, sd = sqrt(v_d)),
each = sch_per_dist * cls_per_sch * stu_per_cls)
s_eff <- rep(rnorm(n_sch, sd = sqrt(v_s)),
each = cls_per_sch * stu_per_cls)
c_eff <- rep(rnorm(n_cls, sd = sqrt(v_c)),
each = stu_per_cls)
i_err <- rnorm(n_stu, sd = sqrt(v_i))
x <- rnorm(n_stu)
y <- 0 * x + d_eff + s_eff + c_eff + i_err
fit <- lm(y ~ x)
b_hat <- coef(fit)[2]
r <- resid(fit)
Xmat <- cbind(1, x)
# Robust SE (HC1) — efficient version
e2 <- r^2 * n_stu / (n_stu - 2)
bread <- solve(crossprod(Xmat))
meat_hc <- crossprod(Xmat * sqrt(e2))
robust_se <- sqrt((bread %*% meat_hc %*% bread)[2, 2])
# Clustered SEs at each level
class_se <- calc_cluster_se(Xmat, r, class_id)
school_se <- calc_cluster_se(Xmat, r, school_id)
dist_se <- calc_cluster_se(Xmat, r, dist_id)
ses <- c(robust_se, class_se, school_se, dist_se)
n_clusters <- c(n_stu, n_cls, n_sch, n_dist)
labels <- c("Robust\n(individual)",
paste0("Classroom\n(G=", n_cls, ")"),
paste0("School\n(G=", n_sch, ")"),
paste0("District\n(G=", n_dist, ")"))
list(b_hat = b_hat, ses = ses,
n_clusters = n_clusters, labels = labels)
})
output$var_indiv_text <- renderUI({
v_i <- max(0, 1 - input$var_dist -
input$var_school - input$var_class)
col <- if (v_i < 0.05) "color: #e74c3c;" else ""
tags$p(style = paste0("font-size: 13px; ", col),
HTML(paste0("<b>Individual share:</b> ",
round(v_i, 2))))
})
output$se_staircase <- renderPlot({
d <- dat3()
par(mar = c(5, 9, 3, 2))
cols <- c("#e74c3c", "#e67e22", "#3498db", "#27ae60")
ses <- d$ses
bp <- barplot(ses, names.arg = d$labels,
horiz = TRUE, col = cols, border = NA,
las = 1, cex.names = 0.8,
xlab = expression("SE(" * hat(beta) * ")"),
main = "SE at each clustering level",
xlim = c(0, max(ses) * 1.5))
# Label: value and ratio vs robust
text(ses + max(ses) * 0.03, bp,
paste0(format(round(ses, 4), nsmall = 4),
" (", round(ses / ses[1], 1), "x)"),
adj = 0, cex = 0.85)
})
output$ci_levels <- renderPlot({
d <- dat3()
par(mar = c(5, 9, 3, 2))
ses <- d$ses
b <- d$b_hat
lo <- b - 1.96 * ses
hi <- b + 1.96 * ses
cols <- c("#e74c3c", "#e67e22", "#3498db", "#27ae60")
xlim <- c(min(lo, 0) - abs(min(lo, 0)) * 0.3,
max(hi, 0) + abs(max(hi, 0)) * 0.3)
plot(NULL, xlim = xlim, ylim = c(0.5, 4.5),
yaxt = "n", ylab = "",
xlab = expression(hat(beta)),
main = expression(
"95% CI at each level (true " * beta * " = 0)"))
axis(2, at = 1:4, labels = d$labels,
las = 1, cex.axis = 0.8)
for (i in 1:4) {
segments(lo[i], i, hi[i], i,
lwd = 5, col = cols[i])
points(b, i, pch = 19, cex = 1.5, col = cols[i])
}
abline(v = 0, lty = 2, lwd = 2, col = "#2c3e50")
})
output$results3 <- renderUI({
d <- dat3()
ses <- d$ses
tags$div(class = "stats-box",
HTML(paste0(
"<b>SE staircase:</b><br>",
"Robust: ", round(ses[1], 4), "<br>",
"Classroom: ", round(ses[2], 4),
" (<b>", round(ses[2]/ses[1], 1), "x</b>)<br>",
"School: ", round(ses[3], 4),
" (<b>", round(ses[3]/ses[1], 1), "x</b>)<br>",
"District: ", round(ses[4], 4),
" (<b>", round(ses[4]/ses[1], 1), "x</b>)<br>",
"<hr style='margin:8px 0'>",
"<small>Ratios are vs robust SE.<br>",
"Big jump = correlation at that level.</small>"
))
)
})
}
shinyApp(ui, server)
Things to try
- All variance at district level (0.5, 0, 0): huge jump from robust to district. Clustering at classroom or school barely helps — the correlation lives at the top.
- All variance at classroom level (0, 0, 0.5): the jump happens from robust to classroom. School and district clustering add almost nothing beyond classroom, because there’s no correlation between classrooms.
- Equal shares (0.15, 0.15, 0.15): a gradual staircase — each level adds some.
- Zero variance at all levels (0, 0, 0): no hierarchy at all. All four SEs are essentially the same. Robust is fine.
- Few districts (4): even with the “right” SE, only 4 clusters is dangerously few for asymptotic clustered SEs. Consider the wild cluster bootstrap.
Choosing the level in practice
The staircase gives you a diagnostic: where the SE jumps is where the correlation lives. Apply this to your research design:
| If your design looks like… | Cluster at… | Why |
|---|---|---|
| Students randomized within schools | School | Treatment assigned at school level |
| Policy varies by state, data is individual | State | Treatment assigned at state level |
| Tracts in counties, treatment is tract-level exposure | County | Spatial correlation in outcomes across tracts in same county |
| Panel data, units observed over time | Unit | Serial correlation within unit |
The key rule: you can never go wrong clustering coarser (SEs only get bigger), but you can go wrong clustering too fine. The risk of coarse clustering is having too few clusters — below ~30, use the wild cluster bootstrap.
Code reference
| Software | Clustered SEs |
|---|---|
| R | lmtest::coeftest(fit, vcov = sandwich::vcovCL, cluster = ~school) |
| Stata | reg y x, cluster(school) |
| Python | sm.OLS(y, X).fit(cov_type='cluster', cov_kwds={'groups': school}) |
The practical rule
- Always cluster at the level of treatment assignment. If you randomized schools, cluster by school. If you randomized classrooms, cluster by classroom.
- When in doubt, cluster at the highest level that makes sense. Clustering too aggressively (too few clusters) can be a problem, but not clustering when you should is always worse.
- The minimum number of clusters for reliable clustered SEs is roughly 30–50. With fewer clusters, consider the wild cluster bootstrap.
Did you know?
- Brent Moulton (1990) published a now-classic paper showing that regressions using aggregate variables (like state-level policies) with individual-level data produce t-statistics that are wildly inflated if you ignore clustering. He called it the “Moulton problem.” Many published results in economics were likely spurious because of this.
- The Moulton factor — the ratio of the true variance to the (too-small) OLS variance — is approximately \(1 + (m-1) \times ICC\), where \(m\) is the average cluster size. With 50 students per school and ICC = 0.2, the Moulton factor is about 11, meaning your OLS SE is \(\sqrt{11} \approx 3.3\) times too small. Your t-statistic is 3x too large.
- Cameron, Gelbach, and Miller (2008) showed that even clustered SEs can be unreliable when the number of clusters is small (< 30–50), leading to the development of the wild cluster bootstrap as a more robust alternative.