Robust vs Clustered SEs: When Observations Aren’t Independent
The independence assumption you forgot about
OLS assumes that observations are independent. But in many real datasets they aren’t:
- Students within the same school share teachers and resources
- Patients within the same hospital get similar care
- Purchases by the same customer are correlated over time
- Employees within the same firm face the same management
When observations within a group (cluster) are correlated, OLS standard errors are too small — often dramatically so. Your t-statistics are inflated, your p-values are too small, and you reject the null far more than 5% of the time.
Three types of standard errors
| SE type | What it assumes | When to use |
|---|---|---|
| OLS (classical) | Errors are i.i.d. (independent, constant variance) | Almost never in practice |
| Robust (HC) | Errors can have different variances, but are independent | Cross-sectional data with heteroskedasticity |
| Clustered | Errors can be correlated within clusters | Any grouped/panel data |
Robust SEs fix heteroskedasticity but not within-cluster correlation. Clustered SEs fix both. If your data has clusters, you need clustered SEs.
Simulation 1: Clustered data and wrong SEs
Generate data with students nested in schools. Within each school, outcomes are correlated (shared school effect). Compare the three types of standard errors.
#| standalone: true
#| viewerHeight: 620
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
.good { color: #27ae60; font-weight: bold; }
.bad { color: #e74c3c; font-weight: bold; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("n_clusters", "Number of schools:",
min = 10, max = 100, value = 30, step = 5),
sliderInput("cluster_size", "Students per school:",
min = 5, max = 50, value = 20, step = 5),
sliderInput("icc", "ICC (intra-cluster correlation):",
min = 0, max = 0.8, value = 0.3, step = 0.05),
helpText("ICC = share of total variance due to the
school-level component. Higher = more clustering."),
actionButton("go", "New draw",
class = "btn-primary", width = "100%"),
uiOutput("results")
),
mainPanel(
width = 9,
fluidRow(
column(6, plotOutput("scatter", height = "400px")),
column(6, plotOutput("se_compare", height = "400px"))
)
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$go
G <- input$n_clusters
m <- input$cluster_size
icc <- input$icc
# Variance decomposition: total var = 1
sigma_b <- sqrt(icc) # between-cluster SD
sigma_w <- sqrt(1 - icc) # within-cluster SD
# Generate clustered data (no true effect: beta = 0)
cluster_id <- rep(seq_len(G), each = m)
school_effect <- rep(rnorm(G, sd = sigma_b), each = m)
x <- rnorm(G * m)
y <- 0 * x + school_effect + rnorm(G * m, sd = sigma_w)
fit <- lm(y ~ x)
b_hat <- coef(fit)[2]
n <- G * m
# OLS SE
ols_se <- summary(fit)$coefficients[2, 2]
# Robust SE (HC1)
r <- resid(fit)
X <- cbind(1, x)
bread <- solve(t(X) %*% X)
meat_hc <- t(X) %*% diag(r^2 * n / (n - 2)) %*% X
robust_se <- sqrt((bread %*% meat_hc %*% bread)[2, 2])
# Clustered SE (Liang-Zeger)
meat_cl <- matrix(0, 2, 2)
for (g in seq_len(G)) {
idx <- which(cluster_id == g)
Xg <- X[idx, , drop = FALSE]
rg <- r[idx]
meat_cl <- meat_cl + t(Xg) %*% (rg %*% t(rg)) %*% Xg
}
adj <- G / (G - 1) * (n - 1) / (n - 2)
cl_vcov <- adj * bread %*% meat_cl %*% bread
clustered_se <- sqrt(cl_vcov[2, 2])
list(x = x, y = y, cluster_id = cluster_id, fit = fit,
b_hat = b_hat, ols_se = ols_se, robust_se = robust_se,
clustered_se = clustered_se, G = G, m = m, icc = icc)
})
output$scatter <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
# Color by cluster (cycle through colors)
palette <- rep(c("#e74c3c", "#3498db", "#27ae60", "#9b59b6",
"#e67e22", "#1abc9c", "#34495e", "#f39c12"),
length.out = d$G)
cols <- palette[d$cluster_id]
plot(d$x, d$y, pch = 16, col = adjustcolor(cols, 0.5), cex = 0.6,
xlab = "X", ylab = "Y",
main = paste0(d$G, " schools, ", d$m, " students each"))
abline(d$fit, col = "#2c3e50", lwd = 2.5)
abline(h = 0, lty = 3, col = "gray60")
})
output$se_compare <- renderPlot({
d <- dat()
par(mar = c(4.5, 8, 3, 1))
ses <- c(d$ols_se, d$robust_se, d$clustered_se)
names_se <- c("OLS SE", "Robust SE", "Clustered SE")
cols <- c("#e74c3c", "#e67e22", "#27ae60")
# CIs
lo <- d$b_hat - 1.96 * ses
hi <- d$b_hat + 1.96 * ses
xlim <- range(c(lo, hi, 0)) + c(-0.3, 0.3)
plot(NULL, xlim = xlim, ylim = c(0.5, 3.5),
yaxt = "n", ylab = "", xlab = expression(hat(beta)),
main = "95% CIs with different SEs")
axis(2, at = 1:3, labels = names_se, las = 1, cex.axis = 0.85)
for (i in 1:3) {
segments(lo[i], i, hi[i], i, lwd = 4, col = cols[i])
points(d$b_hat, i, pch = 19, cex = 1.5, col = cols[i])
}
abline(v = 0, lty = 2, lwd = 2, col = "#2c3e50")
text(0, 3.4, expression("True " * beta * " = 0"), cex = 0.9)
})
output$results <- renderUI({
d <- dat()
ratio <- d$clustered_se / d$ols_se
tags$div(class = "stats-box",
HTML(paste0(
"<b>", expression("hat(beta)"), ":</b> ",
round(d$b_hat, 4), "<br>",
"<b>OLS SE:</b> <span class='bad'>",
round(d$ols_se, 4), "</span><br>",
"<b>Robust SE:</b> ",
round(d$robust_se, 4), "<br>",
"<b>Clustered SE:</b> <span class='good'>",
round(d$clustered_se, 4), "</span><br>",
"<hr style='margin:8px 0'>",
"<b>Cluster/OLS ratio:</b> ", round(ratio, 2), "x<br>",
"<small>With ICC = ", d$icc, ", clustered SE is<br>",
round(ratio, 1), "x larger than OLS SE.</small>"
))
)
})
}
shinyApp(ui, server)
Simulation 2: Rejection rates
The real test: run 500 experiments where the true effect is zero. How often does each type of SE incorrectly reject H₀ at the 5% level? OLS SEs reject far too often. Clustered SEs get it right.
#| standalone: true
#| viewerHeight: 520
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
.good { color: #27ae60; font-weight: bold; }
.bad { color: #e74c3c; font-weight: bold; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
sliderInput("G2", "Number of clusters:",
min = 10, max = 100, value = 30, step = 5),
sliderInput("m2", "Obs per cluster:",
min = 5, max = 50, value = 20, step = 5),
sliderInput("icc2", "ICC:",
min = 0, max = 0.8, value = 0.3, step = 0.05),
sliderInput("sims", "Number of simulations:",
min = 200, max = 1000, value = 500, step = 100),
actionButton("go2", "Run simulations",
class = "btn-primary", width = "100%"),
uiOutput("results2")
),
mainPanel(
width = 9,
plotOutput("rejection_plot", height = "420px")
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$go2
G <- input$G2
m <- input$m2
icc <- input$icc2
sims <- input$sims
sigma_b <- sqrt(icc)
sigma_w <- sqrt(1 - icc)
n <- G * m
reject_ols <- 0
reject_robust <- 0
reject_cluster <- 0
for (s in seq_len(sims)) {
cluster_id <- rep(seq_len(G), each = m)
school_eff <- rep(rnorm(G, sd = sigma_b), each = m)
x <- rnorm(n)
y <- 0 * x + school_eff + rnorm(n, sd = sigma_w)
fit <- lm(y ~ x)
b_hat <- coef(fit)[2]
r <- resid(fit)
X <- cbind(1, x)
# OLS
ols_se <- summary(fit)$coefficients[2, 2]
# Robust
bread <- solve(t(X) %*% X)
meat_hc <- t(X) %*% diag(r^2 * n / (n - 2)) %*% X
robust_se <- sqrt((bread %*% meat_hc %*% bread)[2, 2])
# Clustered
meat_cl <- matrix(0, 2, 2)
for (g in seq_len(G)) {
idx <- which(cluster_id == g)
Xg <- X[idx, , drop = FALSE]
rg <- r[idx]
meat_cl <- meat_cl + t(Xg) %*% (rg %*% t(rg)) %*% Xg
}
adj <- G / (G - 1) * (n - 1) / (n - 2)
cl_se <- sqrt((adj * bread %*% meat_cl %*% bread)[2, 2])
if (abs(b_hat / ols_se) > 1.96) reject_ols <- reject_ols + 1
if (abs(b_hat / robust_se) > 1.96) reject_robust <- reject_robust + 1
if (abs(b_hat / cl_se) > 1.96) reject_cluster <- reject_cluster + 1
}
list(rej_ols = reject_ols / sims,
rej_robust = reject_robust / sims,
rej_cluster = reject_cluster / sims,
sims = sims, G = G, m = m, icc = icc)
})
output$rejection_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 8, 3, 1))
rates <- c(d$rej_ols, d$rej_robust, d$rej_cluster) * 100
names_se <- c("OLS SE", "Robust SE", "Clustered SE")
cols <- c("#e74c3c", "#e67e22", "#27ae60")
bp <- barplot(rates, names.arg = names_se, col = cols,
horiz = TRUE, xlim = c(0, max(rates, 10) * 1.3),
xlab = "Rejection rate (%)",
main = paste0("False rejection rate at 5% level\n(",
d$sims, " simulations, ICC = ",
d$icc, ")"),
las = 1, border = NA)
abline(v = 5, lty = 2, lwd = 2.5, col = "#2c3e50")
text(5, max(bp) + 0.6, "Target: 5%", cex = 0.85)
text(rates + 1, bp, paste0(round(rates, 1), "%"),
cex = 0.9, adj = 0)
})
output$results2 <- renderUI({
d <- dat()
tags$div(class = "stats-box",
HTML(paste0(
"<b>OLS rejection rate:</b> <span class='",
ifelse(d$rej_ols > 0.08, "bad", "good"), "'>",
round(d$rej_ols * 100, 1), "%</span><br>",
"<b>Robust rejection rate:</b> <span class='",
ifelse(d$rej_robust > 0.08, "bad", "good"), "'>",
round(d$rej_robust * 100, 1), "%</span><br>",
"<b>Clustered rejection rate:</b> <span class='good'>",
round(d$rej_cluster * 100, 1), "%</span><br>",
"<hr style='margin:8px 0'>",
"<small>True \u03b2 = 0 in all simulations.<br>",
"Correct rejection rate is 5%.</small>"
))
)
})
}
shinyApp(ui, server)
Things to try
- ICC = 0 (no clustering): all three SEs give ~5% rejection. When there’s no within-cluster correlation, OLS SEs are fine.
- ICC = 0.3: OLS rejection shoots up to ~15–25%. Robust SEs help a little but not enough. Clustered SEs stay near 5%.
- ICC = 0.5: OLS can reject 30–40% of the time — catastrophic.
- Increase cluster size, hold clusters fixed: the problem gets worse with more observations per cluster. More correlated data doesn’t help — it just gives you a false sense of precision.
- Increase number of clusters: this does help. The effective sample size for clustered data is closer to the number of clusters than the number of observations.
Code reference
| Software | Clustered SEs |
|---|---|
| R | lmtest::coeftest(fit, vcov = sandwich::vcovCL, cluster = ~school) |
| Stata | reg y x, cluster(school) |
| Python | sm.OLS(y, X).fit(cov_type='cluster', cov_kwds={'groups': school}) |
The practical rule
- Always cluster at the level of treatment assignment. If you randomized schools, cluster by school. If you randomized classrooms, cluster by classroom.
- When in doubt, cluster at the highest level that makes sense. Clustering too aggressively (too few clusters) can be a problem, but not clustering when you should is always worse.
- The minimum number of clusters for reliable clustered SEs is roughly 30–50. With fewer clusters, consider the wild cluster bootstrap.
Did you know?
- Brent Moulton (1990) published a now-classic paper showing that regressions using aggregate variables (like state-level policies) with individual-level data produce t-statistics that are wildly inflated if you ignore clustering. He called it the “Moulton problem.” Many published results in economics were likely spurious because of this.
- The Moulton factor — the ratio of the true variance to the (too-small) OLS variance — is approximately \(1 + (m-1) \times ICC\), where \(m\) is the average cluster size. With 50 students per school and ICC = 0.2, the Moulton factor is about 11, meaning your OLS SE is \(\sqrt{11} \approx 3.3\) times too small. Your t-statistic is 3x too large.
- Cameron, Gelbach, and Miller (2008) showed that even clustered SEs can be unreliable when the number of clusters is small (< 30–50), leading to the development of the wild cluster bootstrap as a more robust alternative.