Robust vs Clustered SEs: When Observations Aren’t Independent

The independence assumption you forgot about

OLS assumes that observations are independent. But in many real datasets they aren’t:

Students within the same school share teachers and resources
Patients within the same hospital get similar care
Purchases by the same customer are correlated over time
Employees within the same firm face the same management

When observations within a group (cluster) are correlated, OLS standard errors are too small — often dramatically so. Your t-statistics are inflated, your p-values are too small, and you reject the null far more than 5% of the time.

Three types of standard errors

SE type	What it assumes	When to use
OLS (classical)	Errors are i.i.d. (independent, constant variance)	Almost never in practice
Robust (HC)	Errors can have different variances, but are independent	Cross-sectional data with heteroskedasticity
Clustered	Errors can be correlated within clusters	Any grouped/panel data

Robust SEs fix heteroskedasticity but not within-cluster correlation. Clustered SEs fix both. If your data has clusters, you need clustered SEs.

The Oracle View. In these simulations, we set the cluster structure and the intra-cluster correlation (ICC) — we know exactly how much correlation exists within groups. In practice, you identify clusters from your study design (schools, hospitals, firms) and let the clustered SE estimator handle the rest without knowing the true ICC.

Simulation 1: Clustered data and wrong SEs

Generate data with students nested in schools. Within each school, outcomes are correlated (shared school effect). Compare the three types of standard errors.

#| standalone: true
#| viewerHeight: 620

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("n_clusters", "Number of schools:",
                  min = 10, max = 100, value = 30, step = 5),

      sliderInput("cluster_size", "Students per school:",
                  min = 5, max = 50, value = 20, step = 5),

      sliderInput("icc", "ICC (intra-cluster correlation):",
                  min = 0, max = 0.8, value = 0.3, step = 0.05),

      helpText("ICC = share of total variance due to the
               school-level component. Higher = more clustering."),

      actionButton("go", "New draw",
                   class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(6, plotOutput("scatter", height = "400px")),
        column(6, plotOutput("se_compare", height = "400px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    G  <- input$n_clusters
    m  <- input$cluster_size
    icc <- input$icc

    # Variance decomposition: total var = 1
    sigma_b <- sqrt(icc)        # between-cluster SD
    sigma_w <- sqrt(1 - icc)    # within-cluster SD

    # Generate clustered data (no true effect: beta = 0)
    cluster_id <- rep(seq_len(G), each = m)
    school_effect <- rep(rnorm(G, sd = sigma_b), each = m)
    x <- rnorm(G * m)
    y <- 0 * x + school_effect + rnorm(G * m, sd = sigma_w)

    fit <- lm(y ~ x)
    b_hat <- coef(fit)[2]
    n <- G * m

    # OLS SE
    ols_se <- summary(fit)$coefficients[2, 2]

    # Robust SE (HC1)
    r <- resid(fit)
    X <- cbind(1, x)
    bread <- solve(t(X) %*% X)
    meat_hc <- t(X) %*% diag(r^2 * n / (n - 2)) %*% X
    robust_se <- sqrt((bread %*% meat_hc %*% bread)[2, 2])

    # Clustered SE (Liang-Zeger)
    meat_cl <- matrix(0, 2, 2)
    for (g in seq_len(G)) {
      idx <- which(cluster_id == g)
      Xg <- X[idx, , drop = FALSE]
      rg <- r[idx]
      meat_cl <- meat_cl + t(Xg) %*% (rg %*% t(rg)) %*% Xg
    }
    adj <- G / (G - 1) * (n - 1) / (n - 2)
    cl_vcov <- adj * bread %*% meat_cl %*% bread
    clustered_se <- sqrt(cl_vcov[2, 2])

    list(x = x, y = y, cluster_id = cluster_id, fit = fit,
         b_hat = b_hat, ols_se = ols_se, robust_se = robust_se,
         clustered_se = clustered_se, G = G, m = m, icc = icc)
  })

  output$scatter <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    # Color by cluster (cycle through colors)
    palette <- rep(c("#e74c3c", "#3498db", "#27ae60", "#9b59b6",
                     "#e67e22", "#1abc9c", "#34495e", "#f39c12"),
                   length.out = d$G)
    cols <- palette[d$cluster_id]

    plot(d$x, d$y, pch = 16, col = adjustcolor(cols, 0.5), cex = 0.6,
         xlab = "X", ylab = "Y",
         main = paste0(d$G, " schools, ", d$m, " students each"))
    abline(d$fit, col = "#2c3e50", lwd = 2.5)
    abline(h = 0, lty = 3, col = "gray60")
  })

  output$se_compare <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 8, 3, 1))

    ses <- c(d$ols_se, d$robust_se, d$clustered_se)
    names_se <- c("OLS SE", "Robust SE", "Clustered SE")
    cols <- c("#e74c3c", "#e67e22", "#27ae60")

    # CIs
    lo <- d$b_hat - 1.96 * ses
    hi <- d$b_hat + 1.96 * ses

    xlim <- range(c(lo, hi, 0)) + c(-0.3, 0.3)

    plot(NULL, xlim = xlim, ylim = c(0.5, 3.5),
         yaxt = "n", ylab = "", xlab = expression(hat(beta)),
         main = "95% CIs with different SEs")
    axis(2, at = 1:3, labels = names_se, las = 1, cex.axis = 0.85)

    for (i in 1:3) {
      segments(lo[i], i, hi[i], i, lwd = 4, col = cols[i])
      points(d$b_hat, i, pch = 19, cex = 1.5, col = cols[i])
    }

    abline(v = 0, lty = 2, lwd = 2, col = "#2c3e50")
    text(0, 3.4, expression("True " * beta * " = 0"), cex = 0.9)
  })

  output$results <- renderUI({
    d <- dat()
    ratio <- d$clustered_se / d$ols_se

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>", expression("hat(beta)"), ":</b> ",
        round(d$b_hat, 4), "<br>",
        "<b>OLS SE:</b> <span class='bad'>",
        round(d$ols_se, 4), "</span><br>",
        "<b>Robust SE:</b> ",
        round(d$robust_se, 4), "<br>",
        "<b>Clustered SE:</b> <span class='good'>",
        round(d$clustered_se, 4), "</span><br>",
        "<hr style='margin:8px 0'>",
        "<b>Cluster/OLS ratio:</b> ", round(ratio, 2), "x<br>",
        "<small>With ICC = ", d$icc, ", clustered SE is<br>",
        round(ratio, 1), "x larger than OLS SE.</small>"
      ))
    )
  })
}

shinyApp(ui, server)

Simulation 2: Rejection rates

The real test: run 500 experiments where the true effect is zero. How often does each type of SE incorrectly reject H₀ at the 5% level? OLS SEs reject far too often. Clustered SEs get it right.

#| standalone: true
#| viewerHeight: 520

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("G2", "Number of clusters:",
                  min = 10, max = 100, value = 30, step = 5),

      sliderInput("m2", "Obs per cluster:",
                  min = 5, max = 50, value = 20, step = 5),

      sliderInput("icc2", "ICC:",
                  min = 0, max = 0.8, value = 0.3, step = 0.05),

      sliderInput("sims", "Number of simulations:",
                  min = 200, max = 1000, value = 500, step = 100),

      actionButton("go2", "Run simulations",
                   class = "btn-primary", width = "100%"),

      uiOutput("results2")
    ),

    mainPanel(
      width = 9,
      plotOutput("rejection_plot", height = "420px")
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go2
    G    <- input$G2
    m    <- input$m2
    icc  <- input$icc2
    sims <- input$sims

    sigma_b <- sqrt(icc)
    sigma_w <- sqrt(1 - icc)
    n <- G * m

    reject_ols <- 0
    reject_robust <- 0
    reject_cluster <- 0

    for (s in seq_len(sims)) {
      cluster_id <- rep(seq_len(G), each = m)
      school_eff <- rep(rnorm(G, sd = sigma_b), each = m)
      x <- rnorm(n)
      y <- 0 * x + school_eff + rnorm(n, sd = sigma_w)

      fit <- lm(y ~ x)
      b_hat <- coef(fit)[2]
      r <- resid(fit)
      X <- cbind(1, x)

      # OLS
      ols_se <- summary(fit)$coefficients[2, 2]

      # Robust
      bread <- solve(t(X) %*% X)
      meat_hc <- t(X) %*% diag(r^2 * n / (n - 2)) %*% X
      robust_se <- sqrt((bread %*% meat_hc %*% bread)[2, 2])

      # Clustered
      meat_cl <- matrix(0, 2, 2)
      for (g in seq_len(G)) {
        idx <- which(cluster_id == g)
        Xg <- X[idx, , drop = FALSE]
        rg <- r[idx]
        meat_cl <- meat_cl + t(Xg) %*% (rg %*% t(rg)) %*% Xg
      }
      adj <- G / (G - 1) * (n - 1) / (n - 2)
      cl_se <- sqrt((adj * bread %*% meat_cl %*% bread)[2, 2])

      if (abs(b_hat / ols_se) > 1.96)    reject_ols <- reject_ols + 1
      if (abs(b_hat / robust_se) > 1.96) reject_robust <- reject_robust + 1
      if (abs(b_hat / cl_se) > 1.96)     reject_cluster <- reject_cluster + 1
    }

    list(rej_ols = reject_ols / sims,
         rej_robust = reject_robust / sims,
         rej_cluster = reject_cluster / sims,
         sims = sims, G = G, m = m, icc = icc)
  })

  output$rejection_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 8, 3, 1))

    rates <- c(d$rej_ols, d$rej_robust, d$rej_cluster) * 100
    names_se <- c("OLS SE", "Robust SE", "Clustered SE")
    cols <- c("#e74c3c", "#e67e22", "#27ae60")

    bp <- barplot(rates, names.arg = names_se, col = cols,
                  horiz = TRUE, xlim = c(0, max(rates, 10) * 1.3),
                  xlab = "Rejection rate (%)",
                  main = paste0("False rejection rate at 5% level\n(",
                               d$sims, " simulations, ICC = ",
                               d$icc, ")"),
                  las = 1, border = NA)

    abline(v = 5, lty = 2, lwd = 2.5, col = "#2c3e50")
    text(5, max(bp) + 0.6, "Target: 5%", cex = 0.85)

    text(rates + 1, bp, paste0(round(rates, 1), "%"),
         cex = 0.9, adj = 0)
  })

  output$results2 <- renderUI({
    d <- dat()
    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>OLS rejection rate:</b> <span class='",
        ifelse(d$rej_ols > 0.08, "bad", "good"), "'>",
        round(d$rej_ols * 100, 1), "%</span><br>",
        "<b>Robust rejection rate:</b> <span class='",
        ifelse(d$rej_robust > 0.08, "bad", "good"), "'>",
        round(d$rej_robust * 100, 1), "%</span><br>",
        "<b>Clustered rejection rate:</b> <span class='good'>",
        round(d$rej_cluster * 100, 1), "%</span><br>",
        "<hr style='margin:8px 0'>",
        "<small>True \u03b2 = 0 in all simulations.<br>",
        "Correct rejection rate is 5%.</small>"
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

ICC = 0 (no clustering): all three SEs give ~5% rejection. When there’s no within-cluster correlation, OLS SEs are fine.
ICC = 0.3: OLS rejection shoots up to ~15–25%. Robust SEs help a little but not enough. Clustered SEs stay near 5%.
ICC = 0.5: OLS can reject 30–40% of the time — catastrophic.
Increase cluster size, hold clusters fixed: the problem gets worse with more observations per cluster. More correlated data doesn’t help — it just gives you a false sense of precision.
Increase number of clusters: this does help. The effective sample size for clustered data is closer to the number of clusters than the number of observations.

Code reference

Software	Clustered SEs
R	`lmtest::coeftest(fit, vcov = sandwich::vcovCL, cluster = ~school)`
Stata	`reg y x, cluster(school)`
Python	`sm.OLS(y, X).fit(cov_type='cluster', cov_kwds={'groups': school})`

The practical rule

Always cluster at the level of treatment assignment. If you randomized schools, cluster by school. If you randomized classrooms, cluster by classroom.
When in doubt, cluster at the highest level that makes sense. Clustering too aggressively (too few clusters) can be a problem, but not clustering when you should is always worse.
The minimum number of clusters for reliable clustered SEs is roughly 30–50. With fewer clusters, consider the wild cluster bootstrap.

Did you know?

Brent Moulton (1990) published a now-classic paper showing that regressions using aggregate variables (like state-level policies) with individual-level data produce t-statistics that are wildly inflated if you ignore clustering. He called it the “Moulton problem.” Many published results in economics were likely spurious because of this.
The Moulton factor — the ratio of the true variance to the (too-small) OLS variance — is approximately \(1 + (m-1) \times ICC\), where \(m\) is the average cluster size. With 50 students per school and ICC = 0.2, the Moulton factor is about 11, meaning your OLS SE is \(\sqrt{11} \approx 3.3\) times too small. Your t-statistic is 3x too large.
Cameron, Gelbach, and Miller (2008) showed that even clustered SEs can be unreliable when the number of clusters is small (< 30–50), leading to the development of the wild cluster bootstrap as a more robust alternative.