LLN vs CLT: Why Averages Stabilize Before They Become Normal
Two theorems, two different promises
The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) are the two pillars of statistics, but they say different things:
| LLN | CLT | |
|---|---|---|
| What it says | Sample mean converges to the true mean | Distribution of the (scaled) sample mean is approximately normal |
| Object | \(\bar{X}_1, \bar{X}_2, \bar{X}_3, \dots\) — one sequence, growing \(n\) | \(\bar{X}^{(1)}, \bar{X}^{(2)}, \bar{X}^{(3)}, \dots\) — many samples, fixed \(n\) |
| Key result | \(\bar{X}_n \xrightarrow{p} \mu\) | \(\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} N(0, \sigma^2)\) |
| Interpretation | Consistency | Asymptotic normality |
| Promise | You get the right answer | You know how uncertain your estimate is |
| Requires | Finite mean | Finite variance |
LLN says the mean settles down. CLT says where it settles follows a normal distribution.
Think of it this way: LLN tells you that if you flip a fair coin enough times, the fraction of heads approaches 0.5. CLT tells you that if you repeat the entire experiment many times, the distribution of those fractions is bell-shaped.
Simulation 1: The running average
Watch one running average converge to the true mean as you add observations one at a time. This is LLN in action — the cumulative mean stabilizes. Try different distributions and notice that convergence happens regardless of the population shape.
#| standalone: true
#| viewerHeight: 620
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
selectInput("pop", "Population:",
choices = c("Uniform(0, 1)",
"Exponential(1)",
"Bimodal",
"Bernoulli(0.3)",
"Heavy-tailed (t, df=2)")),
sliderInput("n", "Number of observations:",
min = 50, max = 5000, value = 500, step = 50),
sliderInput("paths", "Number of paths:",
min = 1, max = 10, value = 3, step = 1),
actionButton("go", "Draw new paths",
class = "btn-primary", width = "100%"),
uiOutput("results")
),
mainPanel(
width = 9,
plotOutput("running_avg", height = "500px")
)
)
)
server <- function(input, output, session) {
draw_pop <- function(n, pop) {
switch(pop,
"Uniform(0, 1)" = runif(n),
"Exponential(1)" = rexp(n, rate = 1),
"Bimodal" = {
k <- rbinom(n, 1, 0.5)
k * rnorm(n, -2, 0.6) + (1 - k) * rnorm(n, 2, 0.6)
},
"Bernoulli(0.3)" = rbinom(n, 1, 0.3),
"Heavy-tailed (t, df=2)" = rt(n, df = 2)
)
}
pop_mu <- function(pop) {
switch(pop,
"Uniform(0, 1)" = 0.5,
"Exponential(1)" = 1,
"Bimodal" = 0,
"Bernoulli(0.3)" = 0.3,
"Heavy-tailed (t, df=2)" = 0
)
}
dat <- reactive({
input$go
n <- input$n
paths <- input$paths
pop <- input$pop
mu <- pop_mu(pop)
all_paths <- lapply(seq_len(paths), function(i) {
x <- draw_pop(n, pop)
cumsum(x) / seq_along(x)
})
list(all_paths = all_paths, n = n, mu = mu, pop = pop, paths = paths)
})
output$running_avg <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
ylim <- range(unlist(d$all_paths))
ylim <- ylim + c(-1, 1) * 0.1 * diff(ylim)
cols <- c("#e74c3c", "#3498db", "#27ae60", "#9b59b6", "#e67e22",
"#1abc9c", "#34495e", "#f39c12", "#2ecc71", "#c0392b")
plot(NULL, xlim = c(1, d$n), ylim = ylim,
xlab = "Number of observations",
ylab = "Cumulative mean",
main = paste0("LLN: Running average converges to \u03bc = ", d$mu))
abline(h = d$mu, lty = 2, lwd = 2.5, col = "#2c3e50")
for (i in seq_along(d$all_paths)) {
lines(seq_along(d$all_paths[[i]]), d$all_paths[[i]],
col = adjustcolor(cols[i], 0.7), lwd = 1.5)
}
legend("topright", bty = "n", cex = 0.9,
legend = c(paste0("True mean (\u03bc = ", d$mu, ")"),
paste0(d$paths, " sample path(s)")),
col = c("#2c3e50", cols[1]),
lwd = c(2.5, 1.5), lty = c(2, 1))
})
output$results <- renderUI({
d <- dat()
final_means <- sapply(d$all_paths, function(p) p[length(p)])
tags$div(class = "stats-box",
HTML(paste0(
"<b>True mean:</b> ", d$mu, "<br>",
"<b>Final running avg(s):</b><br>",
paste(round(final_means, 4), collapse = ", "), "<br>",
"<b>Max deviation:</b> ",
round(max(abs(final_means - d$mu)), 4)
))
)
})
}
shinyApp(ui, server)
Simulation 2: LLN vs CLT side by side
Now see both theorems at once. The left panel shows LLN — one running average converging to \(\mu\). The right panel shows CLT — the histogram of many sample means forming a normal distribution. Same data, different questions.
#| standalone: true
#| viewerHeight: 580
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
selectInput("pop2", "Population:",
choices = c("Uniform(0, 1)",
"Exponential(1)",
"Bimodal",
"Bernoulli(0.3)")),
sliderInput("n2", "Sample size per experiment:",
min = 5, max = 200, value = 30, step = 5),
sliderInput("reps2", "Number of experiments:",
min = 200, max = 3000, value = 1000, step = 100),
actionButton("go2", "Run experiments",
class = "btn-primary", width = "100%"),
uiOutput("results2")
),
mainPanel(
width = 9,
fluidRow(
column(6, plotOutput("lln_plot", height = "430px")),
column(6, plotOutput("clt_plot", height = "430px"))
)
)
)
)
server <- function(input, output, session) {
draw_pop <- function(n, pop) {
switch(pop,
"Uniform(0, 1)" = runif(n),
"Exponential(1)" = rexp(n, rate = 1),
"Bimodal" = {
k <- rbinom(n, 1, 0.5)
k * rnorm(n, -2, 0.6) + (1 - k) * rnorm(n, 2, 0.6)
},
"Bernoulli(0.3)" = rbinom(n, 1, 0.3)
)
}
pop_mu <- function(pop) {
switch(pop,
"Uniform(0, 1)" = 0.5,
"Exponential(1)" = 1,
"Bimodal" = 0,
"Bernoulli(0.3)" = 0.3
)
}
pop_sigma <- function(pop) {
switch(pop,
"Uniform(0, 1)" = sqrt(1 / 12),
"Exponential(1)" = 1,
"Bimodal" = sqrt(0.6^2 + 4),
"Bernoulli(0.3)" = sqrt(0.3 * 0.7)
)
}
dat <- reactive({
input$go2
n <- input$n2
reps <- input$reps2
pop <- input$pop2
mu <- pop_mu(pop)
sig <- pop_sigma(pop)
# One long draw for LLN
long_draw <- draw_pop(n * 5, pop)
running <- cumsum(long_draw) / seq_along(long_draw)
# Many experiments for CLT
means <- replicate(reps, mean(draw_pop(n, pop)))
list(running = running, means = means, mu = mu, sig = sig,
n = n, reps = reps, pop = pop)
})
output$lln_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
plot(seq_along(d$running), d$running, type = "l",
col = "#3498db", lwd = 1.5,
xlab = "Number of observations",
ylab = "Cumulative mean",
main = "LLN: One running average")
abline(h = d$mu, lty = 2, lwd = 2.5, col = "#e74c3c")
legend("topright", bty = "n", cex = 0.85,
legend = c("Running average",
paste0("True \u03bc = ", d$mu)),
col = c("#3498db", "#e74c3c"),
lwd = c(1.5, 2.5), lty = c(1, 2))
})
output$clt_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
hist(d$means, breaks = 40, probability = TRUE,
col = "#3498db30", border = "#3498db80",
main = paste0("CLT: ", d$reps, " sample means (n=", d$n, ")"),
xlab = "Sample mean", ylab = "Density")
x_seq <- seq(min(d$means), max(d$means), length.out = 300)
se <- d$sig / sqrt(d$n)
lines(x_seq, dnorm(x_seq, mean = d$mu, sd = se),
col = "#e74c3c", lwd = 2.5)
abline(v = d$mu, lty = 2, lwd = 2, col = "#2c3e50")
legend("topright", bty = "n", cex = 0.85,
legend = c("Sampling distribution",
paste0("N(\u03bc, \u03c3/\u221an) overlay"),
paste0("True \u03bc = ", d$mu)),
col = c("#3498db80", "#e74c3c", "#2c3e50"),
lwd = c(8, 2.5, 2), lty = c(1, 1, 2))
})
output$results2 <- renderUI({
d <- dat()
se <- d$sig / sqrt(d$n)
tags$div(class = "stats-box",
HTML(paste0(
"<b>LLN (left):</b><br>",
"Final avg: ", round(d$running[length(d$running)], 4), "<br>",
"True \u03bc: ", d$mu, "<br>",
"<hr style='margin:8px 0'>",
"<b>CLT (right):</b><br>",
"Mean of means: ", round(mean(d$means), 4), "<br>",
"SD of means: ", round(sd(d$means), 4), "<br>",
"Theoretical SE: ", round(se, 4)
))
)
})
}
shinyApp(ui, server)
Things to try
- Exponential, n = 5: the LLN panel shows the running average still wobbling, while the CLT histogram is visibly skewed. Neither theorem has fully “kicked in” yet — but they will at different rates.
- Exponential, n = 100: the running average has settled (LLN), and the histogram is now bell-shaped (CLT). Both theorems have done their job.
- Bimodal: the running average converges to 0 (the midpoint), even though no individual observation is near 0. LLN doesn’t care about the shape.
- Multiple paths (Sim 1): all paths converge to the same \(\mu\), but they take different routes. The early wobble is sampling variability; the eventual convergence is LLN.
The bottom line
- LLN = your estimate gets closer to the truth as \(n\) grows. It’s about accuracy of a single estimate.
- CLT = the shape of the sampling distribution is approximately normal. It’s about the distribution of the estimator across repeated experiments.
- You need LLN before CLT matters. If the mean hasn’t stabilized, knowing its distribution is normal doesn’t help much.
When does each theorem kick in?
LLN and CLT are both asymptotic results — they describe what happens as \(n \to \infty\). But in practice, they kick in at different speeds, and the speed depends on the population you’re sampling from.
LLN kicks in early. As soon as you have a few dozen observations, the sample mean is usually close to \(\mu\). The only requirement is a finite first moment (\(E[|X|] < \infty\)). Even heavily skewed distributions settle down quickly — the running average doesn’t care about the shape of the population, only that the mean exists.
CLT kicks in later, and the speed depends on the population shape. The CLT is about the distribution of \(\bar{X}_n\) converging to a bell curve — a stronger claim that requires more data. How much more depends on how non-normal the population is:
| Population | CLT kicks in at roughly… | Why |
|---|---|---|
| Symmetric (Normal, Uniform) | \(n \approx 5\text{--}10\) | Already close to normal; little work for CLT to do |
| Moderately skewed (Exponential, \(\chi^2\)) | \(n \approx 30\text{--}50\) | Skewness needs to be averaged away |
| Heavily skewed (Bernoulli(0.05), Log-normal) | \(n \approx 100\text{--}500\) | Extreme skewness or heavy tails slow convergence |
| No finite variance (Cauchy) | Never | CLT requires \(\text{Var}(X) < \infty\); without it, the theorem doesn’t apply |
The folklore rule “\(n \geq 30\) is enough” comes from CLT, not LLN. It’s roughly when the normal approximation becomes decent for moderately skewed populations. But it’s a rule of thumb, not a theorem — symmetric populations need far less, and heavily skewed populations need far more.
Why does skewness slow things down?
The CLT convergence rate is governed by the Berry-Esseen bound:
\[ \sup_z \left| P\!\left(\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \leq z\right) - \Phi(z) \right| \;\leq\; \frac{C \cdot \rho}{\sigma^3 \sqrt{n}} \]
where \(\rho = E[|X - \mu|^3]\) is the third absolute moment. The ratio \(\rho / \sigma^3\) measures how “non-normal” the population is — distributions with heavy tails or strong skewness have a large \(\rho / \sigma^3\), which means the bound shrinks more slowly with \(n\). That’s the mathematical reason skewed distributions need more data before CLT kicks in.
Simulation 3: watch CLT kick in at different speeds
Pick a population and see how the sampling distribution of \(\bar{X}_n\) (standardised) compares to the \(N(0, 1)\) curve at four sample sizes. The KS distance measures how far the sampling distribution is from normal — smaller = closer.
#| standalone: true
#| viewerHeight: 620
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
selectInput("pop3", "Population:",
choices = c(
"Normal(0, 1)" = "normal",
"Uniform(0, 1)" = "uniform",
"Exponential(1)" = "exp",
"Chi-squared(3)" = "chisq",
"Bernoulli(0.1)" = "bern",
"Log-normal(0,1)" = "lognorm"
),
selected = "exp"
),
actionButton("run3", "Run 2,000 experiments",
style = "width:100%; margin-top:10px;
background:#3498db; color:white;
border:none; padding:10px; font-weight:bold;"),
uiOutput("stats3")
),
mainPanel(
width = 9,
plotOutput("grid_plot", height = "520px")
)
)
)
server <- function(input, output, session) {
sim_data <- reactiveVal(NULL)
observeEvent(input$run3, {
pop <- input$pop3
B <- 2000
ns <- c(5, 30, 100, 500)
draw <- function(n) {
switch(pop,
normal = rnorm(n),
uniform = runif(n),
exp = rexp(n, 1),
chisq = rchisq(n, df = 3),
bern = rbinom(n, 1, 0.1),
lognorm = rlnorm(n, 0, 1))
}
mu <- switch(pop,
normal = 0, uniform = 0.5, exp = 1,
chisq = 3, bern = 0.1, lognorm = exp(0.5))
sig <- switch(pop,
normal = 1, uniform = sqrt(1/12), exp = 1,
chisq = sqrt(6), bern = sqrt(0.1 * 0.9),
lognorm = sqrt((exp(1) - 1) * exp(1)))
results <- lapply(ns, function(n) {
means <- replicate(B, mean(draw(n)))
list(means = means, n = n)
})
sim_data(list(results = results, mu = mu, sig = sig,
pop = pop, ns = ns))
})
output$grid_plot <- renderPlot({
d <- sim_data()
if (is.null(d)) {
plot.new()
text(0.5, 0.5, "Press the button to run",
cex = 1.4, col = "#7f8c8d")
return()
}
par(mfrow = c(2, 2), mar = c(4, 4, 3.5, 1), oma = c(0, 0, 2, 0))
pop_label <- switch(d$pop,
normal = "Normal(0, 1)", uniform = "Uniform(0, 1)",
exp = "Exponential(1)", chisq = "\u03c7\u00b2(3)",
bern = "Bernoulli(0.1)", lognorm = "Log-normal(0, 1)")
for (i in seq_along(d$results)) {
r <- d$results[[i]]
se <- d$sig / sqrt(r$n)
z <- (r$means - d$mu) / se
hist(z, breaks = 40, freq = FALSE, col = "#dfe6e9",
border = "#b2bec3", xlim = c(-4, 4),
main = paste0("n = ", r$n),
xlab = "Standardised mean", ylab = "Density",
cex.main = 1.3)
xseq <- seq(-4, 4, length.out = 300)
lines(xseq, dnorm(xseq), lwd = 2.5, col = "#e74c3c", lty = 2)
ks <- suppressWarnings(ks.test(z, "pnorm")$statistic)
legend("topright", bty = "n", cex = 0.85,
legend = paste0("KS = ", sprintf("%.3f", ks)),
text.col = if (ks < 0.03) "#27ae60" else "#e74c3c")
}
mtext(paste0("CLT convergence: ", pop_label),
side = 3, outer = TRUE, cex = 1.2, font = 2)
})
output$stats3 <- renderUI({
d <- sim_data()
if (is.null(d)) return(NULL)
ks_vals <- sapply(d$results, function(r) {
se <- d$sig / sqrt(r$n)
z <- (r$means - d$mu) / se
suppressWarnings(ks.test(z, "pnorm")$statistic)
})
tags$div(class = "stats-box",
HTML(paste0(
"<b>KS distance from N(0,1):</b><br>",
paste(paste0("n = ", d$ns, ": <b>",
sprintf("%.3f", ks_vals), "</b>"),
collapse = "<br>"),
"<br><hr style='margin:8px 0'>",
"<small>KS → 0 means the<br>",
"sampling distribution is<br>",
"converging to normal.</small>"
))
)
})
}
shinyApp(ui, server)
Did you know?
- Jacob Bernoulli proved the first version of the Law of Large Numbers in his book Ars Conjectandi, published posthumously in 1713. He called it his “golden theorem” and spent over 20 years working on the proof. The result seems obvious in hindsight — of course averages converge — but rigorously proving why required entirely new mathematical machinery.
- There are actually two versions: the Weak LLN (convergence in probability, proved by Bernoulli and later Chebyshev) and the Strong LLN (almost sure convergence, proved by Kolmogorov in 1933). The strong version says that convergence happens with probability 1, not just “usually.”
- The heavy-tailed \(t\)-distribution with \(\text{df} = 1\) (the Cauchy distribution) has no finite mean, so LLN doesn’t apply. The running average never settles down, no matter how much data you collect. Try it in the simulation above — it keeps jumping around forever.