LLN vs CLT: Why Averages Stabilize Before They Become Normal
Two theorems, two different promises
The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) are the two pillars of statistics, but they say different things:
| LLN | CLT | |
|---|---|---|
| What it says | The sample mean converges to \(\mu\) | The distribution of sample means is approximately normal |
| About | One running average | Many sample averages |
| Promise | Your estimate gets close to the truth | You know the shape of the uncertainty |
| Requires | Finite mean | Finite variance |
LLN says the mean settles down. CLT says where it settles follows a normal distribution.
Think of it this way: LLN tells you that if you flip a fair coin enough times, the fraction of heads approaches 0.5. CLT tells you that if you repeat the entire experiment many times, the distribution of those fractions is bell-shaped.
Simulation 1: The running average
Watch one running average converge to the true mean as you add observations one at a time. This is LLN in action — the cumulative mean stabilizes. Try different distributions and notice that convergence happens regardless of the population shape.
#| standalone: true
#| viewerHeight: 620
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
selectInput("pop", "Population:",
choices = c("Uniform(0, 1)",
"Exponential(1)",
"Bimodal",
"Bernoulli(0.3)",
"Heavy-tailed (t, df=2)")),
sliderInput("n", "Number of observations:",
min = 50, max = 5000, value = 500, step = 50),
sliderInput("paths", "Number of paths:",
min = 1, max = 10, value = 3, step = 1),
actionButton("go", "Draw new paths",
class = "btn-primary", width = "100%"),
uiOutput("results")
),
mainPanel(
width = 9,
plotOutput("running_avg", height = "500px")
)
)
)
server <- function(input, output, session) {
draw_pop <- function(n, pop) {
switch(pop,
"Uniform(0, 1)" = runif(n),
"Exponential(1)" = rexp(n, rate = 1),
"Bimodal" = {
k <- rbinom(n, 1, 0.5)
k * rnorm(n, -2, 0.6) + (1 - k) * rnorm(n, 2, 0.6)
},
"Bernoulli(0.3)" = rbinom(n, 1, 0.3),
"Heavy-tailed (t, df=2)" = rt(n, df = 2)
)
}
pop_mu <- function(pop) {
switch(pop,
"Uniform(0, 1)" = 0.5,
"Exponential(1)" = 1,
"Bimodal" = 0,
"Bernoulli(0.3)" = 0.3,
"Heavy-tailed (t, df=2)" = 0
)
}
dat <- reactive({
input$go
n <- input$n
paths <- input$paths
pop <- input$pop
mu <- pop_mu(pop)
all_paths <- lapply(seq_len(paths), function(i) {
x <- draw_pop(n, pop)
cumsum(x) / seq_along(x)
})
list(all_paths = all_paths, n = n, mu = mu, pop = pop, paths = paths)
})
output$running_avg <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
ylim <- range(unlist(d$all_paths))
ylim <- ylim + c(-1, 1) * 0.1 * diff(ylim)
cols <- c("#e74c3c", "#3498db", "#27ae60", "#9b59b6", "#e67e22",
"#1abc9c", "#34495e", "#f39c12", "#2ecc71", "#c0392b")
plot(NULL, xlim = c(1, d$n), ylim = ylim,
xlab = "Number of observations",
ylab = "Cumulative mean",
main = paste0("LLN: Running average converges to \u03bc = ", d$mu))
abline(h = d$mu, lty = 2, lwd = 2.5, col = "#2c3e50")
for (i in seq_along(d$all_paths)) {
lines(seq_along(d$all_paths[[i]]), d$all_paths[[i]],
col = adjustcolor(cols[i], 0.7), lwd = 1.5)
}
legend("topright", bty = "n", cex = 0.9,
legend = c(paste0("True mean (\u03bc = ", d$mu, ")"),
paste0(d$paths, " sample path(s)")),
col = c("#2c3e50", cols[1]),
lwd = c(2.5, 1.5), lty = c(2, 1))
})
output$results <- renderUI({
d <- dat()
final_means <- sapply(d$all_paths, function(p) p[length(p)])
tags$div(class = "stats-box",
HTML(paste0(
"<b>True mean:</b> ", d$mu, "<br>",
"<b>Final running avg(s):</b><br>",
paste(round(final_means, 4), collapse = ", "), "<br>",
"<b>Max deviation:</b> ",
round(max(abs(final_means - d$mu)), 4)
))
)
})
}
shinyApp(ui, server)
Simulation 2: LLN vs CLT side by side
Now see both theorems at once. The left panel shows LLN — one running average converging to \(\mu\). The right panel shows CLT — the histogram of many sample means forming a normal distribution. Same data, different questions.
#| standalone: true
#| viewerHeight: 580
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.stats-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-top: 12px; font-size: 14px; line-height: 1.9;
}
.stats-box b { color: #2c3e50; }
"))),
sidebarLayout(
sidebarPanel(
width = 3,
selectInput("pop2", "Population:",
choices = c("Uniform(0, 1)",
"Exponential(1)",
"Bimodal",
"Bernoulli(0.3)")),
sliderInput("n2", "Sample size per experiment:",
min = 5, max = 200, value = 30, step = 5),
sliderInput("reps2", "Number of experiments:",
min = 200, max = 3000, value = 1000, step = 100),
actionButton("go2", "Run experiments",
class = "btn-primary", width = "100%"),
uiOutput("results2")
),
mainPanel(
width = 9,
fluidRow(
column(6, plotOutput("lln_plot", height = "430px")),
column(6, plotOutput("clt_plot", height = "430px"))
)
)
)
)
server <- function(input, output, session) {
draw_pop <- function(n, pop) {
switch(pop,
"Uniform(0, 1)" = runif(n),
"Exponential(1)" = rexp(n, rate = 1),
"Bimodal" = {
k <- rbinom(n, 1, 0.5)
k * rnorm(n, -2, 0.6) + (1 - k) * rnorm(n, 2, 0.6)
},
"Bernoulli(0.3)" = rbinom(n, 1, 0.3)
)
}
pop_mu <- function(pop) {
switch(pop,
"Uniform(0, 1)" = 0.5,
"Exponential(1)" = 1,
"Bimodal" = 0,
"Bernoulli(0.3)" = 0.3
)
}
pop_sigma <- function(pop) {
switch(pop,
"Uniform(0, 1)" = sqrt(1 / 12),
"Exponential(1)" = 1,
"Bimodal" = sqrt(0.6^2 + 4),
"Bernoulli(0.3)" = sqrt(0.3 * 0.7)
)
}
dat <- reactive({
input$go2
n <- input$n2
reps <- input$reps2
pop <- input$pop2
mu <- pop_mu(pop)
sig <- pop_sigma(pop)
# One long draw for LLN
long_draw <- draw_pop(n * 5, pop)
running <- cumsum(long_draw) / seq_along(long_draw)
# Many experiments for CLT
means <- replicate(reps, mean(draw_pop(n, pop)))
list(running = running, means = means, mu = mu, sig = sig,
n = n, reps = reps, pop = pop)
})
output$lln_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
plot(seq_along(d$running), d$running, type = "l",
col = "#3498db", lwd = 1.5,
xlab = "Number of observations",
ylab = "Cumulative mean",
main = "LLN: One running average")
abline(h = d$mu, lty = 2, lwd = 2.5, col = "#e74c3c")
legend("topright", bty = "n", cex = 0.85,
legend = c("Running average",
paste0("True \u03bc = ", d$mu)),
col = c("#3498db", "#e74c3c"),
lwd = c(1.5, 2.5), lty = c(1, 2))
})
output$clt_plot <- renderPlot({
d <- dat()
par(mar = c(4.5, 4.5, 3, 1))
hist(d$means, breaks = 40, probability = TRUE,
col = "#3498db30", border = "#3498db80",
main = paste0("CLT: ", d$reps, " sample means (n=", d$n, ")"),
xlab = "Sample mean", ylab = "Density")
x_seq <- seq(min(d$means), max(d$means), length.out = 300)
se <- d$sig / sqrt(d$n)
lines(x_seq, dnorm(x_seq, mean = d$mu, sd = se),
col = "#e74c3c", lwd = 2.5)
abline(v = d$mu, lty = 2, lwd = 2, col = "#2c3e50")
legend("topright", bty = "n", cex = 0.85,
legend = c("Sampling distribution",
paste0("N(\u03bc, \u03c3/\u221an) overlay"),
paste0("True \u03bc = ", d$mu)),
col = c("#3498db80", "#e74c3c", "#2c3e50"),
lwd = c(8, 2.5, 2), lty = c(1, 1, 2))
})
output$results2 <- renderUI({
d <- dat()
se <- d$sig / sqrt(d$n)
tags$div(class = "stats-box",
HTML(paste0(
"<b>LLN (left):</b><br>",
"Final avg: ", round(d$running[length(d$running)], 4), "<br>",
"True \u03bc: ", d$mu, "<br>",
"<hr style='margin:8px 0'>",
"<b>CLT (right):</b><br>",
"Mean of means: ", round(mean(d$means), 4), "<br>",
"SD of means: ", round(sd(d$means), 4), "<br>",
"Theoretical SE: ", round(se, 4)
))
)
})
}
shinyApp(ui, server)
Things to try
- Exponential, n = 5: the LLN panel shows the running average still wobbling, while the CLT histogram is visibly skewed. Neither theorem has fully “kicked in” yet — but they will at different rates.
- Exponential, n = 100: the running average has settled (LLN), and the histogram is now bell-shaped (CLT). Both theorems have done their job.
- Bimodal: the running average converges to 0 (the midpoint), even though no individual observation is near 0. LLN doesn’t care about the shape.
- Multiple paths (Sim 1): all paths converge to the same \(\mu\), but they take different routes. The early wobble is sampling variability; the eventual convergence is LLN.
The bottom line
- LLN = your estimate gets closer to the truth as \(n\) grows. It’s about accuracy of a single estimate.
- CLT = the shape of the sampling distribution is approximately normal. It’s about the distribution of the estimator across repeated experiments.
- You need LLN before CLT matters. If the mean hasn’t stabilized, knowing its distribution is normal doesn’t help much.
Did you know?
- Jacob Bernoulli proved the first version of the Law of Large Numbers in his book Ars Conjectandi, published posthumously in 1713. He called it his “golden theorem” and spent over 20 years working on the proof. The result seems obvious in hindsight — of course averages converge — but rigorously proving why required entirely new mathematical machinery.
- There are actually two versions: the Weak LLN (convergence in probability, proved by Bernoulli and later Chebyshev) and the Strong LLN (almost sure convergence, proved by Kolmogorov in 1933). The strong version says that convergence happens with probability 1, not just “usually.”
- The heavy-tailed \(t\)-distribution with \(\text{df} = 1\) (the Cauchy distribution) has no finite mean, so LLN doesn’t apply. The running average never settles down, no matter how much data you collect. Try it in the simulation above — it keeps jumping around forever.