LLN vs CLT: Why Averages Stabilize Before They Become Normal

Two theorems, two different promises

The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) are the two pillars of statistics, but they say different things:

	LLN	CLT
What it says	The sample mean converges to \(\mu\)	The distribution of sample means is approximately normal
About	One running average	Many sample averages
Promise	Your estimate gets close to the truth	You know the shape of the uncertainty
Requires	Finite mean	Finite variance

LLN says the mean settles down. CLT says where it settles follows a normal distribution.

Think of it this way: LLN tells you that if you flip a fair coin enough times, the fraction of heads approaches 0.5. CLT tells you that if you repeat the entire experiment many times, the distribution of those fractions is bell-shaped.

The Oracle View. In these simulations, we know the true mean \(\mu\) — the horizontal line the running average converges to. In practice, you don’t know \(\mu\). You watch your estimate stabilize but never know exactly what it’s converging toward. That’s the gap between theory and practice: LLN guarantees convergence, but the target is invisible.

Simulation 1: The running average

Watch one running average converge to the true mean as you add observations one at a time. This is LLN in action — the cumulative mean stabilizes. Try different distributions and notice that convergence happens regardless of the population shape.

#| standalone: true
#| viewerHeight: 620

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      selectInput("pop", "Population:",
                  choices = c("Uniform(0, 1)",
                              "Exponential(1)",
                              "Bimodal",
                              "Bernoulli(0.3)",
                              "Heavy-tailed (t, df=2)")),

      sliderInput("n", "Number of observations:",
                  min = 50, max = 5000, value = 500, step = 50),

      sliderInput("paths", "Number of paths:",
                  min = 1, max = 10, value = 3, step = 1),

      actionButton("go", "Draw new paths",
                   class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      plotOutput("running_avg", height = "500px")
    )
  )
)

server <- function(input, output, session) {

  draw_pop <- function(n, pop) {
    switch(pop,
      "Uniform(0, 1)"          = runif(n),
      "Exponential(1)"         = rexp(n, rate = 1),
      "Bimodal"                = {
        k <- rbinom(n, 1, 0.5)
        k * rnorm(n, -2, 0.6) + (1 - k) * rnorm(n, 2, 0.6)
      },
      "Bernoulli(0.3)"         = rbinom(n, 1, 0.3),
      "Heavy-tailed (t, df=2)" = rt(n, df = 2)
    )
  }

  pop_mu <- function(pop) {
    switch(pop,
      "Uniform(0, 1)"          = 0.5,
      "Exponential(1)"         = 1,
      "Bimodal"                = 0,
      "Bernoulli(0.3)"         = 0.3,
      "Heavy-tailed (t, df=2)" = 0
    )
  }

  dat <- reactive({
    input$go
    n     <- input$n
    paths <- input$paths
    pop   <- input$pop
    mu    <- pop_mu(pop)

    all_paths <- lapply(seq_len(paths), function(i) {
      x <- draw_pop(n, pop)
      cumsum(x) / seq_along(x)
    })

    list(all_paths = all_paths, n = n, mu = mu, pop = pop, paths = paths)
  })

  output$running_avg <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    ylim <- range(unlist(d$all_paths))
    ylim <- ylim + c(-1, 1) * 0.1 * diff(ylim)

    cols <- c("#e74c3c", "#3498db", "#27ae60", "#9b59b6", "#e67e22",
              "#1abc9c", "#34495e", "#f39c12", "#2ecc71", "#c0392b")

    plot(NULL, xlim = c(1, d$n), ylim = ylim,
         xlab = "Number of observations",
         ylab = "Cumulative mean",
         main = paste0("LLN: Running average converges to \u03bc = ", d$mu))

    abline(h = d$mu, lty = 2, lwd = 2.5, col = "#2c3e50")

    for (i in seq_along(d$all_paths)) {
      lines(seq_along(d$all_paths[[i]]), d$all_paths[[i]],
            col = adjustcolor(cols[i], 0.7), lwd = 1.5)
    }

    legend("topright", bty = "n", cex = 0.9,
           legend = c(paste0("True mean (\u03bc = ", d$mu, ")"),
                      paste0(d$paths, " sample path(s)")),
           col = c("#2c3e50", cols[1]),
           lwd = c(2.5, 1.5), lty = c(2, 1))
  })

  output$results <- renderUI({
    d <- dat()
    final_means <- sapply(d$all_paths, function(p) p[length(p)])

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>True mean:</b> ", d$mu, "<br>",
        "<b>Final running avg(s):</b><br>",
        paste(round(final_means, 4), collapse = ", "), "<br>",
        "<b>Max deviation:</b> ",
        round(max(abs(final_means - d$mu)), 4)
      ))
    )
  })
}

shinyApp(ui, server)

Simulation 2: LLN vs CLT side by side

Now see both theorems at once. The left panel shows LLN — one running average converging to \(\mu\). The right panel shows CLT — the histogram of many sample means forming a normal distribution. Same data, different questions.

#| standalone: true
#| viewerHeight: 580

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      selectInput("pop2", "Population:",
                  choices = c("Uniform(0, 1)",
                              "Exponential(1)",
                              "Bimodal",
                              "Bernoulli(0.3)")),

      sliderInput("n2", "Sample size per experiment:",
                  min = 5, max = 200, value = 30, step = 5),

      sliderInput("reps2", "Number of experiments:",
                  min = 200, max = 3000, value = 1000, step = 100),

      actionButton("go2", "Run experiments",
                   class = "btn-primary", width = "100%"),

      uiOutput("results2")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(6, plotOutput("lln_plot", height = "430px")),
        column(6, plotOutput("clt_plot", height = "430px"))
      )
    )
  )
)

server <- function(input, output, session) {

  draw_pop <- function(n, pop) {
    switch(pop,
      "Uniform(0, 1)"  = runif(n),
      "Exponential(1)" = rexp(n, rate = 1),
      "Bimodal"        = {
        k <- rbinom(n, 1, 0.5)
        k * rnorm(n, -2, 0.6) + (1 - k) * rnorm(n, 2, 0.6)
      },
      "Bernoulli(0.3)" = rbinom(n, 1, 0.3)
    )
  }

  pop_mu <- function(pop) {
    switch(pop,
      "Uniform(0, 1)"  = 0.5,
      "Exponential(1)" = 1,
      "Bimodal"        = 0,
      "Bernoulli(0.3)" = 0.3
    )
  }

  pop_sigma <- function(pop) {
    switch(pop,
      "Uniform(0, 1)"  = sqrt(1 / 12),
      "Exponential(1)" = 1,
      "Bimodal"        = sqrt(0.6^2 + 4),
      "Bernoulli(0.3)" = sqrt(0.3 * 0.7)
    )
  }

  dat <- reactive({
    input$go2
    n    <- input$n2
    reps <- input$reps2
    pop  <- input$pop2
    mu   <- pop_mu(pop)
    sig  <- pop_sigma(pop)

    # One long draw for LLN
    long_draw <- draw_pop(n * 5, pop)
    running   <- cumsum(long_draw) / seq_along(long_draw)

    # Many experiments for CLT
    means <- replicate(reps, mean(draw_pop(n, pop)))

    list(running = running, means = means, mu = mu, sig = sig,
         n = n, reps = reps, pop = pop)
  })

  output$lln_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    plot(seq_along(d$running), d$running, type = "l",
         col = "#3498db", lwd = 1.5,
         xlab = "Number of observations",
         ylab = "Cumulative mean",
         main = "LLN: One running average")
    abline(h = d$mu, lty = 2, lwd = 2.5, col = "#e74c3c")

    legend("topright", bty = "n", cex = 0.85,
           legend = c("Running average",
                      paste0("True \u03bc = ", d$mu)),
           col = c("#3498db", "#e74c3c"),
           lwd = c(1.5, 2.5), lty = c(1, 2))
  })

  output$clt_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    hist(d$means, breaks = 40, probability = TRUE,
         col = "#3498db30", border = "#3498db80",
         main = paste0("CLT: ", d$reps, " sample means (n=", d$n, ")"),
         xlab = "Sample mean", ylab = "Density")

    x_seq <- seq(min(d$means), max(d$means), length.out = 300)
    se <- d$sig / sqrt(d$n)
    lines(x_seq, dnorm(x_seq, mean = d$mu, sd = se),
          col = "#e74c3c", lwd = 2.5)

    abline(v = d$mu, lty = 2, lwd = 2, col = "#2c3e50")

    legend("topright", bty = "n", cex = 0.85,
           legend = c("Sampling distribution",
                      paste0("N(\u03bc, \u03c3/\u221an) overlay"),
                      paste0("True \u03bc = ", d$mu)),
           col = c("#3498db80", "#e74c3c", "#2c3e50"),
           lwd = c(8, 2.5, 2), lty = c(1, 1, 2))
  })

  output$results2 <- renderUI({
    d <- dat()
    se <- d$sig / sqrt(d$n)

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>LLN (left):</b><br>",
        "Final avg: ", round(d$running[length(d$running)], 4), "<br>",
        "True \u03bc: ", d$mu, "<br>",
        "<hr style='margin:8px 0'>",
        "<b>CLT (right):</b><br>",
        "Mean of means: ", round(mean(d$means), 4), "<br>",
        "SD of means: ", round(sd(d$means), 4), "<br>",
        "Theoretical SE: ", round(se, 4)
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

Exponential, n = 5: the LLN panel shows the running average still wobbling, while the CLT histogram is visibly skewed. Neither theorem has fully “kicked in” yet — but they will at different rates.
Exponential, n = 100: the running average has settled (LLN), and the histogram is now bell-shaped (CLT). Both theorems have done their job.
Bimodal: the running average converges to 0 (the midpoint), even though no individual observation is near 0. LLN doesn’t care about the shape.
Multiple paths (Sim 1): all paths converge to the same \(\mu\), but they take different routes. The early wobble is sampling variability; the eventual convergence is LLN.

The bottom line

LLN = your estimate gets closer to the truth as \(n\) grows. It’s about accuracy of a single estimate.
CLT = the shape of the sampling distribution is approximately normal. It’s about the distribution of the estimator across repeated experiments.
You need LLN before CLT matters. If the mean hasn’t stabilized, knowing its distribution is normal doesn’t help much.

Did you know?

Jacob Bernoulli proved the first version of the Law of Large Numbers in his book Ars Conjectandi, published posthumously in 1713. He called it his “golden theorem” and spent over 20 years working on the proof. The result seems obvious in hindsight — of course averages converge — but rigorously proving why required entirely new mathematical machinery.
There are actually two versions: the Weak LLN (convergence in probability, proved by Bernoulli and later Chebyshev) and the Strong LLN (almost sure convergence, proved by Kolmogorov in 1933). The strong version says that convergence happens with probability 1, not just “usually.”
The heavy-tailed \(t\)-distribution with \(\text{df} = 1\) (the Cauchy distribution) has no finite mean, so LLN doesn’t apply. The running average never settles down, no matter how much data you collect. Try it in the simulation above — it keeps jumping around forever.