Residuals & Controls

What is a residual?

A residual is what’s left over after your model has done its best:

\[e_i = Y_i - \hat{Y}_i\]

It’s the vertical distance between the actual data point and the regression line. If your model is good, residuals should look like random noise — no patterns, no structure.

If the residuals do have structure, your model is missing something. This is the single most important diagnostic in regression.

Residuals and the CEF

Recall from the CEF page: OLS is the best linear approximation to \(E[Y \mid X]\). When the CEF is nonlinear, the residuals absorb the nonlinearity. That’s why a curved pattern in the residual plot signals model misspecification — the residuals are doing the work the model should be doing.

The Oracle View. In these simulations, we set the true DGP — including any omitted variables and the true functional form. We can see exactly what the residuals are “absorbing.” In practice, you don’t know what’s missing from your model. Residual diagnostics are your flashlight in the dark.

#| standalone: true
#| viewerHeight: 600

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .info-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.8;
    }
    .info-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      selectInput("dgp", "True relationship:",
                  choices = c("Linear (correct spec)",
                              "Quadratic (misspecified)",
                              "Heteroskedastic",
                              "Outliers")),

      sliderInput("n", "Sample size:",
                  min = 50, max = 500, value = 200, step = 50),

      sliderInput("sigma", "Noise (SD):",
                  min = 0.5, max = 4, value = 1.5, step = 0.5),

      actionButton("go", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("info")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(4, plotOutput("scatter", height = "380px")),
        column(4, plotOutput("resid_fitted", height = "380px")),
        column(4, plotOutput("resid_hist", height = "380px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n     <- input$n
    sigma <- input$sigma
    dgp   <- input$dgp

    x <- runif(n, -3, 5)

    if (dgp == "Linear (correct spec)") {
      y <- 2 + 1.5 * x + rnorm(n, sd = sigma)
    } else if (dgp == "Quadratic (misspecified)") {
      y <- 1 + 0.5 * x - 0.3 * x^2 + rnorm(n, sd = sigma)
    } else if (dgp == "Heteroskedastic") {
      y <- 2 + 1.5 * x + rnorm(n, sd = sigma * (0.3 + 0.4 * abs(x)))
    } else {
      y <- 2 + 1.5 * x + rnorm(n, sd = sigma)
      # Add outliers
      outlier_idx <- sample(n, 5)
      y[outlier_idx] <- y[outlier_idx] + sample(c(-1, 1), 5, replace = TRUE) * 8
    }

    fit <- lm(y ~ x)
    list(x = x, y = y, fit = fit, dgp = dgp)
  })

  output$scatter <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))
    plot(d$x, d$y, pch = 16, col = "#3498db80", cex = 0.7,
         xlab = "X", ylab = "Y", main = "Data + OLS fit")
    abline(d$fit, col = "#e74c3c", lwd = 2.5)
  })

  output$resid_fitted <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    r  <- resid(d$fit)
    fv <- fitted(d$fit)

    plot(fv, r, pch = 16, col = "#9b59b680", cex = 0.7,
         xlab = "Fitted values", ylab = "Residuals",
         main = "Residuals vs Fitted")
    abline(h = 0, lty = 2, col = "gray40", lwd = 1.5)

    lo <- loess(r ~ fv)
    ox <- order(fv)
    lines(fv[ox], predict(lo)[ox], col = "#e74c3c", lwd = 2)
  })

  output$resid_hist <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))

    r <- resid(d$fit)
    hist(r, breaks = 30, probability = TRUE,
         col = "#dae8fc", border = "#6c8ebf",
         main = "Residual Distribution",
         xlab = "Residuals", ylab = "Density")
    x_seq <- seq(min(r), max(r), length.out = 200)
    lines(x_seq, dnorm(x_seq, mean = 0, sd = sd(r)),
          col = "#e74c3c", lwd = 2)
  })

  output$info <- renderUI({
    d <- dat()
    r <- resid(d$fit)

    msg <- switch(d$dgp,
      "Linear (correct spec)" =
        "Residuals look like random noise. No patterns in the residual plot. The model is correctly specified.",
      "Quadratic (misspecified)" =
        "U-shaped pattern in residuals! The LOESS curve bends, revealing the quadratic structure OLS is missing.",
      "Heteroskedastic" =
        "Fan shape: residuals spread out as fitted values increase. The variance isn't constant (heteroskedasticity).",
      "Outliers" =
        "A few points have huge residuals. Check the histogram for heavy tails. These points have outsized influence on the fit."
    )

    tags$div(class = "info-box", HTML(paste0("<b>Diagnosis:</b><br>", msg)))
  })
}

shinyApp(ui, server)

Reading the three panels

Left — Scatter + fit: does the line follow the data?
Middle — Residuals vs fitted: the key diagnostic. If you see a pattern (curve, fan, clusters), your model is missing something.
Right — Residual histogram: should look roughly normal and centered at 0. Heavy tails or skew signal problems.

Switch between the four DGPs above and learn to recognize each pattern — you’ll see these in every applied paper you read.

Residuals vs Fitted vs Residuals vs X

The app above plots residuals against \(\hat{Y}\) (fitted values). But you should also plot residuals against each explanatory variable \(X_k\). These two plots answer different questions:

Plot	X-axis	What it diagnoses
Residuals vs \(\hat{Y}\)	Predicted value	Overall model misspecification
Residuals vs \(X_k\)	Explanatory variable	Whether the \(X_k\) relationship is correctly specified

Why the distinction matters. With one regressor, fitted values are just a linear rescaling of \(X\), so the two plots look identical. With multiple regressors, \(\hat{Y}\) blends all the \(X\)’s together. A problem with one variable’s functional form may be hard to spot in the residuals-vs-fitted plot but jumps out when you plot residuals against that specific \(X\).

The residuals-vs-\(X\) plot directly tests the key OLS assumption:

\[E[\varepsilon \mid X] = 0\]

If residuals trend or curve against \(X_k\), this assumption is violated for that variable.

#| standalone: true
#| viewerHeight: 600

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .info-box2 {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.8;
    }
    .info-box2 b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      selectInput("scenario", "Scenario:",
                  choices = c("Both linear (correct)",
                              "X2 quadratic (misspecified)",
                              "Heteroskedastic in X1",
                              "Omitted variable (corr with X1)")),

      sliderInput("n2", "Sample size:",
                  min = 100, max = 500, value = 300, step = 50),

      actionButton("go2", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("info2")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(4, plotOutput("resid_fitted2", height = "380px")),
        column(4, plotOutput("resid_x1", height = "380px")),
        column(4, plotOutput("resid_x2", height = "380px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat2 <- reactive({
    input$go2
    n <- input$n2
    scenario <- input$scenario

    x1 <- rnorm(n, 0, 2)
    x2 <- rnorm(n, 0, 2)

    if (scenario == "Both linear (correct)") {
      y <- 1 + 0.8 * x1 + 1.2 * x2 + rnorm(n, sd = 1.5)
    } else if (scenario == "X2 quadratic (misspecified)") {
      y <- 1 + 0.8 * x1 + 0.5 * x2 + 0.4 * x2^2 + rnorm(n, sd = 1.5)
    } else if (scenario == "Heteroskedastic in X1") {
      y <- 1 + 0.8 * x1 + 1.2 * x2 + rnorm(n, sd = 0.5 + 0.5 * abs(x1))
    } else {
      z <- 0.7 * x1 + rnorm(n, sd = 1)
      y <- 1 + 0.3 * x1 + 1.2 * x2 + 2 * z + rnorm(n, sd = 1.5)
    }

    fit <- lm(y ~ x1 + x2)
    list(x1 = x1, x2 = x2, y = y, fit = fit, scenario = scenario)
  })

  make_resid_plot <- function(xvar, r, xlab, title, col) {
    plot(xvar, r, pch = 16, col = col, cex = 0.6,
         xlab = xlab, ylab = "Residuals", main = title)
    abline(h = 0, lty = 2, col = "gray40", lwd = 1.5)
    lo <- loess(r ~ xvar)
    ox <- order(xvar)
    lines(xvar[ox], predict(lo)[ox], col = "#e74c3c", lwd = 2)
  }

  output$resid_fitted2 <- renderPlot({
    d <- dat2()
    par(mar = c(4.5, 4.5, 3, 1))
    r <- resid(d$fit); fv <- fitted(d$fit)
    make_resid_plot(fv, r, "Fitted values",
                    expression("Residuals vs " * hat(Y)), "#9b59b680")
  })

  output$resid_x1 <- renderPlot({
    d <- dat2()
    par(mar = c(4.5, 4.5, 3, 1))
    make_resid_plot(d$x1, resid(d$fit), expression(X[1]),
                    expression("Residuals vs " * X[1]), "#3498db80")
  })

  output$resid_x2 <- renderPlot({
    d <- dat2()
    par(mar = c(4.5, 4.5, 3, 1))
    make_resid_plot(d$x2, resid(d$fit), expression(X[2]),
                    expression("Residuals vs " * X[2]), "#e67e2280")
  })

  output$info2 <- renderUI({
    d <- dat2()

    msg <- switch(d$scenario,
      "Both linear (correct)" =
        "All three plots: <span class='good'>random scatter</span>. The LOESS curves stay near zero. Model is correctly specified.",
      "X2 quadratic (misspecified)" =
        "Residuals vs X<sub>2</sub>: <span class='bad'>clear U-shape</span> — the missing X<sub>2</sub><sup>2</sup> term. Residuals vs X<sub>1</sub>: clean. Residuals vs fitted: pattern present but harder to attribute to a specific variable.",
      "Heteroskedastic in X1" =
        "Residuals vs X<sub>1</sub>: <span class='bad'>fan shape</span> — variance grows with |X<sub>1</sub>|. Residuals vs X<sub>2</sub>: no fan. The fitted-values plot shows spreading too, but doesn't tell you <em>which</em> variable causes it.",
      "Omitted variable (corr with X1)" =
        "Residuals vs X<sub>1</sub>: <span class='bad'>upward trend</span> — the omitted Z is correlated with X<sub>1</sub>, so E[&epsilon;|X<sub>1</sub>] &ne; 0. Residuals vs X<sub>2</sub>: clean (Z is uncorrelated with X<sub>2</sub>)."
    )

    tags$div(class = "info-box2", HTML(msg))
  })
}

shinyApp(ui, server)

What the patterns mean

Pattern in residuals	Diagnosis	Example
Random scatter around zero	Correctly specified	—
U-shape / curve	Missing nonlinear term	True model has \(X^2\) but you fit only \(X\)
Fan / funnel	Heteroskedasticity	Variance of \(\varepsilon\) depends on \(X\)
Linear trend	Omitted variable correlated with \(X\)	\(E[\varepsilon \mid X] \neq 0\)

Things to try

“X2 quadratic”: the U-shape shows up clearly in Residuals vs \(X_2\) but is muddled in Residuals vs \(\hat{Y}\). Residuals vs \(X_1\) stays clean — the problem is with \(X_2\)’s functional form, not \(X_1\)’s.
“Heteroskedastic in X1”: the fan opens up against \(X_1\) but not \(X_2\). The fitted-values plot shows a fan too but can’t pinpoint the source.
“Omitted variable”: residuals trend upward against \(X_1\) because the omitted \(Z\) is correlated with \(X_1\). \(X_2\) is unaffected.

In practice

Most applied researchers follow this workflow:

Residuals vs fitted — general health check for the overall model
Residuals vs each \(X\) — pinpoint which variable’s relationship is wrong

Controls and residuals

When you “control for” a variable in a regression, what you’re really doing is removing its influence via residuals. This is the Frisch-Waugh-Lovell theorem in action (see the FWL page).

Adding \(X_2\) as a control means:

Regress \(Y\) on \(X_2\) → residuals \(\tilde{Y}\) (variation in \(Y\) not explained by \(X_2\))
Regress \(X_1\) on \(X_2\) → residuals \(\tilde{X}_1\) (variation in \(X_1\) not explained by \(X_2\))
Regress \(\tilde{Y}\) on \(\tilde{X}_1\) → the coefficient is \(\hat{\beta}_1\)

“Controlling for \(X_2\)” = looking at the relationship between \(Y\) and \(X_1\) after removing what \(X_2\) explains about each.

#| standalone: true
#| viewerHeight: 580

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; }
    .bad  { color: #e74c3c; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("n", "Sample size:", min = 100, max = 500, value = 300, step = 50),

      sliderInput("b1", HTML("True &beta;<sub>1</sub>:"),
                  min = -2, max = 3, value = 0.5, step = 0.25),

      sliderInput("b2", HTML("True &beta;<sub>2</sub> (confounder effect):"),
                  min = -3, max = 3, value = 2, step = 0.25),

      sliderInput("rho", HTML("Corr(X<sub>1</sub>, X<sub>2</sub>):"),
                  min = -0.9, max = 0.9, value = 0.7, step = 0.1),

      actionButton("go", "New draw", class = "btn-primary", width = "100%"),

      uiOutput("results")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(4, plotOutput("raw_plot",  height = "380px")),
        column(4, plotOutput("ctrl_y",    height = "380px")),
        column(4, plotOutput("ctrl_both", height = "380px"))
      )
    )
  )
)

server <- function(input, output, session) {

  dat <- reactive({
    input$go
    n   <- input$n
    b1  <- input$b1
    b2  <- input$b2
    rho <- input$rho

    z1 <- rnorm(n)
    z2 <- rnorm(n)
    x1 <- z1
    x2 <- rho * z1 + sqrt(1 - rho^2) * z2
    y  <- b1 * x1 + b2 * x2 + rnorm(n)

    # Without control
    naive_fit <- lm(y ~ x1)
    naive_b1 <- coef(naive_fit)[2]

    # With control
    full_fit <- lm(y ~ x1 + x2)
    full_b1 <- coef(full_fit)[2]

    # FWL residuals
    ey <- resid(lm(y ~ x2))
    ex <- resid(lm(x1 ~ x2))

    list(x1 = x1, x2 = x2, y = y, ey = ey, ex = ex,
         naive_b1 = naive_b1, full_b1 = full_b1,
         b1 = b1, b2 = b2, rho = rho)
  })

  output$raw_plot <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))
    plot(d$x1, d$y, pch = 16, col = "#3498db60", cex = 0.6,
         xlab = expression(X[1]), ylab = "Y",
         main = "No control (omit X2)")
    abline(lm(d$y ~ d$x1), col = "#e74c3c", lwd = 2.5)
    legend("topleft", bty = "n", cex = 0.85,
           legend = paste0("Slope = ", round(d$naive_b1, 3)))
  })

  output$ctrl_y <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))
    plot(d$x1, d$ey, pch = 16, col = "#9b59b660", cex = 0.6,
         xlab = expression(X[1]),
         ylab = expression("Residualized Y  (" * tilde(Y) * ")"),
         main = expression("Remove " * X[2] * " from Y"))
    abline(h = 0, lty = 2, col = "gray60")
  })

  output$ctrl_both <- renderPlot({
    d <- dat()
    par(mar = c(4.5, 4.5, 3, 1))
    plot(d$ex, d$ey, pch = 16, col = "#27ae6080", cex = 0.6,
         xlab = expression("Residualized " * X[1] * "  (" * tilde(X)[1] * ")"),
         ylab = expression("Residualized Y  (" * tilde(Y) * ")"),
         main = "After controlling for X2")
    abline(lm(d$ey ~ d$ex), col = "#e74c3c", lwd = 2.5)
    legend("topleft", bty = "n", cex = 0.85,
           legend = paste0("Slope = ", round(d$full_b1, 3)))
  })

  output$results <- renderUI({
    d <- dat()
    bias <- d$naive_b1 - d$b1

    tags$div(class = "stats-box",
      HTML(paste0(
        "<b>True &beta;<sub>1</sub>:</b> ", d$b1, "<br>",
        "<b>Without control:</b> <span class='bad'>",
        round(d$naive_b1, 3), "</span>",
        " (bias: ", round(bias, 3), ")<br>",
        "<b>With control:</b> <span class='good'>",
        round(d$full_b1, 3), "</span><br>",
        "<hr style='margin:8px 0'>",
        "<small>Omitted variable bias = &beta;<sub>2</sub> &times; ",
        "corr(X<sub>1</sub>,X<sub>2</sub>) / ... <br>",
        "Higher confounding (&beta;<sub>2</sub>) + higher correlation ",
        "= more bias when you don't control.</small>"
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

True \(\beta_1\) = 0.5, Corr = 0.7, \(\beta_2\) = 2: the naive slope (left panel) is way off because \(X_2\) confounds the relationship. The controlled slope (right panel) recovers the truth.
Set Corr = 0: no confounding. Both estimates are the same — controls don’t help when there’s nothing to control for.
Set \(\beta_2\) = 0: same thing. Even if \(X_1\) and \(X_2\) are correlated, \(X_2\) doesn’t affect \(Y\), so omitting it doesn’t bias \(\beta_1\).
Middle panel: shows what \(Y\) looks like after removing \(X_2\)’s contribution. The right panel adds the second step — removing \(X_2\) from \(X_1\) too — which isolates the independent variation in \(X_1\).

Did you know?

Carl Friedrich Gauss invented the method of least squares at age 18 in 1795 to predict the orbit of the asteroid Ceres. When Ceres reappeared exactly where Gauss predicted, he became an overnight celebrity. The entire method is built on minimizing the sum of squared residuals.
Adrien-Marie Legendre independently published the method in 1805, before Gauss. But Gauss claimed he’d been using it since 1795. The priority dispute was never fully resolved — but we call it “Gaussian” anyway.
The idea of “controlling for” a variable sounds scientific, but it’s been criticized. As Edward Leamer wrote in his famous 1983 paper “Let’s Take the Con Out of Econometrics”: adding controls is not the same as running an experiment. The choice of what to control for is often arbitrary and can introduce more bias than it removes (see: bad controls, collider bias).