Limited Dependent Variables
The problem with OLS for binary outcomes
When \(Y \in \{0, 1\}\), the linear probability model (LPM) treats the regression \(Y = X'\beta + \varepsilon\) as usual. It works — the coefficients estimate the change in \(P(Y=1)\) per unit change in \(X\). But there are two problems:
- Predictions outside \([0,1]\). A linear function eventually predicts negative probabilities or probabilities above 1.
- Heteroskedastic errors. Since \(Y\) is binary, \(\text{Var}(\varepsilon | X) = P(1-P)\), which depends on \(X\). OLS standard errors are wrong (use robust SEs).
Near the center of the data, the LPM is often fine. But when the true probability is strongly nonlinear — near 0 or 1 — the LPM misses the shape.
Logit and probit
Both models pass \(X'\beta\) through a nonlinear link function to keep predictions in \([0, 1]\):
\[P(Y = 1 \mid X) = F(X'\beta)\]
| Model | Link function \(F\) | Formula |
|---|---|---|
| Logit | Logistic CDF | \(\Lambda(z) = \dfrac{e^z}{1 + e^z}\) |
| Probit | Normal CDF | \(\Phi(z)\) |
Both are estimated by maximum likelihood. In practice, logit and probit give nearly identical results — the logistic and normal CDFs are almost the same shape when properly scaled.
Simulation
Binary data with a nonlinear true probability. The LPM fits a straight line; logit and probit fit S-curves. Drag the coefficient to make the nonlinearity more extreme.
#| standalone: true
#| viewerHeight: 700
library(shiny)
ui <- fluidPage(
tags$head(tags$style(HTML("
.eq-box {
background: #f0f4f8; border-radius: 6px; padding: 14px;
margin-bottom: 14px; font-size: 14px; line-height: 1.9;
}
.eq-box b { color: #2c3e50; }
"))),
sidebarLayout(
sidebarPanel(
width = 4,
sliderInput("n", "Sample size (n):",
min = 100, max = 1000, value = 300, step = 50),
sliderInput("beta", HTML("Coefficient (β):"),
min = 0.5, max = 5, value = 2, step = 0.25),
sliderInput("intercept", HTML("Intercept (α):"),
min = -3, max = 3, value = 0, step = 0.5),
actionButton("resim", "New draw", class = "btn-primary", width = "100%"),
uiOutput("results_box")
),
mainPanel(
width = 8,
plotOutput("main_plot", height = "500px"),
uiOutput("note_box")
)
)
)
server <- function(input, output, session) {
dat <- reactive({
input$resim
n <- input$n
beta <- input$beta
a <- input$intercept
x <- rnorm(n)
# True probability via logistic function
p_true <- plogis(a + beta * x)
y <- rbinom(n, 1, p_true)
# Fit models
lpm_fit <- lm(y ~ x)
logit_fit <- glm(y ~ x, family = binomial(link = "logit"))
probit_fit <- glm(y ~ x, family = binomial(link = "probit"))
x_grid <- seq(min(x) - 0.5, max(x) + 0.5, length.out = 300)
lpm_pred <- coef(lpm_fit)[1] + coef(lpm_fit)[2] * x_grid
logit_pred <- plogis(coef(logit_fit)[1] + coef(logit_fit)[2] * x_grid)
probit_pred <- pnorm(coef(probit_fit)[1] + coef(probit_fit)[2] * x_grid)
true_pred <- plogis(a + beta * x_grid)
list(x = x, y = y, x_grid = x_grid,
lpm_pred = lpm_pred, logit_pred = logit_pred,
probit_pred = probit_pred, true_pred = true_pred,
lpm_fit = lpm_fit, logit_fit = logit_fit,
probit_fit = probit_fit,
beta = beta, a = a)
})
output$main_plot <- renderPlot({
d <- dat()
par(mar = c(5, 5, 4, 2))
# Jitter y slightly for visibility
y_jit <- d$y + runif(length(d$y), -0.03, 0.03)
plot(d$x, y_jit, pch = 16, col = adjustcolor("#95a5a6", 0.3),
cex = 0.6, xlab = "X", ylab = "P(Y = 1)",
main = "LPM vs Logit vs Probit",
ylim = c(-0.15, 1.15))
abline(h = c(0, 1), lty = 3, col = "#bdc3c7")
# True probability
lines(d$x_grid, d$true_pred, col = "#2c3e50", lwd = 2, lty = 2)
# LPM
lines(d$x_grid, d$lpm_pred, col = "#e74c3c", lwd = 2.5)
# Logit
lines(d$x_grid, d$logit_pred, col = "#3498db", lwd = 2.5)
# Probit
lines(d$x_grid, d$probit_pred, col = "#27ae60", lwd = 2.5)
legend("topleft", bty = "n", cex = 0.9,
legend = c("True P(Y=1)", "LPM (OLS)", "Logit", "Probit"),
col = c("#2c3e50", "#e74c3c", "#3498db", "#27ae60"),
lwd = c(2, 2.5, 2.5, 2.5),
lty = c(2, 1, 1, 1))
})
output$results_box <- renderUI({
d <- dat()
lpm_b <- round(coef(d$lpm_fit)[2], 3)
logit_b <- round(coef(d$logit_fit)[2], 3)
probit_b <- round(coef(d$probit_fit)[2], 3)
ame_logit <- round(mean(dlogis(predict(d$logit_fit, type = "link"))) *
coef(d$logit_fit)[2], 3)
tags$div(class = "eq-box", style = "margin-top: 16px;",
HTML(paste0(
"<b>Estimated coefficients:</b><br>",
"LPM: <b>", lpm_b, "</b> (= marginal effect)<br>",
"Logit: <b>", logit_b, "</b> (log-odds)<br>",
"Probit: <b>", probit_b, "</b> (latent index)<br>",
"<hr style='margin:8px 0'>",
"<b>Logit AME:</b> ", ame_logit, "<br>",
"<b>Logit β/4:</b> ", round(logit_b / 4, 3), "<br>",
"<small>(β/4 approximates the AME)</small>"
))
)
})
output$note_box <- renderUI({
tags$div(class = "eq-box", style = "margin-top: 8px;",
HTML(paste0(
"<b>Notice:</b> The logit coefficient is <i>not</i> the marginal effect. ",
"The LPM slope (red) directly reads as ΔP per unit ΔX, ",
"but the logit/probit slopes refer to the latent index. ",
"Use the AME or the β/4 rule to translate."
))
)
})
}
shinyApp(ui, server)
Marginal effects
In the LPM, \(\beta\) is the marginal effect: a one-unit increase in \(X\) changes \(P(Y=1)\) by \(\beta\), everywhere.
In logit/probit, \(\beta\) enters through the nonlinear link, so the marginal effect depends on where you are:
\[\frac{\partial P(Y=1)}{\partial X_j} = f(X'\beta) \cdot \beta_j\]
where \(f = F'\) is the density of the link function (logistic density for logit, normal density for probit). Two common summaries:
| Summary | Definition | Use |
|---|---|---|
| Average Marginal Effect (AME) | \(\frac{1}{n}\sum_i f(X_i'\hat{\beta}) \cdot \hat{\beta}_j\) | Average effect across the sample |
| Marginal Effect at the Mean (MEM) | \(f(\bar{X}'\hat{\beta}) \cdot \hat{\beta}_j\) | Effect for a “typical” observation |
Quick approximation: for logit, the maximum of the logistic density is \(1/4\) (at \(X'\beta = 0\)). So as an upper bound, \(\text{AME} \approx \hat{\beta}/4\). This is rough but often surprisingly close.
Standard errors for marginal effects require the delta method since they are nonlinear functions of \(\hat{\beta}\).
Tobit: censored outcomes
When the dependent variable is continuous but censored — observed only above (or below) a threshold — OLS on the observed data is biased. The Tobit model handles this:
\[Y_i^* = X_i'\beta + \varepsilon_i, \qquad Y_i = \max(0, Y_i^*)\]
where \(Y^*\) is the latent (uncensored) variable. This is estimated by MLE, combining a probit-like component (probability of being censored) with a truncated normal component (density for uncensored observations).
The same logic extends to Heckman selection models, where the censoring isn’t mechanical but results from a separate decision — see Sample Selection & Heckman.
Connections
- Maximum Likelihood — Logit and probit are MLE estimators
- Training as MLE — Cross-entropy loss in neural nets is exactly logistic regression’s log-likelihood
- The Delta Method — Required for marginal effect standard errors
- Sample Selection & Heckman — Extends the Tobit idea to non-mechanical selection
Did you know?
- The logistic function was introduced by Pierre-François Verhulst in 1838 to model population growth, not binary outcomes. It took over a century before statisticians adopted it for regression.
- Joseph Berkson coined the term “logit” in 1944 as a portmanteau of “logistic unit,” by analogy with “probit” (probability unit).
- James Tobin won the 1981 Nobel Prize in Economics. His 1958 Tobit paper (the name is a blend of “Tobin” and “probit”) introduced the censored regression model. He was reportedly amused by the name, which he didn’t coin himself.