Agglomeration Economies

Why do firms cluster?

Cities exist because proximity is productive. When firms and workers are close together, everyone becomes more productive. These agglomeration economies are the fundamental reason cities exist despite their costs (congestion, high rents, pollution).

The three Marshallian channels

Alfred Marshall (1890) identified three mechanisms through which density raises productivity:

  1. Labor market pooling. Dense labor markets reduce mismatch. Firms are more likely to find workers with the right skills; workers are more likely to find jobs that match their talents. If one firm lays off workers, another nearby firm can hire them. The thick market reduces risk for both sides.

  2. Input sharing. Clusters attract specialized suppliers. A single firm might not generate enough demand for a niche input supplier, but a cluster of firms can. Silicon Valley has specialized law firms, venture capitalists, and chip fabricators — none of which could survive serving a single firm.

  3. Knowledge spillovers. Ideas spread faster face-to-face. Casual conversations, job-hopping, and observation transmit tacit knowledge that can’t be easily codified. This is why R&D-intensive industries cluster intensely — biotech in Boston, tech in Silicon Valley, finance in London.

The urban wage premium

Productivity increases with city size. Combes, Duranton, and Gobillon (2012) estimate that doubling city employment raises individual wages by 3–8%, even after controlling for worker sorting (the possibility that more productive workers simply choose bigger cities).

The relationship is approximately:

\[\ln w = \alpha + \gamma \ln N + \varepsilon\]

where \(w\) is the wage, \(N\) is city population, and \(\gamma\) is the agglomeration elasticity — typically estimated at 0.03 to 0.08.

The tradeoff

If bigger is always better, why aren’t all cities infinitely large? Because agglomeration benefits are offset by congestion costs: longer commutes, higher rents, pollution, crime. The optimal city size balances the marginal agglomeration benefit against the marginal congestion cost.

The Model View. The simulation below lets you adjust the agglomeration elasticity and congestion costs. Firms choose locations on a spatial grid. With agglomeration turned on, they cluster. The right panel shows the productivity-congestion tradeoff and the implied optimal city size.

#| standalone: true
#| viewerHeight: 700

library(shiny)

ui <- fluidPage(
  tags$head(tags$style(HTML("
    .stats-box {
      background: #f0f4f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 14px; line-height: 1.9;
    }
    .stats-box b { color: #2c3e50; }
    .good { color: #27ae60; font-weight: bold; }
    .bad  { color: #e74c3c; font-weight: bold; }
    .info-box {
      background: #eaf2f8; border-radius: 6px; padding: 14px;
      margin-top: 12px; font-size: 13px; line-height: 1.8;
    }
    .info-box b { color: #2c3e50; }
  "))),

  sidebarLayout(
    sidebarPanel(
      width = 3,

      sliderInput("agglom", "Agglomeration elasticity:",
                  min = 0.02, max = 0.10, value = 0.05, step = 0.01),

      sliderInput("congestion", "Congestion cost parameter:",
                  min = 0.01, max = 0.08, value = 0.03, step = 0.005),

      sliderInput("n_firms", "Number of firms:",
                  min = 20, max = 200, value = 80, step = 10),

      checkboxInput("spillovers", "Knowledge spillovers active", value = TRUE),

      actionButton("go", "Run simulation", class = "btn-primary", width = "100%"),

      uiOutput("info")
    ),

    mainPanel(
      width = 9,
      fluidRow(
        column(5, plotOutput("spatial_plot", height = "450px")),
        column(7, plotOutput("tradeoff_plot", height = "450px"))
      )
    )
  )
)

server <- function(input, output, session) {

  sim <- reactive({
    input$go
    gamma <- input$agglom
    cong <- input$congestion
    n_firms <- input$n_firms
    spill <- input$spillovers

    # If no spillovers, reduce effective agglomeration
    eff_gamma <- if (spill) gamma else gamma * 0.4

    # Simulate firm locations: start random, then iterate toward cluster
    set.seed(NULL)
    x <- runif(n_firms, 0, 10)
    y <- runif(n_firms, 0, 10)

    # Simple agglomeration: firms move toward denser areas
    for (iter in 1:30) {
      for (i in 1:n_firms) {
        # Count nearby firms (within radius 2)
        dists <- sqrt((x[-i] - x[i])^2 + (y[-i] - y[i])^2)
        nearby <- sum(dists < 3)

        if (nearby > 0 && eff_gamma > 0.02) {
          # Move slightly toward center of nearby firms
          close_idx <- which(dists < 3)
          cx <- mean(x[-i][close_idx])
          cy <- mean(y[-i][close_idx])

          pull <- eff_gamma * 3  # attraction strength
          push <- cong * 0.5     # repulsion from congestion

          x[i] <- x[i] + (cx - x[i]) * pull - (cx - x[i]) * push * (nearby / n_firms)
          y[i] <- y[i] + (cy - y[i]) * pull - (cy - y[i]) * push * (nearby / n_firms)

          # Add noise
          x[i] <- x[i] + rnorm(1, 0, 0.1)
          y[i] <- y[i] + rnorm(1, 0, 0.1)
        }
      }
      # Keep in bounds
      x <- pmax(pmin(x, 10), 0)
      y <- pmax(pmin(y, 10), 0)
    }

    # Calculate productivity for each firm based on local density
    productivity <- numeric(n_firms)
    for (i in 1:n_firms) {
      dists <- sqrt((x[-i] - x[i])^2 + (y[-i] - y[i])^2)
      local_density <- sum(dists < 2) + 1
      productivity[i] <- 100 * local_density^eff_gamma
    }

    # Identify clusters using a grid
    grid_size <- 2
    grid_x <- floor(x / grid_size)
    grid_y <- floor(y / grid_size)
    grid_id <- paste(grid_x, grid_y)
    cluster_counts <- table(grid_id)
    n_clusters <- sum(cluster_counts >= 3)

    # Tradeoff curve: productivity and congestion vs city size
    city_sizes <- seq(10, 500, by = 5)
    base_prod <- 100
    agglom_benefit <- base_prod * city_sizes^eff_gamma - base_prod
    congestion_cost <- cong * city_sizes^1.2
    net_benefit <- agglom_benefit - congestion_cost

    optimal_size <- city_sizes[which.max(net_benefit)]

    list(x = x, y = y, productivity = productivity,
         n_clusters = n_clusters, avg_prod = mean(productivity),
         city_sizes = city_sizes, agglom_benefit = agglom_benefit,
         congestion_cost = congestion_cost, net_benefit = net_benefit,
         optimal_size = optimal_size, eff_gamma = eff_gamma)
  })

  output$spatial_plot <- renderPlot({
    s <- sim()
    par(mar = c(4, 4, 3, 1))

    # Color by productivity
    prod_scaled <- (s$productivity - min(s$productivity)) /
                   (max(s$productivity) - min(s$productivity) + 0.01)
    cols <- rgb(0.2, 0.4 + 0.5 * prod_scaled, 0.8 - 0.3 * prod_scaled, 0.7)

    plot(s$x, s$y, pch = 19, cex = 1.5 + prod_scaled * 1.5,
         col = cols, xlim = c(-0.5, 10.5), ylim = c(-0.5, 10.5),
         xlab = "Location (x)", ylab = "Location (y)",
         main = "Firm Locations",
         asp = 1)

    # Grid lines
    abline(h = seq(0, 10, 2), v = seq(0, 10, 2),
           col = adjustcolor("gray80", 0.5), lty = 3)

    legend("bottomright", bty = "n", cex = 0.8,
           legend = c("Low productivity", "High productivity"),
           pch = 19, col = c(rgb(0.2, 0.4, 0.8, 0.7),
                              rgb(0.2, 0.9, 0.5, 0.7)),
           pt.cex = c(1.5, 3))
  })

  output$tradeoff_plot <- renderPlot({
    s <- sim()
    par(mar = c(5, 5, 4, 5))

    ylim <- range(c(s$agglom_benefit, -s$congestion_cost, s$net_benefit)) * 1.1

    plot(s$city_sizes, s$agglom_benefit, type = "l", lwd = 3, col = "#27ae60",
         xlab = "City size (thousands of workers)",
         ylab = "Benefit / Cost",
         main = "Agglomeration vs Congestion",
         ylim = ylim)

    lines(s$city_sizes, -s$congestion_cost, lwd = 3, col = "#e74c3c")
    lines(s$city_sizes, s$net_benefit, lwd = 3, col = "#2c3e50", lty = 2)

    abline(h = 0, col = "gray70", lty = 3)
    abline(v = s$optimal_size, col = "#f39c12", lwd = 2, lty = 2)
    text(s$optimal_size, max(s$net_benefit) * 0.8,
         paste0("Optimal\nN* = ", s$optimal_size, "k"),
         pos = 4, cex = 0.9, col = "#f39c12", font = 2)

    # Mark the max of net benefit
    max_idx <- which.max(s$net_benefit)
    points(s$city_sizes[max_idx], s$net_benefit[max_idx],
           pch = 19, cex = 2, col = "#f39c12")

    legend("topright", bty = "n", cex = 0.85,
           legend = c("Agglomeration benefit",
                      "Congestion cost",
                      "Net benefit",
                      "Optimal size"),
           col = c("#27ae60", "#e74c3c", "#2c3e50", "#f39c12"),
           lwd = c(3, 3, 3, 2), lty = c(1, 1, 2, 2))
  })

  output$info <- renderUI({
    s <- sim()
    tags$div(class = "info-box",
      HTML(paste0(
        "<b>Agglom. elasticity:</b> ", round(s$eff_gamma, 3), "<br>",
        "<b>Avg productivity:</b> ", round(s$avg_prod, 1), "<br>",
        "<b>Clusters (3+ firms):</b> ", s$n_clusters, "<br>",
        "<b>Optimal city size:</b> ", s$optimal_size, "k workers"
      ))
    )
  })
}

shinyApp(ui, server)

Things to try

  • High agglomeration, low congestion: firms cluster tightly, optimal city size is large. This is the Silicon Valley scenario — huge benefits from proximity, moderate congestion costs.
  • Low agglomeration, high congestion: firms spread out, optimal city size is small. Proximity doesn’t add much, and crowding is costly.
  • Turn off knowledge spillovers: the effective agglomeration elasticity drops by 60%. Firms cluster less, and optimal city size shrinks. This shows how important the spillover channel is relative to labor pooling and input sharing.
  • Add more firms: watch clusters form and grow. With 200 firms and high agglomeration, you’ll see 2–3 large clusters emerge from initially random locations.

Connections

  • Spatial Equilibrium — agglomeration economies are what make big cities productive (high wages), which are then offset by high rents in the Rosen-Roback framework.
  • The Monocentric City — the CBD in the monocentric model is essentially an agglomeration: all firms cluster at one point because proximity to other firms and workers is valuable.
  • Zoning & Housing Supply — if agglomeration makes big cities productive, then zoning that prevents growth in these cities destroys aggregate productivity (the Hsieh-Moretti argument).

Did you know?

  • Alfred Marshall identified the three agglomeration channels in his 1890 Principles of Economics, making it one of the oldest ideas in urban economics that is still central today. He wrote that in industrial districts, “the mysteries of the trade become no mysteries; but are as it were in the air.”
  • Silicon Valley is the canonical agglomeration story. The cluster started with Stanford University and a few electronics firms in the 1950s. Knowledge spillovers and labor market pooling created a self-reinforcing cycle: more firms attracted more talent, which attracted more firms. Today, Santa Clara County has more patent activity per capita than anywhere else in the world.
  • Duranton and Puga (2001) introduced the “nursery cities” concept: young firms experiment in large, diverse cities (where they can try different production processes) and then relocate to smaller, specialized cities once they’ve found their niche. This explains why startups cluster in New York and San Francisco but mature manufacturing plants locate in smaller metros.