Why Standard Errors Change
OLS assumes: $$E[u_i \, u_j] = 0 \quad \text{for } i \neq j$$ But in panel or spatial data, **that is false.** Observations are correlated within: - The same property - The same census tract - The same county - The same time period - The same firm - The same school If you ignore that, your SEs are **too small** → false significance. **Clustering corrects this.** --- ## Intuition If shocks are correlated within group \\(g\\): $$u_{ig} = \underbrace{\text{common component}}_{\text{shared within cluster}} + \underbrace{\text{idiosyncratic}}_{\text{independent across obs.}}$$ Then treating observations as independent **overcounts information.** Clustering says: *treat each cluster as the unit of independent variation.* --- ## What Happens When You Change Cluster Level? Suppose you estimate: $$\ln P_{it} = \beta \, \text{Shock}_{it} + \text{FE} + u_{it}$$ Now consider clustering at:
SE usually ↑ relative to naive.
If crime shocks spill over spatially, this matters.
SE likely ↑ further.
SE ↑ more.
SE smallest. Usually wrong.--- ## Why Larger Clusters Often Increase SE Because: $$\text{Var}(\hat{\beta}) \;\propto\; \frac{1}{G}$$ where \\(G\\) = number of independent clusters (not \\(N\\)). | Cluster Level | # Clusters | SE | |---|---|---| | Property | Many | Smaller | | Tract | Fewer | Larger | | County | Few | Largest | The effective sample size becomes \\(G\\), not \\(N\\). --- ## Key Principle
1. Property (PIN)
Allows arbitrary serial correlation within property over time.SE usually ↑ relative to naive.
2. Census Tract
Allows correlation across different properties in same tract.If crime shocks spill over spatially, this matters.
SE likely ↑ further.
3. County
Allows even broader spatial correlation.SE ↑ more.
4. No Clustering
Assumes independence.SE smallest. Usually wrong.
You should cluster at the level where the identifying variation is correlated.
--- ## In the Crime–Housing Context The setting uses: - Spatial kernel exposure - Properties within tracts - Crime shocks at neighborhood level Errors are likely correlated within: - **Census tract** - Possibly community area - Possibly time × area So **clustering at tract** is reasonable.