Card (1993): Using geographic variation in college proximity to estimate the return to schooling

Card (1993) estimates the causal effect of length of education on hourly wages. Their dataset is based on a 1976 subsample of the National Longitudinal Survey of Young Men and contains 3010 observations. It includes years of education obtained by 1976 (ed76), log hourly wages in 1976 (lwage76), age in 1976 (age76), and indicators of whether the individual lived close to a public four-year college (nearc4a), a private four-year college (nearc4b), or a two-year college in 1966 (nearc2). The dataset also contains variables or indicators on race, metropolitan area, region (summarized in indicators), and family background (summarized in family). Card (1993) defines (potential) experience exp76 as age76 - ed76 - 6.

Card (1993) reports estimates of the causal effect on education on log wages for multiple specifications. We focus on their specification where experience and experience squared are included as endogenous regressors and variables or indicators on race, metropolitan area, region, and family background are included as exogenous regressors. Whereas Card uses age and age squared to instrument for experience and experience squared, we simply include these as additional instruments. We also include all three college proximity indicators as instruments, instead of aggregating nearc4a and nearc4b into a single instrument.

[1]:

from io import BytesIO
from zipfile import ZipFile

import numpy as np
import pandas as pd
import requests

url = "https://davidcard.berkeley.edu/data_sets/proximity.zip"
content = requests.get(url).content

# From code_bk.txt in the zip file
colspec = {
    "id": (1, 5),  # sequential id runs from 1 to 5225
    "nearc2": (7, 7),  # grew up near 2-yr college
    "nearc4": (10, 10),  # grew up near 4-yr college
    "nearc4a": (12, 13),  # grew up near 4-yr public college
    "nearc4b": (15, 16),  # grew up near 4-yr priv college
    "ed76": (18, 19),  # educ in 1976
    "ed66": (21, 22),  # educ in 1966
    "age76": (24, 25),  # age in 1976
    "daded": (27, 31),  # dads education missing=avg
    "nodaded": (33, 33),  # 1 if dad ed imputed
    "momed": (35, 39),  # moms education
    "nomomed": (41, 41),  # 1 if mom ed imputed
    "weight": (43, 54),  # nls weight for 1976 cross-section
    "momdad14": (56, 56),  # 1 if live with mom and dad age 14
    "sinmom14": (58, 58),  # lived with single mom age 14
    "step14": (60, 60),  # lived step parent age 14
    "reg661": (62, 62),  # dummy for region=1 in 1966
    "reg662": (64, 64),  # dummy for region=2 in 1966
    "reg663": (66, 66),  # dummy for region=3 in 1966
    "reg664": (68, 68),
    "reg665": (70, 70),
    "reg666": (72, 72),
    "reg667": (74, 74),
    "reg668": (76, 76),
    "reg669": (78, 78),  # dummy for region=9 in 1966
    "south66": (80, 80),  # lived in south in 1966
    "work76": (82, 82),  # worked in 1976
    "work78": (84, 84),  # worked in 1978
    "lwage76": (86, 97),  # log wage (outliers trimmed) 1976
    "lwage78": (99, 110),  # log wage in 1978 outliers trimmed
    "famed": (112, 112),  # mom-dad education class 1-9
    "black": (114, 114),  # 1 if black
    "smsa76r": (116, 116),  # in smsa in 1976
    "smsa78r": (118, 118),  # in smsa in 1978
    "reg76r": (120, 120),  # in south in 1976
    "reg78r": (122, 122),  # in south in 1978
    "reg80r": (124, 124),  # in south in 1980
    "smsa66r": (126, 126),  # in smsa in 1966
    "wage76": (128, 132),  # raw wage cents per hour 1976
    "wage78": (134, 138),
    "wage80": (140, 144),
    "noint78": (146, 146),  # 1 if noninterview in 78
    "noint80": (148, 148),
    "enroll76": (150, 150),  # 1 if enrolled in 76
    "enroll78": (152, 152),
    "enroll80": (154, 154),
    "kww": (156, 157),  # the kww score
    "iq": (159, 161),  # a normed iq score
    "marsta76": (163, 163),  # mar status in 1976 1=married, sp. present
    "marsta78": (165, 165),
    "marsta80": (167, 167),
    "libcrd14": (169, 169),  # 1 if lib card in home age 14
}

with ZipFile(BytesIO(content)).open("nls.dat") as file:
    df = pd.read_fwf(
        file,
        names=colspec.keys(),
        # pandas expects [from, to[ values, starting at 0
        colspecs=[(f - 1, t) for (f, t) in colspec.values()],
        na_values=".",
    )

df = df[lambda x: x["lwage76"].notna()].set_index("id")

# construct potential experience and its square
df["exp76"] = df["age76"] - df["ed76"] - 6
df["exp762"] = df["exp76"] ** 2
df["age762"] = df["age76"] ** 2

df["f1"] = df["famed"].eq(1).astype("float")  # mom and dad both > 12 yrs ed
df["f2"] = df["famed"].eq(2).astype("float")  # mom&dad >=12 and not both exactly 12
df["f3"] = df["famed"].eq(3).astype("float")  # mom=dad=12
df["f4"] = df["famed"].eq(4).astype("float")  # mom >=12 and dad missing
df["f5"] = df["famed"].eq(5).astype("float")  # father >=12 and mom not in f1-f4
df["f6"] = df["famed"].eq(6).astype("float")  # mom>=12 and dad nonmissing
df["f7"] = df["famed"].eq(7).astype("float")  # mom and dad both >=9
df["f8"] = df["famed"].eq(8).astype("float")  # mom and dad both nonmissing

indicators = ["black", "smsa66r", "smsa76r", "reg76r"]
# exclude reg669, as sum(reg661, ..., reg669) = 1
indicators += [f"reg66{i}" for i in range(1, 9)]

family = ["daded", "momed", "nodaded", "nomomed", "famed", "momdad14", "sinmom14"]
fs = [f"f{i}" for i in range(1, 8)]  # exclude f8 as sum(f1, ..., f8) = 1
family += fs

# endogenous variables: years of education, experience, experience squared
X = df[["ed76", "exp76", "exp762"]]
y = df["lwage76"]  # outcome: log wage
# included exog. variables: indicators for family background, region, race
C = df[family + indicators]
# instruments: proximity to colleges, age, and age squared
Z = df[["nearc4a", "nearc4b", "nearc2", "age76", "age762"]]

pd.concat([X, y, Z], axis=1).head()

[1]:

	ed76	exp76	exp762	lwage76	nearc4a	nearc4b	nearc2	age76	age762
id
2	7	16	256	6.306275	0	0	0	29	841
3	12	9	81	6.175867	0	0	0	27	729
4	12	16	256	6.580639	0	0	0	34	1156
5	11	10	100	5.521461	1	0	1	27	729
6	12	16	256	6.591674	1	0	1	34	1156

For simplicity, we reduce to model 1 by replacing \(Z\), \(X\), and \(y\) with their residuals after regressing out \(C\).

[2]:

from ivmodels.utils import oproj

C = C - C.mean(axis=0)
# oproj(A, B) returns B minus the projection of B onto the column space of A.
Z, X, y = oproj(C, Z, X, y)

Estimators

We compare the ordinary least-squares (OLS), two-stage least-squares (TSLS), and limited information maximum likelihood (LIML) estimators applied to Card’s dataset.

[3]:

from ivmodels import KClass

ols = KClass(kappa="ols").fit(Z=None, X=X, y=y)
ols.named_coef_

[3]:

intercept    4.768683
ed76         0.072634
exp76        0.084529
exp762      -0.002290
Name: coefficients, dtype: float64

[4]:

tsls = KClass(kappa="tsls").fit(Z=Z, X=X, y=y)
tsls.named_coef_

[4]:

intercept    3.907937
ed76         0.144954
exp76        0.061604
exp762      -0.001196
Name: coefficients, dtype: float64

[5]:

liml = KClass(kappa="liml").fit(Z=Z, X=X, y=y)
liml.named_coef_

[5]:

intercept    3.587249
ed76         0.172352
exp76        0.051571
exp762      -0.000713
Name: coefficients, dtype: float64

Here \(3 = m_x < k = 5\) and the LIML estimator \(\hat{\beta}_{\text{LIML}}\) regularizes the two-stage least-squares estimator \(\hat{\beta}_{\text{TSLS}}\) away from the ordinary least-squares estimator \(\hat{\beta}_{\text{OLS}}\).

We numerically validate Proposition 10 in Londschien (2025), that

\[AR(\hat\beta_\mathrm{LIML}) = \frac{n - k}{k} \cdot (\hat\kappa_\mathrm{LIML} - 1).\]

[6]:

# Multiply with (n - k - 1) instead of (n - k) due to the included intercept
(y.shape[0] - Z.shape[1] - 1) / Z.shape[1] * (liml.kappa_ - 1)

[6]:

0.8568537499505241

[7]:

from ivmodels.tests import anderson_rubin_test

# This returns a tuple (statistic, p_value)
anderson_rubin_test(Z=Z, X=X, y=y, beta=liml.coef_)

[7]:

(0.8568537499489436, 0.5092552733782498)

Tests for the causal parameter

We test the null hypothesis

\[H_0 \colon \text{ The causal effect of education on log wages } \beta_0 = 0\]

using the subvector Wald, Anderson-Rubin, (conditional) likelihood-ratio, and lagrange multiplier tests.

[8]:

from functools import partial
from ivmodels.tests import (
    wald_test,
    likelihood_ratio_test,
    conditional_likelihood_ratio_test,
    lagrange_multiplier_test
)

X, W = X[["ed76"]], X[["exp76", "exp762"]]

for test, name in [
    (partial(wald_test, estimator="tsls"), "Wald (TSLS)"),
    (partial(wald_test, estimator="liml"), "Wald (LIML)"),
    (anderson_rubin_test, "AR"),
    (likelihood_ratio_test, "LR"),
    (conditional_likelihood_ratio_test, "CLR"),
    (lagrange_multiplier_test, "LM"),
]:
    stat, pval = test(Z=Z, X=X, W=W, y=y, beta=np.array([0.]))
    print(f"{name:<11}: statistic={stat:5.2f}, p-value={pval:.4f}")

Wald (TSLS): statistic=10.62, p-value=0.0011
Wald (LIML): statistic= 9.46, p-value=0.0021
AR         : statistic= 5.07, p-value=0.0016
LR         : statistic=10.93, p-value=0.0009
CLR        : statistic=10.93, p-value=0.0024
LM         : statistic= 5.79, p-value=0.0161

Testing model assumptions

We compute the LIML variant of the Sargan J-statistic (Guggenberger, 2012) and the Cragg-Donald test statistic (Cragg and Donald, 1993) of reduced rank.

[9]:

from ivmodels.tests import j_test, rank_test

XW = pd.concat([X, W], axis=1)

j_stat, j_pval = j_test(Z=Z, X=XW, y=y, estimator="liml")
print(f"J-statistic   : {j_stat:6.3f}, p-value: {j_pval:.4f}")
rank_stat, rank_pval = rank_test(Z=Z, X=XW)
print(f"Rank statistic: {rank_stat:6.3f}, p-value: {rank_pval:.4f}")

J-statistic   :  4.284, p-value: 0.1174
Rank statistic: 15.613, p-value: 0.0014

The (LIML variant of the) J-test does not reject at \(\alpha = 0.05\).

Stock and Yogo (2002) do not tabulate critical values for Anderson’s (1951) statistic of reduced rank for Wald tests to have acceptable size for \(m=3\) endogenous regressors. For the TSLS centered Wald test at significance \(\alpha = 0.05\) to have an expected size not exceeding \(r = 0.1\) for \(k=5\) and \(m=1, 2\) these critical values would be 26.87 and 19.45. These values are close to the observed value of 15.6.

We apply Scheidegger et al.’s (2025) residual prediction test of model misspecification to our dataset. This tests

\[H_0 \colon \exists \beta \in \mathbb{R}^m \text{ such that } \mathbb{E}[ y_i - X_i \beta \mid Z_i ] = 0.\]

Here, we need to explicitly include the excluded exogenous regressors.

[10]:

from ivmodels.tests import residual_prediction_test

residual_prediction_test(
    Z=df[["nearc4a", "nearc4b", "nearc2", "age76", "age762"]],
    X=df[["ed76", "exp76", "exp762"]],
    y=df["lwage76"],
    C=df[family + indicators],
    seed=0,
)

[10]:

(1.6869718742813247, 0.04580438012203314)

The result is just barely significant at 5%. To avoid the \(p\)-value lottery due to the random train/test split used in the residual prediction test, Scheidegger et al. (2025) suggest aggregating the \(p\)-values from 50 random splits by taking 2 times the median. This results in a conservative \(p\)-value (Meinshausen et al., 2009).

[11]:

ps = np.zeros(50)
for i in range(50):
    _, ps[i] = residual_prediction_test(
        Z=df[["nearc4a", "nearc4b", "nearc2", "age76", "age762"]],
        X=df[["ed76", "exp76", "exp762"]],
        y=df["lwage76"],
        C=df[family + indicators],
        seed=i
    )

2 * np.median(ps)

[11]:

0.1058841781437827

This is not significant at 5%.

Is it necessary to treat experience and experience squared as endogenous variables in Card’s (1995) analysis? Treating them as exogenous would simplify the analysis by negating the need for subvector tests. Using Scheidegger’s (2025) residual prediction test we can verify that instruments are likely not valid if we treat experience and its square as exogenous.

[12]:

for i in range(50):
    _, ps[i] = residual_prediction_test(
        Z=df[["nearc4a", "nearc4b", "nearc2", "age76", "age762"]],
        X=df[["ed76"]],
        y=df["lwage76"],
        C=df[family + indicators + ["exp76", "exp762"]],
        seed=i
    )

2 * np.median(ps)

[12]:

0.003990876020778189

Confidence Sets

We compute 95% (subvector) confidence sets for the causal effect of education on log wages.

[13]:

from ivmodels.tests import (
    inverse_anderson_rubin_test,
    inverse_wald_test,
    inverse_likelihood_ratio_test,
    inverse_conditional_likelihood_ratio_test,
    inverse_lagrange_multiplier_test
)

for name, inverse_test in [
    ("Wald (TSLS)", partial(inverse_wald_test, estimator="tsls")),
    ("Wald (LIML)", partial(inverse_wald_test, estimator="liml")),
    ("AR", inverse_anderson_rubin_test),
    ("LR", inverse_likelihood_ratio_test),
    ("CLR", inverse_conditional_likelihood_ratio_test),
    ("LM", inverse_lagrange_multiplier_test),
]:
    print(f"{name:<11}: {inverse_test(Z=Z, X=X, W=W, y=y, alpha=0.05):.3f}")

Wald (TSLS): [0.058, 0.232]
Wald (LIML): [0.063, 0.282]
AR         : [0.083, 0.352]
LR         : [0.079, 0.368]
CLR        : [0.073, 0.396]
LM         : [-0.594, -0.059] U [0.061, 0.467]

The confidence sets are of varying size. The inverse Anderson-Rubin test confidence set is the smallest among the weak-instrument-robust tests. This occurs as \(J_\mathrm{LIML} = 4.284\) is substantial and

\[(k - m_w) \cdot \mathrm{AR}(\beta) = \mathrm{LR}(\beta) + J_\mathrm{LIML}\]

such that “a little misspecification improves power” as discussed in Londschien(2025), section 4.

We observe a common phenomenon for the Lagrange multiplier test: The confidence set obtained by inversion is disjoint. This is due to the test being a function of the score, the derivative of the likelihood, such that the statistic is zero both at the minimum but also at the maximum of the likelihood. The Lagrange multiplier test still has its uses, and might result in more smaller confidence sets if one can discard one part of the confidence set a-priori, e.g., as prior knowledge dictates that \(\beta_0 > 0\).

We now numerically verify proposition 44 of Londschien (2025). For \(\alpha_\mathrm{max} = 1 − F_{\chi^2(k - m_w)}(J_\mathrm{LIML})\), we expect the inverse Anderson-Rubin test confidence set to be just non-empty. For any \(\alpha > \alpha_\mathrm{max}\), we expect the confidence set to be empty.

[14]:

import scipy.stats

alpha_max = 1 - scipy.stats.chi2(5 - 2).cdf(j_stat)  # not equal to j_pval

cs = inverse_anderson_rubin_test(Z=Z, X=X, y=y, W=W, alpha=alpha_max - 1e-6)
print(f"Inverse AR (1-alpha={1 - alpha_max - 1e-6:7f}): {cs:.7f}")
cs = inverse_anderson_rubin_test(Z=Z, X=X, y=y, W=W, alpha=alpha_max + 1e-6)
print(f"Inverse AR (1-alpha={1 - alpha_max + 1e-6:7f}): {cs:.3f}")

Inverse AR (1-alpha=0.767640): [0.1721722, 0.1725321]
Inverse AR (1-alpha=0.767642): ∅

For \(\alpha_\mathrm{min} = 1 - F_{\chi^2(k-{m_W})}^{-1}(\mathrm{CD})\), we expect the inverse Anderson-Rubin test confidence set to be just bounded. For \(\alpha > \alpha_\mathrm{min}\), the confidence set should be unbounded.

[15]:

alpha_min = rank_pval

# The confidence set explodes at exactly alpha_min due to numerical instabilities
cs = inverse_anderson_rubin_test(Z=Z, X=X, y=y, W=W, alpha=alpha_min - 1e-6)
print(f"Inverse AR (1-alpha={1 - alpha_min - 1e-6:.6f}): {cs:.3f}")
cs = inverse_anderson_rubin_test(Z=Z, X=X, y=y, W=W, alpha=alpha_min + 1e-6)
print(f"Inverse AR (1-alpha={1 - alpha_min + 1e-6:.6f}): {cs:.3f}")

Inverse AR (1-alpha=0.998638): [-inf, -1451.003] U [-0.005, inf]
Inverse AR (1-alpha=0.998640): [-0.005, 1452.363]

The confidence sets obtained using \(\chi^2(k - m_W)\) critical values are more powerful than those proposed by Dufour et al. (2005). For \(\alpha=0.005\), the former are finite and don’t include 0, whereas the latter are unbounded.

[16]:

cs = inverse_anderson_rubin_test(Z=Z, X=X, y=y, W=W, alpha=0.005)
print(f"Inverse AR (1-alpha={1 - 0.005}): {cs:.3f}")

cs = inverse_anderson_rubin_test(Z=Z, X=XW, y=y, alpha=0.005).project([0])
print(f"Inverse AR (1-alpha={1 - 0.005}) by projection: {cs:.3f}")

Inverse AR (1-alpha=0.995): [0.028, 0.932]
Inverse AR (1-alpha=0.995) by projection: [-inf, -1.822] U [-0.023, inf]