Risk and Time Preferences: Linking Experimental and Household Survey Data from Vietnam

Tomomi Tanaka, Colin F. Camerer, and Quang Nguyen (2010) investigate causes for risk preferences in Vietnam. Individuals from 25 households were interviewed for each of 289 villages. The authors work with a subsample of in total 181 households from 9 villages. From the interviews, they estimate measures of risk preferences, including the curvature of the utility function. We investigate how this is affected by income and gender.

The data used by Tanaka et al. (2010) can be downloaded from https://www.openicpsr.org/openicpsr/project/112336, but this requires an institutional login. We assume this has been downloaded into the working directory.

[1]:
from zipfile import ZipFile

from ivmodels import KClass
import pandas as pd

with ZipFile("112336-V1.zip").open("20060431_data/20060431_risk.dta") as file:
    df = pd.read_stata(file)

y = df["vfctnc"]
C = df[["chinese", "edu", "market", "south", "gender", "age"]]
X = df[["nmlrlincome", "mnincome"]]
Z = df[["rainfall", "headnowork"]]

tsls = KClass(kappa="tsls").fit(Z=Z, X=X, y=y, C=C)

features = ["nmlrlincome", "mnincome", "gender"]
print(tsls.named_coefs_[["nmlrlincome", "mnincome", "gender"]])
nmlrlincome    0.049098
mnincome       0.010253
gender        -0.006189
Name: coefficients, dtype: float64

In this application, the number of instruments (2) is equal to the number of endogenous regressors (2). Thus, the LIML estimator is equal to the TSLS estimator. Also, the Anderson-Rubin, conditional likelihood-ratio, and Lagrange multiplier tests are equivalent.

[2]:
liml = KClass(kappa="liml").fit(Z=Z, X=X, y=y, C=C)
print(f"{liml.kappa_=:}")
print(liml.named_coefs_[features])
liml.kappa_=1.0
nmlrlincome    0.049098
mnincome       0.010253
gender        -0.006189
Name: coefficients, dtype: float64
[3]:
print(liml.summary(Z=Z, X=X, y=y, C=C, test="wald", feature_names=features))
print("")
print(liml.summary(Z=Z, X=X, y=y, C=C, test="anderson-rubin", feature_names=features))
Summary based on the wald test.

              estimate  statistic  p-value              conf. set
nmlrlincome     0.0491     0.1106   0.7394      [-0.2402, 0.3384]
mnincome       0.01025      3.411  0.06475  [-0.0006271, 0.02113]
gender       -0.006189    0.01087   0.9169      [-0.1225, 0.1101]

Endogenous model statistic: 3.525, p-value: 0.1716
(Multivariate) F-statistic: 6.07, p-value: 0.01375
J-statistic (LIML): 1.331e-12, p-value: nan

Summary based on the anderson-rubin test.

              estimate  statistic  p-value              conf. set
nmlrlincome     0.0491     0.1135   0.7362      [-0.3391, 0.6383]
mnincome       0.01025      3.507  0.06109  [-0.0005294, 0.02222]
gender       -0.006189    0.01085    0.917      [-0.1212, 0.1187]

Endogenous model statistic: 1.952, p-value: 0.1419
(Multivariate) F-statistic: 6.07, p-value: 0.01375
J-statistic (LIML): 1.331e-12, p-value: nan

The instruments are weak, but don’t prohibit inference with weak-instrument-robust tests such as the Anderson-Rubin test. The causal effect of mean village income (mnincome) is significant at alpha=0.1 for both tests.

In Londschien and Bühlmann (2024), we suggest building interactions of instruments to improve identification. We thus add the interaction of rainfall and headnowork to the instruments and repeat the analysis above. As in the previous specification no individual causal effects were significant at the level 0.05, we present 80% confidence sets below.

[4]:
df["rainfallxheadnowork"] = df["rainfall"] * df["headnowork"]

Z = df[["rainfall", "headnowork", "rainfallxheadnowork"]]

tsls = KClass(kappa="tsls").fit(Z=Z, X=X, y=y, C=C)
print(tsls.named_coefs_[features])
print("")

liml = KClass(kappa="liml").fit(Z=Z, X=X, y=y, C=C)
print(liml.named_coefs_[features])
print("")

for test in [
    "wald",
    "anderson-rubin",
    "conditional likelihood-ratio",
    "lagrange multiplier"
]:
    print(liml.summary(Z=Z, X=X, y=y, C=C, test=test, feature_names=features, alpha=0.2))
    print("")
nmlrlincome    0.046815
mnincome       0.010361
gender        -0.006066
Name: coefficients, dtype: float64

nmlrlincome    0.048929
mnincome       0.010396
gender        -0.005918
Name: coefficients, dtype: float64

Summary based on the wald test.

              estimate  statistic  p-value            conf. set
nmlrlincome    0.04893     0.1058   0.7449    [-0.1438, 0.2417]
mnincome        0.0104      3.501  0.06135  [0.003275, 0.01752]
gender       -0.005918   0.009927   0.9206   [-0.08204, 0.0702]

Endogenous model statistic: 3.609, p-value: 0.1646
(Multivariate) F-statistic: 6.041, p-value: 0.04877
J-statistic (LIML): 0.2191, p-value: 0.6397

Summary based on the anderson-rubin test.

              estimate  statistic  p-value             conf. set
nmlrlincome    0.04893     0.1636   0.8491     [-0.2698, 0.4911]
mnincome        0.0104      1.894   0.1504  [0.0009274, 0.02083]
gender       -0.005918     0.1145   0.8918     [-0.1078, 0.1025]

Endogenous model statistic: 1.402, p-value: 0.2401
(Multivariate) F-statistic: 6.041, p-value: 0.04877
J-statistic (LIML): 0.2191, p-value: 0.6397

Summary based on the conditional likelihood-ratio test.

              estimate  statistic  p-value            conf. set
nmlrlincome    0.04893     0.1081   0.7667    [-0.1954, 0.3599]
mnincome        0.0104       3.57   0.1052  [0.002431, 0.01907]
gender       -0.005918        nan        1          [-inf, inf]

Endogenous model statistic: 3.987, p-value: 0.1743
(Multivariate) F-statistic: 6.041, p-value: 0.04877
J-statistic (LIML): 0.2191, p-value: 0.6397

Summary based on the lagrange multiplier test.

              estimate  statistic  p-value                                 conf. set
nmlrlincome    0.04893     0.1042   0.7469      [-10.39, -1.246] U [-0.1642, 0.3111]
mnincome        0.0104      3.542  0.05983                       [0.003364, 0.01799]
gender       -0.005918   0.009855   0.9209  [-0.8529, -0.2543] U [-0.08144, 0.07243]

Endogenous model statistic: 3.975, p-value: 0.1371
(Multivariate) F-statistic: 6.041, p-value: 0.04877
J-statistic (LIML): 0.2191, p-value: 0.6397

The additional instrument did not increase identification, with Anderson’s (1951) test statistic of reduced rank decreasing from 6.07 to 6.04. For the Wald test (that is not robust to weak instruments), the additional instrument decreased the p-value for the conditional causal effect of village mean income on risk preferences from 0.065 to 0.61. For the (weak instrument robust) Anderson-Rubin and conditional likelihood-ratio tests, the p-values increased from 0.061 to 0.150 and 0.105 respectively. For the (weak instrument robust) Lagrange multiplier test, the p-value slightly decreased from 0.061 to 0.060.