{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Risk and Time Preferences: Linking Experimental and Household Survey Data from Vietnam\n",
    "\n",
    "Tomomi Tanaka, Colin F. Camerer, and Quang Nguyen (2010) investigate causes for risk preferences in Vietnam. Individuals from 25 households were interviewed for each of 289 villages.\n",
    "The authors work with a subsample of in total 181 households from 9 villages.\n",
    "From the interviews, they estimate measures of risk preferences, including the curvature of the utility function. We investigate how this is affected by income and gender.\n",
    "\n",
    "The data used by Tanaka et al. (2010) can be downloaded from https://www.openicpsr.org/openicpsr/project/112336, but this requires an institutional login. We assume this has been downloaded into the working directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "nmlrlincome    0.049098\n",
      "mnincome       0.010253\n",
      "gender        -0.006189\n",
      "Name: coefficients, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "from zipfile import ZipFile\n",
    "\n",
    "from ivmodels import KClass\n",
    "import pandas as pd\n",
    "\n",
    "with ZipFile(\"112336-V1.zip\").open(\"20060431_data/20060431_risk.dta\") as file:\n",
    "    df = pd.read_stata(file)\n",
    "\n",
    "y = df[\"vfctnc\"]  # measure of risk preference\n",
    "C = df[[\"chinese\", \"edu\", \"market\", \"south\", \"gender\", \"age\"]]\n",
    "X = df[[\"mnincome\", \"nmlrlincome\"]]  # mean village income and income relative to the village mean\n",
    "Z = df[[\"rainfall\", \"headnowork\"]]\n",
    "\n",
    "tsls = KClass(kappa=\"tsls\").fit(Z=Z, X=X, y=y, C=C)\n",
    "\n",
    "features = [\"nmlrlincome\", \"mnincome\", \"gender\"]\n",
    "print(tsls.named_coef_[[\"nmlrlincome\", \"mnincome\", \"gender\"]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this application, the number of instruments (2) is equal to the number of endogenous regressors (2).\n",
    "Thus, the LIML estimator is equal to the TSLS estimator.\n",
    "Also, the Anderson-Rubin, conditional likelihood-ratio, and Lagrange multiplier tests are equivalent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "liml.kappa_=1.0\n",
      "nmlrlincome    0.049098\n",
      "mnincome       0.010253\n",
      "gender        -0.006189\n",
      "Name: coefficients, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "liml = KClass(kappa=\"liml\").fit(Z=Z, X=X, y=y, C=C)\n",
    "print(f\"{liml.kappa_=:}\")\n",
    "print(liml.named_coef_[features])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Summary based on the wald test.\n",
      "\n",
      "              estimate  statistic  p-value              conf. set\n",
      "nmlrlincome     0.0491     0.1106   0.7394      [-0.2402, 0.3384]\n",
      "mnincome       0.01025      3.411  0.06475  [-0.0006271, 0.02113]\n",
      "gender       -0.006189    0.01087   0.9169      [-0.1225, 0.1101]\n",
      "\n",
      "Endogenous model statistic: 3.525, p-value: 0.1716\n",
      "(Multivariate) F-statistic: 6.07, p-value: 0.01375\n",
      "\n",
      "Summary based on the anderson-rubin test.\n",
      "\n",
      "              estimate  statistic  p-value              conf. set\n",
      "nmlrlincome     0.0491     0.1135   0.7362      [-0.3391, 0.6383]\n",
      "mnincome       0.01025      3.507  0.06109  [-0.0005294, 0.02222]\n",
      "gender       -0.006189    0.01085    0.917      [-0.1212, 0.1187]\n",
      "\n",
      "Endogenous model statistic: 1.952, p-value: 0.1419\n",
      "(Multivariate) F-statistic: 6.07, p-value: 0.01375\n"
     ]
    }
   ],
   "source": [
    "print(liml.summary(Z=Z, X=X, y=y, C=C, test=\"wald\", feature_names=features))\n",
    "print(\"\")\n",
    "print(liml.summary(Z=Z, X=X, y=y, C=C, test=\"anderson-rubin\", feature_names=features))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The instruments are weak, but don't prohibit inference with weak-instrument-robust tests such as the Anderson-Rubin test.\n",
    "The causal effect of mean village income (`mnincome`) is significant at alpha=0.1 for both tests.\n",
    "\n",
    "In Londschien and Bühlmann (2024), we suggest building interactions of instruments to improve identification.\n",
    "We thus add the interaction of `rainfall` and `headnowork` to the instruments and repeat the analysis above.\n",
    "As in the previous specification no individual causal effects were significant at the level 0.05, we present 80% confidence sets below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "nmlrlincome    0.046815\n",
      "mnincome       0.010361\n",
      "gender        -0.006066\n",
      "Name: coefficients, dtype: float64\n",
      "\n",
      "nmlrlincome    0.048929\n",
      "mnincome       0.010396\n",
      "gender        -0.005918\n",
      "Name: coefficients, dtype: float64\n",
      "\n",
      "Summary based on the wald test.\n",
      "\n",
      "              estimate  statistic  p-value            conf. set\n",
      "nmlrlincome    0.04893     0.1058   0.7449    [-0.1438, 0.2417]\n",
      "mnincome        0.0104      3.501  0.06135  [0.003275, 0.01752]\n",
      "gender       -0.005918   0.009927   0.9206   [-0.08204, 0.0702]\n",
      "\n",
      "Endogenous model statistic: 3.609, p-value: 0.1646\n",
      "(Multivariate) F-statistic: 6.041, p-value: 0.04877\n",
      "J-statistic (LIML): 0.2191, p-value: 0.6397\n",
      "\n",
      "Summary based on the anderson-rubin test.\n",
      "\n",
      "              estimate  statistic  p-value             conf. set\n",
      "nmlrlincome    0.04893     0.1636   0.8491     [-0.2698, 0.4911]\n",
      "mnincome        0.0104      1.894   0.1504  [0.0009274, 0.02083]\n",
      "gender       -0.005918     0.1145   0.8918     [-0.1078, 0.1025]\n",
      "\n",
      "Endogenous model statistic: 1.402, p-value: 0.2401\n",
      "(Multivariate) F-statistic: 6.041, p-value: 0.04877\n",
      "J-statistic (LIML): 0.2191, p-value: 0.6397\n",
      "\n",
      "Summary based on the conditional likelihood-ratio test.\n",
      "\n",
      "              estimate  statistic  p-value            conf. set\n",
      "nmlrlincome    0.04893     0.1081   0.7667    [-0.1954, 0.3599]\n",
      "mnincome        0.0104       3.57   0.1052  [0.002431, 0.01907]\n",
      "gender       -0.005918        nan        1          [-inf, inf]\n",
      "\n",
      "Endogenous model statistic: 3.987, p-value: 0.1743\n",
      "(Multivariate) F-statistic: 6.041, p-value: 0.04877\n",
      "J-statistic (LIML): 0.2191, p-value: 0.6397\n",
      "\n",
      "Summary based on the lagrange multiplier test.\n",
      "\n",
      "              estimate  statistic  p-value                                 conf. set\n",
      "nmlrlincome    0.04893     0.1042   0.7469      [-10.39, -1.246] U [-0.1642, 0.3111]\n",
      "mnincome        0.0104      3.542  0.05983                       [0.003364, 0.01799]\n",
      "gender       -0.005918   0.009855   0.9209  [-0.8529, -0.2544] U [-0.08144, 0.07243]\n",
      "\n",
      "Endogenous model statistic: 3.975, p-value: 0.1371\n",
      "(Multivariate) F-statistic: 6.041, p-value: 0.04877\n",
      "J-statistic (LIML): 0.2191, p-value: 0.6397\n",
      "\n"
     ]
    }
   ],
   "source": [
    "df[\"rainfallxheadnowork\"] = df[\"rainfall\"] * df[\"headnowork\"]\n",
    "\n",
    "Z = df[[\"rainfall\", \"headnowork\", \"rainfallxheadnowork\"]]\n",
    "\n",
    "tsls = KClass(kappa=\"tsls\").fit(Z=Z, X=X, y=y, C=C)\n",
    "print(tsls.named_coef_[features])\n",
    "print(\"\")\n",
    "\n",
    "liml = KClass(kappa=\"liml\").fit(Z=Z, X=X, y=y, C=C)\n",
    "print(liml.named_coef_[features])\n",
    "print(\"\")\n",
    "\n",
    "for test in [\n",
    "    \"wald\",\n",
    "    \"anderson-rubin\",\n",
    "    \"conditional likelihood-ratio\",\n",
    "    \"lagrange multiplier\"\n",
    "]:\n",
    "    print(liml.summary(Z=Z, X=X, y=y, C=C, test=test, feature_names=features, alpha=0.2))\n",
    "    print(\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The additional instrument did not increase identification, with Anderson's (1951) test statistic of reduced rank decreasing from 6.07 to 6.04.\n",
    "For the Wald test (that is not robust to weak instruments), the additional instrument decreased the p-value for the conditional causal effect of village mean income on risk preferences from 0.065 to 0.61.\n",
    "For the (weak instrument robust) Anderson-Rubin and conditional likelihood-ratio tests, the p-values increased from 0.061 to 0.150 and 0.105 respectively.\n",
    "For the (weak instrument robust) Lagrange multiplier test, the p-value slightly decreased from 0.061 to 0.060."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ivmodels",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}