{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Colonial Origins of Comparative Development: An Empirical Investigation\n",
    "Daron Acemoglu, Simon Johnson, and James A. Robinson (2001)\n",
    "\n",
    "We replicate tables 4, 5.\n",
    "The data is available under https://economics.mit.edu/people/faculty/daron-acemoglu/data-archive.\n",
    "We download directly from the dropbox folders supplied.\n",
    "We start with table 4.\n",
    "In all specifications a single endogenous variable is instrumented by a single instrument, so the LIML and TSLS estimators are equal, as are the Anderson-Rubin, (conditional) likelihood-ratio, and Lagrange multiplier tests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>shortnam</th>\n",
       "      <th>africa</th>\n",
       "      <th>lat_abst</th>\n",
       "      <th>rich4</th>\n",
       "      <th>avexpr</th>\n",
       "      <th>logpgp95</th>\n",
       "      <th>logem4</th>\n",
       "      <th>asia</th>\n",
       "      <th>loghjypl</th>\n",
       "      <th>baseco</th>\n",
       "      <th>other_continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>AGO</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.136667</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5.363636</td>\n",
       "      <td>7.770645</td>\n",
       "      <td>5.634789</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-3.411248</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ARG</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.377778</td>\n",
       "      <td>0.0</td>\n",
       "      <td>6.386364</td>\n",
       "      <td>9.133459</td>\n",
       "      <td>4.232656</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-0.872274</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>AUS</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.300000</td>\n",
       "      <td>1.0</td>\n",
       "      <td>9.318182</td>\n",
       "      <td>9.897972</td>\n",
       "      <td>2.145931</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-0.170788</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>BFA</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.144444</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4.454545</td>\n",
       "      <td>6.845880</td>\n",
       "      <td>5.634789</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-3.540459</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>BGD</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.266667</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5.136364</td>\n",
       "      <td>6.877296</td>\n",
       "      <td>4.268438</td>\n",
       "      <td>1.0</td>\n",
       "      <td>-2.063568</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   shortnam  africa  lat_abst  rich4    avexpr  logpgp95    logem4  asia  \\\n",
       "1       AGO     1.0  0.136667    0.0  5.363636  7.770645  5.634789   0.0   \n",
       "3       ARG     0.0  0.377778    0.0  6.386364  9.133459  4.232656   0.0   \n",
       "5       AUS     0.0  0.300000    1.0  9.318182  9.897972  2.145931   0.0   \n",
       "11      BFA     1.0  0.144444    0.0  4.454545  6.845880  5.634789   0.0   \n",
       "12      BGD     0.0  0.266667    0.0  5.136364  6.877296  4.268438   1.0   \n",
       "\n",
       "    loghjypl  baseco  other_continent  \n",
       "1  -3.411248     1.0                0  \n",
       "3  -0.872274     1.0                0  \n",
       "5  -0.170788     1.0                1  \n",
       "11 -3.540459     1.0                0  \n",
       "12 -2.063568     1.0                0  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from io import BytesIO\n",
    "from zipfile import ZipFile\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import requests\n",
    "\n",
    "url4 = \"https://www.dropbox.com/scl/fi/3yuv9j514zuajzjfluoc1/maketable4.zip?rlkey=pq9l7bxktw1iqxe6fmoh26g79&e=1&dl=1\"\n",
    "content4 = requests.get(url4).content\n",
    "\n",
    "with ZipFile(BytesIO(content4)).open(\"maketable4.dta\") as file:\n",
    "    df4 = pd.read_stata(file)\n",
    "\n",
    "df4 = df4[lambda x: x[\"baseco\"] == 1]\n",
    "df4[\"other_continent\"] = df4[\"shortnam\"].isin([\"AUS\", \"MLT\", \"NZL\"]).astype(int)\n",
    "\n",
    "df4.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Column (1): 0.9443 (0.1565)\n",
      "wald: 36.3941 (1.6e-09), ar: 56.6029 (5.32907e-14), f: 23 (1.7e-06)\n",
      "Column (2): 0.9957 (0.2217)\n",
      "wald: 20.1744 (7.1e-06), ar: 36.2442 (1.74077e-09), f: 13 (0.0003)\n",
      "Column (3): 1.2813 (0.3585)\n",
      "wald: 12.7735 (0.00035), ar: 34.3118 (4.69521e-09), f: 8.6 (0.0033)\n",
      "Column (4): 1.2117 (0.3543)\n",
      "wald: 11.6997 (0.00063), ar: 28.1328 (1.13272e-07), f: 7.8 (0.0051)\n",
      "Column (5): 0.5780 (0.0981)\n",
      "wald: 34.6971 (3.9e-09), ar: 24.2217 (8.5858e-07), f: 31 (3.3e-08)\n",
      "Column (6): 0.5757 (0.1173)\n",
      "wald: 24.0864 (9.2e-07), ar: 16.9777 (3.7822e-05), f: 22 (3.3e-06)\n",
      "Column (7): 0.9822 (0.2995)\n",
      "wald: 10.7573 (0.001), ar: 19.3383 (1.09486e-05), f: 6.2 (0.013)\n",
      "Column (8): 1.1071 (0.4636)\n",
      "wald: 5.7032 (0.017), ar: 13.5579 (0.000231311), f: 3.5 (0.063)\n",
      "Column (9): 0.9808 (0.1709)\n",
      "wald: 32.9327 (9.5e-09), ar: 106.2383 ( 0), f: 24 (1.1e-06)\n"
     ]
    }
   ],
   "source": [
    "from ivmodels import KClass\n",
    "from ivmodels.tests import wald_test, anderson_rubin_test, rank_test\n",
    "\n",
    "endogenous = [\"avexpr\"]  # average protection against expropriation risk\n",
    "instruments = [\"logem4\"]  # log european settler mortality\n",
    "\n",
    "for column, (outcome, exogenous, filter) in enumerate([\n",
    "    # *Columns 1 - 2 (Base Sample)\n",
    "    # ivreg logpgp95 (avexpr=logem4), first\n",
    "    (\"logpgp95\", [], None),\n",
    "    # ivreg logpgp95 lat_abst (avexpr=logem4), first\n",
    "    (\"logpgp95\", [\"lat_abst\"], None),\n",
    "    # *Columns 3 - 4 (Base Sample w/o Neo-Europes)\n",
    "    # ivreg logpgp95 (avexpr=logem4) if rich4!=1, first\n",
    "    (\"logpgp95\", [], \"rich4!=1\"),\n",
    "    # ivreg logpgp95 lat_abst (avexpr=logem4) if rich4!=1, first\n",
    "    (\"logpgp95\", [\"lat_abst\"], \"rich4!=1\"),\n",
    "    # *Columns 5 - 6 (Base Sample w/o Africa)\n",
    "    # ivreg logpgp95 (avexpr=logem4) if africa!=1, first\n",
    "    (\"logpgp95\", [], \"africa!=1\"),\n",
    "    # ivreg logpgp95 lat_abst (avexpr=logem4) if africa!=1, first\n",
    "    (\"logpgp95\", [\"lat_abst\"], \"africa!=1\"),\n",
    "    # *Columns 7 - 8 (Base Sample with continent dummies)\n",
    "    # ivreg logpgp95 (avexpr=logem4) africa asia other_cont, first\n",
    "    (\"logpgp95\", [\"africa\", \"asia\", \"other_continent\"], None),\n",
    "    # ivreg logpgp95 lat_abst (avexpr=logem4) africa asia other_cont, first\n",
    "    (\"logpgp95\", [\"lat_abst\", \"africa\", \"asia\", \"other_continent\"], None),\n",
    "    # *Column 9 (Base Sample, log GDP per worker)\n",
    "    # ivreg loghjypl (avexpr=logem4), first\n",
    "    (\"loghjypl\", [], None),\n",
    "]):\n",
    "    data = df4 if filter is None else df4.copy().query(filter)\n",
    "    data = data[lambda x: x[outcome].notna()]\n",
    "\n",
    "    X = data[endogenous]\n",
    "    Z = data[instruments]\n",
    "    C = data[exogenous]\n",
    "    y = data[outcome]\n",
    "\n",
    "    estimator = KClass(kappa=\"tsls\").fit(Z=Z, X=X, C=C, y=y)\n",
    "    \n",
    "    wald, wald_p = wald_test(X=X, y=y, Z=Z, C=C, beta=np.zeros(1))\n",
    "    std_error = np.abs(estimator.coef_[0]) / np.sqrt(wald)\n",
    "    ar, ar_p = anderson_rubin_test(X=X, y=y, Z=Z, C=C, beta=np.zeros(1))\n",
    "    f, f_p = rank_test(X=X, Z=Z, C=C)\n",
    "\n",
    "    print(f\"Column ({column + 1}): {estimator.coef_[0]:.4f} ({std_error:.4f})\")\n",
    "    print(f\"wald: {wald:.4f} ({wald_p:.2g}), ar: {ar:.4f} ({ar_p:2g}), f: {f:.2g} ({f_p:.2g})\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Identification is weak, with f-statistics dropping below the heuristic \"weak instrument threshold\" 10, but the causal effect of average protection against expropriation risk on log-gdp is still significant at level 0.001 using the weak-instrument-robust Anderson-Rubint test.\n",
    "\n",
    "We continue with table 5."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Column (1): 1.0778 (0.2176)\n",
      "wald: 24.5314 (7.3e-07), ar: 44.8089 (2.17235e-11), f: 15 (0.00013)\n",
      "Column (2): 1.1553 (0.3372)\n",
      "wald: 11.7400 (0.00061), ar: 26.1369 (3.18051e-07), f: 7.3 (0.0069)\n",
      "Column (3): 1.0662 (0.2443)\n",
      "wald: 19.0473 (1.3e-05), ar: 34.7046 (3.83717e-09), f: 9.8 (0.0017)\n",
      "Column (4): 1.3391 (0.5164)\n",
      "wald: 6.7227 (0.0095), ar: 20.4897 (5.99543e-06), f: 3.9 (0.049)\n",
      "Column (5): 1.0800 (0.1912)\n",
      "wald: 31.9153 (1.6e-08), ar: 55.7875 (8.07132e-14), f: 18 (1.9e-05)\n",
      "Column (6): 1.1811 (0.2910)\n",
      "wald: 16.4715 (4.9e-05), ar: 36.1450 (1.83171e-09), f: 9.9 (0.0017)\n",
      "Column (7): 0.9174 (0.1467)\n",
      "wald: 39.1013 (4e-10), ar: 51.2727 (8.03801e-13), f: 20 (8.4e-06)\n",
      "Column (8): 1.0062 (0.2517)\n",
      "wald: 15.9767 (6.4e-05), ar: 27.0786 (1.95353e-07), f: 8.6 (0.0033)\n",
      "Column (9): 1.2122 (0.3949)\n",
      "wald: 9.4226 (0.0021), ar: 23.7357 (1.10512e-06), f: 5.3 (0.022)\n"
     ]
    }
   ],
   "source": [
    "\n",
    "url5 = \"https://www.dropbox.com/scl/fi/qiqyoc34vtr5tvgo4ew7n/maketable5.zip?rlkey=7xn2e59lqn8skf1psv0f7lkr0&e=1&dl=1\"\n",
    "content5 = requests.get(url5).content\n",
    "\n",
    "with ZipFile(BytesIO(content5)).open(\"maketable5.dta\") as file:\n",
    "    df5 = pd.read_stata(file)\n",
    "\n",
    "df5 = df5[lambda x: x[\"baseco\"] == 1]\n",
    "\n",
    "outcome = 'logpgp95'\n",
    "endogenous = [\"avexpr\"]\n",
    "instruments = [\"logem4\"]\n",
    "\n",
    "for column, (exogenous, filter) in enumerate([\n",
    "    # *--Columns 1 and 2 (British and French colony dummies)\n",
    "    # ivreg logpgp95 (avexpr=logem4) f_brit f_french, first\n",
    "    ([\"f_brit\", \"f_french\"], None),\n",
    "    # ivreg logpgp95 lat_abst (avexpr=logem4) f_brit f_french, first\n",
    "    ([\"lat_abst\", \"f_brit\", \"f_french\"], None),\n",
    "    # *--Columns 3 and 4 (British colonies only)\n",
    "    # ivreg logpgp95 (avexpr=logem4) if f_brit==1, first\n",
    "    ([], \"f_brit==1\"),\n",
    "    # ivreg logpgp95 lat_abst (avexpr=logem4) if f_brit==1, first\n",
    "    ([\"lat_abst\"], \"f_brit==1\"),\n",
    "    # *--Columns 5 and 6 (Control for French legel origin)\n",
    "    # ivreg logpgp95 (avexpr=logem4) sjlofr, first\n",
    "    ([\"sjlofr\"], None),\n",
    "    # ivreg logpgp95 lat_abst (avexpr=logem4) sjlofr, first\n",
    "    ([\"lat_abst\", \"sjlofr\"], None),\n",
    "    # *--Columns 7 and 8 (Religion dummies)\n",
    "    # ivreg logpgp95 (avexpr=logem4) catho80 muslim80 no_cpm80, first\n",
    "    ([\"catho80\", \"muslim80\", \"no_cpm80\"], None),\n",
    "    # ivreg logpgp95 lat_abst (avexpr=logem4) catho80 muslim80 no_cpm80, first\n",
    "    ([\"lat_abst\", \"catho80\", \"muslim80\", \"no_cpm80\"], None), \n",
    "    # *--Columns 9 (Multiple controls)\n",
    "    # ivreg logpgp95 lat_abst (avexpr=logem4) f_french sjlofr catho80 muslim80 no_cpm80, first\n",
    "    ([\"lat_abst\", \"f_french\", \"sjlofr\", \"catho80\", \"muslim80\", \"no_cpm80\"], None)\n",
    "]):\n",
    "    data = df5 if filter is None else df5.copy().query(filter)\n",
    "    data = data[lambda x: x[outcome].notna()]\n",
    "\n",
    "    X = data[endogenous]\n",
    "    Z = data[instruments]\n",
    "    C = data[exogenous]\n",
    "    y = data[outcome]\n",
    "\n",
    "    estimator = KClass(kappa=\"tsls\").fit(Z=Z, X=X, C=C, y=y)\n",
    "    \n",
    "    wald, wald_p = wald_test(X=X, y=y, Z=Z, C=C, beta=np.zeros(1))\n",
    "    std_error = np.abs(estimator.coef_[0]) / np.sqrt(wald)\n",
    "    ar, ar_p = anderson_rubin_test(X=X, y=y, Z=Z, C=C, beta=np.zeros(1))\n",
    "    f, f_p = rank_test(X=X, Z=Z, C=C)\n",
    "\n",
    "    print(f\"Column ({column + 1}): {estimator.coef_[0]:.4f} ({std_error:.4f})\")\n",
    "    print(f\"wald: {wald:.4f} ({wald_p:.2g}), ar: {ar:.4f} ({ar_p:2g}), f: {f:.2g} ({f_p:.2g})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is not a perfect replicates of the results presented by Acemoglu et al. (2001) in Table 4.\n",
    "However, it does match the results when running their Stata code (`maketable5.do` in the archive).\n",
    "\n",
    "As for table 4, identification is weak, but the causal effect of average protection against expropriation risk on log-gdp is still significant at level 0.001 using the weak-instrument-robust Anderson-Rubint test."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ivmodels",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}