ivmodels package
Subpackages
- ivmodels.models package
- ivmodels.tests package
- Submodules
- ivmodels.tests.anderson_rubin module
- ivmodels.tests.conditional_likelihood_ratio module
- ivmodels.tests.j module
- ivmodels.tests.lagrange_multiplier module
- ivmodels.tests.likelihood_ratio module
- ivmodels.tests.pulse module
- ivmodels.tests.rank module
- ivmodels.tests.residual_prediction module
- ivmodels.tests.wald module
- Module contents
anderson_rubin_test()conditional_likelihood_ratio_test()inverse_anderson_rubin_test()inverse_conditional_likelihood_ratio_test()inverse_lagrange_multiplier_test()inverse_likelihood_ratio_test()inverse_pulse_test()inverse_wald_test()inverse_weak_residual_prediction_test()j_test()lagrange_multiplier_test()likelihood_ratio_test()pulse_test()rank_test()residual_prediction_test()wald_test()weak_residual_prediction_test()
Submodules
ivmodels.confidence_set module
- class ivmodels.confidence_set.ConfidenceSet(boundaries)
Bases:
objectA class to represent a 1D confidence set.
- Parameters:
boundaries (list of 2-tuples of floats.) – The boundaries of the confidence set. The confidence set is the union of the intervals defined by the boundaries.
- static from_quadric(quadric)
Create a 1D confidence set from a quadric.
- is_empty()
Return True if the confidence set is empty.
- is_finite()
Return True if the confidence set is finite.
- length()
Return the length of the confidence set.
ivmodels.quadric module
- class ivmodels.quadric.Quadric(A, b, c)
Bases:
objectA class to represent a quadric \(x^T A x + b^T x + c \leq 0\).
Internally, works with a standardized form of the quadric. If \(V^T D V = A\) with \(D\) diagonal and \(V\) orthonormal, define \(x_\mathrm{center} := -A^{-1} b / 2\), \(\tilde x = V^T (x - x_\mathrm{center})\) and \(\tilde c = c - x_\mathrm{center}^T A x_\mathrm{center}\). Then, the standardized form is given by \(\tilde x^T D \tilde x + \tilde c <= 0\).
- Parameters:
A (np.ndarray of dimension (n, n)) – The matrix A of the quadratic form.
b (np.ndarray of dimension (n,)) – The vector b of the quadratic form.
c (float) – The constant c of the quadratic form.
- center
The center of the quadric. Equal to \(-A^{-1} b / 2\).
- Type:
np.ndarray of dimension (n,)
- c_standardized
The constant c of the standardized quadric. Equal to \(c - x_\mathrm{center}^T A x_\mathrm{center}\).
- Type:
float
- D
The diagonal of the matrix \(D\) in the eigenvalue decomposition \(V^T D V = A\).
- Type:
np.ndarray of dimension (n,)
- V
The matrix \(V\) in the eigenvalue decomposition \(V^T D V = A\).
- Type:
np.ndarray of dimension (n, n)
- dim()
Return the dimension of the quadric.
- forward_map(x_tilde)
Map from the standardized space to the original space.
- inverse_map(x)
Map from the original space to the standardized space.
- is_bounded()
Return True if the quadric is bounded.
- is_empty()
Return True if the quadric is empty.
- project(coordinates)
Return the projection of the quadric onto
coordinates.For a quadric \((x - x_\mathrm{center})^T A (x - x_\mathrm{center}) + c \leq 0\) and any matrix \(B \in \mathbb{R}^{q \times p}\) of rank \(q\), the projection of the quadric onto the coordinates given by the columns of \(B\) is given by
\[(Bx - Bx_\mathrm{center})^T (B^T A^{-1} B)^{-1} (Bx - Bx_\mathrm{center}) + c \leq 0.\]Here, \(B\) is given by
coordinates, with \(B_{i, j} = 1\) ifcoordinates[i-1] == jand \(B_{i, j} = 0\) otherwise for \(i = 1, \ldots, q\) and \(j = 1, \ldots, p\).- Parameters:
coordinates (list of int) – The coordinates onto which to project the quadric. Entries must be unique and be between 0 and p - 1.
- Returns:
The projection of the quadric onto the coordinates.
- Return type:
- volume()
Return the volume of the quadric.
ivmodels.simulate module
- ivmodels.simulate.simulate_gaussian_iv(n, *, mx, k, u=None, mw=0, mc=0, md=0, seed=0, include_intercept=True, return_beta=False, return_gamma=False)
Simulate a Gaussian IV dataset.
- Parameters:
n (int) – Number of observations.
mx (int) – Number of endogenous variables.
k (int) – Number of instruments.
u (int, optional) – Number of unobserved variables. If None, defaults to mx.
mw (int, optional) – Number of endogenous variables not of interest.
mc (int, optional) – Number of exogenous included variables.
seed (int, optional) – Random seed.
include_intercept (bool, optional) – Whether to include an intercept.
return_beta (bool, optional) – Whether to return the true beta.
return_gamma (bool, optional) – Whether to return the true gamma.
- Returns:
Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Endogenous variables.
y (np.ndarray of dimension (n,)) – Outcomes.
C (np.ndarray of dimension (n, mc)) – Exogenous included variables.
W (np.ndarray of dimension (n, mw)) – Endogenous variables not of interest.
beta (np.ndarray of dimension (mx,)) – True beta. Only returned if
return_betais True.gamma (np.ndarray of dimension (mw,)) – True gamma. Only returned if
return_gammais True.
- ivmodels.simulate.simulate_guggenberger12(n, *, k, seed=0, h11=100, h12=1, rho=0.95, cov=None, return_beta=False, md=0)
Generate data by process as proposed by Guggenberger et al. [2012].
Will generate data
\[X = Z \Pi_X + V_X W = Z \Pi_W + V_W y = X \beta + W \gamma + \epsilon\]where \(\epsilon, V_X, V_W\) are jointly Gaussian with covariance matrix cov and Z is a matrix of independent centered Gaussian instruments.
- Parameters:
n (int) – Number of observations.
k (int) – Number of instruments.
seed (int, optional, default 0) – Random seed.
h11 (float, optional, default 100) – Equal to \(\sqrt{n} || \Pi_X ||\).
h12 (float, optional, default 1) – Equal to \(\sqrt{n} || \Pi_W ||\).
rho (float, optional, default 0.95) – Equal to \(< \Pi_X, \Pi_W > / (|| \Pi_X || || \Pi_W ||)\).
cov (np.ndarray, optional, default None) – Covariance matrix of the noise. If None, defaults to [[1, 0, 0.95], [0, 1, 0.3], [0.95, 0.3, 1]].
return_beta (bool, optional, default False) – Whether to return the true beta.
- Returns:
Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, 1)) – Endogenous variables.
y (np.ndarray of dimension (n,)) – Outcomes.
C (None) – Empty
W (np.ndarray of dimension (n, 1)) – Endogenous variables not of interest.
beta (np.ndarray of dimension (1,)) – True beta. Only returned if
return_betais True.
ivmodels.summary module
- class ivmodels.summary.CoefficientTable(feature_names, estimates, statistics, p_values, confidence_sets)
Bases:
objectTable with estimates, statistics, p-values, and confidence sets for each feature.
- Parameters:
feature_names (list of str) – Names of the features.
estimates (list of float) – Estimates of the coefficients.
statistics (list of float) – Test statistics.
p_values (list of float) – P-values of the test statistics.
confidence_sets (list of ivmodels.confidence_set.ConfidenceSet) – Confidence sets for the coefficients.
- class ivmodels.summary.Summary(kclass, test, alpha, feature_names=None)
Bases:
objectClass containing summary statistics for a fitted model.
- Parameters:
kclass (ivmodels.KClass or child class of ivmodels.models.kclass.KClassMixin) – Fitted model.
test (str) – Name of the test to be used. One of
"wald","anderson-rubin","lagrange multiplier","likelihood-ratio", or"conditional likelihood-ratio".alpha (float) – Significance level \(\alpha\) for the confidence sets, e.g., 0.05. The confidence of the confidence set will be \(1 - \alpha\)
feature_names (list of str, optional) – Names of the features to be included in the summary. If not specified, all features will be included.
- coefficient_table_
Table containing the estimates, test statistics, p-values, and confidence sets for each feature.
- Type:
- statistic_
Test statistic with null hypothesis that coefficients corresponding to the endogenous regressors are jointly zero.
- Type:
float
- p_value_
P-value of the test statistic.
- Type:
float
- f_statistic_
F-statistic (or multivariate extension, see
rank_test()) with null hypothesis that the first-stage coefficient is of reduced rank.- Type:
float
- f_p_value_
P-value of the F-statistic (or multivariate extension).
- Type:
float
- fit(X, y, Z=None, C=None, *args, **kwargs)
Fit a summary.
If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_namesandZmust beNone. At least one one ofZ,instrument_names, andinstrument_regexmust be specified. Ifexogenous_namesorexogenous_regexare specified,Xmust be a pandas DataFrame containing columnsexogenous_namesandCmust beNone.- Parameters:
X (array-like, shape (n_samples, n_features)) – The training input samples. If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_names.y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If
instrument_namesorinstrument_regexare specified,Zmust beNone. IfZis specified,instrument_namesandinstrument_regexmust beNone.C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If
exogenous_namesorexogenous_regexare specified,Cmust beNone. IfCis specified,exogenous_namesandexogenous_regexmust beNone.
ivmodels.utils module
- ivmodels.utils.oproj(Z, *args)
Project f onto the subspace orthogonal to Z.
- Parameters:
Z (np.ndarray or pd.DataFrame of dimension (n, d_Z)) – The Z matrix. If None, returns np.zeros_like(f).
*args (np.ndarrays or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)) – Vector or matrices to project.
- Returns:
Projection of args onto the subspace orthogonal to Z. Same number of outputs as args. Same dimension as args. If args were pandas objects, the output will also be pandas objects with the same index and columns.
- Return type:
np.ndarray or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)
- ivmodels.utils.proj(Z, *args)
Project f onto the subspace spanned by Z.
- Parameters:
Z (np.ndarray or pd.DataFrame of dimension (n, d_Z)) – The Z matrix. If None, returns np.zeros_like(f).
*args (np.ndarrays or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)) – Vector or matrices to project.
- Returns:
Projection of args onto the subspace spanned by Z. Same number of outputs as args. Same dimension as args. If args were pandas objects, the output will also be pandas objects with the same index and columns.
- Return type:
np.ndarray or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)
- ivmodels.utils.to_numpy(*args)
Convert input args to a numpy array.
Module contents
- class ivmodels.KClass(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)
Bases:
KClassMixin,GeneralizedLinearRegressorK-class estimator for instrumental variable regression.
The k-class estimator with parameter \(\kappa\) is defined as
\[\begin{split}\hat\beta_\mathrm{k-class}(\kappa) &:= \arg\min_\beta \ (1 - \kappa) \| y - X \beta \|_2^2 + \kappa \|P_Z (y - X \beta) \|_2^2 \\ &= (X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X)^{-1} X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X) y,\end{split}\]where \(P_Z = Z (Z^T Z)^{-1} Z^T\) is the projection matrix onto the subspace spanned by \(Z\) and \(\mathrm{Id}\) is the identity matrix. This includes the the ordinary least-squares (OLS) estimator (\(\kappa = 0\)), the two-stage least-squares (2SLS) estimator (\(\kappa = 1\)), the limited information maximum likelihood (LIML) estimator (\(\kappa = \hat\kappa_\mathrm{LIML}\)), and the Fuller estimators (\(\kappa = \hat\kappa_\mathrm{LIML} - \alpha / (n - k)\)) as special cases.
Specifying exogenous included regressors \(C\) is equivalent to including them into both \(Z\) and \(X\).
- Parameters:
kappa (float or { "ols", "tsls", "2sls", "liml", "fuller", "fuller(a)"}) – The kappa parameter of the k-class estimator. If string, then must be one of
"ols","2sls","tsls","liml","fuller", or"fuller(a)", whereais numeric. Ifkappa="ols", thenkappa=0and the k-class estimator is the ordinary least squares estimator. Ifkappa="tsls"orkappa="2sls", thenkappa=1and the k-class estimator is the two-stage least-squares estimator. Ifkappa="liml", then \(\kappa = \hat\kappa_\mathrm{LIML}\) is used, where \(\kappa_\mathrm{LIML} \geq 1\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_Z (X \ \ y))^{-1} (X \ \ y)^T (X \ y)\), where \(P_Z\) is the projection matrix onto the subspace spanned by \(Z\) and \(M_Z = Id - P_Z\). If exogenous included regressors \(C\) are specified, then \(\kappa_\mathrm{LIML}\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_{[Z, C]} (X \ \ y))^{-1} (X \ \ y)^T M_C (X \ y)\). Ifkappa="fuller(a)", then \(\kappa = \hat\kappa_\mathrm{LIML} - a / (n - k - mc)\), where \(n\) is the number of observations and \(q = \mathrm{dim}(Z)\) is the number of instruments. The string"fuller"is interpreted as"fuller(1.0)", yielding an estimator that is unbiased up to \(O(1/n)\) [Fuller, 1977].instrument_names (str or list of str, optional) – The names of the columns in
Xthat should be used as instruments. RequiresXargument offitmethod to be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.instrument_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as instruments. RequiresXargument offitmethod to be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.exogenous_names (str or list of str, optional) – The names of the columns in
Xthat should be used as exogenous regressors. RequiresXargument offitmethod to be a pandas DataFrame. If bothexogenous_namesandexogenous_regexare specified, the union of the two is used.exogenous_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as exogenous regressors. RequiresXargument offitmethod to be a pandas DataFrame. If bothexogenous_namesandexogenous_regexare specified, the union of the two is used.alpha (float, optional, default=0) – Regularization parameter for elastic net regularization. Only implemented for \(\kappa \leq 1\).
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For
l1_ratio=0the penalty is an L2 penalty. Forl1_ratio=1it is an L1 penalty. Only implemented for \(\kappa \leq 1\).
- coef_
The estimated coefficients for the linear regression problem.
- Type:
array-like, shape (n_features,)
- intercept_
The estimated intercept for the linear regression problem.
- Type:
float
- kappa_
The numerical kappa parameter of the k-class estimator.
- Type:
float
- fuller_alpha_
If
kappais one of{"fuller", "fuller(a)", "liml"}for some numeric valuea, the alpha parameter of the Fuller estimator.- Type:
float
- ar_min_
If
kappais one of{"fuller", "fuller(a)", "liml"}for some numeric valuea, the minimum of the unnormalized Anderson Rubin statistic.- Type:
float
- kappa_liml_
If
kappais one of{"fuller", "fuller(a)", "liml"}for some numeric valuea, the kappa parameter of the LIML estimator, equal to1 + ar_min_.- Type:
float
- named_coef_
If
Xwas a pandas DataFrame, the estimated coefficients for the linear regression problem with the variable names as index.- Type:
array-like, shape (n_features,)
References
[Ful77]Wayne A Fuller. Some properties of a modification of the limited information estimator. Econometrica: Journal of the Econometric Society, pages 939–953, 1977.
- set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') KClass
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter infit.Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Zparameter infit.
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, C: bool | None | str = '$UNCHANGED$') KClass
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, context: bool | None | str = '$UNCHANGED$', offset: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') KClass
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
contextparameter inscore.offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
offsetparameter inscore.sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.
- Returns:
self – The updated object.
- Return type:
object