ivmodels package

Subpackages

Submodules

ivmodels.confidence_set module

class ivmodels.confidence_set.ConfidenceSet(boundaries)

Bases: object

A class to represent a 1D confidence set.

Parameters:

boundaries (list of 2-tuples of floats.) – The boundaries of the confidence set. The confidence set is the union of the intervals defined by the boundaries.

static from_quadric(quadric)

Create a 1D confidence set from a quadric.

is_empty()

Return True if the confidence set is empty.

is_finite()

Return True if the confidence set is finite.

length()

Return the length of the confidence set.

ivmodels.quadric module

class ivmodels.quadric.Quadric(A, b, c)

Bases: object

A class to represent a quadric \(x^T A x + b^T x + c \leq 0\).

Internally, works with a standardized form of the quadric. If \(V^T D V = A\) with \(D\) diagonal and \(V\) orthonormal, define \(x_\mathrm{center} := -A^{-1} b / 2\), \(\tilde x = V^T (x - x_\mathrm{center})\) and \(\tilde c = c - x_\mathrm{center}^T A x_\mathrm{center}\). Then, the standardized form is given by \(\tilde x^T D \tilde x + \tilde c <= 0\).

Parameters:
  • A (np.ndarray of dimension (n, n)) – The matrix A of the quadratic form.

  • b (np.ndarray of dimension (n,)) – The vector b of the quadratic form.

  • c (float) – The constant c of the quadratic form.

center

The center of the quadric. Equal to \(-A^{-1} b / 2\).

Type:

np.ndarray of dimension (n,)

c_standardized

The constant c of the standardized quadric. Equal to \(c - x_\mathrm{center}^T A x_\mathrm{center}\).

Type:

float

D

The diagonal of the matrix \(D\) in the eigenvalue decomposition \(V^T D V = A\).

Type:

np.ndarray of dimension (n,)

V

The matrix \(V\) in the eigenvalue decomposition \(V^T D V = A\).

Type:

np.ndarray of dimension (n, n)

dim()

Return the dimension of the quadric.

forward_map(x_tilde)

Map from the standardized space to the original space.

inverse_map(x)

Map from the original space to the standardized space.

is_bounded()

Return True if the quadric is bounded.

is_empty()

Return True if the quadric is empty.

project(coordinates)

Return the projection of the quadric onto coordinates.

For a quadric \((x - x_\mathrm{center})^T A (x - x_\mathrm{center}) + c \leq 0\) and any matrix \(B \in \mathbb{R}^{q \times p}\) of rank \(q\), the projection of the quadric onto the coordinates given by the columns of \(B\) is given by

\[(Bx - Bx_\mathrm{center})^T (B^T A^{-1} B)^{-1} (Bx - Bx_\mathrm{center}) + c \leq 0.\]

Here, \(B\) is given by coordinates, with \(B_{i, j} = 1\) if coordinates[i-1] == j and \(B_{i, j} = 0\) otherwise for \(i = 1, \ldots, q\) and \(j = 1, \ldots, p\).

Parameters:

coordinates (list of int) – The coordinates onto which to project the quadric. Entries must be unique and be between 0 and p - 1.

Returns:

The projection of the quadric onto the coordinates.

Return type:

Quadric

volume()

Return the volume of the quadric.

ivmodels.simulate module

ivmodels.simulate.simulate_gaussian_iv(n, *, mx, k, u=None, mw=0, mc=0, md=0, seed=0, include_intercept=True, return_beta=False, return_gamma=False)

Simulate a Gaussian IV dataset.

Parameters:
  • n (int) – Number of observations.

  • mx (int) – Number of endogenous variables.

  • k (int) – Number of instruments.

  • u (int, optional) – Number of unobserved variables. If None, defaults to mx.

  • mw (int, optional) – Number of endogenous variables not of interest.

  • mc (int, optional) – Number of exogenous included variables.

  • seed (int, optional) – Random seed.

  • include_intercept (bool, optional) – Whether to include an intercept.

  • return_beta (bool, optional) – Whether to return the true beta.

  • return_gamma (bool, optional) – Whether to return the true gamma.

Returns:

  • Z (np.ndarray of dimension (n, k)) – Instruments.

  • X (np.ndarray of dimension (n, mx)) – Endogenous variables.

  • y (np.ndarray of dimension (n,)) – Outcomes.

  • C (np.ndarray of dimension (n, mc)) – Exogenous included variables.

  • W (np.ndarray of dimension (n, mw)) – Endogenous variables not of interest.

  • beta (np.ndarray of dimension (mx,)) – True beta. Only returned if return_beta is True.

  • gamma (np.ndarray of dimension (mw,)) – True gamma. Only returned if return_gamma is True.

ivmodels.simulate.simulate_guggenberger12(n, *, k, seed=0, h11=100, h12=1, rho=0.95, cov=None, return_beta=False, md=0)

Generate data by process as proposed by Guggenberger et al. [2012].

Will generate data

\[X = Z \Pi_X + V_X W = Z \Pi_W + V_W y = X \beta + W \gamma + \epsilon\]

where \(\epsilon, V_X, V_W\) are jointly Gaussian with covariance matrix cov and Z is a matrix of independent centered Gaussian instruments.

Parameters:
  • n (int) – Number of observations.

  • k (int) – Number of instruments.

  • seed (int, optional, default 0) – Random seed.

  • h11 (float, optional, default 100) – Equal to \(\sqrt{n} || \Pi_X ||\).

  • h12 (float, optional, default 1) – Equal to \(\sqrt{n} || \Pi_W ||\).

  • rho (float, optional, default 0.95) – Equal to \(< \Pi_X, \Pi_W > / (|| \Pi_X || || \Pi_W ||)\).

  • cov (np.ndarray, optional, default None) – Covariance matrix of the noise. If None, defaults to [[1, 0, 0.95], [0, 1, 0.3], [0.95, 0.3, 1]].

  • return_beta (bool, optional, default False) – Whether to return the true beta.

Returns:

  • Z (np.ndarray of dimension (n, k)) – Instruments.

  • X (np.ndarray of dimension (n, 1)) – Endogenous variables.

  • y (np.ndarray of dimension (n,)) – Outcomes.

  • C (None) – Empty

  • W (np.ndarray of dimension (n, 1)) – Endogenous variables not of interest.

  • beta (np.ndarray of dimension (1,)) – True beta. Only returned if return_beta is True.

ivmodels.summary module

class ivmodels.summary.CoefficientTable(feature_names, estimates, statistics, p_values, confidence_sets)

Bases: object

Table with estimates, statistics, p-values, and confidence sets for each feature.

Parameters:
  • feature_names (list of str) – Names of the features.

  • estimates (list of float) – Estimates of the coefficients.

  • statistics (list of float) – Test statistics.

  • p_values (list of float) – P-values of the test statistics.

  • confidence_sets (list of ivmodels.confidence_set.ConfidenceSet) – Confidence sets for the coefficients.

class ivmodels.summary.Summary(kclass, test, alpha, feature_names=None)

Bases: object

Class containing summary statistics for a fitted model.

Parameters:
  • kclass (ivmodels.KClass or child class of ivmodels.models.kclass.KClassMixin) – Fitted model.

  • test (str) – Name of the test to be used. One of "wald", "anderson-rubin", "lagrange multiplier", "likelihood-ratio", or "conditional likelihood-ratio".

  • alpha (float) – Significance level \(\alpha\) for the confidence sets, e.g., 0.05. The confidence of the confidence set will be \(1 - \alpha\)

  • feature_names (list of str, optional) – Names of the features to be included in the summary. If not specified, all features will be included.

coefficient_table_

Table containing the estimates, test statistics, p-values, and confidence sets for each feature.

Type:

CoefficientTable

statistic_

Test statistic with null hypothesis that coefficients corresponding to the endogenous regressors are jointly zero.

Type:

float

p_value_

P-value of the test statistic.

Type:

float

f_statistic_

F-statistic (or multivariate extension, see rank_test()) with null hypothesis that the first-stage coefficient is of reduced rank.

Type:

float

f_p_value_

P-value of the F-statistic (or multivariate extension).

Type:

float

fit(X, y, Z=None, C=None, *args, **kwargs)

Fit a summary.

If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names and Z must be None. At least one one of Z, instrument_names, and instrument_regex must be specified. If exogenous_names or exogenous_regex are specified, X must be a pandas DataFrame containing columns exogenous_names and C must be None.

Parameters:
  • X (array-like, shape (n_samples, n_features)) – The training input samples. If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names.

  • y (array-like, shape (n_samples,)) – The target values.

  • Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If instrument_names or instrument_regex are specified, Z must be None. If Z is specified, instrument_names and instrument_regex must be None.

  • C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If exogenous_names or exogenous_regex are specified, C must be None. If C is specified, exogenous_names and exogenous_regex must be None.

ivmodels.utils module

ivmodels.utils.oproj(Z, *args)

Project f onto the subspace orthogonal to Z.

Parameters:
  • Z (np.ndarray or pd.DataFrame of dimension (n, d_Z)) – The Z matrix. If None, returns np.zeros_like(f).

  • *args (np.ndarrays or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)) – Vector or matrices to project.

Returns:

Projection of args onto the subspace orthogonal to Z. Same number of outputs as args. Same dimension as args. If args were pandas objects, the output will also be pandas objects with the same index and columns.

Return type:

np.ndarray or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)

ivmodels.utils.proj(Z, *args)

Project f onto the subspace spanned by Z.

Parameters:
  • Z (np.ndarray or pd.DataFrame of dimension (n, d_Z)) – The Z matrix. If None, returns np.zeros_like(f).

  • *args (np.ndarrays or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)) – Vector or matrices to project.

Returns:

Projection of args onto the subspace spanned by Z. Same number of outputs as args. Same dimension as args. If args were pandas objects, the output will also be pandas objects with the same index and columns.

Return type:

np.ndarray or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)

ivmodels.utils.to_numpy(*args)

Convert input args to a numpy array.

Module contents

class ivmodels.KClass(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)

Bases: KClassMixin, GeneralizedLinearRegressor

K-class estimator for instrumental variable regression.

The k-class estimator with parameter \(\kappa\) is defined as

\[\begin{split}\hat\beta_\mathrm{k-class}(\kappa) &:= \arg\min_\beta \ (1 - \kappa) \| y - X \beta \|_2^2 + \kappa \|P_Z (y - X \beta) \|_2^2 \\ &= (X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X)^{-1} X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X) y,\end{split}\]

where \(P_Z = Z (Z^T Z)^{-1} Z^T\) is the projection matrix onto the subspace spanned by \(Z\) and \(\mathrm{Id}\) is the identity matrix. This includes the the ordinary least-squares (OLS) estimator (\(\kappa = 0\)), the two-stage least-squares (2SLS) estimator (\(\kappa = 1\)), the limited information maximum likelihood (LIML) estimator (\(\kappa = \hat\kappa_\mathrm{LIML}\)), and the Fuller estimators (\(\kappa = \hat\kappa_\mathrm{LIML} - \alpha / (n - k)\)) as special cases.

Specifying exogenous included regressors \(C\) is equivalent to including them into both \(Z\) and \(X\).

Parameters:
  • kappa (float or { "ols", "tsls", "2sls", "liml", "fuller", "fuller(a)"}) – The kappa parameter of the k-class estimator. If string, then must be one of "ols", "2sls", "tsls", "liml", "fuller", or "fuller(a)", where a is numeric. If kappa="ols", then kappa=0 and the k-class estimator is the ordinary least squares estimator. If kappa="tsls" or kappa="2sls", then kappa=1 and the k-class estimator is the two-stage least-squares estimator. If kappa="liml", then \(\kappa = \hat\kappa_\mathrm{LIML}\) is used, where \(\kappa_\mathrm{LIML} \geq 1\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_Z (X \ \ y))^{-1} (X \ \ y)^T (X \ y)\), where \(P_Z\) is the projection matrix onto the subspace spanned by \(Z\) and \(M_Z = Id - P_Z\). If exogenous included regressors \(C\) are specified, then \(\kappa_\mathrm{LIML}\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_{[Z, C]} (X \ \ y))^{-1} (X \ \ y)^T M_C (X \ y)\). If kappa="fuller(a)", then \(\kappa = \hat\kappa_\mathrm{LIML} - a / (n - k - mc)\), where \(n\) is the number of observations and \(q = \mathrm{dim}(Z)\) is the number of instruments. The string "fuller" is interpreted as "fuller(1.0)", yielding an estimator that is unbiased up to \(O(1/n)\) [Fuller, 1977].

  • instrument_names (str or list of str, optional) – The names of the columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.

  • instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.

  • exogenous_names (str or list of str, optional) – The names of the columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.

  • exogenous_regex (str, optional) – A regex that is used to select columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.

  • alpha (float, optional, default=0) – Regularization parameter for elastic net regularization. Only implemented for \(\kappa \leq 1\).

  • l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For l1_ratio=0 the penalty is an L2 penalty. For l1_ratio=1 it is an L1 penalty. Only implemented for \(\kappa \leq 1\).

coef_

The estimated coefficients for the linear regression problem.

Type:

array-like, shape (n_features,)

intercept_

The estimated intercept for the linear regression problem.

Type:

float

kappa_

The numerical kappa parameter of the k-class estimator.

Type:

float

fuller_alpha_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the alpha parameter of the Fuller estimator.

Type:

float

ar_min_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the minimum of the unnormalized Anderson Rubin statistic.

Type:

float

kappa_liml_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the kappa parameter of the LIML estimator, equal to 1 + ar_min_.

Type:

float

named_coef_

If X was a pandas DataFrame, the estimated coefficients for the linear regression problem with the variable names as index.

Type:

array-like, shape (n_features,)

References

[Ful77]

Wayne A Fuller. Some properties of a modification of the limited information estimator. Econometrica: Journal of the Econometric Society, pages 939–953, 1977.

set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') KClass

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in fit.

  • Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, C: bool | None | str = '$UNCHANGED$') KClass

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, context: bool | None | str = '$UNCHANGED$', offset: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') KClass

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
  • context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for context parameter in score.

  • offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object