ivmodels package

Subpackages

Submodules

ivmodels.confidence_set module

class ivmodels.confidence_set.ConfidenceSet(boundaries)

Bases: object

A class to represent a 1D confidence set.

Parameters:: boundaries (list of 2-tuples of floats.) – The boundaries of the confidence set. The confidence set is the union of the intervals defined by the boundaries.

static from_quadric(quadric): Create a 1D confidence set from a quadric.

is_empty(): Return True if the confidence set is empty.

is_finite(): Return True if the confidence set is finite.

length(): Return the length of the confidence set.

ivmodels.quadric module

class ivmodels.quadric.Quadric(A, b, c)

Bases: object

A class to represent a quadric $x^T A x + b^T x + c \leq 0$.

Internally, works with a standardized form of the quadric. If $A = V D V^T$ with $D$ diagonal and $V$ orthonormal, define $x_\mathrm{center} := -A^{-1} b / 2$, $\tilde x = V^T (x - x_\mathrm{center})$ and $\tilde c = c - x_\mathrm{center}^T A x_\mathrm{center}$. Then, the standardized form is given by $\tilde x^T D \tilde x + \tilde c <= 0$.

Parameters:

A (np.ndarray of dimension (n, n)) – The matrix A of the quadratic form.
b (np.ndarray of dimension (n,)) – The vector b of the quadratic form.
c (float) – The constant c of the quadratic form.

center

The center of the quadric. Equal to $-A^{-1} b / 2$.

Type:: np.ndarray of dimension (n,)

c_standardized

The constant c of the standardized quadric. Equal to $c - x_\mathrm{center}^T A x_\mathrm{center}$.

Type:: float

D

The diagonal of the matrix $D$ in the eigenvalue decomposition $A = V D V^T$.

Type:: np.ndarray of dimension (n,)

V

The matrix $V$ in the eigenvalue decomposition $A = V D V^T$.

Type:: np.ndarray of dimension (n, n)

dim(): Return the dimension of the quadric.

forward_map(x_tilde): Map from the standardized space to the original space.

inverse_map(x): Map from the original space to the standardized space.

is_bounded(): Return True if the quadric is bounded.

is_empty(): Return True if the quadric is empty.

project(coordinates)

Return the projection of the quadric onto coordinates.

For a quadric $(x - x_\mathrm{center})^T A (x - x_\mathrm{center}) + c \leq 0$ and any matrix $B \in \mathbb{R}^{q \times p}$ of rank $q$, the projection of the quadric onto the coordinates given by the columns of $B$ is given by

\[(Bx - Bx_\mathrm{center})^T (B A^{-1} B^T)^{-1} (Bx - Bx_\mathrm{center}) + c \leq 0.\]

Here, $B$ is given by coordinates, with $B_{i, j} = 1$ if coordinates[i] == j and $B_{i, j} = 0$ otherwise for 0-indexed $i = 0, \ldots, q - 1$ and $j = 0, \ldots, p - 1$.

Parameters:: coordinates (list of int) – The coordinates onto which to project the quadric. Entries must be unique and be between 0 and p - 1.
Returns:: The projection of the quadric onto the coordinates.
Return type:: Quadric

volume(): Return the volume of the quadric.

ivmodels.simulate module

ivmodels.simulate.simulate_gaussian_iv(n, *, mx, k, u=None, mw=0, mc=0, md=0, seed=0, include_intercept=True, return_beta=False, return_gamma=False)

Simulate a Gaussian IV dataset.

Parameters:

n (int) – Number of observations.
mx (int) – Number of endogenous variables.
k (int) – Number of instruments.
u (int, optional) – Number of unobserved variables. If None, defaults to mx + mw.
mw (int, optional) – Number of endogenous variables not of interest.
mc (int, optional) – Number of exogenous included variables not of interest.
md (int, optional) – Number of exogenous included variables of interest.
seed (int, optional) – Random seed.
include_intercept (bool, optional) – Whether to include an intercept.
return_beta (bool, optional) – Whether to return the true beta.
return_gamma (bool, optional) – Whether to return the true gamma.

Returns:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Endogenous variables.
y (np.ndarray of dimension (n,)) – Outcomes.
C (np.ndarray of dimension (n, mc)) – Exogenous included variables not of interest.
W (np.ndarray of dimension (n, mw)) – Endogenous variables not of interest.
D (np.ndarray of dimension (n, md)) – Exogenous included variables of interest.
beta (np.ndarray of dimension (mx + md, 1)) – True coefficients of (X, D). Only returned if return_beta is True.
gamma (np.ndarray of dimension (mw, 1)) – True gamma. Only returned if return_gamma is True.

ivmodels.simulate.simulate_guggenberger12(n, *, k, seed=0, h11=100, h12=1, rho=0.95, cov=None, return_beta=False, md=0)

Generate data by process as proposed by Guggenberger et al. [2012].

Will generate data

\[X = Z \Pi_X + V_X W = Z \Pi_W + V_W y = X \beta + W \gamma + \epsilon\]

where $\epsilon, V_X, V_W$ are jointly Gaussian with covariance matrix cov and Z is a matrix of independent centered Gaussian instruments.

Parameters:

n (int) – Number of observations.
k (int) – Number of instruments.
seed (int, optional, default 0) – Random seed.
h11 (float, optional, default 100) – Equal to $\sqrt{n} || \Pi_X ||$.
h12 (float, optional, default 1) – Equal to $\sqrt{n} || \Pi_W ||$.
rho (float, optional, default 0.95) – Equal to $< \Pi_X, \Pi_W > / (|| \Pi_X || || \Pi_W ||)$.
cov (np.ndarray, optional, default None) – Covariance matrix of the noise. If None, defaults to [[1, 0, 0.95], [0, 1, 0.3], [0.95, 0.3, 1]].
return_beta (bool, optional, default False) – Whether to return the true beta.
md (int, optional, default 0) – Number of exogenous regressors of interest.

Returns:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, 1)) – Endogenous variables.
y (np.ndarray of dimension (n,)) – Outcomes.
C (None) – Empty
W (np.ndarray of dimension (n, 1)) – Endogenous variables not of interest.
D (np.ndarray of dimension (n, md)) – Exogenous regressors of interest.
beta (np.ndarray of dimension (1 + md,)) – True (beta, delta). Only returned if return_beta is True.

ivmodels.summary module

class ivmodels.summary.CoefficientTable(feature_names, estimates, statistics, p_values, confidence_sets)

Bases: object

Table with estimates, statistics, p-values, and confidence sets for each feature.

Parameters:

feature_names (list of str) – Names of the features.
estimates (list of float) – Estimates of the coefficients.
statistics (list of float) – Test statistics.
p_values (list of float) – P-values of the test statistics.
confidence_sets (list of ivmodels.confidence_set.ConfidenceSet) – Confidence sets for the coefficients.

class ivmodels.summary.Summary(kclass, test, alpha, feature_names=None)

Bases: object

Class containing summary statistics for a fitted model.

Parameters:

kclass (ivmodels.KClass or child class of ivmodels.models.kclass.KClassMixin) – Fitted model.
test (str) – Name of the test to be used. One of "wald", "anderson-rubin", "lagrange multiplier", "likelihood-ratio", or "conditional likelihood-ratio".
alpha (float) – Significance level $\alpha$ for the confidence sets, e.g., 0.05. The confidence of the confidence set will be $1 - \alpha$
feature_names (list of str, optional) – Names of the features to be included in the summary. If not specified, all features will be included.

coefficient_table_

Table containing the estimates, test statistics, p-values, and confidence sets for each feature.

Type:: CoefficientTable

statistic_

Test statistic with null hypothesis that coefficients corresponding to the endogenous regressors are jointly zero.

Type:: float

p_value_

P-value of the test statistic.

Type:: float

f_statistic_

F-statistic (or multivariate extension, see rank_test()) with null hypothesis that the first-stage coefficient is of reduced rank.

Type:: float

f_p_value_

P-value of the F-statistic (or multivariate extension).

Type:: float

fit(X, y, Z=None, C=None, *args, **kwargs)

Fit a summary.

If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names and Z must be None. At least one one of Z, instrument_names, and instrument_regex must be specified. If exogenous_names or exogenous_regex are specified, X must be a pandas DataFrame containing columns exogenous_names and C must be None.

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples. If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names.
y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If instrument_names or instrument_regex are specified, Z must be None. If Z is specified, instrument_names and instrument_regex must be None.
C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If exogenous_names or exogenous_regex are specified, C must be None. If C is specified, exogenous_names and exogenous_regex must be None.

ivmodels.utils module

ivmodels.utils.oproj(Z, *args)

Project f onto the subspace orthogonal to Z.

Parameters:

Z (np.ndarray or pd.DataFrame of dimension (n, d_Z)) – The Z matrix. If None, returns args unchanged.
*args (np.ndarrays or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)) – Vector or matrices to project.

Returns:

Projection of args onto the subspace orthogonal to Z. Same number of outputs as args. Same dimension as args. If args were pandas objects, the output will also be pandas objects with the same index and columns.

Return type:

np.ndarray or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)

ivmodels.utils.proj(Z, *args)

Project f onto the subspace spanned by Z.

Parameters:

Z (np.ndarray or pd.DataFrame of dimension (n, d_Z)) – The Z matrix. If None, returns np.zeros_like(f).
*args (np.ndarrays or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)) – Vector or matrices to project.

Returns:

Projection of args onto the subspace spanned by Z. Same number of outputs as args. Same dimension as args. If args were pandas objects, the output will also be pandas objects with the same index and columns.

Return type:

np.ndarray or pd.DataFrames or pd.Series of dimension (n, d_f) or (n,)

ivmodels.utils.to_numpy(*args): Convert input args to a numpy array.

Module contents

class ivmodels.KClass(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)

Bases: KClassMixin, GeneralizedLinearRegressor

K-class estimator for instrumental variable regression.

The k-class estimator with parameter $\kappa$ is defined as

\[\begin{split}\hat\beta_\mathrm{k-class}(\kappa) &:= \arg\min_\beta \ (1 - \kappa) \| y - X \beta \|_2^2 + \kappa \|P_Z (y - X \beta) \|_2^2 \\ &= (X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X)^{-1} X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) y,\end{split}\]

where $P_Z = Z (Z^T Z)^{-1} Z^T$ is the projection matrix onto the subspace spanned by $Z$ and $\mathrm{Id}$ is the identity matrix. This includes the the ordinary least-squares (OLS) estimator ($\kappa = 0$), the two-stage least-squares (2SLS) estimator ($\kappa = 1$), the limited information maximum likelihood (LIML) estimator ($\kappa = \hat\kappa_\mathrm{LIML}$), and the Fuller estimators ($\kappa = \hat\kappa_\mathrm{LIML} - \alpha / (n - k - m_C)$) as special cases.

Specifying exogenous included regressors $C$ is equivalent to including them into both $Z$ and $X$.

Parameters:

kappa (float or { "ols", "tsls", "2sls", "liml", "fuller", "fuller(a)"}) – The kappa parameter of the k-class estimator. If string, then must be one of "ols", "2sls", "tsls", "liml", "fuller", or "fuller(a)", where a is numeric. If kappa="ols", then kappa=0 and the k-class estimator is the ordinary least squares estimator. If kappa="tsls" or kappa="2sls", then kappa=1 and the k-class estimator is the two-stage least-squares estimator. If kappa="liml", then $\kappa = \hat\kappa_\mathrm{LIML}$ is used, where $\kappa_\mathrm{LIML} \geq 1$ is the smallest eigenvalue of the matrix $((X \ \ y)^T M_Z (X \ \ y))^{-1} (X \ \ y)^T (X \ y)$, where $P_Z$ is the projection matrix onto the subspace spanned by $Z$ and $M_Z = Id - P_Z$. If exogenous included regressors $C$ are specified, then $\kappa_\mathrm{LIML}$ is the smallest eigenvalue of the matrix $((X \ \ y)^T M_{[Z, C]} (X \ \ y))^{-1} (X \ \ y)^T M_C (X \ y)$. If kappa="fuller(a)", then $\kappa = \hat\kappa_\mathrm{LIML} - a / (n - k - m_C)$, where $n$ is the number of observations, $k = \mathrm{dim}(Z)$ is the number of instruments, and $m_C = \mathrm{dim}(C)$ is the number of exogenous included regressors (plus one if fit_intercept=True). The string "fuller" is interpreted as "fuller(1.0)", yielding an estimator that is unbiased up to $O(1/n)$ [Fuller, 1977].
instrument_names (str or list of str, optional) – The names of the columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
exogenous_names (str or list of str, optional) – The names of the columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.
exogenous_regex (str, optional) – A regex that is used to select columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.
alpha (float, optional, default=0) – Regularization parameter for elastic net regularization. Only implemented for $\kappa \leq 1$.
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For l1_ratio=0 the penalty is an L2 penalty. For l1_ratio=1 it is an L1 penalty. Only implemented for $\kappa \leq 1$.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept.

coef_

The estimated coefficients for the linear regression problem.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept for the linear regression problem.

Type:: float

kappa_

The numerical kappa parameter of the k-class estimator.

Type:: float

fuller_alpha_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the alpha parameter of the Fuller estimator.

Type:: float

ar_min_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the minimum of the unnormalized Anderson Rubin statistic.

Type:: float

kappa_liml_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the kappa parameter of the LIML estimator, equal to 1 + ar_min_.

Type:: float

named_coef_

If X was a pandas DataFrame, the estimated coefficients for the linear regression problem with the variable names as index.

Type:: array-like, shape (n_features,)

References

[Ful77]

Wayne A Fuller. Some properties of a modification of the limited information estimator. Econometrica: Journal of the Econometric Society, pages 939–953, 1977.

set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') → KClass

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in fit.
Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, C: bool | None | str = '$UNCHANGED$') → KClass

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in predict.
Returns:: self – The updated object.
Return type:: object

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for context parameter in score.
offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object