ivmodels.models package

Submodules

ivmodels.models.anchor_regression module

class ivmodels.models.anchor_regression.AnchorMixin(gamma=1, instrument_names=None, instrument_regex=None, *args, **kwargs)

Bases: KClassMixin

Mixin class for anchor regression.

property gamma

class ivmodels.models.anchor_regression.AnchorRegression(gamma=1, instrument_names=None, instrument_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)

Bases: AnchorMixin, GeneralizedLinearRegressor

Linear regression with anchor regularization [Rothenhäusler et al., 2021].

The anchor regression estimator with parameter $\gamma$ is defined as

\[\hat\beta_\mathrm{anchor}(\gamma) := \arg\min_\beta \ \| y - X \beta \|_2^2 + (\gamma - 1) \|P_Z (y - X \beta) \|_2^2.\]

If $\gamma > 0$, then $\hat\beta_\mathrm{anchor}(\gamma) = \hat\beta_\mathrm{k-class}((\gamma - 1) / \gamma)$.

The optimization is based on OLS after a data transformation. First standardizes X and y by subtracting the column means as proposed by Rothenhäusler et al. [2021]. Consequently, no anchor regularization is applied to the intercept.

Parameters:

gamma (float) – The anchor regularization parameter. gamma=1 corresponds to OLS.
instrument_names (str or list of str, optional) – The names of the columns in X that should be used as instruments (anchors). Requires X to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as instruments (anchors). Requires X to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
alpha (float, optional, default=0) – Regularization parameter for elastic net regularization.
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For l1_ratio=0 the penalty is an L2 penalty. For l1_ratio=1 it is an L1 penalty.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept.

coef_

The estimated coefficients for the linear regression problem.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept for the linear regression problem.

Type:: float

kappa_

The kappa parameter of the corresponding k-class estimator.

Type:: float

References

[RMBP21] (1,2,3,4)

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. Anchor regression: heterogeneous data meet causality. Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021.

set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') → AnchorRegression

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in fit.
Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, C: bool | None | str = '$UNCHANGED$') → AnchorRegression

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in predict.
Returns:: self – The updated object.
Return type:: object

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for context parameter in score.
offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

ivmodels.models.kclass module

class ivmodels.models.kclass.KClass(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)

Bases: KClassMixin, GeneralizedLinearRegressor

K-class estimator for instrumental variable regression.

The k-class estimator with parameter $\kappa$ is defined as

\[\begin{split}\hat\beta_\mathrm{k-class}(\kappa) &:= \arg\min_\beta \ (1 - \kappa) \| y - X \beta \|_2^2 + \kappa \|P_Z (y - X \beta) \|_2^2 \\ &= (X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X)^{-1} X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) y,\end{split}\]

where $P_Z = Z (Z^T Z)^{-1} Z^T$ is the projection matrix onto the subspace spanned by $Z$ and $\mathrm{Id}$ is the identity matrix. This includes the the ordinary least-squares (OLS) estimator ($\kappa = 0$), the two-stage least-squares (2SLS) estimator ($\kappa = 1$), the limited information maximum likelihood (LIML) estimator ($\kappa = \hat\kappa_\mathrm{LIML}$), and the Fuller estimators ($\kappa = \hat\kappa_\mathrm{LIML} - \alpha / (n - k - m_C)$) as special cases.

Specifying exogenous included regressors $C$ is equivalent to including them into both $Z$ and $X$.

Parameters:

kappa (float or { "ols", "tsls", "2sls", "liml", "fuller", "fuller(a)"}) – The kappa parameter of the k-class estimator. If string, then must be one of "ols", "2sls", "tsls", "liml", "fuller", or "fuller(a)", where a is numeric. If kappa="ols", then kappa=0 and the k-class estimator is the ordinary least squares estimator. If kappa="tsls" or kappa="2sls", then kappa=1 and the k-class estimator is the two-stage least-squares estimator. If kappa="liml", then $\kappa = \hat\kappa_\mathrm{LIML}$ is used, where $\kappa_\mathrm{LIML} \geq 1$ is the smallest eigenvalue of the matrix $((X \ \ y)^T M_Z (X \ \ y))^{-1} (X \ \ y)^T (X \ y)$, where $P_Z$ is the projection matrix onto the subspace spanned by $Z$ and $M_Z = Id - P_Z$. If exogenous included regressors $C$ are specified, then $\kappa_\mathrm{LIML}$ is the smallest eigenvalue of the matrix $((X \ \ y)^T M_{[Z, C]} (X \ \ y))^{-1} (X \ \ y)^T M_C (X \ y)$. If kappa="fuller(a)", then $\kappa = \hat\kappa_\mathrm{LIML} - a / (n - k - m_C)$, where $n$ is the number of observations, $k = \mathrm{dim}(Z)$ is the number of instruments, and $m_C = \mathrm{dim}(C)$ is the number of exogenous included regressors (plus one if fit_intercept=True). The string "fuller" is interpreted as "fuller(1.0)", yielding an estimator that is unbiased up to $O(1/n)$ [Fuller, 1977].
instrument_names (str or list of str, optional) – The names of the columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
exogenous_names (str or list of str, optional) – The names of the columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.
exogenous_regex (str, optional) – A regex that is used to select columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.
alpha (float, optional, default=0) – Regularization parameter for elastic net regularization. Only implemented for $\kappa \leq 1$.
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For l1_ratio=0 the penalty is an L2 penalty. For l1_ratio=1 it is an L1 penalty. Only implemented for $\kappa \leq 1$.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept.

coef_

The estimated coefficients for the linear regression problem.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept for the linear regression problem.

Type:: float

kappa_

The numerical kappa parameter of the k-class estimator.

Type:: float

fuller_alpha_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the alpha parameter of the Fuller estimator.

Type:: float

ar_min_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the minimum of the unnormalized Anderson Rubin statistic.

Type:: float

kappa_liml_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the kappa parameter of the LIML estimator, equal to 1 + ar_min_.

Type:: float

named_coef_

If X was a pandas DataFrame, the estimated coefficients for the linear regression problem with the variable names as index.

Type:: array-like, shape (n_features,)

References

[Ful77] (1,2)

Wayne A Fuller. Some properties of a modification of the limited information estimator. Econometrica: Journal of the Econometric Society, pages 939–953, 1977.

set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') → KClass

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in fit.
Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, C: bool | None | str = '$UNCHANGED$') → KClass

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in predict.
Returns:: self – The updated object.
Return type:: object

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for context parameter in score.
offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class ivmodels.models.kclass.KClassMixin(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, *args, **kwargs)

Bases: object

Mixin class for k-class estimators.

static ar_min(X, y, Z=None, X_proj=None, y_proj=None)

Compute the minimum of the unnormalized Anderson Rubin statistic.

Computes

\[\begin{split}&\min_{\beta} \frac{(y - X \beta)^T P_Z (y - X \beta)}{(y - X \beta)^T M_Z (y - X \beta)} \\ &=\lambda_\mathrm{min}(((X y)^T M_Z (X y))^{-1} (X y)^T P_Z (X y)),\end{split}\]

where $P_Z$ is the projection matrix onto the subspace spanned by $Z$ and $M_Z = \mathrm{Id} - P_Z$.

Either Z or both X_proj and y_proj must be specified.

Parameters:

X (np.ndarray of dimension (n, mx)) – Possibly endogenous regressors.
y (np.ndarray of dimension (n,)) – Outcome.
Z (np.ndarray of dimension (n, k), optional, default=None.) – Instruments.
X_proj (np.ndarray of dimension (n, mx), optional, default=None.) – Projection of X onto the subspace spanned by Z.
y_proj (np.ndarray of dimension (n,), optional, default=None.) – Projection of y onto the subspace spanned by Z.

Returns:

ar_min – The smallest eigenvalue of $((X y)^T M_Z (X y))^{-1} (X y)^T P_Z (X y)$, where $P_Z$ is the projection matrix onto the subspace spanned by Z.

Return type:

float

fit(X, y, Z=None, C=None, *args, **kwargs)

Fit a k-class or anchor regression estimator.

If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names and Z must be None. At least one one of Z, instrument_names, and instrument_regex must be specified. If exogenous_names or exogenous_regex are specified, X must be a pandas DataFrame containing columns exogenous_names and C must be None.

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples. If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names.
y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If instrument_names or instrument_regex are specified, Z must be None. If Z is specified, instrument_names and instrument_regex must be None.
C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If exogenous_names or exogenous_regex are specified, C must be None. If C is specified, exogenous_names and exogenous_regex must be None.

property named_coefs_

predict(X, C=None, *args, **kwargs)

summary(X, y, Z=None, C=None, test='wald', alpha=0.05, feature_names=None, **kwargs)

Create Summary object for the fitted model.

This contains the fitted values (estimates), subvector test statistics for each parameter, corresponding p-values, and confidence sets.

Parameters:

X (array-like, shape (n_samples, n_features)) – The input data.
y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If instrument_names or instrument_regex are specified, Z must be None. If Z is specified, instrument_names and instrument_regex must be None.
C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If exogenous_names or exogenous_regex are specified, C must be None. If C is specified, exogenous_names and exogenous_regex must be None.
test (str, optional, default="wald (liml)") – The test to use. Must be one of “wald”, “anderson-rubin”, “lagrange multiplier”, “likelihood-ratio”, or “conditional likelihood-ratio”.
alpha (float, optional, default=0.05) – The significance level.
feature_names (list of str, optional) – Names of the features to be included in the summary. If not specified, all features will be included.
**kwargs – Additional keyword arguments to pass to the test and its inversion.

ivmodels.models.pulse module

class ivmodels.models.pulse.PULSE(instrument_names=None, instrument_regex=None, p_min=0.05, rtol=0.01, kappa_max=1, alpha=0, l1_ratio=0)

Bases: PULSEMixin, KClass

p-uncorrelated least squares estimator (PULSE) [Jakobsen and Peters, 2022].

Perform k-class estimation with k-class parameter $\kappa \in [0, \kappa_\mathrm{max}]$ chosen minimally such that the PULSE test of correlation between the instruments and the residuals is not significant at level p_min.

Parameters:

instrument_names (str or list of str, optional) – The names of the columns in X that should be used as anchors. Requires X to be a pandas DataFrame.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as anchors. Requires X to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
p_min (float, optional, default = 0.05) – The p-value of the PULSE test that is used to determine the k-class parameter $\kappa$. The PULSE will search for the smallest $\kappa$ that makes the test not significant at level p_min with binary search.
rtol (float, optional, default = 0.01) – The relative tolerance of the binary search. The PULSE will search for a $\kappa$ such that the PULSE test is not significant at level p_min with binary search but is significant at level p_min * (1 + rtol).
kappa_max (float, optional, default = 1) – The maximum value of kappa to consider. The PULSE will search for the smallest kappa that makes the test not significant at level p_min with binary search. If kappa_max = 1, the PULSE will run a regression equivalent to two-stage-least-squares. If alpha = 0 and Z.shape[1] < X.shape[1], this is not well-defined and the PULSE will raise an exception.
alpha (float, optional, default = 0) – The regularization parameter for elastic net. If alpha is 0, the estimator is unregularized.
l1_ratio (float, optional, default = 0) – The ratio of L1 to L2 regularization for elastic net. If l1_ratio is 1, the estimator is Lasso. If l1_ratio is 0, the estimator is Ridge.

coef_

The estimated coefficients.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept.

Type:: float

kappa_

The estimated kappa.

Type:: float

References

[JP22] (1,2)

Martin Emil Jakobsen and Jonas Peters. Distributional robustness of k-class estimators and the PULSE. The Econometrics Journal, 25(2):404–432, 2022.

set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') → PULSE

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in fit.
Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, C: bool | None | str = '$UNCHANGED$') → PULSE

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in predict.
Returns:: self – The updated object.
Return type:: object

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for context parameter in score.
offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class ivmodels.models.pulse.PULSEMixin(p_min=0.05, rtol=0.01, kappa_max=1, **kwargs)

Bases: object

Mixin class for PULSE estimators.

fit(X, y, Z=None, C=None, *args, **kwargs)

Fit a p-uncorrelated least squares estimator (PULSE).

If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names and Z must be None. At least one of Z, instrument_names, and instrument_regex must be specified.

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples. If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values.
Z (array-like, shape (n_samples, n_anchors), optional) – The instrument (anchor) values. If instrument_names or instrument_regex are specified, Z must be None. If Z is specified, instrument_names and instrument_regex must be None.
C (array-like, shape (n_samples, n_exogenous), optional) – Exogenous included variables. Must be None or have zero columns, as PULSE does not support exogenous included variables.

Module contents

class ivmodels.models.AnchorRegression(gamma=1, instrument_names=None, instrument_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)

Bases: AnchorMixin, GeneralizedLinearRegressor

Linear regression with anchor regularization [Rothenhäusler et al., 2021].

The anchor regression estimator with parameter $\gamma$ is defined as

\[\hat\beta_\mathrm{anchor}(\gamma) := \arg\min_\beta \ \| y - X \beta \|_2^2 + (\gamma - 1) \|P_Z (y - X \beta) \|_2^2.\]

If $\gamma > 0$, then $\hat\beta_\mathrm{anchor}(\gamma) = \hat\beta_\mathrm{k-class}((\gamma - 1) / \gamma)$.

The optimization is based on OLS after a data transformation. First standardizes X and y by subtracting the column means as proposed by Rothenhäusler et al. [2021]. Consequently, no anchor regularization is applied to the intercept.

Parameters:

gamma (float) – The anchor regularization parameter. gamma=1 corresponds to OLS.
instrument_names (str or list of str, optional) – The names of the columns in X that should be used as instruments (anchors). Requires X to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as instruments (anchors). Requires X to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
alpha (float, optional, default=0) – Regularization parameter for elastic net regularization.
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For l1_ratio=0 the penalty is an L2 penalty. For l1_ratio=1 it is an L1 penalty.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept.

coef_

The estimated coefficients for the linear regression problem.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept for the linear regression problem.

Type:: float

kappa_

The kappa parameter of the corresponding k-class estimator.

Type:: float

References

[RMBP21] (1,2,3,4)

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. Anchor regression: heterogeneous data meet causality. Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021.

set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') → AnchorRegression

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in fit.
Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, C: bool | None | str = '$UNCHANGED$') → AnchorRegression

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in predict.
Returns:: self – The updated object.
Return type:: object

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for context parameter in score.
offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class ivmodels.models.KClass(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)

Bases: KClassMixin, GeneralizedLinearRegressor

K-class estimator for instrumental variable regression.

The k-class estimator with parameter $\kappa$ is defined as

\[\begin{split}\hat\beta_\mathrm{k-class}(\kappa) &:= \arg\min_\beta \ (1 - \kappa) \| y - X \beta \|_2^2 + \kappa \|P_Z (y - X \beta) \|_2^2 \\ &= (X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X)^{-1} X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) y,\end{split}\]

where $P_Z = Z (Z^T Z)^{-1} Z^T$ is the projection matrix onto the subspace spanned by $Z$ and $\mathrm{Id}$ is the identity matrix. This includes the the ordinary least-squares (OLS) estimator ($\kappa = 0$), the two-stage least-squares (2SLS) estimator ($\kappa = 1$), the limited information maximum likelihood (LIML) estimator ($\kappa = \hat\kappa_\mathrm{LIML}$), and the Fuller estimators ($\kappa = \hat\kappa_\mathrm{LIML} - \alpha / (n - k - m_C)$) as special cases.

Specifying exogenous included regressors $C$ is equivalent to including them into both $Z$ and $X$.

Parameters:

kappa (float or { "ols", "tsls", "2sls", "liml", "fuller", "fuller(a)"}) – The kappa parameter of the k-class estimator. If string, then must be one of "ols", "2sls", "tsls", "liml", "fuller", or "fuller(a)", where a is numeric. If kappa="ols", then kappa=0 and the k-class estimator is the ordinary least squares estimator. If kappa="tsls" or kappa="2sls", then kappa=1 and the k-class estimator is the two-stage least-squares estimator. If kappa="liml", then $\kappa = \hat\kappa_\mathrm{LIML}$ is used, where $\kappa_\mathrm{LIML} \geq 1$ is the smallest eigenvalue of the matrix $((X \ \ y)^T M_Z (X \ \ y))^{-1} (X \ \ y)^T (X \ y)$, where $P_Z$ is the projection matrix onto the subspace spanned by $Z$ and $M_Z = Id - P_Z$. If exogenous included regressors $C$ are specified, then $\kappa_\mathrm{LIML}$ is the smallest eigenvalue of the matrix $((X \ \ y)^T M_{[Z, C]} (X \ \ y))^{-1} (X \ \ y)^T M_C (X \ y)$. If kappa="fuller(a)", then $\kappa = \hat\kappa_\mathrm{LIML} - a / (n - k - m_C)$, where $n$ is the number of observations, $k = \mathrm{dim}(Z)$ is the number of instruments, and $m_C = \mathrm{dim}(C)$ is the number of exogenous included regressors (plus one if fit_intercept=True). The string "fuller" is interpreted as "fuller(1.0)", yielding an estimator that is unbiased up to $O(1/n)$ [Fuller, 1977].
instrument_names (str or list of str, optional) – The names of the columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
exogenous_names (str or list of str, optional) – The names of the columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.
exogenous_regex (str, optional) – A regex that is used to select columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.
alpha (float, optional, default=0) – Regularization parameter for elastic net regularization. Only implemented for $\kappa \leq 1$.
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For l1_ratio=0 the penalty is an L2 penalty. For l1_ratio=1 it is an L1 penalty. Only implemented for $\kappa \leq 1$.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept.

coef_

The estimated coefficients for the linear regression problem.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept for the linear regression problem.

Type:: float

kappa_

The numerical kappa parameter of the k-class estimator.

Type:: float

fuller_alpha_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the alpha parameter of the Fuller estimator.

Type:: float

ar_min_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the minimum of the unnormalized Anderson Rubin statistic.

Type:: float

kappa_liml_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the kappa parameter of the LIML estimator, equal to 1 + ar_min_.

Type:: float

named_coef_

If X was a pandas DataFrame, the estimated coefficients for the linear regression problem with the variable names as index.

Type:: array-like, shape (n_features,)

References

[Ful77] (1,2)

Wayne A Fuller. Some properties of a modification of the limited information estimator. Econometrica: Journal of the Econometric Society, pages 939–953, 1977.

set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') → KClass

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in fit.
Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, C: bool | None | str = '$UNCHANGED$') → KClass

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in predict.
Returns:: self – The updated object.
Return type:: object

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for context parameter in score.
offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class ivmodels.models.PULSE(instrument_names=None, instrument_regex=None, p_min=0.05, rtol=0.01, kappa_max=1, alpha=0, l1_ratio=0)

Bases: PULSEMixin, KClass

p-uncorrelated least squares estimator (PULSE) [Jakobsen and Peters, 2022].

Perform k-class estimation with k-class parameter $\kappa \in [0, \kappa_\mathrm{max}]$ chosen minimally such that the PULSE test of correlation between the instruments and the residuals is not significant at level p_min.

Parameters:

instrument_names (str or list of str, optional) – The names of the columns in X that should be used as anchors. Requires X to be a pandas DataFrame.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as anchors. Requires X to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
p_min (float, optional, default = 0.05) – The p-value of the PULSE test that is used to determine the k-class parameter $\kappa$. The PULSE will search for the smallest $\kappa$ that makes the test not significant at level p_min with binary search.
rtol (float, optional, default = 0.01) – The relative tolerance of the binary search. The PULSE will search for a $\kappa$ such that the PULSE test is not significant at level p_min with binary search but is significant at level p_min * (1 + rtol).
kappa_max (float, optional, default = 1) – The maximum value of kappa to consider. The PULSE will search for the smallest kappa that makes the test not significant at level p_min with binary search. If kappa_max = 1, the PULSE will run a regression equivalent to two-stage-least-squares. If alpha = 0 and Z.shape[1] < X.shape[1], this is not well-defined and the PULSE will raise an exception.
alpha (float, optional, default = 0) – The regularization parameter for elastic net. If alpha is 0, the estimator is unregularized.
l1_ratio (float, optional, default = 0) – The ratio of L1 to L2 regularization for elastic net. If l1_ratio is 1, the estimator is Lasso. If l1_ratio is 0, the estimator is Ridge.

coef_

The estimated coefficients.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept.

Type:: float

kappa_

The estimated kappa.

Type:: float

References

[JP22] (1,2)

Martin Emil Jakobsen and Jonas Peters. Distributional robustness of k-class estimators and the PULSE. The Econometrics Journal, 25(2):404–432, 2022.

set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') → PULSE

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in fit.
Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for Z parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, C: bool | None | str = '$UNCHANGED$') → PULSE

Configure whether metadata should be requested to be passed to the predict method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for C parameter in predict.
Returns:: self – The updated object.
Return type:: object

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for context parameter in score.
offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for offset parameter in score.
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object