ivmodels.models package
Submodules
ivmodels.models.anchor_regression module
- class ivmodels.models.anchor_regression.AnchorMixin(gamma=1, instrument_names=None, instrument_regex=None, *args, **kwargs)
Bases:
KClassMixinMixin class for anchor regression.
- property gamma
- class ivmodels.models.anchor_regression.AnchorRegression(gamma=1, instrument_names=None, instrument_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)
Bases:
AnchorMixin,GeneralizedLinearRegressorLinear regression with anchor regularization [Rothenhäusler et al., 2021].
The anchor regression estimator with parameter \(\gamma\) is defined as
\[\hat\beta_\mathrm{anchor}(\gamma) := \arg\min_\beta \ \| y - X \beta \|_2^2 + (\gamma - 1) \|P_Z (y - X \beta) \|_2^2.\]If \(\gamma \geq 0\), then \(\hat\beta_\mathrm{anchor}(\gamma) = \hat\beta_\mathrm{k-class}((\gamma - 1) / \gamma)\).
The optimization is based on OLS after a data transformation. First standardizes
Xandyby subtracting the column means as proposed by Rothenhäusler et al. [2021]. Consequently, no anchor regularization is applied to the intercept.- Parameters:
gamma (float) – The anchor regularization parameter.
gamma=1corresponds to OLS.instrument_names (str or list of str, optional) – The names of the columns in
Xthat should be used as instruments (anchors). RequiresXto be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.instrument_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as instruments (anchors). RequiresXto be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.alpha (float, optional, default=0) – Regularization parameter for elastic net regularization.
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For
l1_ratio=0the penalty is an L2 penalty. Forl1_ratio=1it is an L1 penalty.
- coef_
The estimated coefficients for the linear regression problem.
- Type:
array-like, shape (n_features,)
- intercept_
The estimated intercept for the linear regression problem.
- Type:
float
- kappa_
The kappa parameter of the corresponding k-class estimator.
- Type:
float
References
- set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') AnchorRegression
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter infit.Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Zparameter infit.
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, C: bool | None | str = '$UNCHANGED$') AnchorRegression
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, context: bool | None | str = '$UNCHANGED$', offset: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') AnchorRegression
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
contextparameter inscore.offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
offsetparameter inscore.sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.
- Returns:
self – The updated object.
- Return type:
object
ivmodels.models.kclass module
- class ivmodels.models.kclass.KClass(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)
Bases:
KClassMixin,GeneralizedLinearRegressorK-class estimator for instrumental variable regression.
The k-class estimator with parameter \(\kappa\) is defined as
\[\begin{split}\hat\beta_\mathrm{k-class}(\kappa) &:= \arg\min_\beta \ (1 - \kappa) \| y - X \beta \|_2^2 + \kappa \|P_Z (y - X \beta) \|_2^2 \\ &= (X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X)^{-1} X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X) y,\end{split}\]where \(P_Z = Z (Z^T Z)^{-1} Z^T\) is the projection matrix onto the subspace spanned by \(Z\) and \(\mathrm{Id}\) is the identity matrix. This includes the the ordinary least-squares (OLS) estimator (\(\kappa = 0\)), the two-stage least-squares (2SLS) estimator (\(\kappa = 1\)), the limited information maximum likelihood (LIML) estimator (\(\kappa = \hat\kappa_\mathrm{LIML}\)), and the Fuller estimators (\(\kappa = \hat\kappa_\mathrm{LIML} - \alpha / (n - k)\)) as special cases.
Specifying exogenous included regressors \(C\) is equivalent to including them into both \(Z\) and \(X\).
- Parameters:
kappa (float or { "ols", "tsls", "2sls", "liml", "fuller", "fuller(a)"}) – The kappa parameter of the k-class estimator. If string, then must be one of
"ols","2sls","tsls","liml","fuller", or"fuller(a)", whereais numeric. Ifkappa="ols", thenkappa=0and the k-class estimator is the ordinary least squares estimator. Ifkappa="tsls"orkappa="2sls", thenkappa=1and the k-class estimator is the two-stage least-squares estimator. Ifkappa="liml", then \(\kappa = \hat\kappa_\mathrm{LIML}\) is used, where \(\kappa_\mathrm{LIML} \geq 1\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_Z (X \ \ y))^{-1} (X \ \ y)^T (X \ y)\), where \(P_Z\) is the projection matrix onto the subspace spanned by \(Z\) and \(M_Z = Id - P_Z\). If exogenous included regressors \(C\) are specified, then \(\kappa_\mathrm{LIML}\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_{[Z, C]} (X \ \ y))^{-1} (X \ \ y)^T M_C (X \ y)\). Ifkappa="fuller(a)", then \(\kappa = \hat\kappa_\mathrm{LIML} - a / (n - k - mc)\), where \(n\) is the number of observations and \(q = \mathrm{dim}(Z)\) is the number of instruments. The string"fuller"is interpreted as"fuller(1.0)", yielding an estimator that is unbiased up to \(O(1/n)\) [Fuller, 1977].instrument_names (str or list of str, optional) – The names of the columns in
Xthat should be used as instruments. RequiresXargument offitmethod to be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.instrument_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as instruments. RequiresXargument offitmethod to be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.exogenous_names (str or list of str, optional) – The names of the columns in
Xthat should be used as exogenous regressors. RequiresXargument offitmethod to be a pandas DataFrame. If bothexogenous_namesandexogenous_regexare specified, the union of the two is used.exogenous_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as exogenous regressors. RequiresXargument offitmethod to be a pandas DataFrame. If bothexogenous_namesandexogenous_regexare specified, the union of the two is used.alpha (float, optional, default=0) – Regularization parameter for elastic net regularization. Only implemented for \(\kappa \leq 1\).
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For
l1_ratio=0the penalty is an L2 penalty. Forl1_ratio=1it is an L1 penalty. Only implemented for \(\kappa \leq 1\).
- coef_
The estimated coefficients for the linear regression problem.
- Type:
array-like, shape (n_features,)
- intercept_
The estimated intercept for the linear regression problem.
- Type:
float
- kappa_
The numerical kappa parameter of the k-class estimator.
- Type:
float
- fuller_alpha_
If
kappais one of{"fuller", "fuller(a)", "liml"}for some numeric valuea, the alpha parameter of the Fuller estimator.- Type:
float
- ar_min_
If
kappais one of{"fuller", "fuller(a)", "liml"}for some numeric valuea, the minimum of the unnormalized Anderson Rubin statistic.- Type:
float
- kappa_liml_
If
kappais one of{"fuller", "fuller(a)", "liml"}for some numeric valuea, the kappa parameter of the LIML estimator, equal to1 + ar_min_.- Type:
float
- named_coef_
If
Xwas a pandas DataFrame, the estimated coefficients for the linear regression problem with the variable names as index.- Type:
array-like, shape (n_features,)
References
- set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') KClass
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter infit.Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Zparameter infit.
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, C: bool | None | str = '$UNCHANGED$') KClass
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, context: bool | None | str = '$UNCHANGED$', offset: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') KClass
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
contextparameter inscore.offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
offsetparameter inscore.sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.
- Returns:
self – The updated object.
- Return type:
object
- class ivmodels.models.kclass.KClassMixin(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, *args, **kwargs)
Bases:
objectMixin class for k-class estimators.
- static ar_min(X, y, Z=None, X_proj=None, y_proj=None)
Compute the minimum of the unnormalized Anderson Rubin statistic.
Computes
\[\begin{split}&\min_{\beta} \frac{(y - X \beta)^T P_Z (y - X \beta)}{(y - X \beta)^T M_Z (y - X \beta)} \\ &=\lambda_\mathrm{min}(((X y)^T M_Z (X y))^{-1} (X y)^T P_Z (X y)),\end{split}\]where \(P_Z\) is the projection matrix onto the subspace spanned by \(Z\) and \(M_Z = \mathrm{Id} - P_Z\).
Either
Zor bothX_projandy_projmust be specified.- Parameters:
X (np.ndarray of dimension (n, mx)) – Possibly endogenous regressors.
y (np.ndarray of dimension (n,)) – Outcome.
Z (np.ndarray of dimension (n, k), optional, default=None.) – Instruments.
X_proj (np.ndarray of dimension (n, mx), optional, default=None.) – Projection of X onto the subspace orthogonal to Z.
y_proj (np.ndarray of dimension (n,), optional, default=None.) – Projection of y onto the subspace orthogonal to Z.
- Returns:
ar_min – The smallest eigenvalue of \(((X y)^T M_Z (X y))^{-1} (X y)^T P_Z (X y)\), where \(P_Z\) is the projection matrix onto the subspace spanned by Z.
- Return type:
float
- fit(X, y, Z=None, C=None, *args, **kwargs)
Fit a k-class or anchor regression estimator.
If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_namesandZmust beNone. At least one one ofZ,instrument_names, andinstrument_regexmust be specified. Ifexogenous_namesorexogenous_regexare specified,Xmust be a pandas DataFrame containing columnsexogenous_namesandCmust beNone.- Parameters:
X (array-like, shape (n_samples, n_features)) – The training input samples. If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_names.y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If
instrument_namesorinstrument_regexare specified,Zmust beNone. IfZis specified,instrument_namesandinstrument_regexmust beNone.C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If
exogenous_namesorexogenous_regexare specified,Cmust beNone. IfCis specified,exogenous_namesandexogenous_regexmust beNone.
- property named_coefs_
- predict(X, C=None, *args, **kwargs)
- summary(X, y, Z=None, C=None, test='wald', alpha=0.05, feature_names=None, **kwargs)
Create Summary object for the fitted model.
This contains the fitted values (estimates), subvector test statistics for each parameter, corresponding p-values, and confidence sets.
- Parameters:
X (array-like, shape (n_samples, n_features)) – The input data.
y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If
instrument_namesorinstrument_regexare specified,Zmust beNone. IfZis specified,instrument_namesandinstrument_regexmust beNone.C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If
exogenous_namesorexogenous_regexare specified,Cmust beNone. IfCis specified,exogenous_namesandexogenous_regexmust beNone.test (str, optional, default="wald (liml)") – The test to use. Must be one of “wald”, “anderson-rubin”, “lagrange multiplier”, “likelihood-ratio”, or “conditional likelihood-ratio”.
alpha (float, optional, default=0.05) – The significance level.
feature_names (list of str, optional) – Names of the features to be included in the summary. If not specified, all features will be included.
**kwargs – Additional keyword arguments to pass to the test and its inversion.
ivmodels.models.pulse module
- class ivmodels.models.pulse.PULSE(instrument_names=None, instrument_regex=None, p_min=0.05, rtol=0.01, kappa_max=1, alpha=0, l1_ratio=0)
Bases:
PULSEMixin,KClassp-uncorrelated least squares estimator (PULSE) [Jakobsen and Peters, 2022].
Perform k-class estimation with k-class parameter \(\kappa \in [0, \kappa_\mathrm{max}]\) chosen minimally such that the PULSE test of correlation between the instruments and the residuals is not significant at level
p_min.- Parameters:
instrument_names (str or list of str, optional) – The names of the columns in
Xthat should be used as anchors. RequiresXto be a pandas DataFrame.instrument_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as anchors. RequiresXto be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.p_min (float, optional, default = 0.05) – The p-value of the PULSE test that is used to determine the k-class parameter \(\kappa\). The PULSE will search for the smallest \(\kappa\) that makes the test not significant at level
p_minwith binary search.rtol (float, optional, default = 0.01) – The relative tolerance of the binary search. The PULSE will search for a \(\kappa\) such that the PULSE test is not significant at level
p_min` with binary search but is significant at level ``p_min * (1 + rtol).kappa_max (float, optional, default = 1) – The maximum value of
kappato consider. The PULSE will search for the smallestkappathat makes the test not significant at levelp_minwith binary search. Ifkappa_max = 1, the PULSE will run a regression equivalent to two-stage-least-squares. Ifalpha = 0andZ.shape[1] < X.shape[1], this is not well-defined and the PULSE will raise an exception.alpha (float, optional, default = 0) – The regularization parameter for elastic net. If
alphais 0, the estimator is unregularized.l1_ratio (float, optional, default = 0) – The ratio of L1 to L2 regularization for elastic net. If
l1_ratiois 1, the estimator is Lasso. Ifl1_ratiois 0, the estimator is Ridge.
- coef_
The estimated coefficients.
- Type:
array-like, shape (n_features,)
- intercept_
The estimated intercept.
- Type:
float
- kappa_
The estimated kappa.
- Type:
float
References
- set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') PULSE
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter infit.Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Zparameter infit.
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, C: bool | None | str = '$UNCHANGED$') PULSE
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, context: bool | None | str = '$UNCHANGED$', offset: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') PULSE
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
contextparameter inscore.offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
offsetparameter inscore.sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.
- Returns:
self – The updated object.
- Return type:
object
- class ivmodels.models.pulse.PULSEMixin(p_min=0.05, rtol=0.01, kappa_max=1, **kwargs)
Bases:
objectMixin class for PULSE estimators.
- fit(X, y, Z=None, C=None, *args, **kwargs)
Fit a p-uncorrelated least squares estimator (PULSE) [1].
If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_namesandamust beNone. At least one one ofa,instrument_names, andinstrument_regexmust be specified.- Parameters:
X (array-like, shape (n_samples, n_features)) – The training input samples. If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_names.y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values.
Z (array-like, shape (n_samples, n_anchors), optional) – The instrument (anchor) values. If
instrument_namesorinstrument_regexare specified,Zmust beNone. IfZis specified,instrument_namesandinstrument_regexmust beNone.
ivmodels.models.space_iv module
- class ivmodels.models.space_iv.SpaceIV(s_max=None, p_min=0.05)
Bases:
objectRun the space IV algorithm from Pfister and Peters [2022].
Returns \(\arg\min \| \beta \|_0\) subject to \(\mathrm{AR}(\beta) \leq q_{1 - \alpha}\), where \(q_{1 - \alpha}\) is the \(1 - \alpha\) quantile of the F distribution with \(q\) and \(n-q\) degrees of freedom.
- Parameters:
s_max (int, optional, default = None) – Maximum number of variables to consider. If
None, set toX.shape[1].p_min (float, optional, default = 0.05) – Confidence level (\(\alpha\) above).
- coef_
Estimated coefficients for the linear regression problem.
- Type:
array-like, shape (n_features,)
- intercept_
Independent term in the linear model.
- Type:
float
- S_
Indices of the selected variables.
- Type:
array-like, shape (s,)
- s_
Number of selected variables.
- Type:
int
- kappa_
Equal to \(\hat\kappa_\mathrm{LIML}\) for the selected model.
- Type:
float
References
- fit(X, y, Z=None)
Fit a SpaceIV model.
If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_namesandZmust beNone. At least one one ofZ,instrument_names, andinstrument_regexmust be specified.- Parameters:
X (array-like, shape (n_samples, n_features)) – The training input samples. If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_names.y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If
instrument_namesorinstrument_regexare specified,Zmust beNone. IfZis specified,instrument_namesandinstrument_regexmust beNone.
Module contents
- class ivmodels.models.AnchorRegression(gamma=1, instrument_names=None, instrument_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)
Bases:
AnchorMixin,GeneralizedLinearRegressorLinear regression with anchor regularization [Rothenhäusler et al., 2021].
The anchor regression estimator with parameter \(\gamma\) is defined as
\[\hat\beta_\mathrm{anchor}(\gamma) := \arg\min_\beta \ \| y - X \beta \|_2^2 + (\gamma - 1) \|P_Z (y - X \beta) \|_2^2.\]If \(\gamma \geq 0\), then \(\hat\beta_\mathrm{anchor}(\gamma) = \hat\beta_\mathrm{k-class}((\gamma - 1) / \gamma)\).
The optimization is based on OLS after a data transformation. First standardizes
Xandyby subtracting the column means as proposed by Rothenhäusler et al. [2021]. Consequently, no anchor regularization is applied to the intercept.- Parameters:
gamma (float) – The anchor regularization parameter.
gamma=1corresponds to OLS.instrument_names (str or list of str, optional) – The names of the columns in
Xthat should be used as instruments (anchors). RequiresXto be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.instrument_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as instruments (anchors). RequiresXto be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.alpha (float, optional, default=0) – Regularization parameter for elastic net regularization.
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For
l1_ratio=0the penalty is an L2 penalty. Forl1_ratio=1it is an L1 penalty.
- coef_
The estimated coefficients for the linear regression problem.
- Type:
array-like, shape (n_features,)
- intercept_
The estimated intercept for the linear regression problem.
- Type:
float
- kappa_
The kappa parameter of the corresponding k-class estimator.
- Type:
float
References
- set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') AnchorRegression
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter infit.Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Zparameter infit.
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, C: bool | None | str = '$UNCHANGED$') AnchorRegression
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, context: bool | None | str = '$UNCHANGED$', offset: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') AnchorRegression
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
contextparameter inscore.offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
offsetparameter inscore.sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.
- Returns:
self – The updated object.
- Return type:
object
- class ivmodels.models.KClass(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)
Bases:
KClassMixin,GeneralizedLinearRegressorK-class estimator for instrumental variable regression.
The k-class estimator with parameter \(\kappa\) is defined as
\[\begin{split}\hat\beta_\mathrm{k-class}(\kappa) &:= \arg\min_\beta \ (1 - \kappa) \| y - X \beta \|_2^2 + \kappa \|P_Z (y - X \beta) \|_2^2 \\ &= (X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X)^{-1} X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X) y,\end{split}\]where \(P_Z = Z (Z^T Z)^{-1} Z^T\) is the projection matrix onto the subspace spanned by \(Z\) and \(\mathrm{Id}\) is the identity matrix. This includes the the ordinary least-squares (OLS) estimator (\(\kappa = 0\)), the two-stage least-squares (2SLS) estimator (\(\kappa = 1\)), the limited information maximum likelihood (LIML) estimator (\(\kappa = \hat\kappa_\mathrm{LIML}\)), and the Fuller estimators (\(\kappa = \hat\kappa_\mathrm{LIML} - \alpha / (n - k)\)) as special cases.
Specifying exogenous included regressors \(C\) is equivalent to including them into both \(Z\) and \(X\).
- Parameters:
kappa (float or { "ols", "tsls", "2sls", "liml", "fuller", "fuller(a)"}) – The kappa parameter of the k-class estimator. If string, then must be one of
"ols","2sls","tsls","liml","fuller", or"fuller(a)", whereais numeric. Ifkappa="ols", thenkappa=0and the k-class estimator is the ordinary least squares estimator. Ifkappa="tsls"orkappa="2sls", thenkappa=1and the k-class estimator is the two-stage least-squares estimator. Ifkappa="liml", then \(\kappa = \hat\kappa_\mathrm{LIML}\) is used, where \(\kappa_\mathrm{LIML} \geq 1\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_Z (X \ \ y))^{-1} (X \ \ y)^T (X \ y)\), where \(P_Z\) is the projection matrix onto the subspace spanned by \(Z\) and \(M_Z = Id - P_Z\). If exogenous included regressors \(C\) are specified, then \(\kappa_\mathrm{LIML}\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_{[Z, C]} (X \ \ y))^{-1} (X \ \ y)^T M_C (X \ y)\). Ifkappa="fuller(a)", then \(\kappa = \hat\kappa_\mathrm{LIML} - a / (n - k - mc)\), where \(n\) is the number of observations and \(q = \mathrm{dim}(Z)\) is the number of instruments. The string"fuller"is interpreted as"fuller(1.0)", yielding an estimator that is unbiased up to \(O(1/n)\) [Fuller, 1977].instrument_names (str or list of str, optional) – The names of the columns in
Xthat should be used as instruments. RequiresXargument offitmethod to be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.instrument_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as instruments. RequiresXargument offitmethod to be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.exogenous_names (str or list of str, optional) – The names of the columns in
Xthat should be used as exogenous regressors. RequiresXargument offitmethod to be a pandas DataFrame. If bothexogenous_namesandexogenous_regexare specified, the union of the two is used.exogenous_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as exogenous regressors. RequiresXargument offitmethod to be a pandas DataFrame. If bothexogenous_namesandexogenous_regexare specified, the union of the two is used.alpha (float, optional, default=0) – Regularization parameter for elastic net regularization. Only implemented for \(\kappa \leq 1\).
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For
l1_ratio=0the penalty is an L2 penalty. Forl1_ratio=1it is an L1 penalty. Only implemented for \(\kappa \leq 1\).
- coef_
The estimated coefficients for the linear regression problem.
- Type:
array-like, shape (n_features,)
- intercept_
The estimated intercept for the linear regression problem.
- Type:
float
- kappa_
The numerical kappa parameter of the k-class estimator.
- Type:
float
- fuller_alpha_
If
kappais one of{"fuller", "fuller(a)", "liml"}for some numeric valuea, the alpha parameter of the Fuller estimator.- Type:
float
- ar_min_
If
kappais one of{"fuller", "fuller(a)", "liml"}for some numeric valuea, the minimum of the unnormalized Anderson Rubin statistic.- Type:
float
- kappa_liml_
If
kappais one of{"fuller", "fuller(a)", "liml"}for some numeric valuea, the kappa parameter of the LIML estimator, equal to1 + ar_min_.- Type:
float
- named_coef_
If
Xwas a pandas DataFrame, the estimated coefficients for the linear regression problem with the variable names as index.- Type:
array-like, shape (n_features,)
References
- set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') KClass
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter infit.Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Zparameter infit.
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, C: bool | None | str = '$UNCHANGED$') KClass
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, context: bool | None | str = '$UNCHANGED$', offset: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') KClass
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
contextparameter inscore.offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
offsetparameter inscore.sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.
- Returns:
self – The updated object.
- Return type:
object
- class ivmodels.models.PULSE(instrument_names=None, instrument_regex=None, p_min=0.05, rtol=0.01, kappa_max=1, alpha=0, l1_ratio=0)
Bases:
PULSEMixin,KClassp-uncorrelated least squares estimator (PULSE) [Jakobsen and Peters, 2022].
Perform k-class estimation with k-class parameter \(\kappa \in [0, \kappa_\mathrm{max}]\) chosen minimally such that the PULSE test of correlation between the instruments and the residuals is not significant at level
p_min.- Parameters:
instrument_names (str or list of str, optional) – The names of the columns in
Xthat should be used as anchors. RequiresXto be a pandas DataFrame.instrument_regex (str, optional) – A regex that is used to select columns in
Xthat should be used as anchors. RequiresXto be a pandas DataFrame. If bothinstrument_namesandinstrument_regexare specified, the union of the two is used.p_min (float, optional, default = 0.05) – The p-value of the PULSE test that is used to determine the k-class parameter \(\kappa\). The PULSE will search for the smallest \(\kappa\) that makes the test not significant at level
p_minwith binary search.rtol (float, optional, default = 0.01) – The relative tolerance of the binary search. The PULSE will search for a \(\kappa\) such that the PULSE test is not significant at level
p_min` with binary search but is significant at level ``p_min * (1 + rtol).kappa_max (float, optional, default = 1) – The maximum value of
kappato consider. The PULSE will search for the smallestkappathat makes the test not significant at levelp_minwith binary search. Ifkappa_max = 1, the PULSE will run a regression equivalent to two-stage-least-squares. Ifalpha = 0andZ.shape[1] < X.shape[1], this is not well-defined and the PULSE will raise an exception.alpha (float, optional, default = 0) – The regularization parameter for elastic net. If
alphais 0, the estimator is unregularized.l1_ratio (float, optional, default = 0) – The ratio of L1 to L2 regularization for elastic net. If
l1_ratiois 1, the estimator is Lasso. Ifl1_ratiois 0, the estimator is Ridge.
- coef_
The estimated coefficients.
- Type:
array-like, shape (n_features,)
- intercept_
The estimated intercept.
- Type:
float
- kappa_
The estimated kappa.
- Type:
float
References
- set_fit_request(*, C: bool | None | str = '$UNCHANGED$', Z: bool | None | str = '$UNCHANGED$') PULSE
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter infit.Z (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Zparameter infit.
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, C: bool | None | str = '$UNCHANGED$') PULSE
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
C (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
Cparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, context: bool | None | str = '$UNCHANGED$', offset: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') PULSE
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
context (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
contextparameter inscore.offset (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
offsetparameter inscore.sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.
- Returns:
self – The updated object.
- Return type:
object
- class ivmodels.models.SpaceIV(s_max=None, p_min=0.05)
Bases:
objectRun the space IV algorithm from Pfister and Peters [2022].
Returns \(\arg\min \| \beta \|_0\) subject to \(\mathrm{AR}(\beta) \leq q_{1 - \alpha}\), where \(q_{1 - \alpha}\) is the \(1 - \alpha\) quantile of the F distribution with \(q\) and \(n-q\) degrees of freedom.
- Parameters:
s_max (int, optional, default = None) – Maximum number of variables to consider. If
None, set toX.shape[1].p_min (float, optional, default = 0.05) – Confidence level (\(\alpha\) above).
- coef_
Estimated coefficients for the linear regression problem.
- Type:
array-like, shape (n_features,)
- intercept_
Independent term in the linear model.
- Type:
float
- S_
Indices of the selected variables.
- Type:
array-like, shape (s,)
- s_
Number of selected variables.
- Type:
int
- kappa_
Equal to \(\hat\kappa_\mathrm{LIML}\) for the selected model.
- Type:
float
References
- fit(X, y, Z=None)
Fit a SpaceIV model.
If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_namesandZmust beNone. At least one one ofZ,instrument_names, andinstrument_regexmust be specified.- Parameters:
X (array-like, shape (n_samples, n_features)) – The training input samples. If
instrument_namesorinstrument_regexare specified,Xmust be a pandas DataFrame containing columnsinstrument_names.y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If
instrument_namesorinstrument_regexare specified,Zmust beNone. IfZis specified,instrument_namesandinstrument_regexmust beNone.