API

Estimators

class ivmodels.KClass(kappa=1, instrument_names=None, instrument_regex=None, exogenous_names=None, exogenous_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)

K-class estimator for instrumental variable regression.

The k-class estimator with parameter \(\kappa\) is defined as

\[\begin{split}\hat\beta_\mathrm{k-class}(\kappa) &:= \arg\min_\beta \ (1 - \kappa) \| y - X \beta \|_2^2 + \kappa \|P_Z (y - X \beta) \|_2^2 \\ &= (X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X)^{-1} X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) y,\end{split}\]

where \(P_Z = Z (Z^T Z)^{-1} Z^T\) is the projection matrix onto the subspace spanned by \(Z\) and \(\mathrm{Id}\) is the identity matrix. This includes the the ordinary least-squares (OLS) estimator (\(\kappa = 0\)), the two-stage least-squares (2SLS) estimator (\(\kappa = 1\)), the limited information maximum likelihood (LIML) estimator (\(\kappa = \hat\kappa_\mathrm{LIML}\)), and the Fuller estimators (\(\kappa = \hat\kappa_\mathrm{LIML} - \alpha / (n - k - m_C)\)) as special cases.

Specifying exogenous included regressors \(C\) is equivalent to including them into both \(Z\) and \(X\).

Parameters:

kappa (float or { "ols", "tsls", "2sls", "liml", "fuller", "fuller(a)"}) – The kappa parameter of the k-class estimator. If string, then must be one of "ols", "2sls", "tsls", "liml", "fuller", or "fuller(a)", where a is numeric. If kappa="ols", then kappa=0 and the k-class estimator is the ordinary least squares estimator. If kappa="tsls" or kappa="2sls", then kappa=1 and the k-class estimator is the two-stage least-squares estimator. If kappa="liml", then \(\kappa = \hat\kappa_\mathrm{LIML}\) is used, where \(\kappa_\mathrm{LIML} \geq 1\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_Z (X \ \ y))^{-1} (X \ \ y)^T (X \ y)\), where \(P_Z\) is the projection matrix onto the subspace spanned by \(Z\) and \(M_Z = Id - P_Z\). If exogenous included regressors \(C\) are specified, then \(\kappa_\mathrm{LIML}\) is the smallest eigenvalue of the matrix \(((X \ \ y)^T M_{[Z, C]} (X \ \ y))^{-1} (X \ \ y)^T M_C (X \ y)\). If kappa="fuller(a)", then \(\kappa = \hat\kappa_\mathrm{LIML} - a / (n - k - m_C)\), where \(n\) is the number of observations, \(k = \mathrm{dim}(Z)\) is the number of instruments, and \(m_C = \mathrm{dim}(C)\) is the number of exogenous included regressors (plus one if fit_intercept=True). The string "fuller" is interpreted as "fuller(1.0)", yielding an estimator that is unbiased up to \(O(1/n)\) [Fuller, 1977].
instrument_names (str or list of str, optional) – The names of the columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as instruments. Requires X argument of fit method to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
exogenous_names (str or list of str, optional) – The names of the columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.
exogenous_regex (str, optional) – A regex that is used to select columns in X that should be used as exogenous regressors. Requires X argument of fit method to be a pandas DataFrame. If both exogenous_names and exogenous_regex are specified, the union of the two is used.
alpha (float, optional, default=0) – Regularization parameter for elastic net regularization. Only implemented for \(\kappa \leq 1\).
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For l1_ratio=0 the penalty is an L2 penalty. For l1_ratio=1 it is an L1 penalty. Only implemented for \(\kappa \leq 1\).
fit_intercept (bool, optional, default=True) – Whether to fit an intercept.

coef_

The estimated coefficients for the linear regression problem.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept for the linear regression problem.

Type:: float

kappa_

The numerical kappa parameter of the k-class estimator.

Type:: float

fuller_alpha_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the alpha parameter of the Fuller estimator.

Type:: float

ar_min_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the minimum of the unnormalized Anderson Rubin statistic.

Type:: float

kappa_liml_

If kappa is one of {"fuller", "fuller(a)", "liml"} for some numeric value a, the kappa parameter of the LIML estimator, equal to 1 + ar_min_.

Type:: float

named_coef_

If X was a pandas DataFrame, the estimated coefficients for the linear regression problem with the variable names as index.

Type:: array-like, shape (n_features,)

References

[Ful77]

Wayne A Fuller. Some properties of a modification of the limited information estimator. Econometrica: Journal of the Econometric Society, pages 939–953, 1977.

fit(X, y, Z=None, C=None, *args, **kwargs)

Fit a k-class or anchor regression estimator.

If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names and Z must be None. At least one one of Z, instrument_names, and instrument_regex must be specified. If exogenous_names or exogenous_regex are specified, X must be a pandas DataFrame containing columns exogenous_names and C must be None.

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples. If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names.
y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If instrument_names or instrument_regex are specified, Z must be None. If Z is specified, instrument_names and instrument_regex must be None.
C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If exogenous_names or exogenous_regex are specified, C must be None. If C is specified, exogenous_names and exogenous_regex must be None.

class ivmodels.models.AnchorRegression(gamma=1, instrument_names=None, instrument_regex=None, alpha=0, l1_ratio=0, fit_intercept=True)

Linear regression with anchor regularization [Rothenhäusler et al., 2021].

The anchor regression estimator with parameter \(\gamma\) is defined as

\[\hat\beta_\mathrm{anchor}(\gamma) := \arg\min_\beta \ \| y - X \beta \|_2^2 + (\gamma - 1) \|P_Z (y - X \beta) \|_2^2.\]

If \(\gamma > 0\), then \(\hat\beta_\mathrm{anchor}(\gamma) = \hat\beta_\mathrm{k-class}((\gamma - 1) / \gamma)\).

The optimization is based on OLS after a data transformation. First standardizes X and y by subtracting the column means as proposed by Rothenhäusler et al. [2021]. Consequently, no anchor regularization is applied to the intercept.

Parameters:

gamma (float) – The anchor regularization parameter. gamma=1 corresponds to OLS.
instrument_names (str or list of str, optional) – The names of the columns in X that should be used as instruments (anchors). Requires X to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as instruments (anchors). Requires X to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
alpha (float, optional, default=0) – Regularization parameter for elastic net regularization.
l1_ratio (float, optional, default=0) – Ratio of L1 to L2 regularization for elastic net regularization. For l1_ratio=0 the penalty is an L2 penalty. For l1_ratio=1 it is an L1 penalty.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept.

coef_

The estimated coefficients for the linear regression problem.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept for the linear regression problem.

Type:: float

kappa_

The kappa parameter of the corresponding k-class estimator.

Type:: float

References

[RMBP21] (1,2)

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. Anchor regression: heterogeneous data meet causality. Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021.

fit(X, y, Z=None, C=None, *args, **kwargs)

Fit a k-class or anchor regression estimator.

If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names and Z must be None. At least one one of Z, instrument_names, and instrument_regex must be specified. If exogenous_names or exogenous_regex are specified, X must be a pandas DataFrame containing columns exogenous_names and C must be None.

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples. If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names.
y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If instrument_names or instrument_regex are specified, Z must be None. If Z is specified, instrument_names and instrument_regex must be None.
C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If exogenous_names or exogenous_regex are specified, C must be None. If C is specified, exogenous_names and exogenous_regex must be None.

class ivmodels.models.PULSE(instrument_names=None, instrument_regex=None, p_min=0.05, rtol=0.01, kappa_max=1, alpha=0, l1_ratio=0)

p-uncorrelated least squares estimator (PULSE) [Jakobsen and Peters, 2022].

Perform k-class estimation with k-class parameter \(\kappa \in [0, \kappa_\mathrm{max}]\) chosen minimally such that the PULSE test of correlation between the instruments and the residuals is not significant at level p_min.

Parameters:

instrument_names (str or list of str, optional) – The names of the columns in X that should be used as anchors. Requires X to be a pandas DataFrame.
instrument_regex (str, optional) – A regex that is used to select columns in X that should be used as anchors. Requires X to be a pandas DataFrame. If both instrument_names and instrument_regex are specified, the union of the two is used.
p_min (float, optional, default = 0.05) – The p-value of the PULSE test that is used to determine the k-class parameter \(\kappa\). The PULSE will search for the smallest \(\kappa\) that makes the test not significant at level p_min with binary search.
rtol (float, optional, default = 0.01) – The relative tolerance of the binary search. The PULSE will search for a \(\kappa\) such that the PULSE test is not significant at level p_min with binary search but is significant at level p_min * (1 + rtol).
kappa_max (float, optional, default = 1) – The maximum value of kappa to consider. The PULSE will search for the smallest kappa that makes the test not significant at level p_min with binary search. If kappa_max = 1, the PULSE will run a regression equivalent to two-stage-least-squares. If alpha = 0 and Z.shape[1] < X.shape[1], this is not well-defined and the PULSE will raise an exception.
alpha (float, optional, default = 0) – The regularization parameter for elastic net. If alpha is 0, the estimator is unregularized.
l1_ratio (float, optional, default = 0) – The ratio of L1 to L2 regularization for elastic net. If l1_ratio is 1, the estimator is Lasso. If l1_ratio is 0, the estimator is Ridge.

coef_

The estimated coefficients.

Type:: array-like, shape (n_features,)

intercept_

The estimated intercept.

Type:: float

kappa_

The estimated kappa.

Type:: float

References

[JP22] (1,2,3)

Martin Emil Jakobsen and Jonas Peters. Distributional robustness of k-class estimators and the PULSE. The Econometrics Journal, 25(2):404–432, 2022.

fit(X, y, Z=None, C=None, *args, **kwargs)

Fit a p-uncorrelated least squares estimator (PULSE).

If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names and Z must be None. At least one of Z, instrument_names, and instrument_regex must be specified.

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples. If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names.
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – The target values.
Z (array-like, shape (n_samples, n_anchors), optional) – The instrument (anchor) values. If instrument_names or instrument_regex are specified, Z must be None. If Z is specified, instrument_names and instrument_regex must be None.
C (array-like, shape (n_samples, n_exogenous), optional) – Exogenous included variables. Must be None or have zero columns, as PULSE does not support exogenous included variables.

Tests

ivmodels.tests.anderson_rubin_test(Z, X, y, beta, W=None, C=None, D=None, critical_values='chi2', fit_intercept=True)

Perform the Anderson Rubin test [Anderson and Rubin, 1949].

Test the null hypothesis that the residuals are uncorrelated with the instruments. If W is None, the test statistic is defined as

\[\mathrm{AR}(\beta) := \frac{n - k}{k} \frac{\| P_Z (y - X \beta) \|_2^2}{\| M_Z (y - X \beta) \|_2^2},\]

where \(P_Z\) is the projection matrix onto the column space of \(Z\) and \(M_Z = \mathrm{Id} - P_Z\).

Under the null and normally distributed errors, this test statistic is distributed as \(F_{k, n - k}\), where \(k\) is the number of instruments and \(n\) is the number of observations. The statistic is asymptotically distributed as \(\chi^2(k) / k\) under the null and non-normally distributed errors, even for weak instruments.

If W is not None, the test statistic is

\[\begin{split}\mathrm{AR}(\beta) &:= \min_\gamma \frac{n - k}{k - m_W} \frac{\| P_Z (y - X \beta - W \gamma) \|_2^2}{\| M_Z (y - X \beta - W \gamma) \|_2^2} \\ &= \frac{n - k}{k - m_W} \frac{\| P_Z (y - X \beta - W \hat\gamma_\mathrm{LIML}) \|_2^2}{\| M_Z (y - X \beta - W \hat\gamma_\mathrm{LIML}) \|_2^2},\end{split}\]

where \(\hat\gamma_\mathrm{LIML}\) is the LIML estimate using instruments \(Z\), covariates \(W\) and outcomes \(y - X \beta\). Under the null, this test statistic is asymptotically bounded from above by a random variable that is distributed as \(\frac{1}{k - m_W} \chi^2(k - m_W)\), where \(m_W = \mathrm{dim}(W)\). See [Guggenberger et al., 2012].

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
beta (np.ndarray of dimension (mx + md,)) – Coefficients to test.
W (np.ndarray of dimension (n, mw) or None, optional, default = None) – Endogenous regressors not of interest.
C (np.ndarray of dimension (n, mc) or None, optional, default = None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default = None) – Exogenous regressors of interest.
critical_values (str, optional, default = "chi2") – If "chi2", use the \(\chi^2(k - m_W)\) distribution to compute the p-value. If "f", use the \(F_{k - m_W, n - k}\) distribution to compute the p-value. If "guggenberger" or "GKM", use the critical value function proposed by Guggenberger et al. [2019] to compute the p-value.
fit_intercept (bool, optional, default = True) – Whether to include an intercept. This is equivalent to centering the inputs.

Returns:

statistic (float) – The test statistic \(\mathrm{AR}(\beta)\).
p_value (float) – The p-value of the test.

Raises:

ValueError: – If the dimensions of the inputs are incorrect.

References

[AR49]

Theodore W Anderson and Herman Rubin. Estimation of the parameters of a single equation in a complete system of stochastic equations. The Annals of mathematical statistics, 20(1):46–63, 1949.

[GKM19] (1,2)

Patrik Guggenberger, Frank Kleibergen, and Sophocles Mavroeidis. A more powerful subvector Anderson Rubin test in linear instrumental variables regression. Quantitative Economics, 10(2):487–526, 2019.

[GKMC12]

Patrik Guggenberger, Frank Kleibergen, Sophocles Mavroeidis, and Linchun Chen. On the asymptotic sizes of subset Anderson–Rubin and Lagrange multiplier tests in linear instrumental variables regression. Econometrica, 80(6):2649–2666, 2012.

ivmodels.tests.conditional_likelihood_ratio_test(Z, X, y, beta, W=None, C=None, D=None, fit_intercept=True, critical_values='londschien2025exact', tol=1e-06, num_samples=100000)

Perform the conditional likelihood ratio test for beta.

If W is None, the test statistic is defined as

\[\begin{split}\mathrm{CLR}(\beta) &:= (n - k) \frac{ \| P_Z (y - X \beta) \|_2^2}{ \| M_Z (y - X \beta) \|_2^2} - (n - k) \frac{ \| P_Z (y - X \hat\beta_\mathrm{LIML}) \|_2^2 }{ \| M_Z (y - X \hat\beta_\mathrm{LIML}) \|_2^2 } \\ &= k \cdot \mathrm{AR}(\beta) - k \cdot \min_\beta \mathrm{AR}(\beta),\end{split}\]

where \(P_Z\) is the projection matrix onto the column space of \(Z\), \(M_Z = \mathrm{Id} - P_Z\), and \(\hat\beta_\mathrm{LIML}\) is the LIML estimator of \(\beta\) (see KClass), minimizing the Anderson-Rubin test statistic \(\mathrm{AR}(\beta)\) (see anderson_rubin_test()) at

\[\mathrm{AR}(\hat\beta_\mathrm{LIML}) = \frac{n - k}{k} \lambda_\mathrm{min}\left( \left(\begin{pmatrix} X & y \end{pmatrix}^T M_Z \begin{pmatrix} X & y \end{pmatrix}\right)^{-1} \begin{pmatrix} X & y \end{pmatrix}^T P_Z \begin{pmatrix} X & y \end{pmatrix} \right).\]

Let

\[\tilde X(\beta) := X - (y - X \beta) \cdot \frac{(y - X \beta)^T M_Z X}{(y - X \beta)^T M_Z (y - X \beta)}\]

and let \(\lambda_1, \ldots, \lambda_m\) be the eigenvalues of

\[(n - k) \cdot \left[\tilde X(\beta)^T M_Z \tilde X(\beta)\right]^{-1} \tilde X(\beta)^T P_Z \tilde X(\beta).\]

If critical_values="londschien2025exact", the exact asymptotic distribution from Theorem 1 of [Londschien, 2025] is used:

\[\mathrm{CLR}(\beta_0) \overset{d}{\to} \sum_{i=0}^m q_i - \mu_\mathrm{min},\]

where \(q_0 \sim \chi^2(k-m)\), \(q_1, \ldots, q_m \sim \chi^2(1)\), and \(\mu_\mathrm{min}\) is the smallest root of the polynomial

\[p(\mu) := \left(\mu - \sum_{i=0}^m q_i\right) \prod_{i=1}^m (\mu - \lambda_i) - \sum_{i=1}^m \lambda_i q_i \prod_{j \geq 1, j \neq i} (\mu - \lambda_j).\]

This distribution is conditional on all eigenvalues \(\lambda_1, \ldots, \lambda_m\) and provides substantially more power when eigenvalues differ.

If critical_values is "kleibergen2007generalizing" or "moreira2003conditional", uses the upper bound conditional on only the smallest eigenvalue \(\lambda_1\):

\[\mathrm{CLR}(\beta_0) \leq \frac{1}{2} \left( Q_{m_X} + Q_{k - m_X} - \lambda_1 + \sqrt{ (Q_{m_X} + Q_{k - m_X} + \lambda_1)^2 - 4 Q_{k - m_X} \lambda_1 } \right),\]

where \(Q_{m_X} \sim \chi^2(m_X)\) and \(Q_{k - m_X} \sim \chi^2(k - m_X)\) are independent. This bound is sharp when all eigenvalues are equal.

This test is robust to weak instruments. If identification is strong (\(\lambda_i \to \infty\)), the test is equivalent to the likelihood ratio test with \(\chi^2(m_X)\) distribution. If identification is weak (\(\lambda_i \to 0\)), the test is equivalent to the Anderson-Rubin test with \(\chi^2(k)\) distribution.

If W is not None, the test statistic is defined as

\[\begin{split}\mathrm{CLR}(\beta) &:= (n - k) \min_\gamma \frac{ \| P_Z (y - X \beta - W \gamma) \|_2^2}{ \| M_Z (y - X \beta - W \gamma) \|_2^2} - (n - k) \min_{\beta, \gamma} \frac{ \| P_Z (y - X \beta - W \gamma) \|_2^2 }{ \| M_Z (y - X \beta - W \gamma) \|_2^2 } \\ &= (n - k) \frac{ \| P_Z (y - X \beta - W \hat\gamma_\mathrm{LIML}) \|_2^2}{ \| M_Z (y - X \beta - W \hat\gamma_\mathrm{LIML}) \|_2^2} - (n - k) \frac{ \| P_Z (y - \begin{pmatrix} X & W \end{pmatrix} \hat\delta_\mathrm{LIML}) \|_2^2 }{ \| M_Z (y - \begin{pmatrix} X & W \end{pmatrix} \hat\delta_\mathrm{LIML}) \|_2^2 },\end{split}\]

where \(\hat\gamma_\mathrm{LIML}\) and \(\hat\delta_\mathrm{LIML}\) are LIML estimators. In this case, only the upper bound method is available (critical_values must be "kleibergen2007generalizing" or "moreira2003conditional").

Parameters:

Z (array_like of shape (n, k)) – Instruments.
X (array_like of shape (n, mx)) – Endogenous regressors of interest.
y (array_like of shape (n,)) – Outcomes.
beta (array_like of shape (mx + md,)) – Coefficients to test.
W (array_like of shape (n, mw) or None, default=None) – Endogenous regressors not of interest.
C (array_like of shape (n, mc) or None, default=None) – Exogenous regressors not of interest.
D (array_like of shape (n, md) or None, default=None) – Exogenous regressors of interest. Will be included into both X and Z if supplied.
fit_intercept (bool, default=True) – Whether to include an intercept. This is equivalent to centering the inputs.
critical_values ({"londschien2025exact", "kleibergen2007generalizing", "moreira2003conditional"}, default="londschien2025exact") – Which critical values to use. If "londschien2025exact", uses the exact distribution conditional on all eigenvalues via Monte Carlo simulation of the polynomial root \(\mu_\mathrm{min}\). Only available when W is None. If "kleibergen2007generalizing" or "moreira2003conditional", uses the upper bound conditional on the smallest eigenvalue via numerical integration.
tol (float, default=1e-6) – Tolerance for the approximation of the CDF and thus the p-value.
num_samples (int, default=100_000) – Number of Monte Carlo samples when using "londschien2025exact".

Returns:

statistic (float) – The test statistic \(\mathrm{CLR}(\beta)\).
p_value (float) – The p-value of the test, correct up to tolerance tol.

Raises:

ValueError: – If the dimensions of the inputs are incorrect.

References

[Kle07]

Frank Kleibergen. Generalizing weak instrument robust IV statistics towards multiple parameters, unrestricted covariance matrices and identification statistics. Journal of Econometrics, 139(1):181–216, 2007.

[Kle21]

Frank Kleibergen. Efficient size correct subset inference in homoskedastic linear instrumental variables regression. Journal of Econometrics, 221(1):78–96, 2021.

[Lon25]

Malte Londschien. The exact distribution of the conditional likelihood-ratio test in instrumental variables regression. arXiv preprint, 2025.

[Mor03]

Marcelo J Moreira. A conditional likelihood ratio test for structural models. Econometrica, 71(4):1027–1048, 2003.

ivmodels.tests.inverse_anderson_rubin_test(Z, X, y, alpha=0.05, W=None, C=None, D=None, critical_values='chi2', fit_intercept=True, tol=1e-06, max_eval=100)

Return the quadric for the inverse Anderson-Rubin test’s acceptance region.

The returned quadric satisfies quadric(x) <= 0 if and only if anderson_rubin_test(Z, X, y, beta=x, W=W)[1] > alpha. It is thus a confidence region for the causal parameter corresponding to the endogenous regressors of interest X.

If W is None, let \(q := \frac{k}{n-k}F_{F(k, n-k}(1 - \alpha)\), where \(F_{F(k, n-k)}\) is the cumulative distribution function of the \(F(k, n-k)\) distribution. The quadric is defined as

\[\begin{split}\mathrm{AR}(\beta) = \frac{n - k}{k} \frac{\| P_Z (y - X \beta) \|_2^2}{\| M_Z (y - X \beta) \|_2^2} \leq F_{F(k, n-k)}(1 - \alpha) \\ \Leftrightarrow \beta^T X^T (P_Z - q M_Z) X \beta - 2 y^T (P_Z - q M_Z) X \beta + y^T (P_Z - q M_Z) y \leq 0.\end{split}\]

If W is not None, let \(q := \frac{k - m_W}{n-k}F_{F(k - m_W, n-k)}(1 - \alpha)\). The quadric is defined as

\[\mathrm{AR}(\beta) = \min_\gamma \frac{n - k}{k - m_W} \frac{\| P_Z (y - X \beta - W \gamma) \|_2^2}{\| M_Z (y - X \beta - W \gamma) \|_2^2} \leq F_{F(k - m_W, n-k)}(1 - \alpha).\]

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
alpha (float) – Significance level.
W (np.ndarray of dimension (n, mw) or None, optional, default = None) – Endogenous regressors not of interest.
C (np.ndarray of dimension (n, mc) or None, optional, default = None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default = None) – Exogenous regressors of interest.
critical_values (str, optional, default = "chi2") – If "chi2", use the \(\chi^2(k - m_W)\) distribution to compute the p-value. If "f", use the \(F_{k - m_W, n - k}\) distribution to compute the p-value. If "gkm", use the critical value function proposed by Guggenberger et al. [2019].
fit_intercept (bool, optional, default = True) – Whether to include an intercept. This is equivalent to centering the inputs.
tol (float, optional, default = 1e-6) – Tolerance for the root finding algorithm when critical_values is "gkm".
max_eval (int, optional, default = 100) – Maximum number of evaluations for the root finding algorithm when critical_values is "gkm".

Returns:

The quadric for the acceptance region.

Return type:

Quadric

ivmodels.tests.inverse_conditional_likelihood_ratio_test(Z, X, y, alpha=0.05, W=None, C=None, D=None, fit_intercept=True, tol=1e-06, max_value=1000000.0, max_eval=100)

Return an approximation of the confidence set by inversion of the CLR test.

This is only implemented if mx + md = 1. The confidence set is computed by a root finding algorithm, see the docs of _find_roots() for more details.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
alpha (float, optional, default = 0.05) – Significance level of the test.
W (np.ndarray of dimension (n, mw) or None, optional, default = None) – Endogenous regressors not of interest.
C (np.ndarray of dimension (n, mc) or None, optional, default = None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default = None) – Exogenous regressors of interest.
fit_intercept (bool, optional, default: True) – Whether to include an intercept. This is equivalent to centering the inputs.
tol (float, optional, default: 1e-6) – The boundaries of the confidence set are computed up to this tolerance.
max_value (float, optional, default: 1e6) – The maximum value to consider when searching for the boundaries of the confidence set. That is, if the true confidence set is of the form [0, max_value + 1], the confidence returned set will be [0, np.inf].
max_eval (int, optional, default: 100) – The maximum number of evaluations of the critical value function to find the boundaries of the confidence set.

ivmodels.tests.inverse_lagrange_multiplier_test(Z, X, y, alpha=0.05, W=None, C=None, D=None, fit_intercept=True, tol=1e-06, max_value=1000000.0, max_eval=100)

Return an approximation of the confidence set by inversion of the LM test.

This is only implemented if mx + md = 1. The confidence set is computed by a root finding algorithm, see the docs of _find_roots() for more details.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors of interest.
y (np.ndarray of dimension (n,)) – Outcomes.
alpha (float, optional, default=0.05) – Significance level. The confidence level is 1 - alpha.
W (np.ndarray of dimension (n, mw) or None, optional, default=None) – Endogenous regressors not of interest.
C (np.ndarray of dimension (n, mc) or None, optional, default=None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default=None) – Exogenous regressors of interest.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept. This is equivalent to centering the inputs.
tol (float, optional, default=1e-6) – Tolerance for the root finding algorithm.
max_value (float, optional, default=1e6) – Maximum value for the root finding algorithm. Returns a confidence set with infinite bounds if the algorithm reaches this value.
max_eval (int, optional, default=100) – Maximum number of evaluations of the statistic for the root finding algorithm.

ivmodels.tests.inverse_likelihood_ratio_test(Z, X, y, alpha=0.05, W=None, C=None, D=None, fit_intercept=True)

Return the quadric for the inverse likelihood ratio test’s acceptance region.

The quadric is defined as

\[\mathrm{LR}(\beta) \leq F_{\chi^2(m_X)}(1 - \alpha).\]

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
alpha (float, optional, default=0.05) – Significance level.
W (np.ndarray of dimension (n, mw) or None, optional, default=None) – Endogenous regressors not of interest.
C (np.ndarray of dimension (n, mc) or None, optional, default=None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default=None) – Exogenous regressors of interest.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept. This is equivalent to centering the inputs.

ivmodels.tests.inverse_pulse_test(Z, X, y, alpha=0.05, W=None, C=None, D=None, fit_intercept=True)

Return the quadric for the inverse pulse test’s acceptance region.

The quadric satisfies quadric(x) <= 0 if and only if pulse_test(Z, X, y, beta=x, C=C, D=D)[1] > alpha. It is thus a confidence region for the causal parameter corresponding to the regressors X and D.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
alpha (float, optional, default=0.05) – Significance level.
W (np.ndarray of dimension (n, mw) or None, optional, default=None) – Must be None or mw must be 0. No subvector variant of the test is implemented.
C (np.ndarray of dimension (n, mc) or None, optional, default=None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default=None) – Exogenous regressors of interest.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept. This is equivalent to centering the inputs.

Returns:

The quadric for the acceptance region.

Return type:

Quadric

ivmodels.tests.inverse_wald_test(Z, X, y, alpha=0.05, W=None, C=None, D=None, estimator='tsls', fit_intercept=True)

Return the quadric for the acceptance region based on asymptotic normality.

If W is None, the quadric is defined as

\[(\beta - \hat{\beta})^T X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X (\beta - \hat{\beta}) \leq \hat{\sigma}^2 F_{\chi^2(m_X)}(1 - \alpha),\]

where \(\hat \beta\) is an estimate of the causal parameter \(\beta_0\) (controlled by the parameter estimator), \(\hat \sigma^2 = \frac{1}{n - m_X} \| y - X \hat \beta \|^2_2\), \(P_Z\) is the projection matrix onto the column space of \(Z\), \(M_Z\) is the projection matrix onto the orthogonal complement of the column space of \(Z\), and \(F_{\chi^2(m_X)}\) is the cumulative distribution function of the \(\chi^2(m_X)\) distribution.

If W is not None, the quadric is defined as

\[(\beta - B \hat{\beta})^T (B ((X W)^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) (X W))^{-1} B^T)^{-1} (\beta - B \hat{\beta}) \leq \hat{\sigma}^2 F_{\chi^2(m_X)}(1 - \alpha),\]

where \(B \in \mathbb{R}^{m_X \times (m_X + m_W)}\) is a diagonal matrix with 1 on the diagonal and 0 elsewhere.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
alpha (float) – Significance level.
W (np.ndarray of dimension (n, mw) or None, optional, default = None) – Endogenous regressors not of interest.
C (np.ndarray of dimension (n, mc) or None, optional, default = None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default = None) – Exogenous regressors of interest.
estimator (float or str, optional, default = "tsls") – Estimator to use. Passed as kappa parameter to KClass.
fit_intercept (bool, optional, default = True) – Whether to include an intercept. The intercept will be included both in the complete and the (restricted) model. Including an intercept is equivalent to centering the columns of all design matrices.

ivmodels.tests.inverse_weak_residual_prediction_test(Z, X, y, C=None, alpha=0.05, robust=False, nonlinear_model=None, fit_intercept=True, train_fraction=None, upper_clipping_quantile=0.9, gamma=0.05, seed=0, tol=1e-06, max_value=1000000.0, max_eval=100)

Compute confidence set for the weak-IV residual prediction test [Scheidegger et al., 2025].

This implements weak-IV robust confidence sets for the causal parameter based on the weak_residual_prediction_test [Scheidegger et al., 2025]. For each candidate \(\beta_0\), it tests

\[H_0(\beta_0): \exists \theta \in \mathbb{R}^q \text{ s.t. } \mathbb{E}[y - X^T \beta_0 - C^T \theta \mid Z, C] = 0,\]

and returns the confidence set

\[C_\alpha = \{\beta_0 \mid \mathrm{pval}(\beta_0) \geq \alpha\}.\]

Unlike residual_prediction_test(), this test does not require estimating \(\beta\) via TSLS and therefore remains valid under weak or many instruments. If \(C_\alpha\) is empty, the model is misspecified for all candidate values.

The function is only implemented for scalar \(X\) (\(p = 1\)). The confidence set boundaries are located by root-finding starting from the TSLS estimate of \(\beta\) on the full sample.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, 1)) – Endogenous regressor (must be scalar).
y (np.ndarray of dimension (n,)) – Outcomes.
C (np.ndarray of dimension (n, mc) or None, optional, default = None) – Included exogenous regressors.
alpha (float, optional, default = 0.05) – Significance level.
robust (bool, optional, default = False) – Whether to use the heteroskedasticity-robust variance estimator.
nonlinear_model (object, optional, default = None) – Object with a fit and predict method. If None, uses an sklearn.ensemble.RandomForestRegressor().
fit_intercept (bool, optional, default = True) – Whether to include an intercept. This is equivalent to centering the inputs.
train_fraction (float, optional, default = None) – Fraction of data used to train the nonlinear model. If None, uses \(\min(0.5, e / \log(n))\).
upper_clipping_quantile (float, optional, default = 0.9) – Asymptotic normality requires the nonlinear model’s prediction not to put too much weight in the tails. To avoid this, we clip its “test set” predictions by a certain threshold in absolute value. The threshold is the upper_clipping_quantile of predictions on the “train” data. Must be between 0 and 1.
gamma (float, optional, default = 0.05) – A non-negative scalar. Limits the minimum variance of the test statistic to gamma times the noise level.
seed (int, optional, default = 0) – Seed for the random train / test split.
tol (float, optional, default = 1e-6) – Tolerance for the root-finding algorithm used to locate the confidence set boundaries.
max_value (float, optional, default = 1e6) – Maximum absolute value of \(\beta_0\) to consider. If the confidence set boundary lies beyond this value, the boundary is reported as \(\pm \infty\).
max_eval (int, optional, default = 100) – Maximum number of evaluations of the test statistic for the root-finding algorithm.

Returns:

The confidence set \(C_\alpha\).

Return type:

ConfidenceSet

Raises:

ValueError: – If the dimensions of the inputs are incorrect.
ValueError: – If X has more than one column.
ValueError: – If train_fraction is not in (0, 1).
ValueError: – If nonlinear_model does not have a fit and predict method.

References

[SLBuhlmann25] (1,2,3,4,5)

Cyrill Scheidegger, Malte Londschien, and Peter Bühlmann. A residual prediction test for the well-specification of linear instrumental variable models. arXiv preprint arXiv:2506.12771, 2025.

ivmodels.tests.j_test(Z, X, y, C=None, fit_intercept=True, estimator='liml')

Perform the J-test for overidentifying restrictions.

This is also called Sargan–Hansen test or Sargan’s J test.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Endogenous regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
C (np.ndarray of dimension (n, mc) or None, optional, default = None) – Exogenous regressors.
fit_intercept (bool, optional, default = True) – Whether to include an intercept. This is equivalent to centering the inputs.
estimator (str, optional, default = 'liml') – Estimator to use. Passed to KClass.

Returns:

statistic (float) – The test statistic \(J\).
p_value (float) – The p-value of the test.

Raises:

ValueError: – If the dimensions of the inputs are incorrect.

ivmodels.tests.lagrange_multiplier_test(Z, X, y, beta, W=None, C=None, D=None, fit_intercept=True, **kwargs)

Perform the Lagrange multiplier test for beta by Kleibergen [2002].

Test the null hypothesis that the residuals are uncorrelated with the instruments. If W is None, let

\[\tilde X(\beta) := X - (y - X \beta) \frac{(y - X \beta) M_Z X}{(y - X \beta) M_Z (y - X \beta)}.\]

The test statistic is

\[\mathrm{LM}(\beta) := (n - k) \frac{\| P_{P_Z \tilde X(\beta)} (y - X \beta) \|_2^2}{\| M_Z (y - X \beta) \|_2^2}.\]

If W is not None, let

\[\tilde S(\beta, \gamma) := (X \ W) - (y - X \beta - W \gamma) \frac{(y - X \beta - W \gamma) M_Z (X \ W)}{(y - X \beta - W \gamma) M_Z (y - X \beta - W \gamma)}.\]

The test statistic is

\[\mathrm{LM}(\beta) := (n - k) \min_{\gamma} \frac{\| P_{P_Z \tilde S(\beta, \gamma)} (y - X \beta - W \gamma) \|_2^2}{\| M_Z (y - X \beta - W \gamma) \|_2^2}.\]

This test statistic is asymptotically distributed as \(\chi^2(m_X)\) under the null, even if the instruments are weak.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors of interest.
y (np.ndarray of dimension (n,)) – Outcomes.
beta (np.ndarray of dimension (mx + md,)) – Coefficients to test.
W (np.ndarray of dimension (n, mw) or None, optional, default=None) – Endogenous regressors not of interest.
C (np.ndarray of dimension (n, mc) or None, optional, default=None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default=None) – Exogenous regressors of interest.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept. This is equivalent to centering the inputs.

Returns:

statistic (float) – The test statistic \(\mathrm{LM}(\beta)\).
p_value (float) – The p-value of the test. Equal to \(1 - F_{\chi^2(m_X)}(\mathrm{LM}(\beta)\), where \(F_{\chi^2(m_X)}\) is the cumulative distribution function of the \(\chi^2(m_X)\) distribution.

Raises:

ValueError: – If the dimensions of the inputs are incorrect.

ivmodels.tests.likelihood_ratio_test(Z, X, y, beta, W=None, C=None, D=None, fit_intercept=True)

Perform the likelihood ratio test for beta.

If W is None, the test statistic is defined as

\[\begin{split}\mathrm{LR}(\beta) &:= (n - k) \frac{ \| P_Z (y - X \beta) \|_2^2}{ \| M_Z (y - X \beta) \|_2^2} - (n - k) \frac{ \| P_Z (y - X \hat\beta_\mathrm{LIML}) \|_2^2 }{ \| M_Z (y - X \hat\beta_\mathrm{LIML}) \|_2^2 } \\ &= k \ \mathrm{AR}(\beta) - k \ \mathrm{AR}(\hat\beta_\mathrm{LIML}),\end{split}\]

where \(P_Z\) is the projection matrix onto the column space of \(Z\), \(M_Z = \mathrm{Id} - P_Z\), and \(\hat\beta_\mathrm{LIML}\) is the LIML estimator of \(\beta\), minimizing the Anderson-Rubin test statistic \(\mathrm{AR}(\beta)\) (see anderson_rubin_test()) at \(\mathrm{AR}(\hat\beta_\mathrm{LIML}) = \frac{n - k}{k} (\hat\kappa_\mathrm{LIML} - 1)\).

If W is not None, the test statistic is defined as

\[\mathrm{LR(\beta)} := (n - k) \frac{ \|P_Z (y - X \beta - W \hat\gamma_\mathrm{LIML}) \|^2_2 }{\| M_Z (y - X \beta - W \hat\gamma_\mathrm{LIML}) \|_2^2 } - (n - k) \frac{\| P_Z (y - (X \ W) \hat\delta_\mathrm{LIML}) \|_2^2 }{ \| M_Z (y - (X \ W) \hat\delta_\mathrm{LIML}) \|_2^2}\]

where \(\gamma_\mathrm{LIML}\) is the LIML estimator (see KClass) using instruments \(Z\), endogenous covariates \(W\), and outcomes \(y - X \beta\) and \(\hat\delta_\mathrm{LIML}\) is the LIML estimator using instruments \(Z\), endogenous covariates \(X \ W\), and outcomes \(y\).

Under the null and given strong instruments, the test statistic is asymptotically distributed as \(\chi^2(m_X)\), where \(m_X\) is the number of regressors.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
beta (np.ndarray of dimension (mx + md,)) – Coefficients to test.
W (np.ndarray of dimension (n, mw) or None, optional, default=None) – Endogenous regressors not of interest.
C (np.ndarray of dimension (n, mc) or None, optional, default=None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default=None) – Exogenous regressors of interest.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept. This is equivalent to centering the inputs.

Returns:

statistic (float) – The test statistic \(\mathrm{LR}(\beta)\).
p_value (float) – The p-value of the test. Equal to \(1 - F_{\chi^2(m_X)}(\mathrm{LR}(\beta)\), where \(F_{\chi^2(m_X)}\) is the cumulative distribution function of the \(\chi^2(m_X)\) distribution.

Raises:

ValueError: – If the dimensions of the inputs are incorrect.

ivmodels.tests.pulse_test(Z, X, y, beta, W=None, C=None, D=None, fit_intercept=True)

Test proposed by Jakobsen and Peters [2022] with null hypothesis: \(Z\) and \(y - X \beta\) are uncorrelated.

The test statistic is defined as

\[T := n \frac{\| P_Z (y - X \beta) \|_2^2}{\| (y - X \beta) \|_2^2}.\]

Under the null, \(T\) is asymptotically distributed as \(\chi^2(k)\). See Section 3.2 of [Jakobsen and Peters, 2022] for details.

If D is not None, it is added to both the instruments \(Z\) and the regressors \(X\), such that \(T\) is asymptotically distributed as \(\chi^2(k + m_D)\) under the null.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
beta (np.ndarray of dimension (mx + md,)) – Coefficients to test.
W (np.ndarray of dimension (n, mw) or None, optional, default=None) – Must be None or mw must be 0. No subvector variant of the test is implemented.
C (np.ndarray of dimension (n, mc) or None, optional, default=None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None, optional, default=None) – Exogenous regressors of interest.
fit_intercept (bool, optional, default=True) – Whether to fit an intercept. This is equivalent to centering the inputs.

Returns:

statistic (float) – The test statistic \(T\).
p_value (float) – The p-value of the test. Equal to \(1 - F_{\chi^2(k + m_D)}(T)\), where \(F_{\chi^2(k + m_D)}\) is the cumulative distribution function of the \(\chi^2(k + m_D)\) distribution.

Raises:

ValueError: – If the dimensions of the inputs are incorrect.

References

[JP22] (1,2,3)

Martin Emil Jakobsen and Jonas Peters. Distributional robustness of k-class estimators and the PULSE. The Econometrics Journal, 25(2):404–432, 2022.

ivmodels.tests.rank_test(Z, X, C=None, fit_intercept=True)

Perform the Cragg-Donald test for reduced rank [Cragg and Donald, 1997].

Let \(X = Z \Pi + V\) with \(\Pi \in \mathbb{R}^{k \times m_X}\). The Cragg-Donald test is the Wald test with the null hypothesis

\[H_0 := \mathrm{rank}(\Pi) < m_X,\]

The test statistic is

\[\mathrm{CD} := \lambda := (n-k) \lambda_\mathrm{min}((X^T M_Z X)^{-1} X^T P_Z X)\]

where \(P_Z = Z (Z^T Z)^{-1} Z^T\) is the orthogonal projection onto the column space of \(Z\), \(M_Z = I - P_Z\) is the orthogonal projection onto the orthogonal complement of the column space of \(Z\), and \(\lambda_\mathrm{min}\) is the smallest eigenvalue. Under the null hypothesis, \(\mathrm{CD}\) is asymptotically distributed as \(\chi^2(k - m_X + 1)\).

The Cragg-Donald test is asymptotically equivalent to Anderson [1951]’s likelihood ratio test for reduced rank of \(\Pi\).

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
C (np.ndarray of dimension (n, mc) or None, optional, default=None) – Exogenous regressors not of interest.
fit_intercept (bool) – Whether to fit an intercept.

Returns:

statistic (float) – The test statistic \(\mathrm{CD}\).
p_value (float) – The p-value of the test. Equal to \(1 - F_{\chi^2(k - m_X + 1)}(\mathrm{CD})\), where \(F_{\chi^2(k - m_X + 1)}\) is the cumulative distribution function of the \(\chi^2(k - m_X + 1)\) distribution.

References

[And51]

Theodore Wilbur Anderson. Estimating linear restrictions on regression coefficients for multivariate normal distributions. The Annals of Mathematical Statistics, pages 327–351, 1951.

[CD97]

John G Cragg and Stephen G Donald. Inferring the rank of a matrix. Journal of Econometrics, 76(1-2):223–250, 1997.

ivmodels.tests.residual_prediction_test(Z, X, y, C=None, robust=False, nonlinear_model=None, fit_intercept=True, train_fraction=None, upper_clipping_quantile=0.9, gamma=0.05, seed=0)

Perform the residual prediction test [Scheidegger et al., 2025] for model specification.

This uses a nonlinear model to test the well specification of an IV model: “Is the linear IV model appropriate for the data”? Formally, the null hypothesis is:

\[H_0: \exists \beta_0 \in \mathbb{R}^p \mathrm{\ such \ that \ } \mathbb{E}[y - X \beta | Z] = 0.\]

The tests splits the data according to train_fraction into \(y_a, X_a, Z_a\) and \(y_b, X_b, Z_b\) and fits a nonlinear model regressing \(\hat{\varepsilon}_a \sim Z_a\) on the residuals \(\hat{\varepsilon}_a := y_a - X_a \hat \beta_a\) of a two-stage least-squares (TSLS) estimator \(\hat \beta_a: y_a \sim X_a | Z_a\) on the “train” data \(y_a, X_a, Z_a\). The fitted nonlinear model is then used to predict the residuals on the “test” data \(y_b, X_b, Z_b\), yielding weights \(\hat w_b := \mathrm{nonlinear\_model}(Z_b)\). Let \(\hat \varepsilon_b := y_b - X_b \hat \beta_b\) be the residuals of a TSLS estimator \(\hat \beta_b: y_b \sim X_b | Z_b\) on the “test” data and \(\hat \sigma^2\) be an estimate of the variance of \(\hat w_b \cdot \hat \varepsilon_b\) under the null hypothesis. The test statistic is

\[T = \frac{1}{\sqrt{n_b}} \frac{w_b^T \hat \varepsilon_b}{\sqrt{\hat \sigma^2}}.\]

This is asymptotically standard Gaussian distributed under the null.

See also the test’s R implementation by Cyrill Scheidegger.

To avoid the \(p\)-value lottery due to the random train / test split used in the residual prediction test, Scheidegger et al. [2025] suggest aggregating the \(p\)-values from multiple random splits by taking 2 times the median. This results in a conservative \(p\)-value Meinshausen et al. [2009].

Example

>>> import numpy as np
>>> from ivmodels.tests import residual_prediction_test
>>> from ivmodels.simulate import simulate_gaussian_iv
>>>
>>> Z, X, y = ...
>>>
>>> ps = np.empty(50)
>>> for i in range(50):
...     _, ps[i] = residual_prediction_test(Z, X, y, seed=i)
>>>
>>> print(f"Residual prediction test p-value: {2 * np.median(ps):.3f}")

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
C (np.ndarray of dimension (n, mc) or None, optional, default = None) – Included exogenous regressors.
robust (bool or string, optional, default = False) – Whether to use a heteroskedasticity-robust method to estimate the noise level \(\hat \sigma^2\).
nonlinear_model (object, optional, default = None) – Object with a fit and predict method. If None, uses an sklearn.ensemble.RandomForestRegressor().
fit_intercept (bool, optional, default = True) – Whether to include an intercept. This is equivalent to centering the inputs.
train_fraction (float, optional, default = None) – Fraction of data to use to train the nonlinear model. Must be between 0 and 1. The remaining data is used to compute the test statistic. If None, 0.5 or \(e / \log(n)\) is used, whichever is smaller.
upper_clipping_quantile (float, optional, default = 0.9) – Asymptotic normality requires the nonlinear model’s prediction not to put too much weight in the tails. To avoid this, we clip its “test set” predictions by a certail threshold in absolute value. The threshold is the upper_clipping_quantile of predictions on the “train” data. Must be between 0 and 1.
gamma (float, optional, default = 0.05) – A non-negative scalar. Limits the minimum variance of the test statistic to gamma times the noise level.
seed (int, optional, default = 0) – Seed used to generate the random train / test split.

Returns:

statistic (float) – The test statistic \(\mathrm{T}\).
p_value (float) – The p-value of the test.

Raises:

ValueError: – If the dimensions of the inputs are incorrect.
ValueError: – If train_fraction is not in (0, 1).
ValueError: – If nonlinear_model does not have a fit and predict method.

References

[MMBuhlmann09]

Nicolai Meinshausen, Lukas Meier, and Peter Bühlmann. P-values for high-dimensional regression. Journal of the American Statistical Association, 104(488):1671–1681, 2009.

[SLBuhlmann25] (1,2,3,4,5)

Cyrill Scheidegger, Malte Londschien, and Peter Bühlmann. A residual prediction test for the well-specification of linear instrumental variable models. arXiv preprint arXiv:2506.12771, 2025.

ivmodels.tests.wald_test(Z, X, y, beta, W=None, C=None, D=None, estimator='tsls', fit_intercept=True, robust=False)

Test based on asymptotic normality of the TSLS (or LIML) estimator.

If W is None, the test statistic is defined as

\[\mathrm{Wald}(\beta) := (\beta - \hat{\beta})^T X^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) X (\beta - \hat{\beta}) / \hat{\sigma}^2,\]

where \(\hat \beta = \hat \beta(\kappa)\) is a k-class estimator with \(\sqrt{n} (1 - \kappa) \to 0\), \(\hat \sigma^2 = \frac{1}{n - m_X} \| y - X \hat \beta \|^2_2\) is an estimate of the variance of the errors, \(P_Z\) is the projection matrix onto the column space of \(Z\), and \(M_Z = \mathrm{Id} - P_Z\). Under strong instruments, the test statistic is asymptotically distributed as \(\chi^2(m_X)\) under the null.

If W is not None, the test statistic is defined as

\[\mathrm{Wald}(\beta) := (\beta - B \hat{\beta})^T (B ( (X W)^T (\kappa P_Z + (1 - \kappa) \mathrm{Id}) (X W) )^{-1} B^T)^{-1} (\beta - B \hat{\beta}) / \hat{\sigma}^2,\]

where \(B \in \mathbb{R}^{m_X \times (m_X + m_W)}\) is a diagonal matrix with 1 on the diagonal and 0 elsewhere.

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
W (np.ndarray of dimension (n, mw) or None) – Endogenous regressors not of interest.
C (np.ndarray of dimension (n, mc) or None) – Exogenous regressors not of interest.
D (np.ndarray of dimension (n, md) or None) – Exogenous regressors of interest.
beta (np.ndarray of dimension (mx + md,)) – Coefficients to test.
estimator (str or float, optional, default = "tsls") – Estimator to use. Passed to kappa argument of KClass.
fit_intercept (bool, optional, default = True) – Whether to include an intercept. The intercept will be included both in the complete and the (restricted) model. Including an intercept is equivalent to centering the columns of all design matrices.
robust (bool, optional, default = False) – Whether to use a robust variance estimator. If True, the sandwich estimator is used.

Returns:

statistic (float) – The test statistic \(\mathrm{Wald}\).
p_value (float) – The p-value of the test. Equal to \(1 - F_{\chi^2(m_X)}(Wald)\), where \(F_{\chi^2(m_X)}\) is the cumulative distribution function of the \(\chi^2(m_X)\) distribution.

Raises:

ValueError: – If the dimensions of the inputs are incorrect.

ivmodels.tests.weak_residual_prediction_test(Z, X, y, beta, C=None, robust=False, nonlinear_model=None, fit_intercept=True, train_fraction=None, upper_clipping_quantile=0.9, gamma=0.05, seed=0)

Perform the weak-IV-robust residual prediction test at a fixed beta [Scheidegger et al., 2025].

Unlike residual_prediction_test(), this test does not estimate \(\beta\) via TSLS. Instead, it tests the null hypothesis

\[H_0(\beta_0): \exists \theta \in \mathbb{R}^q \text{ s.t. } \mathbb{E}[y - X^T \beta_0 - C^T \theta \mid Z, C] = 0\]

at the supplied value \(\beta_0\). The test statistic is asymptotically standard Gaussian under the null and remains valid under weak or many instruments.

See also the test’s R implementation by Cyrill Scheidegger (weak_RPIV_test).

Parameters:

Z (np.ndarray of dimension (n, k)) – Instruments.
X (np.ndarray of dimension (n, mx)) – Regressors.
y (np.ndarray of dimension (n,)) – Outcomes.
beta (np.ndarray of dimension (mx,)) – Coefficients to test.
C (np.ndarray of dimension (n, mc) or None, optional, default = None) – Included exogenous regressors.
robust (bool, optional, default = False) – Whether to use the heteroskedasticity-robust variance estimator.
nonlinear_model (object, optional, default = None) – Object with a fit and predict method. If None, uses an sklearn.ensemble.RandomForestRegressor().
fit_intercept (bool, optional, default = True) – Whether to include an intercept. This is equivalent to centering the inputs.
train_fraction (float, optional, default = None) – Fraction of data used to train the nonlinear model. If None, uses \(\min(0.5, e / \log(n))\).
upper_clipping_quantile (float, optional, default = 0.9) – Asymptotic normality requires the nonlinear model’s prediction not to put too much weight in the tails. To avoid this, we clip its “test set” predictions by a certain threshold in absolute value. The threshold is the upper_clipping_quantile of predictions on the “train” data. Must be between 0 and 1.
gamma (float, optional, default = 0.05) – A non-negative scalar. Limits the minimum variance of the test statistic to gamma times the noise level.
seed (int, optional, default = 0) – Seed for the random train / test split.

Returns:

statistic (float) – The test statistic.
p_value (float) – The p-value of the test.

Raises:

ValueError: – If the dimensions of the inputs are incorrect.
ValueError: – If train_fraction is not in (0, 1).
ValueError: – If nonlinear_model does not have a fit and predict method.

References

[SLBuhlmann25] (1,2,3,4,5)

Cyrill Scheidegger, Malte Londschien, and Peter Bühlmann. A residual prediction test for the well-specification of linear instrumental variable models. arXiv preprint arXiv:2506.12771, 2025.

Summary

class ivmodels.summary.Summary(kclass, test, alpha, feature_names=None)

Class containing summary statistics for a fitted model.

Parameters:

kclass (ivmodels.KClass or child class of ivmodels.models.kclass.KClassMixin) – Fitted model.
test (str) – Name of the test to be used. One of "wald", "anderson-rubin", "lagrange multiplier", "likelihood-ratio", or "conditional likelihood-ratio".
alpha (float) – Significance level \(\alpha\) for the confidence sets, e.g., 0.05. The confidence of the confidence set will be \(1 - \alpha\)
feature_names (list of str, optional) – Names of the features to be included in the summary. If not specified, all features will be included.

coefficient_table_

Table containing the estimates, test statistics, p-values, and confidence sets for each feature.

Type:: CoefficientTable

statistic_

Test statistic with null hypothesis that coefficients corresponding to the endogenous regressors are jointly zero.

Type:: float

p_value_

P-value of the test statistic.

Type:: float

f_statistic_

F-statistic (or multivariate extension, see rank_test()) with null hypothesis that the first-stage coefficient is of reduced rank.

Type:: float

f_p_value_

P-value of the F-statistic (or multivariate extension).

Type:: float

fit(X, y, Z=None, C=None, *args, **kwargs)

Fit a summary.

If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names and Z must be None. At least one one of Z, instrument_names, and instrument_regex must be specified. If exogenous_names or exogenous_regex are specified, X must be a pandas DataFrame containing columns exogenous_names and C must be None.

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples. If instrument_names or instrument_regex are specified, X must be a pandas DataFrame containing columns instrument_names.
y (array-like, shape (n_samples,)) – The target values.
Z (array-like, shape (n_samples, n_instruments), optional) – The instrument values. If instrument_names or instrument_regex are specified, Z must be None. If Z is specified, instrument_names and instrument_regex must be None.
C (array-like, shape (n_samples, n_exogenous), optional) – The exogenous regressors. If exogenous_names or exogenous_regex are specified, C must be None. If C is specified, exogenous_names and exogenous_regex must be None.

class ivmodels.summary.CoefficientTable(feature_names, estimates, statistics, p_values, confidence_sets)

Table with estimates, statistics, p-values, and confidence sets for each feature.

Parameters:

feature_names (list of str) – Names of the features.
estimates (list of float) – Estimates of the coefficients.
statistics (list of float) – Test statistics.
p_values (list of float) – P-values of the test statistics.
confidence_sets (list of ivmodels.confidence_set.ConfidenceSet) – Confidence sets for the coefficients.

ConfidenceSet

class ivmodels.confidence_set.ConfidenceSet(boundaries)

A class to represent a 1D confidence set.

Parameters:: boundaries (list of 2-tuples of floats.) – The boundaries of the confidence set. The confidence set is the union of the intervals defined by the boundaries.

static from_quadric(quadric): Create a 1D confidence set from a quadric.

is_empty(): Return True if the confidence set is empty.

is_finite(): Return True if the confidence set is finite.

length(): Return the length of the confidence set.

Quadric

class ivmodels.quadric.Quadric(A, b, c)

A class to represent a quadric \(x^T A x + b^T x + c \leq 0\).

Internally, works with a standardized form of the quadric. If \(A = V D V^T\) with \(D\) diagonal and \(V\) orthonormal, define \(x_\mathrm{center} := -A^{-1} b / 2\), \(\tilde x = V^T (x - x_\mathrm{center})\) and \(\tilde c = c - x_\mathrm{center}^T A x_\mathrm{center}\). Then, the standardized form is given by \(\tilde x^T D \tilde x + \tilde c <= 0\).

Parameters:

A (np.ndarray of dimension (n, n)) – The matrix A of the quadratic form.
b (np.ndarray of dimension (n,)) – The vector b of the quadratic form.
c (float) – The constant c of the quadratic form.

center

The center of the quadric. Equal to \(-A^{-1} b / 2\).

Type:: np.ndarray of dimension (n,)

c_standardized

The constant c of the standardized quadric. Equal to \(c - x_\mathrm{center}^T A x_\mathrm{center}\).

Type:: float

D

The diagonal of the matrix \(D\) in the eigenvalue decomposition \(A = V D V^T\).

Type:: np.ndarray of dimension (n,)

V

The matrix \(V\) in the eigenvalue decomposition \(A = V D V^T\).

Type:: np.ndarray of dimension (n, n)

dim(): Return the dimension of the quadric.

forward_map(x_tilde): Map from the standardized space to the original space.

inverse_map(x): Map from the original space to the standardized space.

is_bounded(): Return True if the quadric is bounded.

is_empty(): Return True if the quadric is empty.

project(coordinates)

Return the projection of the quadric onto coordinates.

For a quadric \((x - x_\mathrm{center})^T A (x - x_\mathrm{center}) + c \leq 0\) and any matrix \(B \in \mathbb{R}^{q \times p}\) of rank \(q\), the projection of the quadric onto the coordinates given by the columns of \(B\) is given by

\[(Bx - Bx_\mathrm{center})^T (B A^{-1} B^T)^{-1} (Bx - Bx_\mathrm{center}) + c \leq 0.\]

Here, \(B\) is given by coordinates, with \(B_{i, j} = 1\) if coordinates[i] == j and \(B_{i, j} = 0\) otherwise for 0-indexed \(i = 0, \ldots, q - 1\) and \(j = 0, \ldots, p - 1\).

Parameters:: coordinates (list of int) – The coordinates onto which to project the quadric. Entries must be unique and be between 0 and p - 1.
Returns:: The projection of the quadric onto the coordinates.
Return type:: Quadric

volume(): Return the volume of the quadric.

Bibliography

[AR49]

Theodore W Anderson and Herman Rubin. Estimation of the parameters of a single equation in a complete system of stochastic equations. The Annals of mathematical statistics, 20(1):46–63, 1949.

[And51]

Theodore Wilbur Anderson. Estimating linear restrictions on regression coefficients for multivariate normal distributions. The Annals of Mathematical Statistics, pages 327–351, 1951.

[CD97]

John G Cragg and Stephen G Donald. Inferring the rank of a matrix. Journal of Econometrics, 76(1-2):223–250, 1997.

[Ful77]

Wayne A Fuller. Some properties of a modification of the limited information estimator. Econometrica: Journal of the Econometric Society, pages 939–953, 1977.

[GKM19] (1,2)

Patrik Guggenberger, Frank Kleibergen, and Sophocles Mavroeidis. A more powerful subvector Anderson Rubin test in linear instrumental variables regression. Quantitative Economics, 10(2):487–526, 2019.

[GKMC12]

Patrik Guggenberger, Frank Kleibergen, Sophocles Mavroeidis, and Linchun Chen. On the asymptotic sizes of subset Anderson–Rubin and Lagrange multiplier tests in linear instrumental variables regression. Econometrica, 80(6):2649–2666, 2012.

[JP22] (1,2,3)

Martin Emil Jakobsen and Jonas Peters. Distributional robustness of k-class estimators and the PULSE. The Econometrics Journal, 25(2):404–432, 2022.

[Kle02]

Frank Kleibergen. Pivotal statistics for testing structural parameters in instrumental variables regression. Econometrica, 70(5):1781–1803, 2002.

[Lon25]

Malte Londschien. The exact distribution of the conditional likelihood-ratio test in instrumental variables regression. arXiv preprint, 2025.

[MMBuhlmann09]

Nicolai Meinshausen, Lukas Meier, and Peter Bühlmann. P-values for high-dimensional regression. Journal of the American Statistical Association, 104(488):1671–1681, 2009.

[RMBP21] (1,2)

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. Anchor regression: heterogeneous data meet causality. Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021.

[SLBuhlmann25] (1,2,3,4,5)

Cyrill Scheidegger, Malte Londschien, and Peter Bühlmann. A residual prediction test for the well-specification of linear instrumental variable models. arXiv preprint arXiv:2506.12771, 2025.