Bahar–Hausmann Regressions: Analysis & Errors
- Bahar–Hausmann regressions are an empirical approach aimed at linking Venezuelan oil revenues to migration flows using cointegration techniques.
- The method incorrectly applies the Engle–Granger test to first-differenced, rather than level, data, resulting in a 100% false positive rate in simulations.
- Monte Carlo evidence and theoretical analysis underscore the importance of proper integration pre-testing and specification to ensure valid long-run inference.
Bahar–Hausmann regressions refer to a class of empirical strategy employed by Bahar and Hausmann to investigate long-run relationships between Venezuelan oil revenues and U.S. border encounters of Venezuelan nationals. The approach involves the application of cointegration techniques, specifically the Engle–Granger two-step method, to assess whether oil income and migration are linked in the long run. However, the defining feature—and principal flaw—of the Bahar–Hausmann implementation is the misapplication of the Engle–Granger test to first differences of the variables rather than their levels, an error that has significant implications for statistical inference and interpretation (Rodríguez et al., 24 Dec 2025).
1. Bahar–Hausmann Regression Framework
The empirical approach in question is motivated by a narrative that oil sanctions, by reducing Venezuelan oil income, could be expected to impact migration outflows, measurable via U.S. border encounters. The appropriate econometric specification for capturing a long-run equilibrium relationship in this context would involve a cointegrating regression in the levels (typically log-transformed) of the two series. For example,
where denotes monthly U.S. border encounters of Venezuelan nationals and denotes Venezuelan oil revenues. The presence of cointegration is associated with a stationary (I(0)) residual . An alternative but complementary approach considers year-over-year changes (first differences) within an ARDL or error-correction structure, retaining the error-correction term (ECT) from the levels regression. However, Bahar and Hausmann diverged from these conventions by conducting cointegration testing on first differences of unlogged series, i.e., on and , rather than on their levels (Rodríguez et al., 24 Dec 2025).
2. The Engle–Granger Cointegration Test: Foundations and Correct Specification
The Engle–Granger (1987) method is the standard residual-based approach for testing cointegration between two potentially nonstationary (I(1)) time series. Its canonical implementation consists of two steps:
- Regression in Levels: Estimate by OLS. Under cointegration, is stationary (I(0)); otherwise, it remains I(1).
- ADF Test on Residuals: Conduct a Dickey–Fuller-type test on the residuals:
A statistically significant test statistic—below the MacKinnon critical value—provides evidence against the null hypothesis of a unit root, indicating cointegration in the levels of and .
3. Misspecification in Bahar–Hausmann Procedure
Bahar and Hausmann applied the Engle–Granger procedure not to the levels but to the first differences of the series. Explicitly, they estimated
and then performed an ADF test on in
Since, by construction, if and are I(1), then and are I(0), the residual from this regression is I(0) under very general conditions. The ADF will almost always reject the null of a unit root, wrongly indicating cointegration. This diagnostic error virtually ensures a spurious "finding" of cointegration—evidence for a relationship in first differences, not in the original nonstationary series (Rodríguez et al., 24 Dec 2025).
4. Monte Carlo Evidence on Misspecification
Monte Carlo simulations conducted by Rodríguez and Bravo provide quantitative evidence of the flaw inherent in the Bahar–Hausmann approach. The simulations generate 1,000 replications using independent random walks and —each I(1), with no cointegration by construction:
- The correct Engle–Granger test, applied to levels, rejects the null of no cointegration in approximately 5.3% of cases at nominal size (consistent with Type I error).
- The Bahar–Hausmann misspecified test, on first differences, rejects in 100% of cases—demonstrating a complete lack of size control and a 100% false positive rate.
This empirically confirms that the misspecified approach will systematically result in spurious inference (Rodríguez et al., 24 Dec 2025).
| Test | Empirical Reject Rate | Interpretation |
|---|---|---|
| Engle–Granger (levels, correct) | 5.3% | Appropriate size |
| Bahar–Hausmann (differences, miss) | 100% | Systematic false positive |
5. Theoretical Rationale for the Flaw
If and are I(1), their first differences , are I(0). OLS regression of on yields residuals that are also I(0) due to properties of linear combinations of stationary series. As a result, ADF tests on will virtually always reject the null of a unit root, not because of any long-run cointegrating relationship, but simply as an artifact of testing on stationary (I(0)) data. Rejecting the unit root null in this context therefore conveys nothing about the cointegration properties of the levels, resulting in entirely spurious inference (Rodríguez et al., 24 Dec 2025).
6. Best-Practice Guidelines for Cointegration Analysis
To ensure valid inference in Bahar–Hausmann-type applications or similar cointegration studies, the following procedural guidelines are foundational:
- Integration Pre-testing: Establish the order of integration for each series using (A)DF tests on both levels and differences. Only proceed to cointegration testing if both series are convincingly I(1).
- Cointegration in Levels: Apply Engle–Granger or Johansen tests on levels (log- or level-transformed data as dictated by the theory).
- Critical Value Selection: Use Engle–Granger MacKinnon critical values; do not substitute standard t-tables.
- Long-run Estimation: If cointegration is found, estimate long-run coefficients using dynamic OLS, FM-OLS, or comparable methods to mitigate endogeneity and serial correlation.
- Error-correction Models: Build ARDL or error-correction models in differences, always incorporating the lagged cointegrating residual as the error-correction term (ECT).
- Transformation Consistency: Align the transformation (log/level/differences) between cointegration tests and subsequent regressions; mismatch leads to inconsistent inference.
- Seasonal and Structural Breaks: Include dummies for structural or seasonal effects directly in the cointegrating regression when warranted by data characteristics (Rodríguez et al., 24 Dec 2025).
7. Implications and Controversies
The central controversy arising from the Bahar–Hausmann regressions pertains to the misapplication of cointegration tests, resulting in invalid inferences regarding the long-run relationship between migration flows and oil revenues. The 100% spurious rejection rate demonstrated both theoretically and empirically implies that any findings derived from their procedure offer no valid basis for inference. The episode underscores the necessity of rigorous adherence to the statistical theory underlying cointegration and the dangers of specification error, particularly regarding the proper dimension (levels versus differences) upon which such relationships are to be tested (Rodríguez et al., 24 Dec 2025).