--- title: "Using ivcheck with fixest" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using ivcheck with fixest} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "figures/with-fixest-" ) set.seed(1) ``` ## Why this vignette matters `fixest` is the dominant R package for applied IV estimation. This vignette shows the drop-in integration: fit an IV model with `feols()`, pass it to `iv_check()`, and get every applicable IV-validity test in one call. If you are already using `fixest` for your paper, nothing about your workflow changes. Add one line and your IV estimate now comes with a published falsification test. ## Setup ```{r, message = FALSE} library(ivcheck) library(fixest) ``` ## The Card (1995) proximity-to-college IV Card's (1995) classic IV for the return to schooling uses proximity to a four-year college as an instrument for completed schooling. The bundled `card1995` dataset is a cleaned extract from the National Longitudinal Survey of Young Men. ```{r} data(card1995) head(card1995[, c("lwage", "educ", "college", "near_college", "age", "black", "south")]) ``` Two variants are included: the continuous `educ` (years of schooling) and a binary `college` indicator (`educ >= 16`) for use with tests that require a binary treatment. ## The unconditional case: no exogenous controls We start with the simplest specification: no exogenous controls in the structural equation. This is the case `iv_kitagawa()` is designed for. ```{r} m_uncond <- feols( lwage ~ 1 | college ~ near_college, data = card1995 ) summary(m_uncond) ``` ```{r} chk <- iv_check(m_uncond, n_boot = 500, parallel = FALSE) print(chk) ``` `iv_check()` inspects the model, detects that `college` is binary and `near_college` is a discrete instrument, and runs Kitagawa (2015) and Mourifie-Wan (2017). With no covariates the two are numerically identical (Mourifie-Wan reduces exactly to the variance-weighted Kitagawa test, unit-tested). ### Bootstrap distribution ```{r, fig.width = 6, fig.height = 4} k <- iv_kitagawa(m_uncond, n_boot = 500, parallel = FALSE) hist(k$boot_stats, breaks = 40, main = "Kitagawa bootstrap distribution (Card 1995, no controls)", xlab = "sqrt(n) * positive-part KS") abline(v = k$statistic, col = "red", lwd = 2) ``` ## The conditional case: one exogenous control Card's identification strategy is more naturally read as "valid *conditional on* demographic and regional controls". In that setting the right test is Mourifie and Wan's (2017) conditional version: same testable family of inequalities as Kitagawa, but the conditional CDFs are estimated by series regression on `X` rather than treated as unconditional. In `ivcheck` v0.1.2, the conditional path supports a single covariate (multivariate via tensor-product basis is planned for v0.2.0). ```{r} m_cond <- feols( lwage ~ age | college ~ near_college, data = card1995 ) ``` `iv_kitagawa()` is strictly the unconditional test and refuses fitted models that carry controls: ```{r, error = TRUE} iv_kitagawa(m_cond, n_boot = 100, parallel = FALSE) ``` The conditional test is `iv_mw()`. Dispatched on the same fitted model, it picks up the single covariate automatically and runs the Chernozhukov-Lee-Rosen series-regression test with Andrews-Soares adaptive moment selection: ```{r} mw <- iv_mw(m_cond, n_boot = 200, parallel = FALSE) print(mw) ``` `iv_check()` does the right thing automatically: it detects that the model has controls, drops Kitagawa from the applicable list with an informational message, and reports MW alone: ```{r} iv_check(m_cond, n_boot = 200, parallel = FALSE) ``` ## Multivariate controls When the structural equation carries more than one exogenous control, `iv_mw()` in v0.1.2 does not yet support the multivariate conditioning required for a valid conditional test. Both Kitagawa and MW are skipped, and `iv_check()` reports back with informational messages: ```{r, error = TRUE} m_multi <- feols( lwage ~ age + black + south | college ~ near_college, data = card1995 ) iv_check(m_multi, n_boot = 100, parallel = FALSE) ``` Multivariate conditioning via tensor-product series basis is planned for v0.2.0. Until then, two workarounds: 1. **Reduce `X` to a single index.** Fit a propensity score `Pr(D = 1 | X)` or another scalar summary of the controls, then refit the IV model with that index as the single exogenous control and call `iv_mw()` on the result. 2. **Stratify and run unconditional tests within strata.** Coarsely bin `X` and call `iv_kitagawa()` on raw vectors within each cell, applying a Bonferroni adjustment across cells. ## Combining with `modelsummary` If you have `modelsummary` installed, `iv_check` results are picked up automatically through `broom::glance` registered on package load. This lets you put a validity p-value directly in a regression table footer: ```{r, eval = FALSE} library(modelsummary) modelsummary( list("IV estimate" = m_cond), gof_custom = list( "Mourifie-Wan 2017 p-value" = sprintf("%.3f", mw$p_value) ) ) ``` ## The full workflow In your paper's replication code: ```{r, eval = FALSE} library(fixest) library(ivcheck) # ... data loading ... # IV estimate (conditional on a single control) m <- feols(y ~ x | d ~ z, data = df) # IV validity diagnostic chk <- iv_check(m) # Report both in the paper knitr::kable(chk$table) ``` Three lines of code, a falsification test the referee is almost guaranteed to ask about, and a citation-ready result. That is the whole point of `ivcheck`. ## References Card, D. (1995). Using Geographic Variation in College Proximity to Estimate the Return to Schooling. Kitagawa, T. (2015). A Test for Instrument Validity. *Econometrica* 83(5): 2043-2063. Mourifie, I. and Wan, Y. (2017). Testing Local Average Treatment Effect Assumptions. *Review of Economics and Statistics* 99(2): 305-313.