---
title: "Using ivcheck with fixest"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using ivcheck with fixest}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/with-fixest-"
)
set.seed(1)
```

## Why this vignette matters

`fixest` is the dominant R package for applied IV estimation. This vignette shows the drop-in integration: fit an IV model with `feols()`, pass it to `iv_check()`, and get every applicable IV-validity test in one call.

If you are already using `fixest` for your paper, nothing about your workflow changes. Add one line and your IV estimate now comes with a published falsification test.

## Setup

```{r, message = FALSE}
library(ivcheck)
library(fixest)
```

## The Card (1995) proximity-to-college IV

Card's (1995) classic IV for the return to schooling uses proximity to a four-year college as an instrument for completed schooling. The bundled `card1995` dataset is a cleaned extract from the National Longitudinal Survey of Young Men.

```{r}
data(card1995)
head(card1995[, c("lwage", "educ", "college", "near_college",
                  "age", "black", "south")])
```

Two variants are included: the continuous `educ` (years of schooling) and a binary `college` indicator (`educ >= 16`) for use with tests that require a binary treatment.

## The unconditional case: no exogenous controls

We start with the simplest specification: no exogenous controls in the structural equation. This is the case `iv_kitagawa()` is designed for.

```{r}
m_uncond <- feols(
  lwage ~ 1 | college ~ near_college,
  data = card1995
)
summary(m_uncond)
```

```{r}
chk <- iv_check(m_uncond, n_boot = 500, parallel = FALSE)
print(chk)
```

`iv_check()` inspects the model, detects that `college` is binary and `near_college` is a discrete instrument, and runs Kitagawa (2015) and Mourifie-Wan (2017). With no covariates the two are numerically identical (Mourifie-Wan reduces exactly to the variance-weighted Kitagawa test, unit-tested).

### Bootstrap distribution

```{r, fig.width = 6, fig.height = 4}
k <- iv_kitagawa(m_uncond, n_boot = 500, parallel = FALSE)
hist(k$boot_stats, breaks = 40,
     main = "Kitagawa bootstrap distribution (Card 1995, no controls)",
     xlab = "sqrt(n) * positive-part KS")
abline(v = k$statistic, col = "red", lwd = 2)
```

## The conditional case: one exogenous control

Card's identification strategy is more naturally read as "valid *conditional on* demographic and regional controls". In that setting the right test is Mourifie and Wan's (2017) conditional version: same testable family of inequalities as Kitagawa, but the conditional CDFs are estimated by series regression on `X` rather than treated as unconditional.

In `ivcheck` v0.1.2, the conditional path supports a single covariate (multivariate via tensor-product basis is planned for v0.2.0).

```{r}
m_cond <- feols(
  lwage ~ age | college ~ near_college,
  data = card1995
)
```

`iv_kitagawa()` is strictly the unconditional test and refuses fitted models that carry controls:

```{r, error = TRUE}
iv_kitagawa(m_cond, n_boot = 100, parallel = FALSE)
```

The conditional test is `iv_mw()`. Dispatched on the same fitted model, it picks up the single covariate automatically and runs the Chernozhukov-Lee-Rosen series-regression test with Andrews-Soares adaptive moment selection:

```{r}
mw <- iv_mw(m_cond, n_boot = 200, parallel = FALSE)
print(mw)
```

`iv_check()` does the right thing automatically: it detects that the model has controls, drops Kitagawa from the applicable list with an informational message, and reports MW alone:

```{r}
iv_check(m_cond, n_boot = 200, parallel = FALSE)
```

## Multivariate controls

When the structural equation carries more than one exogenous control, `iv_mw()` in v0.1.2 does not yet support the multivariate conditioning required for a valid conditional test. Both Kitagawa and MW are skipped, and `iv_check()` reports back with informational messages:

```{r, error = TRUE}
m_multi <- feols(
  lwage ~ age + black + south | college ~ near_college,
  data = card1995
)
iv_check(m_multi, n_boot = 100, parallel = FALSE)
```

Multivariate conditioning via tensor-product series basis is planned for v0.2.0. Until then, two workarounds:

1. **Reduce `X` to a single index.** Fit a propensity score `Pr(D = 1 | X)` or another scalar summary of the controls, then refit the IV model with that index as the single exogenous control and call `iv_mw()` on the result.
2. **Stratify and run unconditional tests within strata.** Coarsely bin `X` and call `iv_kitagawa()` on raw vectors within each cell, applying a Bonferroni adjustment across cells.

## Combining with `modelsummary`

If you have `modelsummary` installed, `iv_check` results are picked up automatically through `broom::glance` registered on package load. This lets you put a validity p-value directly in a regression table footer:

```{r, eval = FALSE}
library(modelsummary)
modelsummary(
  list("IV estimate" = m_cond),
  gof_custom = list(
    "Mourifie-Wan 2017 p-value" = sprintf("%.3f", mw$p_value)
  )
)
```

## The full workflow

In your paper's replication code:

```{r, eval = FALSE}
library(fixest)
library(ivcheck)

# ... data loading ...

# IV estimate (conditional on a single control)
m <- feols(y ~ x | d ~ z, data = df)

# IV validity diagnostic
chk <- iv_check(m)

# Report both in the paper
knitr::kable(chk$table)
```

Three lines of code, a falsification test the referee is almost guaranteed to ask about, and a citation-ready result. That is the whole point of `ivcheck`.

## References

Card, D. (1995). Using Geographic Variation in College Proximity to Estimate the Return to Schooling.

Kitagawa, T. (2015). A Test for Instrument Validity. *Econometrica* 83(5): 2043-2063.

Mourifie, I. and Wan, Y. (2017). Testing Local Average Treatment Effect Assumptions. *Review of Economics and Statistics* 99(2): 305-313.