
<!-- README.md is generated from README.Rmd. Please edit that file -->

# nbpInference

<!-- badges: start -->

[![R-CMD-check](https://github.com/AnthonyFrazierCSU/nbpInference/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AnthonyFrazierCSU/nbpInference/actions/workflows/R-CMD-check.yaml)
[![CRAN
status](https://www.r-pkg.org/badges/version/nbpInference)](https://CRAN.R-project.org/package=nbpInference)
<!-- badges: end -->

A common approach to causal inference on observational data with
continuous treatments is matching, where observations are paired based
on covariate similarity while ensuring treatment values are distinct
within pairs. Matching allows for estimation of an average treatment
effect while adjusting for observed confounding. Inexact covariate
matching is common in practice, but may induce severe bias in downstream
causal inference. We propose matching, estimation, and inference methods
for paired observational studies with continuous treatments that
addresses bias caused by inexact covariate matching. The nbpInference
package includes methods for calipered non-bipartite matching,
bias-corrected sample average treatment effect estimation, and
covariate-adjusted conservative variance estimation based on the methods
in (Frazier, Heng, and Zhou 2024).

## Installation

You can install the development version of nbpInference from
[Github](https://github.com/) like so:

``` r
# install.packages("devtools")
# devtools::install_github("AnthonyFrazierCSU/nbpInference")
```

## Example

Below is an example of generating data, forming matched pairs,
estimating treatment effects and creating Wald-type confidence intervals
for the sample average treatment effect using different functionality in
nbpInference. First, we simulate some covariates $X=(X_1,...,X_5)$, a
continuous treatment $Z$ and outcome $Y$ based on the data generation
models described in the main simulation of (Frazier, Heng, and Zhou
2024).

``` r
library(nbpInference)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.3.3
set.seed(12345)

# generating X, Y, Z with a sample size of 100
example_data <- generate.data.dose(100)

# several rows as an example of data generated
head(example_data)
#>             Z         Y         X1         X2         X3          X4
#> 1  1.45275168 -3.302435  0.5855288  0.2239254 -1.4361457  0.52228217
#> 2  0.58768423  4.525210  0.7094660 -1.1562233 -0.6292596  0.00979376
#> 3  0.91140199  1.579223 -0.1093033  0.4224185  0.2435218 -0.44052620
#> 4  3.32703775 15.000065 -0.4534972 -1.3247553  1.0583622  1.19948953
#> 5  4.53489265  7.355280  0.6058875  0.1410843  0.8313488 -0.11746849
#> 6 -0.08620513 -2.965215 -1.8179560 -0.5360480  0.1052118  0.03820979
#>             X5
#> 1  0.627965113
#> 2  0.002143951
#> 3  0.284377723
#> 4 -1.001779086
#> 5 -0.617221929
#> 6  0.828194239
```

To generate matched pairs using the dose assignment discrepancy caliper
defined in (Frazier, Heng, and Zhou 2024) we first need to form an
$N \times N$ matrix, which we will denote the “p-matrix”, where each
cell in the p-matrix contains the probability of observed treatment
assignment within each pair of observations. Specifically, denote
$(Z_{i}, X_{i})$ as the observed treatment and covariates for
observation $i$. Let $f(Z|X)$ denote the conditional density of
treatment given covariates, and let $\widehat{f}(Z|X)$ denote some
estimate of that conditional density. the diagonal entries of the
p-matrix are set to $0$, while the $(i, j)$ entries of the p-matrix are
set to

$$(\widehat{f}(Z_{i}|X_{i})\widehat{f}(Z_{j}|X_{j})+\widehat{f}(Z_{i}|X_{j})\widehat{f}(Z_{j}|X_{i}))^{(-1)}(\widehat{f}(Z_{i}|X_{i})\widehat{f}(Z_{j}|X_{j}))$$

``` r

# setting "treatment" to Z, "covariates" to X1,...,X5
treatment <- example_data[, 1]
covariates <- example_data[, 3:ncol(example_data)]

# creates pmatrix using the "Model" based method
pmat <- make.pmatrix(treatment, covariates)

# preview what entries in pmatrix look like. All entries are values between
# 0 and 1, as each entry represents a probability.
pmat[1:6, 1:6]
#>           [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
#> [1,] 0.0000000 0.5560414 0.5359757 0.3909543 0.4494939 0.7435420
#> [2,] 0.5560414 0.0000000 0.4995097 0.5162289 0.6829898 0.5721800
#> [3,] 0.5359757 0.4995097 0.0000000 0.5179673 0.6740481 0.6045299
#> [4,] 0.3909543 0.5162289 0.5179673 0.0000000 0.5513774 0.8254281
#> [5,] 0.4494939 0.6829898 0.6740481 0.5513774 0.0000000 0.9474680
#> [6,] 0.7435420 0.5721800 0.6045299 0.8254281 0.9474680 0.0000000
```

Using covariates $\mathbf{X}$, we form two sets of $50$ matched pairs,
one set formed without using the dose assignment discrepancy caliper,
and the other set formed using the caliper, setting the associated
hyperparameters $\xi = 0.1$ and $M = 10000$

``` r
# forming a set of matched pairs using conventional non-bipartite matching
pairs_vanilla <- nbp.caliper(treatment, covariates)

# forming a set of matched pairs using the dose assignment discrepancy caliper
pairs_caliper <- nbp.caliper(treatment, covariates, pmat, xi = 0.1, M = 10000)

# preview of what the pairs dataframe looks like. Contains indeces of observations, 
# each row representing a matched pair
head(pairs_vanilla)
#>   High Low
#> 3    3  17
#> 4    4  65
#> 5    5  76
#> 6    6  42
#> 7    7  79
#> 8    8  28
```

Matched pairs formed without using the dose assignment discrepancy
caliper may have probabilities of observed treatment assignment heavily
deviate from $0.5$, as shown in the plot below.

<img src="man/figures/README-treatment.assignment.probabilities.vanilla-1.png" width="100%" />
By implementing the dose assignment discrepancy caliper and setting $M$
to be a large number, pairs of observations with probability of observed
treatment assignment highly deviate from $0.5$ are discouraged from
being formed as matched pairs.

<img src="man/figures/README-treatment.assignment.probabilities.calipered-1.png" width="100%" />
After forming matched pairs, we can estimate the sample average
treatment effect, either using the conventional Neyman estimator
(Baiocchi et al. 2010; Zhang et al. 2022; Heng et al. 2023) or the
bias-corrected neyman estimator (Frazier, Heng, and Zhou 2024).

``` r
# setting "outcome" to Y
outcome <- example_data[, 2]

# Conventional Neyman estimation of sample average treatment effect
classic_estimation <- classic.neyman(outcome, treatment, pairs_vanilla)

# bias-corrected Neyman estimation of sample average treatment effect
bc_estimation <- bias.corrected.neyman(outcome, treatment, pairs_caliper, pmat, 0.1)

# displaying estimated sample average treatment effects
c(classic_estimation, bc_estimation)
#> [1] 1.969861 1.340177
```

Finally, a Wald-type confidence interval can be created for containing
the sample average treatment effect.

``` r

# conservative variance estimation with covariate adjustment
var_est <- covAdj.variance(outcome, treatment, covariates, pairs_caliper, pmat, 0.1)

# creating a 95 percent confidence interval using the bias-corrected estimator
CI <- c(bc_estimation - qnorm(0.975)*sqrt(var_est), bc_estimation + qnorm(0.975)*sqrt(var_est))

# displaying confidence interval
CI
#> [1] -0.5952437  3.2755984
```

<div id="refs" class="references csl-bib-body hanging-indent"
entry-spacing="0">

<div id="ref-baiocchi2010building" class="csl-entry">

Baiocchi, Mike, Dylan S Small, Scott Lorch, and Paul R Rosenbaum. 2010.
“Building a Stronger Instrument in an Observational Study of Perinatal
Care for Premature Infants.” *Journal of the American Statistical
Association* 105 (492): 1285–96.

</div>

<div id="ref-frazier2024bias" class="csl-entry">

Frazier, Anthony, Siyu Heng, and Wen Zhou. 2024. “Bias Reduction in
Matched Observational Studies with Continuous Treatments: Calipered
Non-Bipartite Matching and Bias-Corrected Estimation and Inference.”
*arXiv e-Prints*, arXiv–2409.

</div>

<div id="ref-heng2023instrumental" class="csl-entry">

Heng, Siyu, Bo Zhang, Xu Han, Scott A Lorch, and Dylan S Small. 2023.
“Instrumental Variables: To Strengthen or Not to Strengthen?” *Journal
of the Royal Statistical Society: Series A (Statistics in Society)* 186
(4): 852–73.

</div>

<div id="ref-zhang2022bridging" class="csl-entry">

Zhang, Bo, Siyu Heng, Emily J MacKay, and Ting Ye. 2022. “Bridging
Preference-Based Instrumental Variable Studies and Cluster-Randomized
Encouragement Experiments: Study Design, Noncompliance, and Average
Cluster Effect Ratio.” *Biometrics* 78 (4): 1639–50.

</div>

</div>
