---
title: "Unstratified and Stratified Miettinen and Nurminen Test"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Unstratified and Stratified Miettinen and Nurminen Test}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: rate-compare.bib
---

```{r, include=FALSE}
knitr::opts_chunk$set(
  comment = "#>",
  collapse = TRUE
)
```

## Overview

Binary outcome is a commonly used endpoint in clinical trials.
This page illustrates how to conduct the unstratified or stratified analysis
with the Miettinen and Nurminen (M&N) method [@miettinen1985comparative]
for risk difference analysis in R.
The following statistics can be calculated with the function `rate_compare()`:

- Estimated risk difference.
- Test statistic.
- Confidence interval for the risk difference.
- p-value for risk difference.

## Statistical methods

### Unstratified analysis of M&N method

Assume the data includes two independent binomial samples with binary response variables to be
analyzed/summarized and the data collected in a clinical design without stratification.
Also this approach is applicable to the case when the data are collected
using a stratified clinical design and
the statistician would like to ignore stratification
by pooling the data over strata assuming two independent binomial samples.
Assume $P_i$ is the proportion of success responses in the test ($i=1$) or control ($i=0$) group.

#### Confidence interval

The confidence interval is based on the M&N method and given by the roots for $PD=P_1-P_0$ of the equation:

$$\chi_\alpha^2 = \frac{(\hat{p}_1-\hat{p}_0-PD)^2}{\tilde{V}}$$,

where $\hat{p}_1$ and $\hat{p}_0$ are the observed values of $P_1$ and $P_0$, respectively;

- $\chi_\alpha^2$ = the upper cut point of size $\alpha$ from
the central chi-square distribution
with 1 degree of freedom ($\chi_\alpha^2 = 3.84$ for $95$% confidence interval);

- $PD$ = the difference between two population proportions ($PD=P_1-P_0$);

$$\tilde{V}=\bigg[\frac{\tilde{p}_1(1-\tilde{p}_1)}{n_1}+
\frac{\tilde{p}_0(1-\tilde{p}_0)}{n_0}\bigg]\frac{n_1+n_0}{n_1+n_0-1}$$;

- $n_1$ and $n_0$ are the sample sizes for the test and control group, respectively;

- $\tilde{p}_1$ = maximum likelihood estimate of proportion on test computed as $\tilde{p}_0+PD$;

- $\tilde{p}_0$ = maximum likelihood estimate of
proportion on control under the constraint $\tilde{p}_1-\tilde{p}_0=PD$.

As stated above the 2-sided $100(1-\alpha)$% CI is given by the roots for $PD=P_1-P_0$.
The bisection algorithm is used in the function to obtain the two roots (confidence interval) for $PD$.

#### p-value and Z-statistic

The Z-statistic is computed as:

$$Z_\text{diff}=\frac{\hat{p}_1-\hat{p}_0+S_0}{\sqrt{\tilde{V}}}$$
where $\hat{p}_1$ and $\hat{p}_0$ are the observed values
for $P_1$ and $P_0$ respectively, $S_0$ is pre-specified proportion difference under the null;

- $\tilde{p}_1$ = maximum likelihood estimate of proportion on test computed as $\tilde{p}_0+S_0$;

- $\tilde{p}_0$ = maximum likelihood estimate of proportion on control under the constraint $\tilde{p}_1-\tilde{p}_0=S_0$.

- For non-inferiority or one-sided equivalence hypothesis with $S_0>0$,
the p-value, $\Pr(Z \geq Z_\text{diff} \, | \, H_0)$, is computed based on $Z_\text{diff}$ using the standard normal distribution.
- For non-inferiority or one-sided equivalence hypothesis with $S_0<0$,
the p-value, $\Pr(Z \leq Z_\text{diff} \, | \, H_0)$, is computed based on $Z_\text{diff}$ using the standard normal distribution.
- For two-sided superiority test, the p-value $\Pr(\chi_\text{diff}^2 \leq \chi_1^2 \, | \, H_0)$,
is computed based on $\chi_\text{diff}^2$ using the chi-square distribution with 1 degree of freedom,
where $\chi_\text{diff}^2=Z_\text{diff}^2$.

### Stratified analysis of M&N method

Assume the data includes two treatment groups, test and control, and collected based on a stratified design.
Within each stratum there are two independent binomial samples
with binary response variables to be analyzed/summarized.
The parameter of interest is the difference between
the population proportions of the test and the control groups.
The analysis and summaries need to be performed while adjusting for the stratifying variables.

#### Confidence interval

The confidence interval is based on the M&N method and given by the roots for $PD=P_1-P_0$ of the equation:

$$\chi_\alpha^2 = \frac{(\hat{p}_1^*-\hat{p}_0^*-PD)^2}{\sum_{i=1}^I(W_i/\sum_{k=1}^{K}W_k)^2\tilde{V}_i}$$,

where $\hat{p}_s^* = \sum_{i=1}^I(W_i/\sum_{k=1}^KW_k)\hat{p}_{s i}$ for $s = 0, 1$;

$$\tilde{V}_i=\bigg[\frac{\tilde{p}_{1i}(1-\tilde{p}_{1i})}{n_{1i}}+\frac{\tilde{p}_{0i}(1-\tilde{p}_{0i})}{n_{0i}}\bigg]\frac{n_{1i}+n_{0i}}{n_{1i}+n_{0i}-1}$$;

- $W_i$ is the weight for the $i$-th strata;
- $I = K = $ number of strata, $i=k=$ strata;
- $n_{1i}$ and $n_{0i}$ are the sample sizes in $i$-th strata for the test and control group, respectively;
- $\hat{p}_{1i}$ and $\hat{p}_{0i}$ = observed proportion in $i$-th strata for the test and control, respectively;
- $\tilde{p}_{0i}$ and $\tilde{p}_{1i}$ are MLE for $P_{0i}$ and $P_{1i}$, respectively,
computed under the constraint $\tilde{p}_{1i}=\tilde{p}_{0i}+PD$.

Similarly as for unstratified analysis,the 2-sided $100(1 - \alpha)$% CI is given by the roots for $PD = P_1 - P_0$,
and the bisection algorithm is used in the function to obtain the two roots (confidence interval) for $PD$.

#### p-value and Z-statistic

The Z-statistic is computed as:

$$Z_\text{diff}=\frac{\hat{p}_1^*-\hat{p}^*_0+S_0}{\sqrt{\sum_{i=1}^I(W_i/\sum_{k=1}^{K}W_k)^2\tilde{V}_i}}$$
where $S_0$ is pre-specified proportion difference under the null;

- $\tilde{p}_{0i}$ and $\tilde{p}_{1i}$ are MLE for $P_{0i}$ and $P_{1i}$, respectively,
computed under the constraint $\tilde{p}_{1i} = \tilde{p}_{0i} + S_0$.

The p-value can be calculated as stated above.

## Example

### Load package

```{r, message=FALSE, warning=FALSE}
library(metalite.ae)
```

### Data simulation

We simulated a dataset with 2 treatment group for binary output.
If stratum is used, we considered 4 stratum.

```{r}
ana <- data.frame(
  treatment = c(rep(0, 100), rep(1, 100)),
  response  = c(rep(0, 80), rep(1, 20), rep(0, 40), rep(1, 60)),
  stratum   = c(rep(1:4, 12), 1, 3, 3, 1, rep(1:4, 12), rep(1:4, 25))
)

head(ana)
```

### Unstratified analysis

The function computes the risk difference, Z-statistic, p-value given the type of test,
and two-sided $100(1 - \alpha)$% confidence interval of difference between two rates.

```{r}
rate_compare(response ~ treatment, data = ana)
```

### Stratified analysis

The sample size weighting is often used in the clinical trial.
Below is the function to conduct stratified MN analysis with sample size weights.

We also support weight in `"equal"` and `"cmh"`.
More details can be found in the `rate_compare()` documentation.

```{r}
rate_compare(
  formula = response ~ treatment, strata = stratum, data = ana,
  weight = "ss"
)
```

## References