---
title: "Using the CGMmissingDataR Shiny App"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using the CGMmissingDataR Shiny App}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE
)
```

# Overview

CGMmissingDataR includes an optional Shiny app for interactive missing glucose
imputation. The app is a point-and-click interface around the main package
function:

```r
run_missing_glucose_imputation()
```

The app is useful when users want to:

- upload a CSV file without writing R code;
- choose the target glucose, subject ID, timestamp, and feature columns from a
  user interface;
- load built-in example data sets for demonstration;
- inspect observed missingness before running imputation;
- run the imputation workflow;
- preview rows where glucose was missing and then imputed;
- download the completed data as a CSV file.

The Shiny app does not implement a separate imputation algorithm. It calls
`run_missing_glucose_imputation()` internally and returns the same type of
completed data frame as the command-line workflow.

The imputation workflow handles both explicit missing glucose values coded as
`NA` and missing readings implied by timestamp gaps. During imputation, each
subject is regularized to the expected `interval_minutes` timestamp grid, so the
returned data can contain more rows than the uploaded data when timestamps are
missing.

# Installation

Install CGMmissingDataR from CRAN with:

```r
install.packages("CGMissingDataR")
```

The app requires the optional R package `shiny`. If Shiny is not already
installed, install it with:

```r
install.packages("shiny")
```

Then load the package:

```{r setup}
library(CGMissingDataR)
```

# Launching the app

Launch the app with:

```{r launch-app, eval = FALSE}
run_app()
```

During package development, after running `devtools::load_all()`, the same
launcher can be used:

```{r launch-app-development, eval = FALSE}
devtools::load_all()
run_app()
```

The app is bundled inside the installed package, typically under:

```r
system.file(
  "shiny",
  "cgm_imputation_app",
  package = "CGMissingDataR"
)
```

Users normally do not need to access this directory directly. The `run_app()`
launcher finds it automatically.

# Input options

The app provides two ways to load data.

## Upload a CSV file

Use the **Browse** button to upload a CSV file containing CGM data. The file
should contain, at minimum, columns corresponding to:

| Role | Example column | App selector |
|---|---:|---|
| Subject identifier | `USUBJID` | Subject ID column |
| Glucose value | `LBORRES` | Target glucose column |
| Timestamp | `Time` | Timestamp column |
| Additional predictors | `AGE`, `hba1c` | Feature columns |

After the file is uploaded, the app displays a preview of the uploaded data and
populates the column-selection controls.

## Load built-in example data

The app can also load built-in example data sets for demonstration. These are
useful for quickly showing how the workflow behaves without requiring users to
upload their own data.

The example data sets are intended to include:

| Example data | Description |
|---|---|
| `CGMExmplDat5Pct` | Example CGM data with about 5% explicit missing glucose values. |
| `CGMExmplDat10Pct` | Example CGM data with about 10% explicit missing glucose values. |

After selecting an example data set and clicking **Load example data**, the app
uses that data set exactly as if it had been uploaded by the user.

# Selecting columns

Once data are loaded, select the columns that map to the imputation function.

## Target glucose column

Choose the glucose column with missing values to impute. In the included example
data, this is usually:

```r
LBORRES
```

The original target column is preserved in the returned data. Values that were
originally missing, or created from timestamp gaps during regularization, remain
`NA` in this original column. Completed glucose values are written to a new
column named:

```r
imputed_glucose_value
```

## Subject ID column

Choose the column identifying each subject or participant. In the example data,
this is usually:

```r
USUBJID
```

The subject ID is used for sorting, timestamp regularization, lag feature
creation, rolling-mean feature creation, and subject-level time handling.

## Timestamp column

Choose the raw timestamp column. In the example data, this is usually:

```r
Time
```

The imputation function uses this timestamp column to regularize each subject to
an equal `interval_minutes` CGM grid before imputation. Common timestamp formats
are supported, including colon-separated, hyphen-separated, slash-separated,
ISO-style, and `POSIXct` values.

## Feature columns

Choose additional predictor columns. In the example data, these commonly include:

```r
SEX
AGE
hba1c
```

Feature columns should be numeric or coercible to numeric. If a `SEX` column is
present, the underlying function can encode it internally.

# Missingness summary card

The app includes a missingness summary card beside the uploaded data preview.
After a target glucose column is selected, this card shows the observed
missingness in the loaded data before imputation:

- the percentage of explicit missing values in the selected target column;
- the number of explicit missing rows;
- the total number of uploaded rows;
- a warning style when missingness is greater than the chosen threshold, such as
  20%.

This card is intended as a quick data-quality check before running the
imputation workflow. Timestamp gaps are handled during imputation, so the final
number of rows imputed can be larger than the explicit missing count shown in
this pre-imputation summary.

# Timestamp-gap handling

When imputation runs, the underlying function regularizes each subject to the
expected `interval_minutes` grid. For example, if readings jump from `00:05` to
`00:30`, the function internally creates the missing rows at `00:10`, `00:15`,
`00:20`, and `00:25`, sets the target glucose value to `NA`, and then imputes
those values.

This means the downloaded data can contain more rows than the uploaded data when
there are timestamp gaps.

# Backend selection

The app supports the same backends as `run_missing_glucose_imputation()`.

| Backend | Description | Recommended use |
|---|---|---|
| `mice` | R-native backend using the R package `mice`. | Default, CRAN-safe workflow. |
| `sklearn` | Optional Python-compatible backend through `reticulate`. | Closest agreement with the Python reference workflow. |

## MICE backend

The default backend is:

```r
imputer_backend = "mice"
```

This backend does not require Python and is the safest choice for most users. It
is also the backend used in CRAN-safe examples and tests.

# Method selection

The **Final imputation method** control mirrors the `models` argument in
`run_missing_glucose_imputation()`. The default **Automatic by missing rate**
option uses `MICE+ARIMA` when missingness is at or below the selected threshold
and `MICE+XGBoost` otherwise.

Users can also force exactly one final method:

- `MICE+ARIMA`;
- `MICE+XGBoost`;
- `MICE+Random Forest`;
- `MICE+kNN`;
- `MICE+LightGBM`.

The app shows only the tuning controls relevant to the selected method. For
example, Random Forest shows the tree count, kNN shows the neighbor count, and
LightGBM shows boosting rounds.

The **Model threads** control maps to `n_threads`. It defaults to `1` for
CRAN-friendly and shared-system-friendly CPU use. Increase it for faster local
XGBoost, Random Forest, or LightGBM runs.

## Optional sklearn backend

The optional Python-compatible backend is:

```r
imputer_backend = "sklearn"
```

This path sends the data frame to Python through `reticulate`. Python then uses:

- `pandas` for data-frame operations;
- `scikit-learn` for `IterativeImputer`;
- `statsmodels` for ARIMA;
- Python `xgboost` for XGBoost regression;
- Python `lightgbm` when forcing LightGBM.

To use the Python backend, install `reticulate` and declare the Python
requirements before launching or running the app:

```{r python-requirements, eval = FALSE}
install.packages("reticulate")

reticulate::py_require(c(
  "numpy",
  "pandas",
  "scikit-learn",
  "statsmodels",
  "xgboost"
))

# Optional, only needed for models = "lightgbm"
reticulate::py_install("lightgbm", pip = TRUE)
```

The Python backend is optional. It is not required for installing or loading the
package.

# Running imputation

After loading data and selecting columns, click **Run imputation**.

Internally, the app calls code equivalent to:

```{r app-equivalent-call, eval = FALSE}
out <- run_missing_glucose_imputation(
  data = uploaded_data,
  target_col = selected_target_col,
  feature_cols = selected_feature_cols,
  id_col = selected_id_col,
  time_col = selected_time_col,
  imputer_backend = selected_backend,
  models = selected_method,
  use_arima_if_missing_leq = selected_threshold,
  xgb_nrounds = selected_xgb_rounds,
  rf_n_estimators = selected_rf_trees,
  knn_k = selected_knn_neighbors,
  lgb_nrounds = selected_lightgbm_rounds,
  n_threads = selected_threads,
  seed = selected_seed,
  export = FALSE
)
```

The returned object is a data frame containing the original input columns plus:

| Column | Meaning |
|---|---|
| `imputed_glucose_value` | Completed glucose values after imputation. |

The original target glucose column is left unchanged. Internal time features,
lag features, rolling means, model labels, and missingness-tracking flags are
used during imputation but are not included in the returned or downloaded data.

# Previewing results

After imputation, the app displays a preview of rows where the target glucose
value is missing in the returned data. This includes explicit missing glucose
values and, when timestamp gaps exist, rows inserted during timestamp
regularization.

For example, the preview is based on logic like:

```{r preview-logic, eval = FALSE}
imputed_rows <- out[is.na(out[[target_col]]), , drop = FALSE]
head(imputed_rows, 15)
```

The full completed data frame remains available for download.

# Downloading results

Use the **Download imputed CSV** button to save the completed data set. The CSV
is intentionally minimal and contains:

- the original uploaded columns;
- any rows inserted from timestamp gaps;
- `imputed_glucose_value`.

`imputed_glucose_value` is returned as a continuous numeric model estimate. If
whole-number glucose values are needed for reporting, users can round the column
after download.

# Troubleshooting

## The app does not launch

If you see an error saying that Shiny is not installed, run:

```r
install.packages("shiny")
```

Then restart R and try:

```r
run_app()
```

## No column choices appear

Column choices appear only after data are loaded. Upload a CSV file or load one
of the built-in example data sets.

## Imputation fails because a timestamp cannot be parsed

Check the timestamp column selected in the app. The values should be parseable
dates or datetimes, for example:

```r
"2020:01:16:00:00"
"2020-01-16 00:00:00"
"2020/01/16 00:00:00"
"2020-01-16T00:00:00"
```

If the wrong column was selected as the timestamp column, select the correct
column and rerun imputation.

## Downloaded data have more rows than the uploaded file

This can be expected. The imputation workflow creates missing expected CGM rows
from timestamp gaps before imputing glucose values.

## Python backend fails because a Python module is missing

If `imputer_backend = "sklearn"` fails because Python packages are missing, run:

```r
reticulate::py_require(c(
  "numpy",
  "pandas",
  "scikit-learn",
  "statsmodels",
  "xgboost"
))

# Optional, only needed for models = "lightgbm"
reticulate::py_install("lightgbm", pip = TRUE)
```

Then restart R and launch the app again.

## Downloaded data contain `NA` in the original glucose column

This is expected. The original target column is intentionally preserved. The
completed values are stored in:

```r
imputed_glucose_value
```

# Developer notes

The recommended package structure for the app is:

```text
inst/
└── shiny/
    └── cgm_imputation_app/
        └── app.R
```

The launcher should live in an exported R function, for example:

```r
run_app <- function() {
  app_dir <- system.file(
    "shiny",
    "cgm_imputation_app",
    package = "CGMissingDataR"
  )
  shiny::runApp(app_dir, display.mode = "normal")
}
```

Because the app is optional, `shiny` should usually be listed in `Suggests`, not
`Imports`, unless the package requires Shiny for normal operation.