| Type: | Package |
| Title: | Automated Statistical Analysis and Table Generation for Biomedical Research |
| Version: | 1.7.2 |
| Description: | Generates publication-ready summary tables for clinical research, supporting descriptive summaries and comparisons across two or three groups. The package streamlines the analytical workflow by detecting variable types and applying appropriate statistical tests (Welch t-test, Wilcoxon rank-sum, Welch ANOVA, Kruskal-Wallis, Chi-squared, or Fisher's exact test). Results are formatted as 'tibble' objects and can be exported to 'Word' or 'Excel' using the 'officer', 'flextable', and 'writexl' packages. Optional pairwise post-hoc testing for three-group comparisons (Games-Howell and Dunn's test) is available via the 'rstatix' package. Example data are derived from the landmark adjuvant colon cancer trial described in Moertel et al. (1990) <doi:10.1056/NEJM199002083220602>. |
| License: | MIT + file LICENSE |
| URL: | https://cran.r-project.org/package=TernTables, https://github.com/jdpreston30/TernTables, https://tern-tables.com/ |
| BugReports: | https://github.com/jdpreston30/TernTables/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | cli, dplyr (≥ 1.0.0), epitools, flextable (≥ 0.9.0), magrittr, multcompView, officer (≥ 0.4.6), rlang, rstatix, stats, stringr, tibble, withr, writexl |
| Suggests: | knitr, rmarkdown, survival, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| Depends: | R (≥ 4.1.0) |
| LazyData: | true |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | no |
| Packaged: | 2026-06-04 18:00:18 UTC; jdp2019 |
| Author: | Joshua D. Preston |
| Maintainer: | Joshua D. Preston <joshua.preston@emory.edu> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-04 18:30:02 UTC |
TernTables: Automated Statistics and Table Generation for Clinical Research
Description
TernTables generates publication-ready summary tables for descriptive
statistics and group comparisons. It automatically detects variable types
(continuous, binary, or categorical), selects appropriate
statistical tests, and formats results for direct export to Word or Excel.
Numeric variables can be designated as ordinal via force_ordinal, or
forced to parametric treatment via force_normal.
Main functions
ternGGrouped comparison table for 2- or 3-level group variables.
ternDDescriptive-only summary table (no grouping).
word_exportExport a TernTables tibble to a formatted Word document.
write_methods_docGenerate a methods Word document describing tests used.
val_p_formatFormat a P value for publication.
val_formatFormat a numeric value with rounding rules.
Statistical tests applied
- Binary / Categorical
Chi-squared or Fisher's exact, based on expected cell counts (Cochran criterion).
- Numeric, normal (2 groups)
Welch's t-test, routed by ROBUST logic.
- Numeric, normal (3+ groups)
Welch ANOVA, routed by ROBUST logic per group.
- Numeric, non-normal (2 groups)
Wilcoxon rank-sum, routed by ROBUST logic or forced via
force_ordinal; override to parametric viaforce_normal.- Numeric, non-normal (3+ groups)
Kruskal-Wallis, routed by ROBUST logic or forced via
force_ordinal; override to parametric viaforce_normal.
ROBUST routing uses four gates: (1) n < 3 \Rightarrow non-parametric (fail-safe);
(2) |skewness| > 2 or |excess kurtosis| > 7 in any group \Rightarrow non-parametric;
(3) all groups n \geq 30 \Rightarrow parametric (CLT);
(4) otherwise Shapiro-Wilk p > 0.05 in all groups \Rightarrow parametric.
Scope and limitations
All statistical tests applied by ternG assume
independent observations — that is, each row of the data frame
represents a distinct, unrelated subject with no dependencies between rows.
TernTables is not designed for repeated-measures, longitudinal, or
clustered data where the same subject contributes multiple rows (e.g.
pre/post measurements, matched pairs, or patients nested within clinical
sites). Applying it to such data would violate the independence assumption
shared by all tests in the package (Welch's t-test, Wilcoxon
rank-sum, Welch ANOVA, Kruskal-Wallis, chi-squared, and Fisher's exact)
and would produce invalid p-values.
Getting started
See vignette("getting-started", package = "TernTables") for a
walkthrough using the bundled tern_colon dataset.
Web application
TernTables is available as a free point-and-click web application at
https://tern-tables.com/ — no R installation required. Upload a
CSV or XLSX file, configure the analysis through a simple interface, and
download a publication-ready Word table. The web application is powered
by this R package; all statistical methods and outputs are identical to
calling ternG(), ternD(), and ternP() directly.
Author(s)
Maintainer: Joshua D. Preston joshua.preston@emory.edu (ORCID)
Authors:
Helen Abadiotakis (ORCID)
Ailin Tang (ORCID)
Clayton J. Rust (ORCID)
Michael E. Halkos (ORCID)
Mani A. Daneshmand (ORCID)
Joshua L. Chan (ORCID)
See Also
Useful links:
Report bugs at https://github.com/jdpreston30/TernTables/issues
Classify variables by normality and routing decision
Description
Applies the same normality assessment logic used internally by ternG()
and ternD() and returns a tidy tibble showing per-variable (and
per-group) statistics, the gate that triggered the routing decision, and the
final parametric / non-parametric routing outcome.
Usage
classify_normality(
data,
vars = NULL,
exclude_vars = NULL,
group_var = NULL,
consider_normality = "ROBUST"
)
Arguments
data |
A data frame or tibble. |
vars |
Optional character vector of variable names to assess. If
|
exclude_vars |
Optional character vector of variable names to exclude. |
group_var |
Optional name of the grouping variable (as used in
|
consider_normality |
Normality assessment mode — must match what was
(or will be) passed to |
Details
Useful for:
Answering reviewer questions about normality testing ("was Age normally distributed?").
Verifying that a given variable's routing matches your expectation before running
ternG()orternD().Generating a supplemental normality audit table for a manuscript.
Value
A tibble with one row per variable \times group (or one row per
variable when group_var = NULL), containing:
- variable
Variable name.
- group
Group level, or
"[all]"when nogroup_varis supplied.- n
Non-missing sample size in this group.
- skewness
Sample skewness (population moments).
- kurtosis
Excess kurtosis (population moments; 0 for a normal distribution).
- sw_p
Shapiro-Wilk p-value for this group.
NAwhen the routing decision was made at Gates 1–3 under"ROBUST", when n is outside the valid range (3–5000), or whenconsider_normality = FALSE.- gate
Integer 1–4 indicating which gate made the routing decision under
consider_normality = "ROBUST", orNAforTRUE/FALSEmodes.- gate_reason
Plain-language explanation of the gate decision, naming which group(s) triggered the rule where relevant.
- is_normal
Logical;
TRUE= routed to parametric (mean\pmSD, t-test / ANOVA);FALSE= non-parametric (median [IQR], Wilcoxon / Kruskal-Wallis).- routing
Human-readable routing summary:
"Parametric (mean \u00b1 SD)"or"Non-parametric (median [IQR])".
Examples
data(tern_colon)
# Single-group audit (ternD-style)
classify_normality(tern_colon, exclude_vars = "ID")
# Grouped audit matching a ternG call
classify_normality(tern_colon, exclude_vars = "ID", group_var = "Recurrence")
# Specific variables only
classify_normality(tern_colon,
vars = c("Age", "Positive_Lymph_Nodes_n"),
group_var = "Recurrence")
# Using Shapiro-Wilk only (matches consider_normality = TRUE in ternG/ternD)
classify_normality(tern_colon, exclude_vars = "ID",
group_var = "Recurrence",
consider_normality = TRUE)
Print method for ternP_result objects
Description
Re-displays the preprocessing summary for a ternP_result object.
Note that ternP already emits this summary automatically at
the time it is called, so this method is most useful for reviewing the
summary after the fact (e.g. typing result at the console later
in a session).
Usage
## S3 method for class 'ternP_result'
print(x, ...)
Arguments
x |
A |
... |
Currently unused; included for S3-method compatibility. |
Value
Invisibly returns x.
Combine multiple ternD/ternG tables into a single Word document
Description
Takes a list of tibbles previously created by ternD() or ternG()
and writes them all into one .docx file, one table per page, preserving
the exact formatting settings that were used when each table was built.
Usage
ternB(
tables,
output_docx,
page_break = TRUE,
methods_doc = FALSE,
methods_filename = "TernTables_methods.docx",
open_doc = TRUE,
citation = TRUE,
font_family = getOption("TernTables.font_family", "Arial")
)
Arguments
tables |
A list of tibbles created by |
output_docx |
Output file path ending in |
page_break |
Logical; if |
methods_doc |
Logical; if |
methods_filename |
Output file path for the methods document. Defaults
to |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family for all Word output. Any font name accepted by
the rendering system is valid. Can also be set via
|
Details
ternB() works by replaying the exact word_export() call that
ternD() / ternG() would have made – using stored metadata
attached as an attribute to each returned tibble – but directing all output
into a single combined document instead of separate files.
Table captions (table_caption) and footnotes (table_footnote) specified in the original
ternD() / ternG() call are reproduced automatically. You can
override them by modifying the "ternB_meta" attribute before calling
ternB(), though in practice it is easier to set captions and footnotes when you
first build each table.
Value
Invisibly returns the path to the written Word file.
Examples
data(tern_colon)
T1 <- ternD(tern_colon,
exclude_vars = "ID",
table_caption = "Table 1. Overall patient characteristics.",
methods_doc = FALSE,
open_doc = FALSE)
T2 <- ternG(tern_colon,
group_var = "Recurrence",
exclude_vars = "ID",
table_caption = "Table 2. Characteristics by recurrence status.",
methods_doc = FALSE,
open_doc = FALSE)
ternB(list(T1, T2),
output_docx = file.path(tempdir(), "combined_tables.docx"),
open_doc = FALSE)
Generate descriptive summary table (optionally normality-aware)
Description
Creates a descriptive summary table with a single "Total" column format.
By default (consider_normality = "ROBUST"), continuous variables are shown
as mean +/- SD or median [IQR] based on a four-gate decision (n < 3 fail-safe, skewness/kurtosis, CLT, and Shapiro-Wilk).
This can be overridden via consider_normality and force_ordinal.
Usage
ternD(
data,
vars = NULL,
exclude_vars = NULL,
force_ordinal = NULL,
force_normal = NULL,
force_continuous = NULL,
output_xlsx = NULL,
output_docx = NULL,
consider_normality = "ROBUST",
print_normality = FALSE,
round_intg = FALSE,
round_decimal = NULL,
smart_rename = TRUE,
insert_subheads = TRUE,
factor_order = "mixed",
methods_doc = TRUE,
methods_filename = "TernTables_methods.docx",
category_start = NULL,
plain_header = NULL,
table_font_size = 9,
manual_italic_indent = NULL,
manual_underline = NULL,
table_caption = NULL,
table_footnote = NULL,
abbreviation_footnote = NULL,
variable_footnote = NULL,
index_style = "symbols",
line_break_header = getOption("TernTables.line_break_header", TRUE),
open_doc = TRUE,
citation = TRUE,
font_family = getOption("TernTables.font_family", "Arial"),
show_missing = FALSE,
zero_to_dash = FALSE,
show_missingness = FALSE,
missing_indicators = NULL
)
Arguments
data |
Tibble with variables. |
vars |
Character vector of variables to summarize. Defaults to all except |
exclude_vars |
Character vector to exclude from the summary. |
force_ordinal |
Character vector of variables to treat as ordinal (i.e., use median [IQR])
regardless of the |
force_normal |
Character vector of variable names to treat as normally distributed, bypassing all
normality assessment. Listed variables are summarized as mean |
force_continuous |
Character vector of variables to force treatment as continuous (mean |
output_xlsx |
Optional Excel filename to export the table. |
output_docx |
Optional Word filename to export the table. |
consider_normality |
Character or logical; controls routing of continuous variables to
mean |
print_normality |
Logical; if |
round_intg |
Logical; if |
round_decimal |
Integer or |
smart_rename |
Logical; if |
insert_subheads |
Logical; if |
factor_order |
Character; controls the ordering of factor levels in the output.
|
methods_doc |
Logical; if |
methods_filename |
Character; filename for the methods document.
Default is |
category_start |
Named character vector specifying where to insert category headers.
Names are the header label text to display; values are the anchor variable – either the
original column name (e.g. |
plain_header |
Named character vector, same interface as |
table_font_size |
Numeric; font size for Word document output tables. Default is 9. |
manual_italic_indent |
Character vector of display variable names (post-cleaning) that should be
formatted as italicized and indented in Word output – matching the appearance of factor sub-category
rows. Has no effect on the returned tibble; only applies when |
manual_underline |
Character vector of display variable names (post-cleaning) that should be
formatted as underlined in Word output – matching the appearance of multi-category variable headers.
Has no effect on the returned tibble; only applies when |
table_caption |
Optional character string for a table caption to display above the table in
the Word document. Rendered as size 11 Arial bold, single-spaced with a small gap before the table.
Default is |
table_footnote |
Optional character string for a footnote to display below the table in the
Word document. Rendered as size 6 Arial italic with a double-bar border above and below.
Default is |
abbreviation_footnote |
Optional character string listing abbreviations. Always printed
first in the footnote block. Default |
variable_footnote |
Optional named character vector. Names are display variable names
(case-insensitive); values are the footnote definition text. Each variable gets the next
symbol appended to its name in the table, and the footnote block lists each definition
below the abbreviation line. To share one footnote between multiple variables, separate
their names with a pipe: |
index_style |
Character; |
line_break_header |
Logical; if |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family name used for all Word output (table,
captions, footnotes, methods document). Any font installed on the system that
renders the document may be used. Popular options include |
show_missing |
Logical; if |
zero_to_dash |
Logical; if |
show_missingness |
Controls whether a |
missing_indicators |
Optional character vector of string values to treat as missing
in addition to (or instead of) the built-in ternP defaults. When |
Details
The function always returns a tibble with a single Total (N = n) column format, regardless of the
consider_normality setting. The behavior for numeric variables follows this priority:
Variables in
force_ordinal: Always use median [IQR]When
consider_normality = "ROBUST": Four-gate decision (n<3 fail-safe, skewness/kurtosis, CLT, Shapiro-Wilk)When
consider_normality = TRUE: Use Shapiro-Wilk test to choose formatWhen
consider_normality = FALSE: Default to mean +/- SD
For categorical variables, the function shows frequencies and percentages. When
insert_subheads = TRUE, categorical variables with 3 or more levels are displayed with
hierarchical formatting (main variable as header, levels as indented sub-rows). Binary variables
(Y/N, YES/NO, or numeric 1/0 auto-detected as Y/N) always use a single-row format showing
only the positive/yes count, regardless of this setting. Two-level categorical variables whose
values are not Y/N, YES/NO, or 1/0 (e.g. Male/Female) also use the hierarchical sub-row format.
Value
A tibble with one row per variable (multi-row for factors), containing:
- Variable
Variable names with appropriate indentation
- Total (N = n)
Summary statistics (mean +/- SD, median [IQR], or n (%) as appropriate)
- SW_p
Shapiro-Wilk P values (only if
print_normality = TRUE)
Examples
data(tern_colon)
# Basic descriptive summary
ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE)
# With normality-aware formatting and category section headers
ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE,
category_start = c("Patient Demographics" = "Age (yr)",
"Tumor Characteristics" = "Positive Lymph Nodes (n)"))
# Force specific variables to ordinal (median [IQR]) display
ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE,
force_ordinal = c("Positive_Lymph_Nodes_n"))
# Export to Word (writes a file to tempdir)
ternD(tern_colon,
exclude_vars = c("ID"),
methods_doc = FALSE,
open_doc = FALSE,
output_docx = file.path(tempdir(), "descriptive.docx"),
category_start = c("Patient Demographics" = "Age (yr)",
"Surgical Findings" = "Colonic Obstruction",
"Tumor Characteristics" = "Positive Lymph Nodes (n)",
"Outcomes" = "Recurrence"))
Generate grouped summary table with appropriate statistical tests
Description
Creates a grouped summary table with optional statistical testing for group
comparisons. Supports numeric and categorical variables; numeric variables
can be treated as ordinal via force_ordinal. Includes options to
calculate P values and odds ratios. For descriptive
(ungrouped) tables, use ternD.
Usage
ternG(
data,
vars = NULL,
exclude_vars = NULL,
group_var,
force_ordinal = NULL,
force_normal = NULL,
force_continuous = NULL,
group_order = NULL,
output_xlsx = NULL,
output_docx = NULL,
OR_col = FALSE,
OR_method = "dynamic",
consider_normality = "ROBUST",
print_normality = FALSE,
show_test = FALSE,
p_digits = 3,
round_intg = FALSE,
round_decimal = NULL,
smart_rename = TRUE,
insert_subheads = TRUE,
factor_order = "mixed",
table_font_size = 9,
methods_doc = TRUE,
methods_filename = "TernTables_methods.docx",
category_start = NULL,
plain_header = NULL,
manual_italic_indent = NULL,
manual_underline = NULL,
indent_info_column = FALSE,
show_total = TRUE,
table_caption = NULL,
table_footnote = NULL,
abbreviation_footnote = NULL,
variable_footnote = NULL,
index_style = "symbols",
line_break_header = getOption("TernTables.line_break_header", TRUE),
post_hoc = FALSE,
p_adjust = FALSE,
p_adjust_display = "fdr_only",
open_doc = TRUE,
citation = TRUE,
font_family = getOption("TernTables.font_family", "Arial"),
show_missing = FALSE,
show_p = TRUE,
zero_to_dash = FALSE,
percentage_compute = "column",
categorical_posthoc = FALSE,
show_missingness = FALSE,
missing_indicators = NULL
)
Arguments
data |
Tibble containing all variables. |
vars |
Character vector of variables to summarize. Defaults to all except |
exclude_vars |
Character vector of variable(s) to exclude. |
group_var |
Character, the grouping variable (factor or character with >=2 levels). |
force_ordinal |
Character vector of variables to treat as ordinal (i.e., use medians/IQR and nonparametric tests). |
force_normal |
Character vector of variable names to treat as normally distributed, bypassing all
normality assessment (Gates 1–4 under |
force_continuous |
Character vector of variables to force treatment as continuous (mean |
group_order |
Optional character vector to specify a custom group level order. |
output_xlsx |
Optional filename to export the table as an Excel file. |
output_docx |
Optional filename to export the table as a Word document. |
OR_col |
Logical; if |
OR_method |
Character; controls how odds ratios are calculated when |
consider_normality |
Character or logical; controls how continuous variables are routed to
parametric vs. non-parametric tests.
|
print_normality |
Logical; if |
show_test |
Logical; if |
p_digits |
Integer; number of decimal places for P values (default 3). |
round_intg |
Logical; if |
round_decimal |
Integer or |
smart_rename |
Logical; if |
insert_subheads |
Logical; if |
factor_order |
Character; controls the ordering of factor levels in the output.
|
table_font_size |
Numeric; font size for Word document output tables. Default is 9. |
methods_doc |
Logical; if |
methods_filename |
Character; filename for the methods document. Default is |
category_start |
Named character vector specifying where to insert category headers.
Names are the header label text to display; values are the anchor variable – either the
original column name (e.g. |
plain_header |
Named character vector, same interface as |
manual_italic_indent |
Character vector of display variable names (post-cleaning) that should be
formatted as italicized and indented in Word output – matching the appearance of factor sub-category
rows. Has no effect on the returned tibble; only applies when |
manual_underline |
Character vector of display variable names (post-cleaning) that should be
formatted as underlined in Word output – matching the appearance of multi-category variable headers.
Has no effect on the returned tibble; only applies when |
indent_info_column |
Logical; if |
show_total |
Logical; if |
table_caption |
Optional character string for a table caption to display above the table in
the Word document. Rendered as size 11 Arial bold, single-spaced with a small gap before the table.
Default is |
table_footnote |
Optional character string for a footnote to display below the table in the
Word document. Rendered as size 6 Arial italic with a double-bar border above and below.
Default is |
abbreviation_footnote |
Optional character string listing abbreviations. Always printed
first in the footnote block. Default |
variable_footnote |
Optional named character vector. Names are display variable names
(case-insensitive); values are the footnote definition text. Each variable gets the next
symbol appended to its name in the table, and the footnote block lists each definition
below the abbreviation line. To share one footnote between multiple variables, separate
their names with a pipe: |
index_style |
Character; |
line_break_header |
Logical; if |
post_hoc |
Logical; if |
p_adjust |
Logical; if |
p_adjust_display |
Character; controls how BH-corrected P values appear in the output
when |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family name used for all Word output (table,
captions, footnotes, methods document). Any font installed on the system that
renders the document may be used. Popular options include |
show_missing |
Logical; if |
show_p |
Logical; if |
zero_to_dash |
Logical; if |
percentage_compute |
Character; controls the denominator used when computing percentages
for categorical variables. |
categorical_posthoc |
Logical; if |
show_missingness |
Controls whether a column of missing-value percentages is appended
to the table. Options: |
missing_indicators |
Optional character vector of string values to treat as missing
in addition to (or instead of) the built-in ternP defaults. When |
Details
Independence assumption: all statistical tests applied by this
function (Welch's t-test, Wilcoxon rank-sum, Welch ANOVA,
Kruskal-Wallis, chi-squared, and Fisher's exact) assume that observations
are independent — each row must represent a distinct, unrelated subject.
ternG is not appropriate for repeated-measures, longitudinal, or
clustered data (e.g. pre/post measurements, matched pairs, or patients
nested within sites).
Value
A tibble with one row per variable (multi-row for multi-level factors), showing summary statistics by group, P values, test type, and optionally odds ratios and total summary column.
Examples
data(tern_colon)
# 2-group comparison
ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence",
methods_doc = FALSE)
# 2-group comparison with odds ratios
ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence",
OR_col = TRUE, methods_doc = FALSE)
# 3-group comparison
ternG(tern_colon, exclude_vars = c("ID"), group_var = "Treatment_Arm",
group_order = c("Observation", "Levamisole", "Levamisole + 5FU"),
methods_doc = FALSE)
# 2-group comparison with BH FDR correction (fdr_only — default display)
ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence",
p_adjust = TRUE, methods_doc = FALSE)
# 2-group comparison with BH FDR correction (show raw + corrected side by side)
ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence",
p_adjust = TRUE, p_adjust_display = "both", methods_doc = FALSE)
# Export to Word (writes a file to tempdir)
ternG(tern_colon,
exclude_vars = c("ID"),
group_var = "Recurrence",
OR_col = TRUE,
methods_doc = FALSE,
open_doc = FALSE,
output_docx = file.path(tempdir(), "comparison.docx"),
category_start = c("Patient Demographics" = "Age (yr)",
"Tumor Characteristics" = "Positive Lymph Nodes (n)"))
Preprocess a raw data frame for use with ternG or ternD
Description
ternP() cleans a raw data frame loaded from a CSV or XLSX file,
applying a standardized set of transformations and performing validation
checks before the data is passed to ternG or
ternD.
Usage
ternP(data, mode = "auto", extra_na = NULL, drop_cols = NULL)
Arguments
data |
A data frame or tibble as loaded from a CSV or XLSX file (e.g.
via |
mode |
Preprocessing mode. One of
|
extra_na |
Optional character vector of additional string values to
treat as missing (converted to |
drop_cols |
Optional character vector of column names to drop from
the data before cleaning begins. Intended for use in |
Value
A named list with three elements:
clean_dataA tibble containing the fully cleaned dataset, ready to pass to
ternG()orternD().sparse_rowsA tibble of rows from
clean_datawhere more than 50% of values areNA. These rows are retained inclean_databut extracted here for optional review or download. An empty tibble if no sparse rows exist.feedbackA named list of feedback items. Each element is
NULLif the corresponding transformation was not triggered, or a value describing what changed:string_na_convertedA named list with elements
total(integer count of values converted) andcols(character vector of affected column names), orNULLif no string NA values were found.blank_rows_removedA named list with elements
count(integer) androw_indices(integer vector of original row positions removed), orNULLif none.sparse_rows_flaggedA named list with elements
count(integer) androw_indices(integer vector of row positions inclean_datawith >50% missingness), orNULLif none.case_normalized_varsA named list with elements
cols(character vector of affected column names) anddetail(a named list per column, each withchanged_fromandchanged_tocharacter vectors showing the exact value changes), orNULLif none.
#'
dropped_user_colsCharacter vector of column names explicitly dropped via the
drop_colsparameter, orNULLifdrop_colswas not used.manual_modeLogical.
TRUEwhenmode = "manual"was used (PHI check skipped),FALSEotherwise.dropped_empty_colsCharacter vector of column names (or
""for unnamed columns) that were dropped because they were 100% empty, orNULLif none.date_cols_detectedCharacter vector of column names that appear to contain date values — either R
Date/POSIXcttypes (from Excel) or character columns where\geq80% of non-NA values match a common date pattern (from CSV). These columns are not dropped automatically; the caller should decide whether to exclude them or keep them as categorical variables.
Cleaning pipeline (in order)
Date columns are detected (R
Date/POSIXcttypes, or character columns where\geq80% of values match a common date pattern) and reported infeedback$date_cols_detected. They are not dropped automatically — the caller decides whether to exclude or keep them.String NA values (
"NA","na","N/A","NaN","missing","unknown","unk","not available","not applicable","none","null","nil","-",".","?") are converted toNA(matching is case-insensitive).Leading and trailing whitespace is trimmed from all character columns.
Columns that are 100% empty (all
NA) are silently dropped.Rows where every cell is
NAare removed.Character columns where values differ only by capitalization (e.g.
"Male"vs"MAle") are standardized to title case.
Validation hard stops
ternP() stops with a descriptive error if:
Any column name matches a protected health information (PHI) pattern (e.g.
MRN,DOB,FirstName). De-identified research identifiers such aspatient_id,subject_id, andparticipant_idare explicitly excluded, as are clinical-event dates (admission date, discharge date, visit date, etc.). Only personal-identity dates such as DOB and DOD are flagged.Any column with a blank or whitespace-only header contains data. Completely empty unnamed columns are silently dropped and do not trigger this error.
See Also
ternG for grouped comparisons, ternD for descriptive statistics.
Examples
# Load a messy CSV and preprocess it
path <- system.file("extdata/csv", "tern_colon_messy.csv",
package = "TernTables")
raw <- read.csv(path, stringsAsFactors = FALSE)
result <- ternP(raw)
# Access cleaned data
result$clean_data
# Review preprocessing feedback
result$feedback
# Sparse rows flagged (>50% missing), retained but not removed
result$sparse_rows
Export a custom tibble to Word with TernTables formatting
Description
ternStyle() renders any user-built tibble into a Word document with
the exact same visual style as tables produced by ternG(),
ternD(), and word_export() – Arial font, grey header,
double-bar footer, caption/footnote block, and citation footer.
Usage
ternStyle(
tbl,
filename = NULL,
col1_name = NULL,
subheader_rows = NULL,
bold_rows = NULL,
bold_sig = NULL,
italic_rows = NULL,
bold_cols = NULL,
italic_cols = NULL,
header_format_follow = FALSE,
round_intg = FALSE,
round_decimal = NULL,
font_size = 9,
category_start = NULL,
plain_header = NULL,
manual_italic_indent = NULL,
manual_underline = NULL,
table_caption = NULL,
table_footnote = NULL,
abbreviation_footnote = NULL,
variable_footnote = NULL,
index_style = "symbols",
col1_header = NULL,
line_break_header = FALSE,
open_doc = TRUE,
citation = TRUE,
font_family = getOption("TernTables.font_family", "Arial")
)
Arguments
tbl |
A data frame or tibble. The first column is used as the row-label
column (rendered as "Variable" unless renamed via |
filename |
Output file path ending in |
col1_name |
Optional character string. If supplied, the first column is
renamed to this label in the rendered table. The column need not be named
|
subheader_rows |
Character vector of labels that already exist as rows
in |
bold_rows |
Integer vector of body row indices (1-based, final rendered
table) to bold across every column. Applied after all structural formatting
so it always wins. Default |
bold_sig |
Optional named list for cell-level p-value-based bolding.
Use this when your tibble has pre-formatted p-value strings in columns that
are not named
The Variable column is never modified by bold_sig = list(
p_cols = c("Uni p", "Multi p"),
hr_cols = c("Uni HR (95% CI)", "Multi HR (95% CI)"),
threshold = 0.05
) Default |
italic_rows |
Integer vector of body row indices to italicize across
every column. Default |
bold_cols |
Integer vector of column indices (1-based) to bold across
all body rows. Default |
italic_cols |
Integer vector of column indices to italicize across all
body rows. Default |
header_format_follow |
Logical; if |
round_intg |
Logical; passed to |
round_decimal |
Integer or NULL; if provided, rounds all numeric values in the
table to this many decimal places before rendering. Passed to |
font_size |
Numeric; font size for table body. Default |
category_start |
Named character vector; same as in |
plain_header |
Named character vector; same as in |
manual_italic_indent |
Character vector of row labels to italicize and
indent (sub-item appearance). Default |
manual_underline |
Character vector of row labels to underline (multi-
category header appearance without the full subheader treatment). Default
|
table_caption |
Optional character string for the caption above the
table. Default |
table_footnote |
Optional character string for a footnote below the
table. Default |
abbreviation_footnote |
Optional character string (or character vector)
of abbreviations. Always printed first in the footnote block. Default
|
variable_footnote |
Optional named character vector of per-variable
footnote definitions (case-insensitive name match). To share one footnote
symbol between multiple variables, separate their names with a pipe:
|
index_style |
Character; |
col1_header |
Optional character string. Overrides the top-left header
cell. When |
line_break_header |
Logical; if |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family name used for all Word output.
Defaults to |
Details
Use this function when you have pre-computed summary statistics in a tibble
(e.g. a custom cross-tab or manually assembled output table) and want it to
match the rest of your TernTables document without running it through the full
ternG/ternD pipeline.
Value
Invisibly returns the input tibble (after renaming and coercion)
with a "ternB_meta" attribute attached. This makes the result
directly passable to ternB for bundling with other tables
into a combined Word document.
Examples
library(tibble)
my_tbl <- tibble(
Variable = c("Section A", "Row 1", "Row 2", "Section B", "Row 3"),
`Group 1` = c("", "12 (40%)", "18 (60%)", "", "9 (30%)"),
`Group 2` = c("", "15 (50%)", "15 (50%)", "", "21 (70%)")
)
ternStyle(
tbl = my_tbl,
filename = file.path(tempdir(), "custom_table.docx"),
subheader_rows = c("Section A", "Section B"),
open_doc = FALSE,
citation = FALSE
)
Colon Cancer Recurrence Data (Example Dataset)
Description
A processed subset of the colon dataset restricted to the
recurrence endpoint (etype == 1), providing one row per patient.
Variables have been relabelled with clinically descriptive names and
factor levels suitable for direct use in TernTables functions. This dataset
is provided as a ready-to-use example for demonstrating ternD() and
ternG() functionality.
Usage
tern_colon
Format
A tibble with 929 rows and 12 variables:
- ID
Integer patient identifier.
- Age_Years
Age at study entry (years).
- Sex
Patient sex:
"Female"or"Male".- Colonic_Obstruction
Colonic obstruction present:
"N"or"Y".- Bowel_Perforation
Bowel perforation present:
"N"or"Y".- Positive_Lymph_Nodes_n
Number of positive lymph nodes detected.
- Over_4_Positive_Nodes
More than 4 positive lymph nodes:
"N"or"Y".- Tumor_Adherence
Tumour adherence to surrounding organs:
"N"or"Y".- Tumor_Differentiation
Tumour differentiation grade:
"Well","Moderate", or"Poor".- Extent_of_Local_Spread
Depth of tumour penetration:
"Submucosa","Muscle","Serosa", or"Contiguous Structures".- Recurrence
Recurrence status:
"No Recurrence"or"Recurrence".- Treatment_Arm
Randomised treatment:
"Levamisole + 5FU","Levamisole", or"Observation".
Source
Derived from colon (Laurie et al., 1989).
See colon for full provenance.
Pre-processing script: data-raw/tern_colon.R.
Examples
data(tern_colon)
head(tern_colon)
Format a mean +/- SD string
Description
Format a mean +/- SD string
Usage
val_format(mean, sd)
Arguments
mean |
Numeric mean value. Formatted to 1 decimal place. |
sd |
Numeric standard deviation. Formatted to 1 decimal place. |
Value
A character string of the form "X.X \u00b1 Y.Y" where both values are
rendered to 1 decimal place using fixed-point notation.
Format a P value for reporting
Description
Format a P value for reporting
Usage
val_p_format(p, digits = 3)
Arguments
p |
Numeric P value in the range [0, 1]. |
digits |
Integer; number of decimal places for reported P values. Default is 3.
Note: for p < 0.001, the value is reported in scientific notation with 1 significant figure
regardless of |
Value
A character string. Values < 0.001 are formatted in scientific notation with 1 significant
figure (e.g., "8E-4"). All other values use fixed-point notation rounded to digits
decimal places.
Export TernTables output to a formatted Word document
Description
Export TernTables output to a formatted Word document
Usage
word_export(
tbl,
filename,
round_intg = FALSE,
round_decimal = NULL,
font_size = 9,
category_start = NULL,
plain_header = NULL,
subheader_rows = NULL,
bold_rows = NULL,
bold_sig = NULL,
italic_rows = NULL,
bold_cols = NULL,
italic_cols = NULL,
header_format_follow = FALSE,
manual_italic_indent = NULL,
manual_underline = NULL,
table_caption = NULL,
table_footnote = NULL,
abbreviation_footnote = NULL,
posthoc_footnote = NULL,
variable_footnote = NULL,
index_style = "symbols",
page_break_after = FALSE,
col1_header = NULL,
line_break_header = getOption("TernTables.line_break_header", TRUE),
open_doc = TRUE,
citation = TRUE,
font_family = getOption("TernTables.font_family", "Arial")
)
Arguments
tbl |
A tibble created by ternG or ternD |
filename |
Output file path ending in .docx |
round_intg |
Logical; if TRUE, adds note about integer rounding. Default is FALSE. |
round_decimal |
Integer or NULL; if provided, rounds all numeric values in the table
to this many decimal places before rendering. Default is |
font_size |
Numeric; font size for table body. Default is 9. |
category_start |
Named character vector specifying category headers. Names are header label text; values are anchor variable names – either the original column name or the cleaned display name (both forms accepted). |
plain_header |
Named character vector, same interface as |
subheader_rows |
Character vector of labels that already exist as rows in the table and
should be formatted as full category section headers (merged across all columns, bold, with a
bottom border line). Unlike |
bold_rows |
Integer vector of body row indices (1-based, in the final rendered table) to
bold across every column. Applied as the last formatting pass so it overrides structural
formatting. Default |
bold_sig |
Optional named list for cell-level conditional bolding based on parsed p-values.
Intended for use with
For each p-value cell where the parsed numeric value is below |
italic_rows |
Integer vector of body row indices to italicize across every column.
Default |
bold_cols |
Integer vector of column indices (1-based) to bold across all body rows.
Default |
italic_cols |
Integer vector of column indices to italicize across all body rows.
Default |
header_format_follow |
Logical; if |
manual_italic_indent |
Character vector of display variable names (post-cleaning) to force into italicized and indented formatting, matching the appearance of factor sub-category rows (e.g., levels of a multi-category variable). Use this for rows that should visually appear as sub-items but are not automatically detected as such. |
manual_underline |
Character vector of display variable names (post-cleaning) to force into underlined formatting, matching the appearance of multi-category variable header rows. Use this for rows that should visually appear as section headers but are not automatically detected as such. |
table_caption |
Optional character string to display as a caption above the table in the Word
document. Rendered as size 11 Arial bold, single-spaced with a small gap before the table.
Default is |
table_footnote |
Optional character string to display as a footnote below the table in the Word
document. Rendered as size 6 Arial italic. A double-bar border is applied above and below the
footnote row. Default is |
abbreviation_footnote |
Optional character string (or character vector, which will be
collapsed with spaces) listing abbreviations to display at the top of the footnote block.
Always printed first, before any variable-specific footnote lines. Default |
posthoc_footnote |
Optional character string describing post-hoc CLD superscript
conventions. When supplied by |
variable_footnote |
Optional named character vector. Names are display variable names as
they appear in the table (case-insensitive match); values are the footnote definition text
for that variable. Each entry is assigned the next symbol in the sequence (*, dagger,
double-dagger, ...) and the symbol is appended to the variable name in column 1.
The footnote block lists each as |
index_style |
Character; controls the footnote symbol sequence. |
page_break_after |
Logical; if |
col1_header |
Optional character string. Overrides the top-left header cell text.
When |
line_break_header |
Logical; if |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family used for the entire Word table and its caption,
footnote, and citation. Any font name accepted by the rendering system is valid (Word
will fall back to its default if the font is not installed). Can also be set package-wide
via |
Value
Invisibly returns the path to the written Word file.
Examples
data(tern_colon)
tbl <- ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE, open_doc = FALSE)
word_export(
tbl = tbl,
filename = file.path(tempdir(), "descriptive.docx"),
open_doc = FALSE,
category_start = c(
"Patient Demographics" = "Age (yr)",
"Tumor Characteristics" = "Positive Lymph Nodes (n)"
)
)
Write a cleaning summary document for ternP output
Description
Generates a Word document summarising the preprocessing transformations
applied by ternP. Only sections for triggered transformations
are written; if the data required no preprocessing, a single sentence
stating that is produced instead. The document can be attached to a
data-management log or supplemental materials.
Usage
write_cleaning_doc(
result,
filename = "cleaning_summary.docx",
font_family = getOption("TernTables.font_family", "Arial"),
open_doc = TRUE,
citation = TRUE
)
Arguments
result |
A |
filename |
Output file path ending in |
font_family |
Character; font family for the Word document. Default |
open_doc |
Logical; if |
citation |
Logical; if |
Value
Invisibly returns the path to the written Word file.
See Also
Examples
path <- system.file("extdata/csv", "tern_colon_messy.csv",
package = "TernTables")
raw <- read.csv(path, stringsAsFactors = FALSE)
result <- ternP(raw)
write_cleaning_doc(result, filename = file.path(tempdir(), "cleaning_summary.docx"),
open_doc = FALSE)
Write a methods section Word document for TernTables output
Description
Generates a Word document containing a methods paragraph describing the
statistical approach used in a specific ternG or ternD run.
The paragraph is fully dynamic: it reflects the tests that were actually used,
the number of comparison groups, whether odds ratios were calculated, and
whether post-hoc testing was performed. It is headed by a bold
Statistical Methods label and followed by a brief attribution footer.
Usage
write_methods_doc(
tbl,
filename,
n_levels = 2,
OR_col = FALSE,
OR_method = "dynamic",
source = "ternG",
post_hoc = FALSE,
categorical_posthoc = FALSE,
cat_posthoc_fisher_vars = character(0),
show_missingness = FALSE,
missing_indicators = NULL,
boilerplate = FALSE,
p_adjust = FALSE,
open_doc = TRUE,
citation = TRUE,
font_family = getOption("TernTables.font_family", "Arial")
)
Arguments
tbl |
A tibble created by |
filename |
Output file path ending in |
n_levels |
Number of group levels used in |
OR_col |
Logical; whether odds ratios were calculated. Default |
OR_method |
Character; the OR calculation method used in |
source |
Character; |
post_hoc |
Logical; whether pairwise post-hoc testing was requested
( |
categorical_posthoc |
Logical; whether adjusted standardized residuals
were requested ( |
cat_posthoc_fisher_vars |
Character vector of variable names for which
Fisher's exact test was the omnibus test while |
show_missingness |
Logical or character; whether missingness columns were added
to the table ( |
missing_indicators |
Character vector of string values treated as missing in
addition to R |
boilerplate |
Logical; if |
p_adjust |
Logical; if |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family for the Word document. Default |
Details
When boilerplate = TRUE, all run-specific arguments are ignored and a
comprehensive reference document is written instead, covering all five standard
TernTables configurations with package-default phrasing. See the
boilerplate parameter for details.
Value
Invisibly returns the methods paragraph text as a character string
(or, when boilerplate = TRUE, invisibly returns the output file path).
Useful for programmatic inspection or testing without opening the Word file.
Examples
data(tern_colon)
tbl <- ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence",
methods_doc = FALSE, open_doc = FALSE)
write_methods_doc(tbl, filename = file.path(tempdir(), "methods.docx"),
open_doc = FALSE)
# Write a comprehensive reference document covering all configurations.
write_methods_doc(tbl = NULL,
filename = file.path(tempdir(), "boilerplate_methods.docx"),
boilerplate = TRUE, open_doc = FALSE)