% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cb_get_col_attributes.R
\name{cb_get_col_attributes}
\alias{cb_get_col_attributes}
\title{Get Column Attributes}
\usage{
cb_get_col_attributes(df, .x, keep_blank_attributes = keep_blank_attributes)
}
\arguments{
\item{df}{Data frame of interest}

\item{.x}{Column of interest in df}

\item{keep_blank_attributes}{By default, the column attributes table will omit
the Column description, Source information, Column type, and value labels
rows from the column attributes table in the codebook document if those
attributes haven't been set. In other words, it won't show blank rows for
those attributes. Passing \code{TRUE} to the keep_blank_attributes argument
will cause the opposite to happen. The column attributes table will include
a Column description, Source information, Column type, and value labels
row for every column in the data frame - even if they don't have a value.}
}
\value{
A tibble of column attributes
}
\description{
Used in codebook() to create the top half of the column attributes
table.
}
\details{
Typically, though not necessarily, the first step in creating your
codebook will be to add column attributes to your data. The
\code{cb_add_col_attributes()} function is a convenience function that allows you
to add arbitrary attributes to columns (e.g., description, source, column type).
These attributes can later be accessed to fill in the column attributes table
of the codebook document. Column attributes \emph{can} serve a similar function
to variable labels in SAS or Stata; however, you can assign many different
attributes to a column and they can contain any kind of information you want.

Although the \code{cb_add_col_attributes()} function will allow you to add any
attributes you want, there are currently \strong{only four} special attributes
that the \code{codebook()} function (via \code{cb_get_col_attributes()}) will recognize
and add to the column attributes table of the codebook document. They are:
\itemize{
\item \strong{description}: Although you may add any text you desire to the \code{description}
attribute, it is intended to be used to describe the question/process that
generated the data contained in the column. Many statistical software packages
refer to this as a variable label.
\item \strong{source}: Although you may add any text you desire to the \code{source}
attribute, it is intended to be used to describe where the data contained in
the column originally came from. For example, if the current data frame was
created by merging multiple data sets together, you may want to use the
source attribute to identify the data set it originates from. As another
example, if the current data frame contains longitudinal data, you may want
to use the source attribute to identify the wave(s) in which data for this
column was collected.
\item \strong{col_type}: The \code{col_type} attribute is intended to provide additional
information above and beyond the \verb{Data type} (i.e., column class) about
the values in the column. For example, you may have a column of 0's and 1's,
which will have a \emph{numeric} data type. However, you may want to inform data
users that this is really a dummy variable where the 0's and 1's represent
discrete categories (No and Yes). Another way to think about it is that the
\verb{Data type} attribute is how \emph{R} understands the column and the
\verb{Column type} attribute is how \emph{humans} should understand the column.
Currently accepted values are: \code{Numeric}, \code{Categorical}, or \code{Time}.
\itemize{
\item Perhaps even more importantly, setting the \code{col_type} attribute helps R
determine which descriptive statistics to calculate for the bottom half of
the column attributes table. Inside of the \code{codebook()} function, the
\code{cb_add_summary_stats()} function will attempt to figure out whether the
column is \strong{numeric}, \strong{categorical - many categories (e.g. participant id)},
\strong{categorical - few categories (e.g. sex)}, or \strong{time - including dates}.
Again, this matters because the table of summary stats shown in the codebook
document depends on the value \code{cb_add_summary_stats()} chooses. However, the
user can directly tell \code{cb_add_summary_stats()} which summary stats to
calculate by providing by adding a \code{col_type} attribute to a column with
one of the following values: \code{Numeric}, \code{Categorical}, or \code{Time}.
}
\item \strong{value_labels}: Although you may pass any named vector you desire to the \code{value_labels}
attribute, it is intended to inform your data users about how to correctly
interpret numerically coded categorical variables. For example, you may have
a column of 0's and 1's that represent discrete categories (i.e., "No" and
"Yes") instead of numerical quantities. In some many other software packages
(e.g., SAS, Stata, and SPSS), you can layer "No" and "Yes" labels on top of
the 0's and 1's to improve the readability of your analysis output. These
are commonly referred to as \emph{value labels}. The R programming language does
not really have value labels in the same way that other popular statistical
software applications do. R users can (and typically should) coerce
numerically coded categorical variables into factors;
however, coercing a numeric vector to a factor is not the same as adding value labels to a
numeric vector because the underlying numeric values can change in the
process of creating the factor. For this, and other reasons, many R
programmers choose to create a \emph{new} factor version of a numerically encoded
variable as opposed to overwriting/transforming the numerically encoded
variable. In those cases, you may want to inform your data users about how
to correctly interpret numerically coded categorical variables. Adding value
labels to your codebook is one way of doing so.
}
}
\keyword{internal}
