Adds variable labels and value labels to a data frame based on a metadata
dictionary. This is particularly useful for preparing datasets for use with
packages like haven or for exporting to formats like SPSS or Stata.
Arguments
- data
A data frame containing the raw dataset.
- metadata
A data frame that serves as a metadata dictionary. It must contain at least the columns:
variable_name,label, andtype. Optionally, it may include avaluesetcolumn for categorical variables, which should be a list column with data frames containingvalueandlabelcolumns.- ...
Additional arguments (currently unused).
- set_data_types
Logical; if
TRUE, attempts to coerce column data types to match those implied by the metadata. (Note: currently not fully implemented.)
Value
A `tibble` with the same data as data, but with added attributes:
- Variable labels (via the label attribute)
- Value labels (as a haven::labelled class, if applicable)
Details
The function first checks the structure of the metadata using an internal helper.
Then, for each variable listed in metadata, it:
- Adds a label using the label attribute
- Converts values to labelled vectors using haven::labelled() if a valueset is provided
If value labels are present, the function tries to align data types between the data and the valueset (e.g., converting character codes to integers if necessary).
Examples
data <- data.frame(
sex = c(1, 2, 1),
age = c(23, 45, 34)
)
metadata <- data.frame(
variable_name = c("sex", "age"),
label = c("Gender", "Age in years"),
type = c("categorical", "numeric"),
valueset = I(list(
data.frame(value = c(1, 2), label = c("Male", "Female")),
NULL
))
)
labelled_data <- add_metadata(data, metadata)
str(labelled_data)
#> tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
#> $ sex: dbl+lbl [1:3] 1, 2, 1
#> ..@ labels: Named num [1:2] 1 2
#> .. ..- attr(*, "names")= chr [1:2] "Male" "Female"
#> ..@ label : chr "Gender"
#> $ age: num [1:3] 23 45 34
#> ..- attr(*, "label")= chr "Age in years"