Skip to contents

Adds variable labels and value labels to a data frame based on a metadata dictionary. This is particularly useful for preparing datasets for use with packages like haven or for exporting to formats like SPSS or Stata.

Usage

add_metadata(data, metadata, ..., set_data_types = FALSE)

Arguments

data

A data frame containing the raw dataset.

metadata

A data frame that serves as a metadata dictionary. It must contain at least the columns: variable_name, label, and type. Optionally, it may include a valueset column for categorical variables, which should be a list column with data frames containing value and label columns.

...

Additional arguments (currently unused).

set_data_types

Logical; if TRUE, attempts to coerce column data types to match those implied by the metadata. (Note: currently not fully implemented.)

Value

A `tibble` with the same data as data, but with added attributes: - Variable labels (via the label attribute) - Value labels (as a haven::labelled class, if applicable)

Details

The function first checks the structure of the metadata using an internal helper. Then, for each variable listed in metadata, it: - Adds a label using the label attribute - Converts values to labelled vectors using haven::labelled() if a valueset is provided

If value labels are present, the function tries to align data types between the data and the valueset (e.g., converting character codes to integers if necessary).

Examples

data <- data.frame(
  sex = c(1, 2, 1),
  age = c(23, 45, 34)
)

metadata <- data.frame(
  variable_name = c("sex", "age"),
  label = c("Gender", "Age in years"),
  type = c("categorical", "numeric"),
  valueset = I(list(
    data.frame(value = c(1, 2), label = c("Male", "Female")),
    NULL
  ))
)

labelled_data <- add_metadata(data, metadata)
str(labelled_data)
#> tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
#>  $ sex: dbl+lbl [1:3] 1, 2, 1
#>    ..@ labels: Named num [1:2] 1 2
#>    .. ..- attr(*, "names")= chr [1:2] "Male" "Female"
#>    ..@ label : chr "Gender"
#>  $ age: num [1:3] 23 45 34
#>   ..- attr(*, "label")= chr "Age in years"