-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support labelled vectors #73
Comments
Could you give a small usage example? I have no experience with the labelled package. |
Of course, see below. # Load packages
library(tidyverse)
library(labelled)
library(readstata13)
library(here)
# Create data frame with labelled vectors
dta <-
data.frame(
lab_vct =
sample(c(1:3, 999), 100, replace = T) %>%
labelled(
labels =
c(
"a" = 1,
"b" = 2,
"c" = 3,
"dk" = 999
)
)
)
# Add variable label
var_label(dta$lab_vct) <- "Variable information here"
# Write to disk using save.dta13 (includes no variable/value labels)
save.dta13(data = dta, file = here("file.dta"), compress = T) When we open the file in Stata we see that there are no variable or value labels: Whereas if we save with write_dta from the haven package we can see them (though unfortunately cannot compress the Stata file, which can yield huge file sizes where data sets are large). Finally, I would add that labelled vectors are often used when creating data for Stata as they allow one to specify specific label-value combinations (e.g. that "don't know" = 999) as opposed to factors numbering everything sequentially. They also allow you to add variable labels too. |
It might be a good idea to add some code for labelled columns. I want to add some functions for working with labels anyway. In the meantime you could just prepare you data manually before exporting:
|
Thanks. I agree that supporting labelled vectors would be useful. While I could convert the labelled vectors to factors first, it's not useful in this case as it changes the numbers to be sequential. However, we need, for example, "Don't know" to be 999 in all cases and can't do this with factors that number each value label sequentially. |
I think it is also possible to keep the numeric codes. I’ll take a look into that. |
I did a rough implementation of labelled vectors in d8af207 . The code will probably change in the future, but you might give it a try: # install from readstata13 from branch labels
remotes::install_github("sjewo/readstata13@label")
library(tidyverse)
library(labelled)
library(readstata13)
library(here)
# Create data frame with labelled vectors
dta <-
data.frame(
lab_vct =
sample(c(1:3, 999), 100, replace = T) %>%
labelled(
labels =
c(
"a" = 1,
"b" = 2,
"c" = 3,
"dk" = 999
)
)
)
# Add variable label
var_label(dta$lab_vct) <- "Variable information here"
# Get variable labels
var_labs <- var_label(dta)
# Replace missing labels with ""
var_labs[sapply(var_labs, is.null)] <- ""
# Unlist and order variable labels
var_labs <- unlist(var_labs)[names(dta)]
# Save variable labels as attribute
attr(dta, "var.labels") <- var_labs
# Write to disk using save.dta13 (includes now variable and value labels)
save.dta13(data = dta, file = here("file.dta"), compress = T) |
Thanks for the package. It would be great if (missing) values such as 999 (don't know) could be recoded into Stata missing values such as .a |
At the moment, readstata13 is the only package available that can write compressed Stata .dta files. But the package does not play nicely with labelled vectors from the tidyverse labelled package. Instead, it treats them as numeric and, thus, removes all value and variable labels when it writes them to disk.
Any chance that readstata13 could support labelled vectors?
The text was updated successfully, but these errors were encountered: