Copy of NHANES public data and codebook

This repository maintains snapshots of publicly available NHANES data and documentation. The primary goal is to use these to eventually populate a database containing the same information in a more structured manner.

Requirements

A recent version of R along with the nhanesA package, which can be installed from CRAN or GitHub. To avoid multiple downloads in case the process is interrupted and needs to be restarted, local caching can be enabled using some Bioconductor tools. The following code should set up the system appropriately.

install.packages(c("httpuv", "BiocManager"), repos = "https://cloud.r-project.org")
BiocManager::install("BiocFileCache")
BiocManager::install_github("cjendres1/nhanes")
BiocManager::install_github("ccb-hms/cachehttp")

Run local caching server (optional)

The first step is to start up the caching server. The following code must be run in a separate R session, which must not terminate before all downloads are complete.

require(cachehttp)
add_cache("cdc", "https://wwwn.cdc.gov",
          fun = function(x) {
              x <- tolower(x)
              ok <- endsWith(x, ".htm") || endsWith(x, ".xpt")
              cat(x, if (ok) " [ TRUE ]" else " [ FALSE ]", fill = TRUE)
              ok
          })
s <- start_cache(host = "0.0.0.0", port = 8080,
                 static_path = BiocFileCache::bfccache(BiocFileCache::BiocFileCache()))
## httpuv::stopServer(s) # to stop the httpuv server

It is best to keep an eye on the session to see if anything unexpected is happening. In that case, closing the session and restarting usually works. One thing to look out for is that the temporary directory does not run out of space.

The advantage of doing it this way is that files will not be downloaded from the CDC website more than once unless it has been updated, even if the process is repeated after six months (provided the cache has not been removed).

Download manifests from CDC website and record collection date

Run (from the top-level directory)

R --vanilla < code/update_manifests.R

This downloads and saves current manifests of public and limited access data tables from the CDC NHANES website, along with a manifest of available variables. Subsequent downloads are based on the information in these manifests.

The current date is recorded in a file named COLLECTION_DATE.

Download raw data and convert to CSV

The script code/convert2csv.R downloads the XPT files and converts them to CSV files, keeping track of updates. It should be run from the top-level directory. If the local cache is enabled as described above, this can be done from the command line using

export NHANES_TABLE_BASE="http://127.0.0.1:8080/cdc"
R --vanilla < code/convert2csv.R

To download directly from the CDC website, skip setting the environment variable.

Download documentation files

To download the HTML documentation files (for publicly accessible datasets), similarly run

export NHANES_TABLE_BASE="http://127.0.0.1:8080/cdc"
R --vanilla < code/htmldoc.R

To download the HTML documentation files for limited access datasets, run

export NHANES_TABLE_BASE="http://127.0.0.1:8080/cdc"
R --vanilla < code/htmldoc-limited-access.R

Generate codebooks and other metadata

To extract metadata and codebook information from the documentation files, we follow a two-step process. The first step is to run

R --vanilla < code/extract-metadata.R

to process all available documentation files and extract variable metadata and codebooks. This step is a little slow (~15 minutes) as it needs to process all documentation files one by one. The combined results are stored in metadata/all_metadata.rds.

The second step is to run

R --vanilla < code/process-metadata.R

This uses the manifests and the results of the first step to produce three data frames:

codebookDF containing codebooks for all variables
variablesDF containing descriptions of variables
tablesDF containing descriptions of tables

These are saved as corresponding RDS files in the metadata directory.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
code		code
data		data
docs		docs
metadata		metadata
timestamp		timestamp
COLLECTION_DATE		COLLECTION_DATE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Copy of NHANES public data and codebook

Requirements

Run local caching server (optional)

Download manifests from CDC website and record collection date

Download raw data and convert to CSV

Download documentation files

Generate codebooks and other metadata

About

Releases

Packages

Languages

deepayan/nhanes-snapshot

Folders and files

Latest commit

History

Repository files navigation

Copy of NHANES public data and codebook

Requirements

Run local caching server (optional)

Download manifests from CDC website and record collection date

Download raw data and convert to CSV

Download documentation files

Generate codebooks and other metadata

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages