Releases: ccb-hms/NHANES-metadata
Releases · ccb-hms/NHANES-metadata
v3.11.1
v3.11.0
Change Log
Internal changes:
- Decouple ontology mapping details from
nhanes_variables.tsv
table to a separate table callednhanes_variables_processed.tsv
, which is used downstream for ontology mapping (closes #32).- specifically, the columns
ProcessedText
andTags
have been removed fromnhanes_variables.tsv
.
- specifically, the columns
- Add the expert-verified oral health phenotype mappings to the
nhanes_variables_mappings.tsv
table with the tag"human verified"
(closes #34).
v3.10.0
Change Log
Internal changes:
- Modify blocklist to include all variables that match patterns identified in phonto's vignettes (VariableClassification.md) (closes #31).
- Fix bug where unmapped terms were duplicated in the output mappings table (closes #25).
- Investigate whether replacing labels of oral health variables with their synonyms and remapping them would yield more accurate mappings, which it did not (closes #26).
- Add the oral health, ontology-mapped variables to
blocklist.csv
such that they no longer need to be mapped (closes #33). - Fix issue where the value for Use Constraints contained newline characters or was empty when it should be "None".
v3.9.1
Change Log
User-facing changes:
- Modify the metadata pipeline to output
nhanes_variables.tsv
with capitalized TRUE/FALSE in the IsPhenotype and OntologyMapped columns (closes #29). - Update the metadata README to describe in more the detail the metadata acquisition process and additional processing done on the metadata.
v3.9.0
Change Log
User-facing changes:
- Add a column to the
nhanes_variables.tsv
table calledIsPhenotype
that specifies if a variable is a phenotype or not (closes #15). - Add oral health variable mappings and synonyms contributed by the Harvard School of Dental Medicine (closes #1).
- Add table
metadata/synonym_table.tsv
of 'variable label synonyms' containing alternate descriptions of variables in oral health tables. - Add table
ontology-mappings/nhanes_oral_health_mappings.tsv
of human-verified mappings for variables in oral health tables.
- Add table
Internal changes:
- Add non-phenotype templates in phonto vignettes to our (non-phenotype) blocklist (#15).
- Add new patterns of non-phenotypes found in low-scored mappings (#16).
- Add variables with only EnglishInstructions (e.g., "check item" variables or instructions-only variables) to the non-phenotype "blocklist" (closes #23).
v3.8.2
v3.8.1
v3.8.0
Change Log
User-facing changes:
- Update Metadata tables to contain all Variable and Table IDs in uppercase (closes #20).
- Add
EnglishInstructions
column tonhanes_variables.tsv
table.- Some variables do not have a name or description but contain some instructions.
- Update ontology versions to EFO v3.60.0 and NCI Thesaurus v23.09d
Internal changes:
v3.7.0
Change Log
User-facing changes:
- Add pandemic (P_) tables.
- Add NHANES tables metadata table to
nhanes_metadata.db
. - Add CC-BY 4.0 License.
Internal changes:
- Use nhanesA from its latest state on GitHub.
v3.6.0
Change Log
User-facing changes:
- Update the variable codebooks table
nhanes_variables_codebooks.tsv
with additional codebooks that were previously unattainable using nhanesA, and which we can now get withnhanesA_0.8.0
. - Add disease locations associated with NCIT and FOODON terms.
- Also includes locations expressed in universal restrictions (i.e., 'pancreas disease' disease_has_location only 'pancreas').
- Update ontology set to use EFO v3.57.0, which results in updated ontology mappings (or just updated scores).
- Include DIsease Ontology (DOID) terms as potential mapping targets.
- Modify prototype search module
nhanes_metadata_search.py
to allow queries with multiple search terms.
Internal changes:
- Use optimized
nhanesA v0.8.0
for faster codebook retrieval. - Add dedicated module to build a sqlite DB of the metadata, which outputs a (xz compressed) sqlite database file containing all the generated tables.
- This facilitates browsing and testing the generated tables.
- The resulting database is used with
nhanes_metadata_search.py
.
- Use SemSQL
gzip
database distributions, which provide more up to date ontology versions. - Add argument to include (or not) disease locations associated with ontology terms.
- Update conditions used to flag if variables are mapped, since we include now in the output table all variables even if they have not been mapped (these get a mapping score of 0).