Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GREI 4: Task 6 - Assemble and provide metrics about HDV data collection for therapeutic areas #229

Closed
5 tasks done
Tracked by #118
cmbz opened this issue Apr 18, 2024 · 7 comments
Closed
5 tasks done
Tracked by #118
Assignees
Labels
GREI 4 Analytics and Reporting Harvard Dataverse Issues related to Harvard Dataverse Repository

Comments

@cmbz
Copy link
Contributor

cmbz commented Apr 18, 2024

Overview

  • Assemble and provide metrics about HDV data collection for therapeutic areas
  • Therapeutic areas include: Cardiology, Systems Neuroscience, Quality of Life, Bioinformatics Software, and Real-Time Polymerase Chain Reaction

Tasks

  • Identify HDV source for information about therapeutic areas (e.g., Dataverse subjects and keywords)
  • Map existing HDV information to therapeutic areas (e.g., specific keywords may map to a therapeutic area)
  • Collect metrics about HDV datasets exhibiting related keywords and subjects
  • Share metrics in designated GREI spreadsheet
  • Consider approaches to operationalize this reporting on a regular basis (if needed)

Resources

@cmbz cmbz added GREI 4 Analytics and Reporting Harvard Dataverse Issues related to Harvard Dataverse Repository labels Apr 18, 2024
@cmbz cmbz changed the title Task 6: Assemble and provide metrics about HDV data collection for therapeutic areas GREI 5: Task 6 - Assemble and provide metrics about HDV data collection for therapeutic areas Apr 18, 2024
@cmbz
Copy link
Contributor Author

cmbz commented May 7, 2024

Status: May 2024

  • Julian and Sonia met and discussed a plan to gather as much content as possible given the lack of standards we should be using for this search. This is now due June 6th.
  • Julian shared results of search with Sonia and continued adjusting the search as other GREI repositories added search terms to the GREI spreadsheet

@jggautier
Copy link

jggautier commented May 7, 2024

@sbarbosadataverse, I searched for datasets whose metadata contains:

Could you review the datasets from that search to see if many of these datasets seem irrelevant?

This will help me evaluate the way I'm finding these datasets, before we consider using more search terms like we spoke about, such as the "therapeutic areas" at https://www.cdisc.org/standards/therapeutic-areas/disease-area and the names of NIH centers and institutes.

We also spoke about looking at the keywords and topic classifications in the metadata of datasets from NIH-funded research in the Harvard Dataverse Repository (#217), and using those as search terms, too.

I put those keywords and topic classifications in tabs in the spreadsheet at https://docs.google.com/spreadsheets/d/1OAQiSkgyeb_YdM4rFhl439FUeNadvmg5R5r0d4PN4us. Could you take a look?

My impression is that someone with domain knowledge would need to review these before we can use them for searching. Feels like many of the keywords and especially the topic classifications wouldn't be that helpful, but I'm not sure. Maybe we could use only the keywords when we see that it comes from a relevant vocabulary, like MeSH, SNOMED-CT, and NCIT.

@cmbz
Copy link
Contributor Author

cmbz commented May 8, 2024

Status: June 2024

  • @jggautier added categories and dataset/holding counts from his search results to the the "Harvard Dataverse" tab of the Top Biomedical Research Categories in GREI Repository Holdings spreadsheet. He added a note in that spreadsheet about how he got the counts. Categories with fewer than 10 datasets/holdings were not added to the Top Biomedical Research Categories in GREI Repository Holdings spreadsheet after @sbarbosadataverse suggested this.
  • @jggautier updated the the "Harvard Dataverse" tab and the "Aggregate" tab of the Top Biomedical Research Categories in GREI Repository Holdings spreadsheet with counts of datasets where "covid19" appears in the metadata

@jggautier
Copy link

jggautier commented Jun 20, 2024

@sbarbosadataverse and @cmbz, I'm going to close this GitHub issue. I'm curious how these counts will be used and during the GREI-Monthly CWG Meeting on July 10 I plan to ask about them (unless someone else brings them up).

@jggautier
Copy link

jggautier commented Jun 25, 2024

Re-opening this issue. @sbarbosadataverse asked that I include counts of datasets that include the term "covid19"in their metadata. I'm getting that count now and will update the Harvard Dataverse tab of the the Top Biomedical Research Categories in GREI Repository Holdings spreadsheet today.

@jggautier jggautier reopened this Jun 25, 2024
@jggautier
Copy link

jggautier commented Jun 25, 2024

I updated the "Harvard Dataverse" tab and the "Aggregate" tab of the Top Biomedical Research Categories in GREI Repository Holdings spreadsheet.

@sbarbosadataverse
Copy link

Closed the last remaining checkbox as we consider in a new issue how to make use of the information we collected for this ask from the GREI planning unit

@sbarbosadataverse sbarbosadataverse changed the title GREI 5: Task 6 - Assemble and provide metrics about HDV data collection for therapeutic areas GREI 4: Task 6 - Assemble and provide metrics about HDV data collection for therapeutic areas Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GREI 4 Analytics and Reporting Harvard Dataverse Issues related to Harvard Dataverse Repository
Projects
None yet
Development

No branches or pull requests

3 participants