Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Level of granularity of ctgov sponsor data #17

Open
ccunningham101 opened this issue Aug 3, 2023 · 1 comment
Open

Level of granularity of ctgov sponsor data #17

ccunningham101 opened this issue Aug 3, 2023 · 1 comment

Comments

@ccunningham101
Copy link
Contributor

ccunningham101 commented Aug 3, 2023

Do we actually want to use ROR for ctgov sponsor names?
Around 5% of sponsors get mapped to another sponsor name

Most generally make sense:
array(['Columbia University', 'Teachers College, Columbia University']
array(['Second Affiliated Hospital, School of Medicine, Zhejiang University',
'Zhejiang University'], dtype=object)
array(['University of Michigan Rogel Cancer Center',
'University of Michigan'], dtype=object)
array(['National Taiwan University Hospital',
'National Taiwan University Hospital Hsin-Chu Branch'],
dtype=object)

Some we might deem necessary:
array(['University of Alexandria', 'Alexandria University'], dtype=object)

But maybe there was a reason to keep them separate? And we should trust what separate accounts have been made on ctgov
array(['NYU Langone Health', 'New York University',
'NYU College of Dentistry'], dtype=object)
array(['University of North Carolina, Chapel Hill',
'UNC Lineberger Comprehensive Cancer Center'], dtype=object)
array(['Wake Forest University', 'Wake Forest University Health Sciences'],
dtype=object)
array(['Mansoura University', 'Mansoura University Children Hospital']

And some are incorrect (maybe because one or more of the sites does not exist in ROR)
array(['Mayo Clinic', 'Malo Clinic'], dtype=object)

In the absence of other information city/country information, it will be hard to check the ROR match

@ccunningham101
Copy link
Contributor Author

  1. Decision to go with the level of data that ROR has access to i.e. if it combines NYU Langone Health and New York University, but does not combine Oxford University Hospitals NHS Trust and Oxford University that is okay
  2. We can scan the data manually for incorrect mappings i.e. Mayo Clinic/Malo Clinic
  3. We could maybe steal site data to get city/country to add to sponsor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant