Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems generating some UMLS2018AA sources with umls2rdf #23

Open
zzgulu opened this issue Jul 27, 2018 · 9 comments
Open

Problems generating some UMLS2018AA sources with umls2rdf #23

zzgulu opened this issue Jul 27, 2018 · 9 comments

Comments

@zzgulu
Copy link

zzgulu commented Jul 27, 2018

Hi
It seems some of the sources like ICD9/10CM are not being generated correctly with the newest version of UMLS (2018AA). Thanks

@jvendetti
Copy link
Member

@zzgulu - could you please be more specific? What are you seeing that appears to be incorrect?

@zzgulu
Copy link
Author

zzgulu commented Jul 27, 2018

@jvendetti ttl files for LOINC, ATC, and MESH were created correctly. However, when I tried to do ICD9CM or ICD10CM the result was an incomplete 30k ttl file with a "successful" script message in the terminal (please see attached). Thanks!
ICD10CM.ttl.zip

@jvendetti
Copy link
Member

@zzgulu - I downloaded ICD9CM and ICD10CM via the BioPortal UI and the REST API multiple times and I'm not able to reproduce this. I get a 25.1 MB file ICD9CM.ttl and 100.2 MB file for ICD10CM.ttl. I've checked the contents of both and they look complete. Are you perhaps trying to download over a slow connection?

@zzgulu
Copy link
Author

zzgulu commented Jul 29, 2018

my apology that I wasn't clear enough. I am using umls2rdf against a local mysql database of UMLS2018AA. The script correctly creates LOINC or ATC but not ICDCMs. I am not downloading ICDx from NCBO Bioportal. Do you think perhaps my UMLS201AA local database is corrupted? Thanks

@ncbo-deployer
Copy link

ncbo-deployer commented Jul 29, 2018 via email

@rwynne
Copy link
Contributor

rwynne commented Aug 6, 2018

@jvendetti I'm able to replicate this with 2018AA-full. The ICD9 and ICD10CM outputs are attached.
Appears only STYs are converted. My umls.conf is below.

ICD10,ICD10_codes.ttl,load_on_codes
ICD9CM;ICD9CM,ICD9CM_codes.ttl,load_on_codes

MedDRA and SNOMEDCT_US have been successful.

ICD9_ICD10CM.zip

EDIT:
Also tried the following to see if it was a parameter parse issue.

ICD10;ICD10,ICD10_codes.ttl,load_on_codes

@graybeal
Copy link
Contributor

graybeal commented Aug 6, 2018

This is interesting feedback, thanks @rwynne. If I recall correctly, the example I looked at with data from @zzgulu showed a line terminating in the middle. It was not a well-formed output, so it may be a separate issue.

@rwynne
Copy link
Contributor

rwynne commented Aug 7, 2018

@graybeal think you're right. This could be a mmsys user/gui issue. Turns out, I was creating a subset of ICD10 with its included content, including ICD10AE. If these included RSABs are loaded to the database, could they confuse the converter? My ICD10, without these selections, looks correct now on first inspection.

mmsys_selection

ICD10_codes.zip

@graybeal
Copy link
Contributor

yes, I think that could cause the problems you were seeing @rwynne.

@graybeal graybeal changed the title UMLS2018AA Problems generating some UMLS2018AA sources with umls2rdf Sep 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants