-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems generating some UMLS2018AA sources with umls2rdf #23
Comments
@zzgulu - could you please be more specific? What are you seeing that appears to be incorrect? |
@jvendetti ttl files for LOINC, ATC, and MESH were created correctly. However, when I tried to do ICD9CM or ICD10CM the result was an incomplete 30k ttl file with a "successful" script message in the terminal (please see attached). Thanks! |
@zzgulu - I downloaded ICD9CM and ICD10CM via the BioPortal UI and the REST API multiple times and I'm not able to reproduce this. I get a 25.1 MB file ICD9CM.ttl and 100.2 MB file for ICD10CM.ttl. I've checked the contents of both and they look complete. Are you perhaps trying to download over a slow connection? |
my apology that I wasn't clear enough. I am using umls2rdf against a local mysql database of UMLS2018AA. The script correctly creates LOINC or ATC but not ICDCMs. I am not downloading ICDx from NCBO Bioportal. Do you think perhaps my UMLS201AA local database is corrupted? Thanks |
Without more information it is difficult to do more than guess.
I note that it is the middle of ICD10CM that is missing; the beginning and end appear coherent. This is a bit 'old school', but the only thing I can think of that would do that is if the receiving process for your converted ICD9/10 content is blocked, and the pipe for the data stream can not stop the sending stream and does not have a large buffer.
In this case the beginning of the stream gets processed, but then the receiving process is blocked by other priorities. Finally when the sending process completes, the receiving process completes what is in the buffer. If the buffer is 32K bytes (often a default), that is all that ends up in the "rest of the file."
This guess is consistent with the size of the file, which is just a bit over 32K characters (in fact, the overage is almost exactly the size of the beginning of your ICD10CM file).
|
@jvendetti I'm able to replicate this with 2018AA-full. The ICD9 and ICD10CM outputs are attached. ICD10,ICD10_codes.ttl,load_on_codes ICD9CM;ICD9CM,ICD9CM_codes.ttl,load_on_codes MedDRA and SNOMEDCT_US have been successful. EDIT: ICD10;ICD10,ICD10_codes.ttl,load_on_codes |
@graybeal think you're right. This could be a mmsys user/gui issue. Turns out, I was creating a subset of ICD10 with its included content, including ICD10AE. If these included RSABs are loaded to the database, could they confuse the converter? My ICD10, without these selections, looks correct now on first inspection. |
yes, I think that could cause the problems you were seeing @rwynne. |
Hi
It seems some of the sources like ICD9/10CM are not being generated correctly with the newest version of UMLS (2018AA). Thanks
The text was updated successfully, but these errors were encountered: