-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Still "writing" MTHSPL triples after 24 hrs, even with 244 GB RAM #29
Comments
I'm trying again now with UMLS 2019AA and fresh pull of Python 2.7 and Ubuntu 18 on an AWS EC2 I set it up with MTHSPL as the only source:
It's been running for about 45 minutes now, most of that time completely idle. 0% CPU activity and 0 bytes/second disk activity.
head:
tail:
After 9 hours
That's ~ 350 classes/hour https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MTHSPL/ says
Is this really going to take 170,000/350 = 485 hours!? |
If I set debug mode to
|
I'm running the umls2rdf script on an Ubuntu 16 AWS EC2 server. I bump the RAM up to 128 GB when I'm doing this. I have extracted several other, larger sources with zero or minimal difficulty. I'm using UMLS 2018AA. I'm extracting on CUIs.
I haven't done any MySQL tuning, but the SQL portion of the extraction goes quickly... less than 5 minutes, I think. I have tried to do this with the MTHSPL content combined with other sources in a single mmsys extract/MySQL database, and I have also tried doing MTHSPL in a database all by itself, which has been helpful with some of the other sources.
The triples writing has been going for over 1 day, but I don't think the Turtle file's size has grown beyond roughly 400 MB in the last 10 hours.
top
shows the python process at 100% CPU but a pretty small RAM usage... ~ 10 GB, I think.select count(distinct CUI) from MRCONSO;
in a MTHSPL-only database says there are 58,041 CUIs used by MTHSPL. I have loaded the Turtle content that I have after one day into a triplestore, and that only shows 3,633 CUIs from MTHSPL.The text was updated successfully, but these errors were encountered: