Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTL file load errors due to chunked data loading feature #155

Closed
alexskr opened this issue May 8, 2024 · 2 comments · Fixed by #156
Closed

TTL file load errors due to chunked data loading feature #155

alexskr opened this issue May 8, 2024 · 2 comments · Fixed by #156

Comments

@alexskr
Copy link
Member

alexskr commented May 8, 2024

We encountered an error while parsing the UMLS (TTL) ontology:

I, [2024-05-08T22:01:02.600753 #1470563]  INFO -- : ["Starting to process http://data.bioontology.org/ontologies/MDRGER/submissions/8"]
I, [2024-05-08T22:01:02.606373 #1470563]  INFO -- : ["Starting to process MDRGER/submissions/8"]
I, [2024-05-08T22:01:02.801761 #1470563]  INFO -- : ["Using UMLS turtle file found, skipping OWLAPI parse"]
E, [2024-05-08T22:01:11.151685 #1470563] ERROR -- : ["Error sending data to triple store - 400 RestClient::BadRequest: MALFORMED DATA: Turtle parser error while parsing an input stream on or around line 500000: Expected mandatory token '.', got 'eof'"]

This problem is related to PR #122 which introduces chunked data loading. The feature fails when handling TTL files exceeding 500000 lines with the AllegroGraph triplestore due to its strict Turtle file checker. AllegroGraph expects to load complete Turtle statements that end with a period (.) but chunked data loading feature breaks up turtle statement before reaching the end of the statement. We have not tested this with 4store so similar issue might exist.

UMLS ontologies are processed differently from the other types, where .ttl file is loaded into the triplestore instead of the owlapi.xrdf

 499984 <http://purl.bioontology.org/ontology/MDRGER/10071099> a owl:Class ;
 499985         skos:prefLabel """H5N1-Influenza-Impfung"""@de ;
 499986         skos:notation """10071099"""^^xsd:string ;
 499987         <http://purl.bioontology.org/ontology/MDRGER/classified_as> <http://purl.bioontology.org/ontology/MDRGER/10059429> ;
 499988         umls:cui """C3160880"""^^xsd:string ;
 499989         umls:tui """T061"""^^xsd:string ;
 499990         umls:hasSTY <http://purl.bioontology.org/ontology/STY/T061> ;
 499991  .
 499992
 499993 <http://purl.bioontology.org/ontology/MDRGER/10064980> a owl:Class ;
 499994         skos:prefLabel """Neutralisierende Antikoerper positiv"""@de ;
 499995         skos:notation """10064980"""^^xsd:string ;
 499996         rdfs:subClassOf <http://purl.bioontology.org/ontology/MDRGER/10021504> ;
 499997         <http://purl.bioontology.org/ontology/MDRGER/classifies> <http://purl.bioontology.org/ontology/MDRGER/10064983> ;
 499998         <http://purl.bioontology.org/ontology/MDRGER/member_of> <http://purl.bioontology.org/ontology/MDRGER/20000214> ;
 499999         <http://purl.bioontology.org/ontology/MDRGER/SMQ_TERM_LEVEL> """4"""^^xsd:string ;
 500000         <http://purl.bioontology.org/ontology/MDRGER/MPS> """10022891"""^^xsd:string ;
 500001         umls:cui """C1609515"""^^xsd:string ;
 500002         umls:tui """T034"""^^xsd:string ;
 500003         umls:hasSTY <http://purl.bioontology.org/ontology/STY/T034> ;
 500004  .
 500005
@alexskr
Copy link
Member Author

alexskr commented May 8, 2024

a temporary workaround is to bump up chunk_lines from 500,000 to a larger number to effectively disable this feature without full rollback

chunk_lines = 500_000 # number of line

@syphax-bouazzouni
Copy link

hello @alexskr,

The chunked load works only for ntriples format, not ttl.

The fix here is to not do the chunk load for ttl, or use another method of chunking for it, not by number of lines, but by the number of turtle blocks.

We didn't go through this bug at Agroportal, as we don't have UMLS or any ttl ontology.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants