-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New taxon format causing our GAF readers to crash: Curie(namespace='NCBITaxon', identity='1280') #287
Comments
Does it only show up in the goa_human.gpad, or this occurs in other gpad as well? Parsing this wouldn't slow you down too much. If parsing your original pattern But yeah I do think it's worth bringing this up here just to see if this still matches the gpad file spec / standards. Haibao |
Hi Haiboa, Our parsers sometimes find small errors in the gpad files. I report this just in case it is an error and will mess up parsers other than our own which may silently not process the data correctly. The last time we found an error reading a gpad file, it turned out to be a showstopper bug with missing data in the annotation files (https://github.com/geneontology/go-annotation/issues/2885). So seeing a new format in the gpad files, I think it is best to report it. Of course, we can always change the parser to read the new format. But just in case it will affect other researchers, it is best to report it. |
#183 2. Changed code to workaround new formats in Gene Ontology Consortium's annotations https://github.com/geneontology/go-annotation/issues/3373 geneontology/go-annotation#3523 3. Moved reldepth calculations into its own module to support Wang's method and to give researcher ability to calc reldepths with subset of relationships geneontology/go-annotation#3523
Hi @dvklopfenstein, Thank you for reporting this. According to the current GPAD file format (you can find this at http://geneontology.org/docs/gene-product-association-data-gpad-format/ ), we expect files to only have the taxon number without the other characters. I will pass this info on to either resolve the issue upstream and/or add in a check for this column. |
@dougli1sqrd would this be a Rules issue or @kltm @pgaudet is this an issue from upstream files- although most of the lines are attributed to UniProt, ParkinsonsUK-UCL, BHF-UCL, CAFA, ARUK-UCL, & DIBU have the incorrect format in goa_human.gpad |
Exploration for this issue is underway at biolink/ontobio#489 . |
Great, thanks. @dvklopfenstein, thank you again for reporting this and I apologise that it was buried before we could address the issue. If you find any other issues in the future, please feel free to report them in this repo. I am closing this ticket to avoid duplicating /biolink/ontobio#489 - if you would follow the ontobio ticket, it sounds like they are working on a fix. |
Hello,
Thank you for the great annotations. They are extremely helpful.
Our GPAD reader tests are FAILing due to a taxon in goa_human.gpad, downloaded just now from http://current.geneontology.org/annotations, having this format in 526 lines:
Curie(namespace='NCBITaxon', identity='<some integer'>)
rather than the format of either:
Should we support this new format, which will make reading GPAD files slower, or can the 526 GPAD file taxon lines be changed from:
to:
cc: @tanghaibao @JudoWill @dvklopfenstein
The text was updated successfully, but these errors were encountered: