Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small number of GAF annotations have wrong aspect (BP, MF, CC) #288

Closed
dvklopfenstein opened this issue Aug 27, 2019 · 17 comments
Closed

Small number of GAF annotations have wrong aspect (BP, MF, CC) #288

dvklopfenstein opened this issue Aug 27, 2019 · 17 comments
Assignees

Comments

@dvklopfenstein
Copy link

Thank you for GO and the annotations. It is crucial to be able to write scripts to manage gene products based on GO.

I am seeing a few annotations in the GAF files which have incorrect aspects (biological_process, molecular_function, and cellular_component).

For example, I only see one mismatch in goa_cow.gaf, Date Generated by GOC: 2019-07-01, on line 106,182 where the aspect for GO:0030247 is P, meaning biological_process, but the namespace in go-basic.obo (data-version: releases/2019-07-01) is molecular_function:

[Term]
id: GO:0030247
name: polysaccharide binding
namespace: molecular_function

Fields              | Values
--------------------|-----------------------
DB                  | UniProtKB
DB_ID               | Q30309
DB_Symbol           | BoLA-DRA
GO_ID               | GO:0030247
DB_Reference        | set(['GO_REF:0000107'])
Evidence_Code       | IEA
With_From           | set(['ensembl:ENSP00000378786', 'UniProtKB:P01903'])
Aspect              | P
DB_Name             | set(['BoLA-DR-alpha'])
DB_Synonym          | set(['BOLA-DRA', 'HLA-DRA', 'BoLA-DRA'])
DB_Type             | protein
Taxon               | [9913]
Date                | 2019-06-01
Assigned_By         | Ensembl

The attached file shows the other annotations with mismarked namespaces. The table below shows the quantity of mismatches per file.

Mismatches | GAF files
-----------|-------------
         1 | goa_cow.gaf
         1 | goa_dog.gaf
         1 | goa_pig.gaf
         2 | goa_human.gaf
         3 | pamgo_mgrisea.gaf
         3 | rgd.gaf
        88 | jcvi.gaf

namespace_errors.txt

@ValWood
Copy link

ValWood commented Nov 29, 2020

@cmungall @pgaudet @thomaspd

Outreach need to monitor this tracker.

It would be helpful if the submitter could be guided who to tag/assign by a template.

@dvklopfenstein
Copy link
Author

Thanks. Yes, where should I be reporting the changes in annotation format or other anomalies seen while running the GOATOOLS test suite?

@pgaudet pgaudet transferred this issue from geneontology/go-annotation Nov 30, 2020
@pgaudet
Copy link
Contributor

pgaudet commented Nov 30, 2020

Hi @dvklopfenstein

The helpdesk tracker is a good place to add those issues, we can transfer them to the right tracker as appropriate.

In this case I think this is a pipeline issue, caused be the ontology file being slightly out of synch withe the annotation file. @kltm Is this right ?

Thanks, Pascale

@pgaudet
Copy link
Contributor

pgaudet commented Nov 30, 2020

Sorry my comment was about the obsolete terms.

@kltm Can we add a rule to repair bad ontology aspects in the GAF files we export ?

@kltm
Copy link
Member

kltm commented Dec 1, 2020

@pgaudet That's a possibility, and possibly easy, assuming that we have the closures already on hand. Tagging @dougli1sqrd to see if I'm correct.

@dougli1sqrd
Copy link

We actually have a Repair rule in place already GORULE:0000028, and this is running in our pipeline performing repairs as of May 2019. I found a place where this rule is working, even in the release you're referencing, @dvklopfenstein, here: http://release.geneontology.org/2019-07-01/reports/gramene_oryza-report.html#gorule-0000028. (Allow a few moments for the file to fully load, as that report is large.)

So there's a minor mystery as to why it wouldn't be repairing in this case. The pipeline is looking at the namespace field in the ontology, like you are pointing out, and if the aspect doesn't match then it's replaced with the one stated in the ontology. This could go wrong if:

  • The ontology was somehow wrong (which doesn't seem to be the case here?)
  • The OBO JSON ontology format that the pipeline uses in practice doesn't correctly have the OIO:hasOBONamespace property for some terms?
  • Maybe some other more mysterious mechanism

@kltm We do have a function in ontobio as well that can compute the aspect given it's place in the ontology by computing the ancestor closure, but it's not being used, favoring the metadata strategy above.

@pgaudet
Copy link
Contributor

pgaudet commented Dec 3, 2020

Is this repair taking place at the parsing step ? Because I think the problem is with the 'predictions' files (for goa_human-prediction.gaf); are those re-parsed and repaired ?

My understanding was that the 'prediction' software was passing on the original GO aspect rather than looking up the new one.

Thanks, Pascale

@dougli1sqrd
Copy link

@pgaudet oh interesting, okay. Yeah, I'm less familiar with the predictions process, as owltools does that still.

But no, they are not re-parsed and repaired. So in the original comment above, the listing of incorrect aspect based on file, those are actually the prediction versions of those files?

@pgaudet
Copy link
Contributor

pgaudet commented Dec 3, 2020

Actually maybe that's not right. I am not sure we export the predictions ?

@pgaudet
Copy link
Contributor

pgaudet commented Dec 3, 2020

Plus P01903 has an IDA to polysaccharide binding, so it would not be a prediction. (I think)

@pgaudet
Copy link
Contributor

pgaudet commented Dec 3, 2020

It seems a bit suspicious that gorule-0000028 has 0 errors:
http://release.geneontology.org/2020-11-17/reports/gorule-report.html

We need to start implementing the examples again to make sure the rules are working.

Pascale

@dougli1sqrd
Copy link

It hasn't always had zero errors. Previous releases for gramene_oryza had goruel-0000028 errors.

@ValWood
Copy link

ValWood commented Dec 3, 2020

SInce these are goa gafs, maybe this is an EBI issue @alexsign ?
Some of the annotations are Ensemlb pipeline which originate at goa....

@alexsign
Copy link

alexsign commented Dec 4, 2020

@ValWood Hi Val, can you please give me a real example you see anywhere in GOA files. We do not store aspect information in our database at all. Each annotation gets assigned to (F, P or C) from GO term itself on the unload. We can only can have them all right or all wrong.

@pgaudet
Copy link
Contributor

pgaudet commented Dec 4, 2020

OK - I dont know what happened in the 2019-07 release. But in the current release, at least the example here Q30309

  • GO:0030247 is correctly assigned to F aspect in both the source and the GAF we produce and export.
    I also checked dog, pig, human and rat files, and they are all OK with repsect to the error reported here. I did not check pamgo and jcvi as they are now incorporated in the uniprot-all file.

In any case, this is not an issue as of today. @dvklopfenstein hopefully you can use a newer version of the GO data.

Thanks, Pascale

@pgaudet pgaudet closed this as completed Dec 4, 2020
@pgaudet
Copy link
Contributor

pgaudet commented Dec 4, 2020

@alexsign I think this is out of date - whatever bug seems fixed.

@ValWood
Copy link

ValWood commented Dec 4, 2020

OK maybe I'm wrong. It seemed odd that some of the annotations in the txt file listed above originate from GOA. This means, I assume that the namespace must get munged later?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants