Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genes without locus tag in GCF_030052815.1 #397

Open
manulera opened this issue Sep 4, 2024 · 1 comment
Open

Genes without locus tag in GCF_030052815.1 #397

manulera opened this issue Sep 4, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@manulera
Copy link

manulera commented Sep 4, 2024

Hello,

I have two questions. Not sure if this is a generic NCBI issue, or related to the datasets API. Happy to forward the query elsewhere.

I came across this problem recently for the genome of Hevea brasiliensis - taxid 3981 - reference genome assembly GCF_030052815.1.

I thought that having locus_tags was a requirement for genomes to be deposited / queried in the NCBI. However, it seems like the genes in the nuclear genome of this assembly do not have locus_tags:

https://ncbi.nlm.nih.gov/datasets/gene/GCF_030052815.1/?search=rubber

Question 1: is it to be expected that locus_tags are missing, or is it an issue with this assembly in particular?

I went to the refseq (https://www.ncbi.nlm.nih.gov/nuccore/NC_079493.1/) and GenBank (https://www.ncbi.nlm.nih.gov/nuccore/CM057502.1?report=genbank&log$=seqview) records. Below is an example of the same CDS in both records:

NC_079493

     CDS             complement(642191..642643)
                     /gene="LOC110662440"
                     /note="Derived by automated computational analysis using
                     gene prediction method: Gnomon."
                     /codon_start=1
                     /product="ferredoxin, root R-B2"
                     /protein_id="XP_021677096.2"
                     /db_xref="GeneID:110662440"
                     /translation="MATVTVPSQCMVKIAPKNQFASTIIKNPCSLGSVRSISKSFRLK
                     CSQNFKASMAVYKIKLIGPEGEEQEFDAADDTYILDAAENAGVELPYSCRAGACSTCA
                     GKMVSGSVDQSDGSFLDETQMKEGYLLTCISYPTSDCVIYTHQESELC"

CM057502

     CDS             complement(642191..642643)
                     /locus_tag="P3X46_000044"
                     /codon_start=1
                     /product="hypothetical protein"
                     /protein_id="KAJ9188672.1"
                     /translation="MATVTVPSQCMVKIAPKNQFASTIIKNPCSLGSVRSISKSFRLK
                     CSQNFKASMAVYKIKLIGPEGEEQEFDAADDTYILDAAENAGVELPYSCRAGACSTCA
                     GKMVSGSVDQSDGSFLDETQMKEGYLLTCISYPTSDCVIYTHQESELC"

Here the sequence of both is identical but only one has a locus_tag. There are also cases where there are features that exist in one but not the other.

Question 2: is it common that the annotations in GenBank and RefSeq records differ?

Thank you so much for your help!

Best,
Manu

@manulera manulera added the bug Something isn't working label Sep 4, 2024
@manulera manulera changed the title Genes without locus tag Genes without locus tag in GCF_030052815.1 Sep 4, 2024
@syntheticgio
Copy link

syntheticgio commented Dec 5, 2024

Hi @manulera - sorry for the delay in responding, this issue seems to have slipped by.

I've looked at the underlying data that we use for these genes and it looks like there isn't any for those examples you gave (as an example, if you look at the same organism without the filter: https://ncbi.nlm.nih.gov/datasets/gene/GCF_030052815.1/ you will see locus tags).

So bottom line this looks like this is some type of data curation issue. We're bringing it up internally to see if we can get this fixed. Someone will update here once we have some more information.

As for your second question, the GenBank record is the original submitted reference while the RefSeq record is a copy of this with different types of added value (which can lead to different names, etc.). The idea is not to interfere with the original submission but to be able to add various types of curation. Hope that helps!

Thanks!
John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants