You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have two questions. Not sure if this is a generic NCBI issue, or related to the datasets API. Happy to forward the query elsewhere.
I came across this problem recently for the genome of Hevea brasiliensis - taxid 3981 - reference genome assembly GCF_030052815.1.
I thought that having locus_tags was a requirement for genomes to be deposited / queried in the NCBI. However, it seems like the genes in the nuclear genome of this assembly do not have locus_tags:
Here the sequence of both is identical but only one has a locus_tag. There are also cases where there are features that exist in one but not the other.
Question 2: is it common that the annotations in GenBank and RefSeq records differ?
Thank you so much for your help!
Best,
Manu
The text was updated successfully, but these errors were encountered:
Hi @manulera - sorry for the delay in responding, this issue seems to have slipped by.
I've looked at the underlying data that we use for these genes and it looks like there isn't any for those examples you gave (as an example, if you look at the same organism without the filter: https://ncbi.nlm.nih.gov/datasets/gene/GCF_030052815.1/ you will see locus tags).
So bottom line this looks like this is some type of data curation issue. We're bringing it up internally to see if we can get this fixed. Someone will update here once we have some more information.
As for your second question, the GenBank record is the original submitted reference while the RefSeq record is a copy of this with different types of added value (which can lead to different names, etc.). The idea is not to interfere with the original submission but to be able to add various types of curation. Hope that helps!
Hello,
I have two questions. Not sure if this is a generic NCBI issue, or related to the datasets API. Happy to forward the query elsewhere.
I came across this problem recently for the genome of Hevea brasiliensis - taxid 3981 - reference genome assembly GCF_030052815.1.
I thought that having
locus_tag
s was a requirement for genomes to be deposited / queried in the NCBI. However, it seems like the genes in the nuclear genome of this assembly do not have locus_tags:https://ncbi.nlm.nih.gov/datasets/gene/GCF_030052815.1/?search=rubber
Question 1: is it to be expected that
locus_tags
are missing, or is it an issue with this assembly in particular?I went to the refseq (https://www.ncbi.nlm.nih.gov/nuccore/NC_079493.1/) and GenBank (https://www.ncbi.nlm.nih.gov/nuccore/CM057502.1?report=genbank&log$=seqview) records. Below is an example of the same CDS in both records:
NC_079493
CM057502
Here the sequence of both is identical but only one has a locus_tag. There are also cases where there are features that exist in one but not the other.
Question 2: is it common that the annotations in GenBank and RefSeq records differ?
Thank you so much for your help!
Best,
Manu
The text was updated successfully, but these errors were encountered: