-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Way to access NCBI STAT data in bulk #11
Comments
I don't know if this would be helpful for you, but the answer to my
question I eventually found through contacting the NCBI help email.
The SRA does not store itself this sort of data, it is contracted to cloud
services, but you can freely access them. They sent me these links:
https://www.ncbi.nlm.nih.gov/sra/docs/sra-bigquery/
https://www.ncbi.nlm.nih.gov/sra/docs/sra-bigquery-examples/
https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud-based-examples/
By setting up a bigquery free sandbox account I have been able to access
all of the raw outputs of the data as I was hoping to, i.e. for any run you
can get access to the list of taxonomic names being generated. So you can
setup a search to select from the metadata table only something containing
bat coronavirus, and then maybe use the accession id to crossreference all
the information on taxonomy or whatever it is you might be wanting to do.
I hope this helps,
-Rocky Whitesell
…On Thu, May 20, 2021 at 4:39 PM babarlelephant ***@***.***> wrote:
Same question, any way to find the list of SRA containing Bat coronavirus
in the taxonomy?
Some people proved that even when it looks obvious non-sense, at least for
high quality runs checking the few viral reads can lead to interesting
results and even new genomes assembly.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQB6LLD35U7KYVODHF3QIDTOVXPJANCNFSM4KLH4WAQ>
.
|
Thanks a lot @Jalapenobadger. I could get all the accessions mentioning Coronaviridae in the taxonomy analysis (the full one visible in the html source code, the analysis tab of https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR2063951 is only showing the best matches) I created a gmail account (I had to enter my phone number) then in https://console.cloud.google.com/bigquery I ran
I saved it as "local csv" obtaining 16000 results. To obtain the whole 229293 results I did "save on google drive". Be careful that this interface is limited for free accounts, unless you enter a credit card number and get 300$ free tokens. |
Hey, glad I could help. I think there's a whole lot of potential that is
being overlooked in these databases, I wish they were more widely known.
…On Fri, May 21, 2021 at 5:44 PM babarlelephant ***@***.***> wrote:
Thanks a lot @Jalapenobadger <https://github.com/Jalapenobadger>. I could
get all the accessions mentioning Coronaviridae in the taxonomy analysis
(the full one visible in the html source code,
https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR2063951 is only showing
the best matches)
I created a gmail account (I had to enter my phone number) then at
https://console.cloud.google.com/bigquery
and typed
SELECT acc FROM nih-sra-datastore.sra_tax_analysis_tool.tax_analysis
WHERE name= "Coronaviridae"
Then I did save as "local csv"
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQB6LPK5GCNBJSFMZ2LY6TTO3H4VANCNFSM4KLH4WAQ>
.
|
Hi,
I'm wondering is there any way to access the taxonomic data that STAT is automatically generating on each NCBI run? Every metagenomic upload on the SRA has this analysis generated and displayed as a Krona, but is there a route by which we could download this data in simple text form for playing around with association rule mining?
Also, is there a roadmap or website besides github anywhere dedicated to this project? Is there anywhere people can find more information about STAT like who works on it or what your future goals for it might be?
Thanks!
-Pete
The text was updated successfully, but these errors were encountered: