Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to download all count data associated with a species? #10

Open
kMutagene opened this issue Jul 13, 2022 · 2 comments
Open

How to download all count data associated with a species? #10

kMutagene opened this issue Jul 13, 2022 · 2 comments

Comments

@kMutagene
Copy link

kMutagene commented Jul 13, 2022

Hi there, thank you for this project!

What's the recommended procedure for downloading all available count data for a species? I am trying this:

mdat <- getDEE2Metadata("athaliana")
accessions <- as.vector(mdat$SRR_accession)

full_dataset <- getDEE2(
  species = "athaliana",
  SRRvec = accessions,
  metadata = mdat,
  outfile = "C:/Users/schneike/Documents/dee2_athaliana_full.zip"
)

However, it seems like the URL constructed for this query is too long (it contains 45k accession numbers), as i am getting this error message:

Error in getURL(URL = murl, FUN = download.file, N.TRIES = 1L, destfile = zipname,  : 
  'getURL()' failed: <LARGE TRUNCATED URL>
In addition: Warning messages:
1: In FUN(URL, ...) : downloaded length 0 != reported length 321
2: In FUN(URL, ...) :
  cannot open URL <LARGE TRUNCATED URL>

is there a suggested procedure on how to download this in chunks and combine the results afterwards?

@markziemann
Copy link
Owner

Bulk data dumps can be downloaded with http from this address https://dee2.io/mx/

It is in long format with 3 columns. Run, gene, count. 

I hope this helps.

I could add a new function to getDEE2 to wrap this process if you think it would be useful?

@kMutagene
Copy link
Author

Bulk data dumps can be downloaded with http from this address https://dee2.io/mx/ It is in long format with 3 columns. Run, gene, count.

I knew about those, but was hoping for an easy way to combine all runs as SummarizedExperiment, as i am also interested in the metadata. My current approach is downloading all bundles via getDEE2_bundle, and then combining the SummarizedExperiments via cbind.

I could add a new function to getDEE2 to wrap this process if you think it would be useful?

I think that would be great, especially since count data and qc summary are separated files in the dump, while a combines SummarizedExperiment contains both count and metadata.

Thanks again for DEE2, it is a great project and has already saved me a ton of time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants