How do I download a subset of dataset from google fs using the least amount of bandwidth? #8721

andycjw · 2024-02-08T09:39:24Z

andycjw
Feb 8, 2024

this is related to issue I filed on another repo
weiji14/foss4g2023oceania#10

https://github.com/weiji14/foss4g2023oceania/blob/main/0_weatherbench2zarr.py

I tried to download a zarr subset with xarray and selected subset using the code above

using dataset.nbytes shows that it's around 16.99GB, using google colab and their fast internet access to google fs, it still takes 3 hours and used more than 1TB of downloaded data just to get 65GB of subset data.

my expectation was it should only takes 16.99GB of download to get the data from the cloud storage, how should I do that?
I can't run this on my local setup to download this kind of data everytime when it need 1TB of data access, it's too much for my home network.

hansukyang · 2024-02-23T21:38:48Z

hansukyang
Feb 23, 2024

Unfortunately, I think you're a bit stuck here because the u & v component of the wind is chunked with all 37 levels together as you can see below for the gs://weatherbench2/datasets/era5/1959-2022-full_37-6h-0p25deg-chunk-1.zarr-v2 dataset.

So even though you'd like to downloading the data just for 500 hPa level, you still have to download all 37 levels because it's chunked as a single object of 1 x 37 x 721 x 1440 dimension.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I download a subset of dataset from google fs using the least amount of bandwidth? #8721

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How do I download a subset of dataset from google fs using the least amount of bandwidth? #8721

andycjw Feb 8, 2024

Replies: 1 comment

hansukyang Feb 23, 2024

andycjw
Feb 8, 2024

hansukyang
Feb 23, 2024