Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAXArrays seems to download too much data #358

Open
SimonDanisch opened this issue Jan 11, 2024 · 2 comments
Open

YAXArrays seems to download too much data #358

SimonDanisch opened this issue Jan 11, 2024 · 2 comments

Comments

@SimonDanisch
Copy link

I'm trying the example from the docs:

using Zarr, YAXArrays, Dates, DimensionalData

store = "gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710/"
g = open_dataset(zopen(store, consolidated=true))
c = g["tas"]
ct = c[Ti=At(Date("2018-08-01"):Day(10):Date("2050-08-01"))]

in_memory = ct.data[:, :, :]

This takes reaally long and fills up all my RAM (32gb).
A few infos:

The selected slice:

image

Download speed of the julia process

image

I was expecting it to only download the 328mb, but from the download speed and RAM usage I suspect it's downloading much more data, making it almost impossible to download this part of the dataset...
Am I missing something or is this a bug, or just a limitation of the package?

@Balinus
Copy link
Contributor

Balinus commented Jan 12, 2024

One thought I have in mind reading the example. I might be wrong though.

Depending on the chunks of the zarr folder on Google, the specific slice asked will still need to download the whole dataset between 2018 and 2050, probably a little bit more for the edges on 2018 and 2050. The whole dataset between 2018 and 2050 is 3.21GB. Is it closer to your measurement?

c = g["tas"]
ct = c[Ti=At(Date("2018-08-01"):Date("2050-08-01"))]
384×192×11689 YAXArray{Float32,3} with dimensions: 
  Dim{:lon} Sampled{Float64} 0.0:0.9375:359.0625 ForwardOrdered Regular Points,
  Dim{:lat} Sampled{Float64} Float64[-89.28422753251364, -88.35700351866494, , 88.35700351866494, 89.28422753251364] ForwardOrdered Irregular Points,
  Ti Sampled{DateTime} DateTime[2018-08-01T00:00:00, , 2050-08-01T00:00:00] ForwardOrdered Irregular Points
units: K
name: tas
Total size: 3.21 GB

@Balinus
Copy link
Contributor

Balinus commented Jan 12, 2024

Note that I tried to do the same approach in Python and it seems to behave similarly

(in python, I specified the whole timeseries between 2018 and 2050 for simplicity)

import xarray as xr
import zarr

file = 'gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710/'
ds = xr.open_dataset(file, engine='zarr')

c = ds.tas
ct = c.sel(time=slice("2018-08-01", "2050-08-01"))
%time ct.values

CPU times: user 3min 19s, sys: 1min 29s, total: 4min 49s
Wall time: 21min 58s
Out[12]:
array([[[216.41226, 216.48257, 216.44742, ..., 216.32828, 216.38297,
         216.40054],

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants