Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Save YAXArray or Dataset into a Zarr group #348

Open
danlooo opened this issue Oct 25, 2023 · 2 comments
Open

Feature request: Save YAXArray or Dataset into a Zarr group #348

danlooo opened this issue Oct 25, 2023 · 2 comments

Comments

@danlooo
Copy link
Member

danlooo commented Oct 25, 2023

Multiple Datasets in the Common Data Model V4 can be stored in the same file.
Hereby, they are organized in (nested) groups, analog to files in directories and subdirectories.

For example, xarray.Dataset.to_zarr has the option group to specify the path inside the zarr storage in which the dataset should be stored.
Similarily, zarr.hierarchy.group has the option path to specify the (group) path. The prototype (and part of xarray roadmap) xarray-datatree uses this to represent a tree of Datasets as its own type. I think it is already implemented in Zarr.jl function Zarr.zcreate in option name.

This is of particular importance when it comes to store data cubes of different spatio-temporal resolutions in the same store. I'd be great to have an additional group option to the function savedataset and savecube.

@danlooo danlooo changed the title Feature request: Save YAXArray into a Zarr group Feature request: Save YAXArray or Dataset into a Zarr group Oct 25, 2023
@lazarusA
Copy link
Collaborator

data cubes of different spatio-temporal resolutions

https://juliadatacubes.github.io/YAXArrays.jl/dev/examples/generated/UserGuide/creating/#creating-a-dataset

isn't this case already. You can always pass bunch of YAXArrays of different dimensions into a dataset that can be saved as a .zarr file, or?

@danlooo
Copy link
Member Author

danlooo commented Oct 27, 2023

Datasets are to store multiple variables sampled over the same grid defined by their shared axes. However, the e.g. spatial axes of different resolutions are not the same. Trying this:

using YAXArrays
using Zarr
high_res_cube = YAXArray(rand(10, 10, 3))
low_res_cube = YAXArray(rand(5, 5, 3))
ds = Dataset(high_res = high_res_cube, low_res = low_res_cube)
savedataset(ds; path = "foo.zarr", driver=:zarr)

also returns an error when it comes to saving the dataset on disk:

ERROR: ArgumentError: Can not construct YAXArray, supplied data size is (10, 10, 3) while axis lenghts are (5, 5, 3)
Stacktrace:
  [1] YAXArray(axes::Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, data::ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, properties::Dict{String, Any}, chunks::DiskArrays.GridChunks{3}, cleaner::Vector{YAXArrays.Cubes.CleanMe})
    @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/R6KY3/src/Cubes/Cubes.jl:110
  [2] #YAXArray#5
    @ ~/.julia/packages/YAXArrays/R6KY3/src/Cubes/Cubes.jl:129 [inlined]
  [3] collectfromhandle(e::NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}, dshandle::YAXArrayBase.ZarrDataset, cleaner::Vector{YAXArrays.Cubes.CleanMe})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/R6KY3/src/DatasetAPI/Datasets.jl:403
  [4] #102
    @ ~/.julia/packages/YAXArrays/R6KY3/src/DatasetAPI/Datasets.jl:564 [inlined]
  [5] iterate
    @ ./generator.jl:47 [inlined]
  [6] collect_to!(dest::Vector{YAXArray{Float64, 3, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}}, offs::Int64, st::Int64)
    @ Base ./array.jl:840
  [7] collect_to_with_first!(dest::Vector{YAXArray{Float64, 3, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}}}, v1::YAXArray{Float64, 3, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}}, st::Int64)
    @ Base ./array.jl:818
  [8] _collect(c::Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}}, #unused#::Base.EltypeUnknown, isz::Base.HasShape{1})
    @ Base ./array.jl:812
  [9] collect_similar(cont::Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}})
    @ Base ./array.jl:711
 [10] map(f::Function, A::Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}})
    @ Base ./abstractarray.jl:3261
 [11] savedataset(ds::Dataset; path::String, persist::Nothing, overwrite::Bool, append::Bool, skeleton::Bool, backend::Symbol, driver::Symbol, max_cache::Float64, writefac::Float64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/R6KY3/src/DatasetAPI/Datasets.jl:564
 [12] top-level scope
    @ REPL[20]:1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants