Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting calendar of an Array or Dataset? #357

Closed
Balinus opened this issue Jan 10, 2024 · 5 comments
Closed

Converting calendar of an Array or Dataset? #357

Balinus opened this issue Jan 10, 2024 · 5 comments

Comments

@Balinus
Copy link
Contributor

Balinus commented Jan 10, 2024

Hello!

I'd like to convert a given dataset from a "regular" calendar to a DateTimeNoLeap calendar (and dropping all 29th Feb data along the way).

Is there a way to do it? I tried to find a solution, but they all involved loading the data in-memory, checking datetime vector, dropping the indexes associated with Feb 29th and then rebuilding a Dataset. I'd like it to be lazy because most of the datasets are in the 20GB range.

Thanks for any pointer!

@felixcremer
Copy link
Member

You can use the Not selector with a vector of the leap days either with an inner At or Near selector.

julia> winter = r1[Ti=Not(At([DateTime(1995, 4,16),DateTime(1995,10,16,12)]))]
julia> typeof(winter.data)
DiskArrays.SubDiskArray{Union{Missing, Float32}, 3}

You have to be careful to match the Date or DateTime type with the type of your time axis.
This subsetting will give you a lazy SubDiskArray which handles the subsetting via the metadata without having to load the data to memory.

@Balinus
Copy link
Contributor Author

Balinus commented Jan 10, 2024

Thanks! I was about to write a potentiel solution that seems to work no matter the type of Date/DateTime

ref_tasmax[Time=Dates.monthday.(lookup(ref_tasmax, :Ti)) .!= [(2,29)]]

Seems to be lazy and worked with dataset with regular Datetime and DateTimeNoLeap calendars. Then, I will need to replace the axis with a new type (e.g. from DateTime to DateTimeNoLeap). I guess using the rebuild function?

Does this code make sense?

EDIT
Here's what I got, seems to work, not sure about lazyness, but it seems right when I look at @time ouputs.

using YAXArrays
using CFTime

function drop29thfeb(ds)
    
    ds_subset = ds[Time=Dates.monthday.(lookup(ds, :Ti)) .!= [(2,29)]]
    
    date_vec = lookup(ds_subset, :Ti)
    # New time vector
    datevec_noleap = CFTime.reinterpret.(DateTimeNoLeap, date_vec)
    
    axlist = (    
        Dim{:lon}(lookup(ds_subset,:lon)),
        Dim{:lat}(lookup(ds_subset,:lat)),  
        Dim{:Time}(datevec_noleap),
        )
    
    return YAXArray(axlist, ds_subset.data)
    
end

@felixcremer
Copy link
Member

That looks good.
You could replace your subsetting with a Where selector like this:

ds[Ti=Where(x-> Dates.monthday(x) != (2,29))]

but I am not sure whether there is much of a difference.
I think there might also a way to only rebuild the time axis and leave the other axes as is.
But this is some DimensionalData function that I would also need to lookup.
You might want to not rebuild the lon and lat axis like that but use `dims(ds_subset, :lon) instead this way you get the whole axis with all of the correct reversings and so on.

Could you add this example to the docs. That would be really nice.

@Balinus
Copy link
Contributor Author

Balinus commented Jan 12, 2024

That looks good. You could replace your subsetting with a Where selector like this:

ds[Ti=Where(x-> Dates.monthday(x) != (2,29))]

but I am not sure whether there is much of a difference. I think there might also a way to only rebuild the time axis and leave the other axes as is. But this is some DimensionalData function that I would also need to lookup. You might want to not rebuild the lon and lat axis like that but use `dims(ds_subset, :lon) instead this way you get the whole axis with all of the correct reversings and so on.

Could you add this example to the docs. That would be really nice.

Thanks! Yes, I will include it. Somewhere specific in the documentation?

@Balinus Balinus closed this as completed Jan 12, 2024
@Balinus
Copy link
Contributor Author

Balinus commented Jan 12, 2024

Also, I'm unsure how the "Time" dimension (Ti?) is handled. It seems to be a "special" dimensions and when I rebuild the time dimension now behaves as a regular dimension. Sometimes, the time dimension is accessible with Ti, sometimes not.

For example:

using Zarr, YAXArrays, Dates, DimensionalData

store = "gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710/"
g = open_dataset(zopen(store, consolidated=true))

YAXArray Dataset
Shared Axes: 
()
Variables: 
height, 
tas
 with dimensions: 
  Dim{:lon} Sampled{Float64} 0.0:0.9375:359.0625 ForwardOrdered Regular Points,
  Dim{:lat} Sampled{Float64} Float64[-89.28422753251364, -88.35700351866494, , 88.35700351866494, 89.28422753251364] ForwardOrdered Irregular Points,
  Ti Sampled{DateTime} DateTime[2015-01-01T03:00:00, , 2101-01-01T00:00:00] ForwardOrdered Irregular Points
Properties: Dict{String, Any}("initialization_index" => 1, "realm" => "atmos", "variable_id" => "tas", "external_variables" => "areacella", "branch_time_in_child" => 60265.0, "data_specs_version" => "01.00.30", "history" => "2019-07-21T06:26:13Z ; CMOR rewrote data to be consistent with CMIP6, CF-1.7 CMIP-6.2 and CF standards.", "forcing_index" => 1, "parent_variant_label" => "r1i1p1f1", "table_id" => "3hr")

The printed dataset suggest using Ti. But:

g.Ti
ArgumentError: Ti not found in Dataset

Stacktrace:
 [1] getindex(x::Dataset, i::Symbol)
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/no0he/src/DatasetAPI/Datasets.jl:144
 [2] getproperty(x::Dataset, k::Symbol)
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/no0he/src/DatasetAPI/Datasets.jl:141
 [3] top-level scope
   @ In[6]:1

while using time works:

g.time
Ti DateTime[2015-01-01T03:00:00, , 2101-01-01T00:00:00]

However, using the lookup method, the inverse is found:

lookup(Cube(g), :Ti)
Sampled{DateTime} ForwardOrdered Irregular DimensionalData.Dimensions.LookupArrays.Points
wrapping: 251288-element Vector{DateTime}:
 2015-01-01T03:00:00
 2015-01-01T06:00:00
 2015-01-01T09:00:00
 2015-01-01T12:00:00
  
 2100-12-31T15:00:00
 2100-12-31T18:00:00
 2100-12-31T21:00:00
 2101-01-01T00:00:00

lookup(Cube(g), :time)
MethodError: no method matching lookup(::Nothing)

Closest candidates are:
  lookup(::Union{Val{<:DimensionalData.Dimensions.Dimension}, Type{<:DimensionalData.Dimensions.Dimension}})
   @ DimensionalData ~/.julia/packages/DimensionalData/R7veM/src/Dimensions/dimension.jl:176
  lookup(::DimensionalData.Dimensions.AnonDim)
   @ DimensionalData ~/.julia/packages/DimensionalData/R7veM/src/Dimensions/dimension.jl:383
  lookup(::DimensionalData.Dimensions.Dimension{<:AbstractArray})
   @ DimensionalData ~/.julia/packages/DimensionalData/R7veM/src/Dimensions/dimension.jl:175
  ...


Stacktrace:
 [1] lookup(ds::Tuple{Dim{:lon, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Ti{DimensionalData.Dimensions.LookupArrays.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}, I::Symbol)
   @ DimensionalData.Dimensions ~/.julia/packages/DimensionalData/R7veM/src/Dimensions/dimension.jl:233
 [2] lookup(A::YAXArray{Float32, 3, ZArray{Float32, 3, Zarr.BloscCompressor, GCStore}, Tuple{Dim{:lon, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, Vector{Float64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Ti{DimensionalData.Dimensions.LookupArrays.Sampled{DateTime, Vector{DateTime}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}}, args::Symbol)
   @ DimensionalData ~/.julia/packages/DimensionalData/R7veM/src/array/array.jl:62
 [3] top-level scope
   @ In[16]:1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants