Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64-bit offset NetCDF and unlimited dimensions #231

Closed
Alexander-Barth opened this issue Oct 10, 2023 · 8 comments
Closed

64-bit offset NetCDF and unlimited dimensions #231

Alexander-Barth opened this issue Oct 10, 2023 · 8 comments

Comments

@Alexander-Barth
Copy link
Member

Alexander-Barth commented Oct 10, 2023

This code with 64-bit offset NetCDF files and unlimited dimensions are failing with current master:

using NCDatasets
using Test
sz = (4,5,3)
filename = tempname()

NCDataset(filename,"c",format=:netcdf3_64bit_offset) do ds
    ds.dim["lon"] = sz[1]
    ds.dim["lat"] = sz[2]
    ds.dim["time"] = Inf

    T = Float32
    v = defVar(ds,"var-$T",T,("lon","lat","time"))
    j = 1
    v[:,:,j] = fill(T(j), sz[1:2])
end

This is the stack-trace:

julia> include("/home/abarth/.julia/dev/NCDatasets/test/test_variable_unlim2.jl");
ERROR: LoadError: ArgumentError: Chunk sizes must be strictly positive
Stacktrace:
  [1] RegularChunks
    @ ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:26 [inlined]
  [2] #24
    @ ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:133 [inlined]
  [3] map (repeats 3 times)
    @ ./tuple.jl:318 [inlined]
  [4] DiskArrays.GridChunks(a::Tuple{Int64, Int64, Int64}, chunksize::Tuple{Int64, Int64, Int64}; offset::Tuple{Int64, Int64, Int64})                              
    @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:132
  [5] GridChunks
    @ ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:131 [inlined]
  [6] #GridChunks#18
    @ ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:130 [inlined]
  [7] DiskArrays.GridChunks(a::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, chunksize::Tuple{Int64, Int64, Int64})                                                  
    @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:130
  [8] eachchunk(v::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}})
    @ NCDatasets ~/.julia/dev/NCDatasets/src/variable.jl:407
  [9] _writeblock!(::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, ::Array{Float32, 3}, ::Base.OneTo{Int64}, ::Vararg{AbstractVector})                            
    @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/batchgetindex.jl:187
 [10] writeblock!(::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, ::Array{Float32, 3}, ::Base.OneTo{Int64}, ::Vararg{AbstractVector})                            
    @ NCDatasets ~/.julia/packages/DiskArrays/QlfRF/src/batchgetindex.jl:213
 [11] setindex_disk!(::NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, ::Matrix{Float32}, ::Function, ::Vararg{Any})                                                   
    @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/diskarray.jl:57
 [12] setindex!
    @ ~/.julia/packages/DiskArrays/QlfRF/src/diskarray.jl:229 [inlined]
 [13] setindex!(::CommonDataModel.CFVariable{Float32, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Nothing, Tuple{}, Vararg{Nothing, 5}}}}, ::Matrix{Float32}, ::Colon, ::Colon, ::Int64)                                                                                  
    @ CommonDataModel ~/.julia/dev/CommonDataModel/src/cfvariable.jl:419
 [14] (::var"#29#30")(ds::NCDataset{Nothing})
    @ Main ~/.julia/dev/NCDatasets/test/test_variable_unlim2.jl:18
 [15] NCDataset(::var"#29#30", ::String, ::Vararg{String}; kwargs::Base.Pairs{Symbol, Symbol, Tuple{Symbol}, NamedTuple{(:format,), Tuple{Symbol}}})                 
    @ NCDatasets ~/.julia/dev/NCDatasets/src/dataset.jl:255
 [16] top-level scope
    @ ~/.julia/dev/NCDatasets/test/test_variable_unlim2.jl:7
 [17] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [18] top-level scope
    @ REPL[36]:1
in expression starting at /home/abarth/.julia/dev/NCDatasets/test/test_variable_unlim2.jl:7

I am using DiskArray 0.3.22. Note 64-bit offset NetCDF are always unchunked. The error does not appear with NetCDF4 files or with NCDatasets 0.12.

@tcarion or @rafaqz , do you have any ideas ?

My tentative reproduce with NetCDF.jl fails with a different error:

import NetCDF
fname2 = tempname()
time_dim = NetCDF.NcDim("time",Int[],unlimited=true);
lon_dim = NetCDF.NcDim("lon",1:3);
v = NetCDF.NcVar("obs",[lon_dim,time_dim]);
NetCDF.create(fname2,v; mode=NetCDF.NC_64BIT_OFFSET) do nc
   nc["obs"][:,1:3] = rand(3,3)
end;

which produces Not a valid data type or _FillValue type mismatch.

@rafaqz
Copy link
Member

rafaqz commented Oct 10, 2023

Hmm

@meggart recently added a check to avoid broken chunks (zero or negative size)

https://github.com/meggart/DiskArrays.jl/blob/7d297f8b4a0067f7c93cb84e8c2d42b88754e54e/src/chunks.jl#L26

This is explicitly erroring now instead of allowing an object with broken chunks, it is good to have that guarantee in most cases except yours, where you are trying to break the chunks!

I have warned that the chunks would break if you used setindex! like this ;)

But I don't know what to do besides removing the safety check and let you have broken chunks. I do have a simple solution that will take a few hours to implement, but you know all about that...

@Alexander-Barth
Copy link
Member Author

Alexander-Barth commented Oct 10, 2023

I think the problem of this issue here is not that difficult.
For a NetCDF 3 file, the NetCDF library reports a file always a unchunked and the chunk size defaults to the initial array size (here for example (4,5,0)). But technically, for variables with unlimited dimensions the data is actually chunked for NetCDF3 (see e.g. https://www.unidata.ucar.edu/software/netcdf/workshops/2007/performance/FileFormat.html)
Simply replacing any zero in the chunk size by one (in NCDatasets) gives the actual chunk size on disk and may solve this issue .

See: https://github.com/Alexander-Barth/NCDatasets.jl/pull/232/files

@rafaqz
Copy link
Member

rafaqz commented Oct 10, 2023

Well seems slightly hacky, but if it works!

@Alexander-Barth
Copy link
Member Author

Yes, it seems quite hacky. I am wondering if the netcdf library is actually corrected to report variables with unlimited dimensions as continuous while they are saved in a non-contiguous manner on disk.

Ref:
Unidata/netcdf-c#2224

@rafaqz
Copy link
Member

rafaqz commented Oct 12, 2023

Is it possible to check for unlimited variables manually when you load a file, and effectively ignore what netcdf reports?

@Alexander-Barth
Copy link
Member Author

yes, this is certainly possible but it could cause some confusion. Maybe NCDatasets.chunking should just report what nc_inq_var_chunking says, but for DiskArrays (and NCDatasets.getchunksize) we would use the chunk size that corresponds to the contiguously data blocks on file (for performance).

@rafaqz
Copy link
Member

rafaqz commented Oct 12, 2023

Yes that makes sense, the effective chunk pattern is different to the official chunk pattern.

Alexander-Barth added a commit that referenced this issue Oct 26, 2023
NetCDF3 files (64-bit offset NetCDF,...) and unlimited dimensions (issue #231)
@Alexander-Barth
Copy link
Member Author

Resolved in a0ca027 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants