Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

views of variables don't use the DiskArrays interface #274

Open
tiemvanderdeure opened this issue Feb 6, 2025 · 2 comments
Open

views of variables don't use the DiskArrays interface #274

tiemvanderdeure opened this issue Feb 6, 2025 · 2 comments

Comments

@tiemvanderdeure
Copy link

Calling view on a Variable returns a SubVariable from CommonDataModel, which doesn't implement the DiskArray interface.

This unfortunately means that any chunked operation (such as lazy raster operations) are extremely slow, as discussed in rafaqz/Rasters.jl#889

For example:

using Rasters, NCDatasets
ras = Rasters.create("ras1.nc", Float64, (X(1:100), Y(1:100)); force = true, chunks = (10, 10))
A = read(ras)
open(ras; write = true) do dest
    @time dest .= A
    @time view(dest, :,:) .= A
    @time dest .= dest
    return
end
  0.002006 seconds (10.72 k allocations: 560.047 KiB)
  0.237490 seconds (1.14 M allocations: 41.352 MiB, 3.64% gc time)
  0.228919 seconds (1.15 M allocations: 41.738 MiB, 3.57% gc time)

The last operation here is slow because we are copying a DiskArray to a DiskArray, which happens chunk by chunk, so view is called internally. So clearly this is not great.

Two possible way forward are to implement (parts of) the DiskArray interface for SubVariable, or to return a SubDiskArray from view on Variable. Arguable NCDatasets violates the DiskArray interface here.

@tiemvanderdeure
Copy link
Author

Just to demonstrate one possible fix: removing these view methods just totally fix the problem
After

Base.delete_method.(methods(view, (NCDatasets.CommonDataModel.AbstractVariable, Colon)))

The very same code as above gives

  0.002933 seconds (10.72 k allocations: 560.047 KiB)
  0.004322 seconds (21.53 k allocations: 982.578 KiB)
  0.008718 seconds (33.40 k allocations: 1.483 MiB)

@tiemvanderdeure
Copy link
Author

I'll just add an example with NCDatasets only, might be easier to see what is going on:

using NCDatasets
NCDataset("test_file.nc","c") do ds
    defVar(ds,"temp",rand(100,100),("lon","lat");
        chunksizes = [10,10]
    )
    @time variable(ds, "temp") .+= 1
end;

Before deleting the view methods: 0.781584 seconds (1.15 M allocations: 41.739 MiB, 1.31% gc time)
After deleting the view methods: 0.007317 seconds (33.43 k allocations: 1.484 MiB)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant