-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
writing a lazy raster doesn't use chunks #889
Comments
I guess there isn't always a super efficient way to do this since the raster to be read and the raster to be written might have different chunks. What if one is chunked in rows and the other one in columns? |
Yeah, that's the hard problem. And what if one is a view with an offset. The garbage collection is another priblem, but mostly because your chunks are waaay too small. Try 256 * 256. Reusing memory blobs would make more sense inside a broadcast as we know it's not threaded. But this is likely a simple bug where dispatch isn't working somewhere. What happens with a tiff? |
I found out DiskArrays has a function called
This was just a toy example to not have to wait for minutes if an attempted fix doesn't work |
But otherwise I think this might be a commondatemodel or ncdatasets problem. To compare: ras1 = Raster("ras1.nc"; lazy = true, missingval = missing)
ras2 = Raster("ras1.nc"; lazy = true)
open(ras1; write = true) do dest
@show typeof(parent(dest))
@time dest .= dest
end;
open(ras2; write = true) do dest
@show typeof(parent(dest))
@time dest .= dest
end;
So open I think is flattening the parent into a Variable, which then is index one by one. I'm not sure if we should change it so that it's always wrapped in a FileArray, or if this is just a bug in NCDatasets |
Maybe make a DiskArrays.jl issue about It cant be wrapped in a (welcome to hell btw) |
Okay I think I figured it out! Actually Just to illustrate this: using Rasters, NCDatasets
ras = Rasters.create("ras1.nc", Float64, (X(1:100), Y(1:100)); force = true, chunks = (10, 10))
A = read(ras)
open(ras; write = true) do dest
@time view(dest, :,:) .= A
@time dest .= A
@time dest .= dest
return
end
The last one is slow because DiskArrays loops over the chunks if you copy data from a chunked raster So yeah basically a bug in NCDatasets. I don't know what the best fix is, though.
Thanks |
I've actually spoken to @Alexander-Barth about this bug in person. Alexander did you try the solution from the DiskArrays code we discussed in Lisbon? (the fix is to remove the SubArray thing in CommonDataModel/NCDatasets wherever it is and use But @tiemvanderdeure if you also make a NCDatasets issue for it that would help make it explicit, we need more people pushing for this interop to actually work, I'm getting tired |
An immediate solution is to always wrap with |
MWE:
is slow!
0.284892 seconds (1.14 M allocations: 41.458 MiB)
. I checked and it does callsetindex_disk!
for every index instead of once per chunk.YAXArrays does manage to do this chunk-wise (although it's extremely slow because of a
GC.gc()
)Something like this is similarly slow
The text was updated successfully, but these errors were encountered: