-
-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P2P Rechunking graph prohibits culling and partial release #8326
Comments
In theory, detecting such split points should be pretty easy to do by walking through the input and output chunks of each individual axis and checking where the chunking matches up. If an axis contains unknown sizes, it must be unchanged, so that's even the best case for us. |
to be clear, there are also slightly less trivial cases, e.g. original = (1, 50, 1)
new = (5, 10, -1) which is...
but I still would expect this to break apart into many decoupled P2Ps |
I have not given a proper look at the ERA5 code. Some considerations though:
import dask
import dask.array as da
shape = (200, 200)
before = (1, ) + (2, ) * 99 + (1, )
before = (before, before)
after = 2
arr = da.random.random(shape, chunks=before)
for method in ["p2p", "tasks"]:
dask.config.set({"array.rechunk.method": method})
rechunked = arr.rechunk(after)
coll = arr
print(method, "original:", len(dask.optimize(coll)[0].__dask_graph__()))
coll = rechunked
print(method, "rechunk:", len(dask.optimize(coll)[0].__dask_graph__()))
coll = rechunked[:, :4]
print(method, "rechunk->slice:", len(dask.optimize(coll)[0].__dask_graph__()))
coll = arr[:, :4].rechunk(after)
print(method, "slice->rechunk:", len(dask.optimize(coll)[0].__dask_graph__()))
|
It depends. I expect P2P to be faster than the ordinary split tasks approach once we're through with optimizing. I agree that it is not as powerful as for more complex problems. Besides, I'm displaying the simples example here. I frequently encounter examples that exhibit an all-to-all pattern for some of the dimensions of the array while there is just a simple concat/split for other dimensions. This is very common for xarray workloads and I believe this kind of split could help.
If you don't have a barrier but n**2 edges you'd have the same problem
I'm not sure how bad it is but yet but having to maintain more edges in the scheduler means larger sets on all tasks which requires more memory. It also makes the
This is actually a slightly different problem. While it could manage to do the slicing, everything else wouldn't work (e.g. partial release of data stored to disk / starting reducing tasks earlier) Besides, we didn't even start with dask-expr for arrays yet and I suspect this will still be a a little while until we get there. |
I think this is a really good use case as it forces us to reason in generalized terms. |
Just chiming in to support this. In Earth Science workloads, we basically have two chunking schemes (ignoring variables with a vertical dimension)
And it is very common for users to want to rechunk from one to the other, as in this workload.
I added this simply to get "reasonable chunk sizes", I would expect it to be a very common approach. |
The P2P rechunk graph is similar to the shuffle graph one large globally connected graph. However, many rechunk applications only operate on slices of the data and is often not connecting all in- and output partitions but only planes in the dataset.
Having only slices/planes connected by the graph is beneficial since it allows us to cull the graph, process downstream releaser earlier and release disk spaces sooner.
A real life example showing this effect is using the public ERA5 dataset (see code below) but a simpler variant can be constructed using artificial data.
ERA5 example
A simpler, toy example showing the same principle is just concatenating a single dimension.
For P2P to do the same, we should instead of generating one large P2P layer generate many small P2P layers such that we only connect the relevant slices.
The text was updated successfully, but these errors were encountered: