You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dagger has previously been a purely-functional model, expecting that new values are calculated by applying a function to a set of existing values, without allowing for arguments to be mutated or changed in any way. This model is convenient from an optimization perspective (as we can do aggressive caching and duplication of data, memoization of results, etc.), but it's also quite limiting for users operating in an imperative language like Julia.
I've long desired to find a way to relax this model, and in the course of working on #454, I've found that the simple In/Out model of OpenMP's data dependency system (that spawn_datadeps implements) is quite powerful and allows for a very convenient programming interface, while utilizing a variety of powerful performance optimizations. It also seems to leave room for more fine-grained synchronization and mutation systems going forward.
Of course, spawn_datadeps is rather limiting if we want to optimize over a larger region of code, including nested datadeps regions, which aren't trivial to support with the current implementation. Therefore, I'd like to consider what would be required to move support for the same programming model and optimizations directly into Dagger, to see what that would look like.
Here are the basic set of high-level changes that we'll need to support this model:
In/Out unwrapping and propagation in the frontend
Data ownership tracking (exclusive (write) vs. shared (read)) in the scheduler
Awareness of processor-associated memory spaces and their aliasing properties
copyto! alternative which integrates with Dagger's synchronization (required for GPU support)
Relaxed semantics w.r.t writing data into the original user-provided data containers (lazy writing allows for reduced data transfers and better parallelism), similar to Tapir's sync regions
(Speculative) Data copying and versioning system in the scheduler, to make copies of data for reading and track which copies are up-to-date w.r.t last write
A plan on how to deal with non-Chunk arguments, w.r.t what we'll need to do to ensure we can still track them appropriately and enable distributed parallelism
A system for utilizing Julia's compiler to determine dependencies automatically, when unspecified
A system for specifying dependencies ahead-of-time for a given dispatch signature
The text was updated successfully, but these errors were encountered:
Dagger has previously been a purely-functional model, expecting that new values are calculated by applying a function to a set of existing values, without allowing for arguments to be mutated or changed in any way. This model is convenient from an optimization perspective (as we can do aggressive caching and duplication of data, memoization of results, etc.), but it's also quite limiting for users operating in an imperative language like Julia.
I've long desired to find a way to relax this model, and in the course of working on #454, I've found that the simple
In
/Out
model of OpenMP's data dependency system (thatspawn_datadeps
implements) is quite powerful and allows for a very convenient programming interface, while utilizing a variety of powerful performance optimizations. It also seems to leave room for more fine-grained synchronization and mutation systems going forward.Of course,
spawn_datadeps
is rather limiting if we want to optimize over a larger region of code, including nested datadeps regions, which aren't trivial to support with the current implementation. Therefore, I'd like to consider what would be required to move support for the same programming model and optimizations directly into Dagger, to see what that would look like.Here are the basic set of high-level changes that we'll need to support this model:
In
/Out
unwrapping and propagation in the frontendcopyto!
alternative which integrates with Dagger's synchronization (required for GPU support)Chunk
arguments, w.r.t what we'll need to do to ensure we can still track them appropriately and enable distributed parallelismThe text was updated successfully, but these errors were encountered: