-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize remote array copies by creating a local copy of the domain when possible (off by default) #24391
Conversation
After hitting chapel-lang#13213 this afternoon and talking to Jade about it, I started wondering aloud whether it might be very easy to implement an optimization that would localize the domain of an array variable being created from scratch if the original was a remote, const domain. This quick sketch suggests maybe it is, where some remaining TODOs are: - [ ] disable the optimization if the domain is already privatized - [ ] make sure that the array element copy is done in bulk - [ ] figure out what's going on with the (incorrect?) compiler warnings - [ ] make sure it doesn't break other tests --- Signed-off-by: Brad Chamberlain <[email protected]>
… check Yet... From a user's perspective, it sure seems like '.domain.locale' should be OK without generating a warning... Need to add a TODO to look into that. --- Signed-off-by: Brad Chamberlain <[email protected]>
modules/internal/ChapelArray.chpl
Outdated
var lhs = chpl__coerceCopy(rhs.type, rhs, definedConst); | ||
return lhs; | ||
writeln("In initCopy(definedConst=", definedConst, ")", "domain definedConst: ", rhs.domain.definedConst); | ||
if rhs.domain.definedConst && rhs.domain._value.locale != here { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case it helps save some time: I have bumped into different cases where .locale
ended up to be a performance-killer. It has a very complex implementation including some dynamic dispatch if I am not remembering wrong. If you do time measurements or this is intended to optimize quick-running idioms, you might want to use _wide_get_locale
primitive. I am doing that here: https://github.com/chapel-lang/chapel/pull/24390/files#diff-52d2477eb48a33d9e81b64d64d74a50145a1bacdd880825e267b5bd5549559afR2481-R2486. I think we should consider:
- Improving
.locale
/here
performance (there's discussion about this under some issues, but I can't find anything quickly. Note thathere
can also hurt performance) - Having some helpers like
isLocal
at least internally for cases like that where we just use primitives for fast computation.
Interestingly, I ran back into this need while working on sparse matrix-matrix multiplication algorithms this past week. Specifically, in copying a remote locale's sparse array, I wanted to localize the domain as well for efficient access and realized that some minor tweaks to this PR would be a start in the right direction (though more work is required to get it working for sparse arrays…). |
…alizeConstDomains
…pt itself --- Signed-off-by: Brad Chamberlain <[email protected]>
--- Signed-off-by: Brad Chamberlain <[email protected]>
… locales --- Signed-off-by: Brad Chamberlain <[email protected]>
--- Signed-off-by: Brad Chamberlain <[email protected]>
This turns out to fall out naturally due to the check for |
By our definition of the language, if an array is constant, it implies that its domain can't be modified while the array is in scope; so taking a snapshot of the domain is completely reasonable as an optimization. --- Signed-off-by: Brad Chamberlain <[email protected]>
--- Signed-off-by: Brad Chamberlain <[email protected]>
Capturing this PR's OP here: After hitting #13213 this afternoon and talking to Jade about it, I started wondering aloud whether it might be very easy to implement an optimization that would localize the domain of an array variable being created from scratch if the original was a remote, const domain. This quick sketch suggests maybe it is, where some remaining TODOs are:
Since this is going in off-by-default, I've forked the items in this list that seem important to check before turning it on into #25238 |
--- Signed-off-by: Brad Chamberlain <[email protected]>
const localize = (localizeConstDomains && | ||
numLocales > 1 && | ||
(definedConst || rhs.domain.definedConst) && | ||
rhs.domain._value.locale != here); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add compiledForSingleLocale()
here as well? This is a param and could avoid a branch when COMM=none. Arguably not important for now and you could probably just log it as future work in #25238
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. We could even potentially have both param and non-param aspects of this const
split apart and to have the conditional be something like:
if paramLocalize && localize {
…
} else {
…
}
s.t. if paramLocalize
was false, we'd fold the conditional? I like the idea of forking this off to its own PR, primarily to keep working on things that aren't done yet and adding it to the follow-on issue if you're good with that (though of all the things on that issue, I may do it first—possibly even tomorrow if time permits).
[reviewed by @e-kayrakli] This PR adds the `doiBulkTransfer[To|From]Known()` routines to the two sparse layouts we currently support in order to enable sparse arrays to be initialized using patterns like `var copy = mySparseArray;` where, previously, this resulted in a complaint about being unable to zip sparse arrays due to falling into the most general implementation of array-array transfers. Here, I'm adding some tests of this capability using the domain localization PR that I merged today (#24391) to prove that sparse arrays can be localized and accessed without incurring communication. I've turned on comm counts to track how many transfers are required, where I expect we could significantly reduce this over time by adding support for bulk transfer of the sparse domains themselves in the future (?). Interestingly, one of the two cases that ought to cause my domain localization optimization to fire doesn't currently, suggesting that we don't set up definedConst properly for sparse subdomains; happily, the case I care about (const array, non-const domain) does work properly, so here I've checked in the tests to show that the other case doesn't work—another area for future improvement and a pre-existing issue not related to this effort.
This is a code I've been writing, looking at sparse matrix-matrix multiplication in Chapel to start to get more experience with our sparse features, look for gaps, check performance, etc. So far, the effort has resulted in improvements in the forms of PRs #25152, #25244, #24391, and #25243. This is a work-in-progress that almost certainly needs additional improvements in terms of performance, generality, ergonomics, but it's a good start and in a state to start exercising some of the aforementioned PRs. See the README for additional details.
[reviewed by @jabraham17] This enables the localization-of-domains optimization by default, which was added in #24391. We've been running this in the multi-locale performance playground for the past month and haven't seen any degradations due to the change, and are aware of the benefits when it fires, so are enabling it by default. This PR simply flips the existing flag and removes the explicit setting of it in tests that were designed to make sure it was working as expected.
This PR adds an optimization that localizes an array's domain when a copy of the array is made across locales. For example, given a pattern like:
B will typically share A's domain so that if D is reassigned (resized), B will reflect that. This has the downside that operations on 'A' may need to refer back to 'D' in order to get its size, bounds, etc.
However, if
D
is declaredconst
, it cannot be re-assigned, so rather than referring back to the original domain, a local copy of D could be created and B could be declared over that, permitting operations on 'B' to be done without referring to a remote locale. Effectively, the compiler could translate the code into:Similarly, if 'B' is
const
, the same optimization can be done because we say that the domain of aconst
array cannot be re-assigned while the array is alive.This PR implements this optimization in module code and adds a few tests of it. It also updates one existing test that locked in the old behavior, by testing it with and without the optimization.
The optimization is currently off by default because we are close to the 2.1 deadline and want some more time to live with it and understand the implications before merging. That said, it passed a full paratest GASNet run. I've forked off #25238 to capture the intention to enable this by default and some of the things we'd discussed checking before doing so.
Thanks to @jabraham17 for the conversation that led to the insight that this might be a simple optimization (it was) after years of having people hit this issue and be surprised by it (see issue #13213, e.g.). And to @e-kayrakli and others on the @chapel-lang/perf-team for feedback on the initial draft.