Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize remote array copies by creating a local copy of the domain when possible (off by default) #24391

Merged
merged 10 commits into from
Jun 12, 2024

Conversation

bradcray
Copy link
Member

@bradcray bradcray commented Feb 10, 2024

This PR adds an optimization that localizes an array's domain when a copy of the array is made across locales. For example, given a pattern like:

var D = {1..100, 1..100};
var A: [D] real;
on Locales[1] {
  var B = A;
  B[i] = …;
}

B will typically share A's domain so that if D is reassigned (resized), B will reflect that. This has the downside that operations on 'A' may need to refer back to 'D' in order to get its size, bounds, etc.

However, if D is declared const, it cannot be re-assigned, so rather than referring back to the original domain, a local copy of D could be created and B could be declared over that, permitting operations on 'B' to be done without referring to a remote locale. Effectively, the compiler could translate the code into:

const D = {1..100, 1..100};
var A: [D] real;
on Locales[1] {
  const localD = D;
  var B:[D] A.eltType = A;
  B[i] = …;
}

Similarly, if 'B' is const, the same optimization can be done because we say that the domain of a const array cannot be re-assigned while the array is alive.

This PR implements this optimization in module code and adds a few tests of it. It also updates one existing test that locked in the old behavior, by testing it with and without the optimization.

The optimization is currently off by default because we are close to the 2.1 deadline and want some more time to live with it and understand the implications before merging. That said, it passed a full paratest GASNet run. I've forked off #25238 to capture the intention to enable this by default and some of the things we'd discussed checking before doing so.

Thanks to @jabraham17 for the conversation that led to the insight that this might be a simple optimization (it was) after years of having people hit this issue and be surprised by it (see issue #13213, e.g.). And to @e-kayrakli and others on the @chapel-lang/perf-team for feedback on the initial draft.

After hitting chapel-lang#13213 this afternoon and talking to Jade about it, I
started wondering aloud whether it might be very easy to implement an
optimization that would localize the domain of an array variable being
created from scratch if the original was a remote, const domain.  This
quick sketch suggests maybe it is, where some remaining TODOs are:

- [ ] disable the optimization if the domain is already privatized
- [ ] make sure that the array element copy is done in bulk
- [ ] figure out what's going on with the (incorrect?) compiler warnings
- [ ] make sure it doesn't break other tests

---
Signed-off-by: Brad Chamberlain <[email protected]>
… check

Yet...  From a user's perspective, it sure seems like '.domain.locale'
should be OK without generating a warning...  Need to add a TODO to
look into that.

---
Signed-off-by: Brad Chamberlain <[email protected]>
@bradcray bradcray changed the title Draft optimization to localize const domains Draft optimization to localize const domains in an inferred array type copy setting Feb 10, 2024
var lhs = chpl__coerceCopy(rhs.type, rhs, definedConst);
return lhs;
writeln("In initCopy(definedConst=", definedConst, ")", "domain definedConst: ", rhs.domain.definedConst);
if rhs.domain.definedConst && rhs.domain._value.locale != here {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case it helps save some time: I have bumped into different cases where .locale ended up to be a performance-killer. It has a very complex implementation including some dynamic dispatch if I am not remembering wrong. If you do time measurements or this is intended to optimize quick-running idioms, you might want to use _wide_get_locale primitive. I am doing that here: https://github.com/chapel-lang/chapel/pull/24390/files#diff-52d2477eb48a33d9e81b64d64d74a50145a1bacdd880825e267b5bd5549559afR2481-R2486. I think we should consider:

  1. Improving .locale/here performance (there's discussion about this under some issues, but I can't find anything quickly. Note that here can also hurt performance)
  2. Having some helpers like isLocal at least internally for cases like that where we just use primitives for fast computation.

@bradcray
Copy link
Member Author

Interestingly, I ran back into this need while working on sparse matrix-matrix multiplication algorithms this past week. Specifically, in copying a remote locale's sparse array, I wanted to localize the domain as well for efficient access and realized that some minor tweaks to this PR would be a start in the right direction (though more work is required to get it working for sparse arrays…).

@bradcray
Copy link
Member Author

disable the optimization if the domain is already privatized

This turns out to fall out naturally due to the check for _value, which will use the local class by default.

bradcray added 2 commits June 11, 2024 20:22
By our definition of the language, if an array is constant, it implies
that its domain can't be modified while the array is in scope; so
taking a snapshot of the domain is completely reasonable as an
optimization.

---
Signed-off-by: Brad Chamberlain <[email protected]>
@bradcray bradcray changed the title Draft optimization to localize const domains in an inferred array type copy setting Optimization that creates a local copy of a remote array's domain when that's OK (off by default) Jun 12, 2024
@bradcray
Copy link
Member Author

Capturing this PR's OP here:

After hitting #13213 this afternoon and talking to Jade about it, I started wondering aloud whether it might be very easy to implement an optimization that would localize the domain of an array variable being created from scratch if the original was a remote, const domain. This quick sketch suggests maybe it is, where some remaining TODOs are:

  • disable the optimization if the domain is already privatized
  • make sure that the array element copy is done in bulk
  • figure out what's going on with the (incorrect?) compiler warnings
  • make sure it doesn't break other tests
  • file an issue + future about myArr.domain.locale generating a warning? Is the answer wrong too? (seemed like it wasn't...)
  • impact on:
    • sparse
    • associative
  • creating copy would thwart "identical domain" check—does that matter?
    • impact on swap, say?
    • impact on array-of-arrays?
  • which ops go through domain with --fast?
    • and what fields could we re-store in the array to avoid comm
  • relationship with Engin's notion of optimizing arrays with anonymous domains to avoid reference counting statically

Since this is going in off-by-default, I've forked the items in this list that seem important to check before turning it on into #25238

@bradcray bradcray marked this pull request as ready for review June 12, 2024 22:42
@bradcray bradcray changed the title Optimization that creates a local copy of a remote array's domain when that's OK (off by default) Optimize remote array copies by creating a local copy of the domain when possible (off by default) Jun 12, 2024
Comment on lines +3035 to +3038
const localize = (localizeConstDomains &&
numLocales > 1 &&
(definedConst || rhs.domain.definedConst) &&
rhs.domain._value.locale != here);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add compiledForSingleLocale() here as well? This is a param and could avoid a branch when COMM=none. Arguably not important for now and you could probably just log it as future work in #25238

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. We could even potentially have both param and non-param aspects of this const split apart and to have the conditional be something like:

if paramLocalize && localize {
  …
} else {
  …
}

s.t. if paramLocalize was false, we'd fold the conditional? I like the idea of forking this off to its own PR, primarily to keep working on things that aren't done yet and adding it to the follow-on issue if you're good with that (though of all the things on that issue, I may do it first—possibly even tomorrow if time permits).

@bradcray bradcray merged commit ed7a3e5 into chapel-lang:main Jun 12, 2024
7 checks passed
@bradcray bradcray deleted the localizeConstDomains branch June 12, 2024 23:55
bradcray added a commit that referenced this pull request Jun 13, 2024
[reviewed by @e-kayrakli]

This PR adds the `doiBulkTransfer[To|From]Known()` routines to the two
sparse layouts we currently support in order to enable sparse arrays to
be initialized using patterns like `var copy = mySparseArray;` where,
previously, this resulted in a complaint about being unable to zip
sparse arrays due to falling into the most general implementation of
array-array transfers.

Here, I'm adding some tests of this capability using the domain
localization PR that I merged today (#24391) to prove that sparse arrays
can be localized and accessed without incurring communication. I've
turned on comm counts to track how many transfers are required, where I
expect we could significantly reduce this over time by adding support
for bulk transfer of the sparse domains themselves in the future (?).

Interestingly, one of the two cases that ought to cause my domain
localization optimization to fire doesn't currently, suggesting that we
don't set up definedConst properly for sparse subdomains; happily, the
case I care about (const array, non-const domain) does work properly, so
here I've checked in the tests to show that the other case doesn't
work—another area for future improvement and a pre-existing issue not
related to this effort.
bradcray added a commit that referenced this pull request Jun 14, 2024
This is a code I've been writing, looking at sparse matrix-matrix
multiplication in Chapel to start to get more experience with our sparse
features, look for gaps, check performance, etc. So far, the effort has
resulted in improvements in the forms of PRs #25152, #25244, #24391, and
#25243. This is a work-in-progress that almost certainly needs
additional improvements in terms of performance, generality, ergonomics,
but it's a good start and in a state to start exercising some of the
aforementioned PRs. See the README for additional details.
bradcray added a commit that referenced this pull request Aug 23, 2024
[reviewed by @jabraham17]

This enables the localization-of-domains optimization by default, which
was added in #24391. We've been running this in the multi-locale
performance playground for the past month and haven't seen any
degradations due to the change, and are aware of the benefits when it
fires, so are enabling it by default.

This PR simply flips the existing flag and removes the explicit setting
of it in tests that were designed to make sure it was working as
expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants