-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Part of #1867: String GroupBy optimizations are timing out nightly tests #1869
Part of #1867: String GroupBy optimizations are timing out nightly tests #1869
Conversation
…nightly tests This PR (part of Bears-R-Us#1867): - Changes the medium str logic (`8 <= max_bytes < 16`) to not use the array return variation of `computeOnSegments` - Updates `bytesToUintArr` to run in parallel If the nightly still has issues, I think the next step is to remove the special logic for medium str until the next iteration
} | ||
return concat; | ||
var localSlice = new lowLevelLocalizingSlice(values, rng); | ||
return | reduce [i in 0..#rng.size] (localSlice.ptr(i):uint)<<(8*(rng.size-1-i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this should be equivalent to a more explicit forall, something like
var concat: uint;
forall i in 0..#rng.size with (| reduce concat) {
concat |= (localSlice.ptr(i):uint)<<(8*(rng.size-1-i));
}
return concat;
@ronawho @bradcray, I was wondering if there's any performance benefit between the two? I feel like the one line version could save on memory because (i don't think) we would need shadow variables?
In this particular the example the loop should be small, so probably not a big deal, but it be nice to know for future reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't feel like the localizing slice should be necessary since we're calling using computeOnSegments
, so things should be local? But I know a lot of the other computeOnSegments functions call either interpretAsString
or interpretAsBytes
which do a localizing slice, so it seemed worth a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierce314159 : Sorry for the delayed response, I was out most of last week. I think the difference between your new code in 1441-1442, and your more explicit forall in the comment above should be a wash. Specifically, Chapel typically rewrites op reduce operand-expr
expressions into the equivalent forall
loop with shadow variables, so as long as the expression driving the forall
loop and the operand-expr
are the same, you should feel free to use whichever style you prefer. The shadow variables in cases like this are very modest in size (a scalar per task), so should not be a big deal.
I'm not terribly familiar with lowLevelLocalizingSlice()
, so also found myself wondering whether this could just be written as something like: return | reduce [i in 0..<rng.size] (values[i]:uint)<<(8*(rng.size-1-i));
without much loss in performance by eliminating the need for the slice. Or, if you're correct that all indices i
are local, values.localAccess(i)
instead of values[i]
. This paragraph is very much a shot in the dark, though, and if the performance is good enough and the style doesn't bother you, definitely ignorable.
proc twoReturnComputeOnSegments(segments: [?D] int, values: [?vD] ?t, param function: SegFunction, type retType) throws { | ||
// I hate duplicating the code from computeOnSegments but I need to return 2 arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't immediately see any issues with this code but it's def the most complicated/part i understand the least, so it seemed like the natural choice to revert considering the timeout issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to fix the issue and is now completing in multi locale tests:
And also seems to be scaling:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have not done a thorough review, but seems to fix the timeout
This PR (part of #1867):
8 <= max_bytes < 16
) to not use the 2 array return variation ofcomputeOnSegments
bytesToUintArr
to run in parallelIf the nightly still has issues, I think the next step is to remove the special logic for medium str until the next iteration