Part of #1867: String GroupBy optimizations are timing out nightly tests #1869

stress-tess · 2022-10-27T19:32:53Z

This PR (part of #1867):

Changes the medium str logic (8 <= max_bytes < 16) to not use the 2 array return variation of computeOnSegments
Updates bytesToUintArr to run in parallel

If the nightly still has issues, I think the next step is to remove the special logic for medium str until the next iteration

…nightly tests This PR (part of Bears-R-Us#1867): - Changes the medium str logic (`8 <= max_bytes < 16`) to not use the array return variation of `computeOnSegments` - Updates `bytesToUintArr` to run in parallel If the nightly still has issues, I think the next step is to remove the special logic for medium str until the next iteration

stress-tess · 2022-10-27T19:36:37Z

src/SegmentedString.chpl

-      }
-      return concat;
+      var localSlice = new lowLevelLocalizingSlice(values, rng);
+      return | reduce [i in 0..#rng.size] (localSlice.ptr(i):uint)<<(8*(rng.size-1-i));


I know this should be equivalent to a more explicit forall, something like

var concat: uint; forall i in 0..#rng.size with (| reduce concat) { concat |= (localSlice.ptr(i):uint)<<(8*(rng.size-1-i)); } return concat;

@ronawho @bradcray, I was wondering if there's any performance benefit between the two? I feel like the one line version could save on memory because (i don't think) we would need shadow variables?

In this particular the example the loop should be small, so probably not a big deal, but it be nice to know for future reference

Also, I don't feel like the localizing slice should be necessary since we're calling using computeOnSegments, so things should be local? But I know a lot of the other computeOnSegments functions call either interpretAsString or interpretAsBytes which do a localizing slice, so it seemed worth a try

@pierce314159 : Sorry for the delayed response, I was out most of last week. I think the difference between your new code in 1441-1442, and your more explicit forall in the comment above should be a wash. Specifically, Chapel typically rewrites op reduce operand-expr expressions into the equivalent forall loop with shadow variables, so as long as the expression driving the forall loop and the operand-expr are the same, you should feel free to use whichever style you prefer. The shadow variables in cases like this are very modest in size (a scalar per task), so should not be a big deal.

I'm not terribly familiar with lowLevelLocalizingSlice(), so also found myself wondering whether this could just be written as something like: return | reduce [i in 0..<rng.size] (values[i]:uint)<<(8*(rng.size-1-i)); without much loss in performance by eliminating the need for the slice. Or, if you're correct that all indices i are local, values.localAccess(i) instead of values[i]. This paragraph is very much a shot in the dark, though, and if the performance is good enough and the style doesn't bother you, definitely ignorable.

stress-tess · 2022-10-27T19:40:14Z

src/SegmentedComputation.chpl

-  proc twoReturnComputeOnSegments(segments: [?D] int, values: [?vD] ?t, param function: SegFunction, type retType) throws {
-    // I hate duplicating the code from computeOnSegments but I need to return 2 arrays


I don't immediately see any issues with this code but it's def the most complicated/part i understand the least, so it seemed like the natural choice to revert considering the timeout issue

Ethan-DeBandi99

This makes sense to me as a good starting point to fix the issues with the nightly. I am interested in the input of @ronawho and @bradcray which is why I am leaving this as a comment and not an approval overall.

bmcdonald3 · 2022-10-28T16:45:28Z

This appears to fix the issue and is now completing in multi locale tests:

array size = 100,000,000
number of trials =  1
numLocales = 16, N = 1,600,000,000
small str array Average time = 5.1369 sec
small str array Average rate = 1.1603 GiB/sec
medium str array Average time = 11.2357 sec
medium str array Average rate = 0.9284 GiB/sec
big str array Average time = 10.1455 sec
big str array Average rate = 1.9093 GiB/sec

And also seems to be scaling:

array size = 100,000,000
number of trials =  1
numLocales = 2, N = 200,000,000
small str array Average time = 6.7848 sec
small str array Average rate = 0.1098 GiB/sec
medium str array Average time = 7.3457 sec
medium str array Average rate = 0.1775 GiB/sec
big str array Average time = 6.6441 sec
big str array Average rate = 0.3645 GiB/sec

bmcdonald3

Have not done a thorough review, but seems to fix the timeout

stress-tess requested review from ronawho, Ethan-DeBandi99, reuster986 and mhmerrill October 27, 2022 19:32

stress-tess commented Oct 27, 2022

View reviewed changes

Ethan-DeBandi99 reviewed Oct 28, 2022

View reviewed changes

stress-tess requested review from bmcdonald3 and bradcray October 28, 2022 15:32

bmcdonald3 approved these changes Oct 28, 2022

View reviewed changes

Ethan-DeBandi99 merged commit fe3da03 into Bears-R-Us:master Oct 31, 2022

stress-tess deleted the 1867_str_groupby_optimization_timeout branch October 31, 2022 13:50

stress-tess mentioned this pull request Nov 1, 2022

String groupby optimizations are timing out nightly multi-locale #1867

Closed

stress-tess mentioned this pull request Dec 6, 2022

Investigate adding back twoReturnComputeOnSegments for small str groupby optimizations #1955

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part of #1867: String GroupBy optimizations are timing out nightly tests #1869

Part of #1867: String GroupBy optimizations are timing out nightly tests #1869

stress-tess commented Oct 27, 2022

stress-tess Oct 27, 2022 •

edited

Loading

stress-tess Oct 27, 2022

bradcray Oct 31, 2022

stress-tess Oct 27, 2022

Ethan-DeBandi99 left a comment

bmcdonald3 commented Oct 28, 2022

bmcdonald3 left a comment

		proc twoReturnComputeOnSegments(segments: [?D] int, values: [?vD] ?t, param function: SegFunction, type retType) throws {
		// I hate duplicating the code from computeOnSegments but I need to return 2 arrays

Part of #1867: String GroupBy optimizations are timing out nightly tests #1869

Part of #1867: String GroupBy optimizations are timing out nightly tests #1869

Conversation

stress-tess commented Oct 27, 2022

stress-tess Oct 27, 2022 • edited Loading

Choose a reason for hiding this comment

stress-tess Oct 27, 2022

Choose a reason for hiding this comment

bradcray Oct 31, 2022

Choose a reason for hiding this comment

stress-tess Oct 27, 2022

Choose a reason for hiding this comment

Ethan-DeBandi99 left a comment

Choose a reason for hiding this comment

bmcdonald3 commented Oct 28, 2022

bmcdonald3 left a comment

Choose a reason for hiding this comment

stress-tess Oct 27, 2022 •

edited

Loading