Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part of #1867: String GroupBy optimizations are timing out nightly tests #1869

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion benchmarks/small-str-groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import arkouda as ak

SIZES = {"small": 7, "medium": 14, "big": 28}
SIZES = {"small": 6, "medium": 12, "big": 24}


def time_ak_groupby(N_per_locale, trials, seed):
Expand Down
51 changes: 0 additions & 51 deletions src/SegmentedComputation.chpl
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ module SegmentedComputation {
StringIsUpper,
StringIsTitle,
StringBytesToUintArr,
StringBytesTo2UintArrs,
}

proc computeOnSegments(segments: [?D] int, values: [?vD] ?t, param function: SegFunction, type retType, const strArg: string = "") throws {
Expand Down Expand Up @@ -130,54 +129,4 @@ module SegmentedComputation {
}
return res;
}

proc twoReturnComputeOnSegments(segments: [?D] int, values: [?vD] ?t, param function: SegFunction, type retType) throws {
// I hate duplicating the code from computeOnSegments but I need to return 2 arrays
Comment on lines -134 to -135
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't immediately see any issues with this code but it's def the most complicated/part i understand the least, so it seemed like the natural choice to revert considering the timeout issue

var res1: [D] retType;
var res2: [D] retType;
if D.size == 0 {
return (res1, res2);
}

const (startSegInds, numSegs, lengths) = computeSegmentOwnership(segments, vD);

// Start task parallelism
coforall loc in Locales {
on loc {
const myFirstSegIdx = startSegInds[loc.id];
const myNumSegs = max(0, numSegs[loc.id]);
const mySegInds = {myFirstSegIdx..#myNumSegs};
// Segment offsets whose bytes are owned by loc
// Lengths of segments whose bytes are owned by loc
var mySegs, myLens: [mySegInds] int;
forall i in mySegInds with (var agg = new SrcAggregator(int)) {
agg.copy(mySegs[i], segments[i]);
agg.copy(myLens[i], lengths[i]);
}
try {
// Apply function to bytes of each owned segment, aggregating return values to res1 and res2
forall (start, len, i) in zip(mySegs, myLens, mySegInds) with (var agg1 = newDstAggregator(retType),
var agg2 = newDstAggregator(retType)) {
select function {
when SegFunction.StringBytesTo2UintArrs {
var half = (len/2):int;
agg1.copy(res1[i], stringBytesToUintArr(values, start..#half));
agg2.copy(res2[i], stringBytesToUintArr(values, (start+half)..#half));
}
otherwise {
compilerError("Unrecognized segmented function");
}
}
}
} catch {
throw new owned ErrorWithContext("Error computing %s on string or segment".format(function:string),
getLineNumber(),
getRoutineName(),
getModuleName(),
"IllegalArgumentError");
}
}
}
return (res1, res2);
}
}
20 changes: 13 additions & 7 deletions src/SegmentedString.chpl
Original file line number Diff line number Diff line change
Expand Up @@ -481,7 +481,7 @@ module SegmentedString {
return computeOnSegments(offsets.a, values.a, SegFunction.StringIsTitle, bool);
}

proc bytesToUintArr(max_bytes:int, st) throws {
proc bytesToUintArr(const max_bytes:int, st) throws {
// bytes contained in strings < 128 bits, so concatenating is better than the hash
if max_bytes < 8 {
// we only need one uint array
Expand All @@ -492,7 +492,16 @@ module SegmentedString {
}
else {
// we need two uint arrays
var (numeric1, numeric2) = twoReturnComputeOnSegments(offsets.a, values.a, SegFunction.StringBytesTo2UintArrs, uint);
ref off = offsets.a;
ref vals = values.a;
// should we do strings.getLengths()-1 to not account for null byte
const lens = getLengths();
var numeric1, numeric2: [offsets.aD] uint;
forall (o, l, n1, n2) in zip(off, lens, numeric1, numeric2) {
const half = (l/2):int;
n1 = stringBytesToUintArr(vals, o..#half);
n2 = stringBytesToUintArr(vals, (o+half)..#half);
}
const concat1Name = st.nextName();
const concat2Name = st.nextName();
st.addEntry(concat1Name, new shared SymEntry(numeric1));
Expand Down Expand Up @@ -1429,11 +1438,8 @@ module SegmentedString {
The SegFunction called by computeOnSegments for bytesToUintArr
*/
inline proc stringBytesToUintArr(values, rng) throws {
var concat: uint;
for v in values[rng] {
concat = concat<<8 + v:uint;
}
return concat;
var localSlice = new lowLevelLocalizingSlice(values, rng);
return | reduce [i in 0..#rng.size] (localSlice.ptr(i):uint)<<(8*(rng.size-1-i));
Copy link
Member Author

@stress-tess stress-tess Oct 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this should be equivalent to a more explicit forall, something like

var concat: uint;
forall i in 0..#rng.size with (| reduce concat) {
  concat |= (localSlice.ptr(i):uint)<<(8*(rng.size-1-i));
}
return concat;

@ronawho @bradcray, I was wondering if there's any performance benefit between the two? I feel like the one line version could save on memory because (i don't think) we would need shadow variables?

In this particular the example the loop should be small, so probably not a big deal, but it be nice to know for future reference

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't feel like the localizing slice should be necessary since we're calling using computeOnSegments, so things should be local? But I know a lot of the other computeOnSegments functions call either interpretAsString or interpretAsBytes which do a localizing slice, so it seemed worth a try

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierce314159 : Sorry for the delayed response, I was out most of last week. I think the difference between your new code in 1441-1442, and your more explicit forall in the comment above should be a wash. Specifically, Chapel typically rewrites op reduce operand-expr expressions into the equivalent forall loop with shadow variables, so as long as the expression driving the forall loop and the operand-expr are the same, you should feel free to use whichever style you prefer. The shadow variables in cases like this are very modest in size (a scalar per task), so should not be a big deal.

I'm not terribly familiar with lowLevelLocalizingSlice(), so also found myself wondering whether this could just be written as something like: return | reduce [i in 0..<rng.size] (values[i]:uint)<<(8*(rng.size-1-i)); without much loss in performance by eliminating the need for the slice. Or, if you're correct that all indices i are local, values.localAccess(i) instead of values[i]. This paragraph is very much a shot in the dark, though, and if the performance is good enough and the style doesn't bother you, definitely ignorable.

}

/* Test array of strings for membership in another array (set) of strings. Returns
Expand Down
3 changes: 2 additions & 1 deletion src/UniqueMsg.chpl
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,8 @@ module UniqueMsg
// Only one array which is a strings
var (myNames, _) = namesList[0].splitMsgToTuple("+", 2);
var strings = getSegString(myNames, st);
var max_bytes = max reduce strings.getLengths();
// should we do strings.getLengths()-1 to not account for null
const max_bytes = max reduce strings.getLengths();
if max_bytes < 16 {
var str_names = strings.bytesToUintArr(max_bytes, st).split("+");
var (totalDigits, bitWidths, negs) = getNumDigitsNumericArrays(str_names, st);
Expand Down