Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter-groupby interaction #1892

Merged
merged 12 commits into from
Jul 26, 2017
Merged

Filter-groupby interaction #1892

merged 12 commits into from
Jul 26, 2017

Conversation

rreusser
Copy link
Contributor

This PR adds a failing test for filter + groupby. The goal is to rework this into a PR that solves the issue. That may or may not be possible due to the way one is a calcTransform and the other is a transform. I'll dig in and see if I can figure out what it would take to fix this.

See: #1843

@rreusser
Copy link
Contributor Author

rreusser commented Jul 25, 2017

@alexcjohnson @etpinard I added some more challenging tests for transforms. The actual change to groupby was smaller than I expected (and deserves a close look).

I implemented recursive transform application to lock down the way transforms are recursively expanded and iterated, but I wasn't actually able to come up with a test case that it solves, so I removed it. It might be worth keeping in mind though since I suspect it might be a more robust/extensible way to lock down application of transforms.

@rreusser rreusser changed the title [DO NOT MERGE] Add failing test for filter-groupby interaction Filter-groupby interaction Jul 25, 2017
// destination. This function does not break the array reference
// connection between the split transforms it creates. That's handled in
// initialize, which creates a new empty array for each arrayAttr.
function cloneTransforms(newTrace) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting fix 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rreusser is that something we should do for all transforms?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hah - of course, because the original Lib.extendDeepNoArrays({}, trace) doesn't dive into the transforms array. Do we need a Lib.extendDeepOnlyTheseArrays({}, trace, ['transforms']) ? 🙈

is that something we should do for all transforms?

This only applies to transforms that generate multiple traces, right? I do kind of like the idea of recursive transforms, but until we do something like that, that might be more inherently robust, I think we should manage this case-by-case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So… arrayAttrs does pick up on data_array arrays nested inside transforms. The particular challenge is executing a transform after another has created expanded traces. Because if the first transform splits it into multiple traces, then each expanded trace needs its own copy of the other transforms since they, for example, have different split target or groups data. By expanding deep with no arrays and then cloning the data arrays (which are in arrayAttrs, they all have their own copy that can be filtered appropriately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed a bit of subtlety. Let me check on the identity of the split transforms. Unless I'm mistaken, something somewhere is cloning those… Let me dig that up.

Copy link
Contributor Author

@rreusser rreusser Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etpinard I wondered the same. The downfall of transforms at the moment is that there are too many corner cases and different things transforms might need to do to lock them down to a completely airtight state such that they can only do what they need to do. The result is that there are some patterns like this that should not be a concern of individual transforms at all but which might need to be applied to them all separately. Or if a clear pattern arises, then it can be abstracted. At the moment, the sample size is just a bit to small for me to see overarching patterns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if a clear pattern arises, then it can be abstracted. At the moment, the sample size is just a bit to small for me to see overarching patterns.

Very good answer. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcjohnson Never mind. I read my code more carefully. I'm just manually transferring the transforms. 👍

if(groups[j] !== groupName) continue;

arrayAttrs.forEach(pasteArray.bind(0, newTrace, trace, j));
arrayAttrs.forEach(pasteArray.bind(null, newTrace, trace, j));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't new here - except that the change you made reminded me how bind always confuses me - but it occurs to me that there's no reason to use it at all if we unwind the forEach to a for. Also pasteArray is a bit inefficient in the way it calls Lib.nestedProperty(newTrace, a) twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🎉 🎉 Glad to change 😄

Copy link
Contributor Author

@rreusser rreusser Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexcjohnson @etpinard I've refactored… actually a fair amount of groupby. It was mostly trivial reordering and inlining in order to cut down on function calls and slice and concat operations, just to streamline things a bit function-call and memory-wise.

Oh, and also it was iterating over all points for each group, which is unnecessary. Now it just iterates over all points once and uses an index to pick the right expanded trace destination.

@rreusser
Copy link
Contributor Author

rreusser commented Jul 26, 2017

@monfera's groupby tests saved me here. Refactored, optimized, and fixed some test cases. I'm content. 👍


// Initialize empty arrays for the arrayAttrs, to be split in the next step
for(j = 0; j < arrayAttrs.length; j++) {
Lib.nestedProperty(newTrace, arrayAttrs[j]).set([]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - what you're doing here is pretty similar in fact to what I'm doing with the aggregate transform 🎉

I suspect you could speed it up even a bit more by hanging onto these arrays as you create them, and avoid the inner-loop Lib.nestedProperty calls below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was bugging me, though I know the effect is small in the grand scheme of things. You know exactly what you'll need, so I just indexed them directly and then looked them up by group index. It's still lots of nestedProperty instantiation, but unless we get fancy (read: fragile), I think this is somewhat minimal for the task at hand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


for(i = 0; i < groupNames.length; i++) {
groupName = groupNames[i];
groupIndex[groupName] = i;
Copy link
Contributor Author

@rreusser rreusser Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is tricky. It's actually the array index that I'm indexing. So it should be called indexIndex 😕

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexLookup. Better, maybe. A variable by any other name would smell… as… hmm… 🌹

@alexcjohnson
Copy link
Collaborator

Looks great! 💃

@rreusser
Copy link
Contributor Author

I'll defer to someone else on whether this is a bugfix or a feature. Since the output for interactions was simply incorrect/nonsensical before, I'd tend to consider this a bugfix.

@alexcjohnson alexcjohnson added the bug something broken label Jul 26, 2017
@rreusser rreusser merged commit 658a096 into master Jul 26, 2017
@rreusser rreusser deleted the filter-groupby-combo branch July 26, 2017 19:11
@etpinard
Copy link
Contributor

I'd tend to consider this a bugfix.

Yeah. This is a 🪲 fix for sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants