-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge identical transforms at forks #4029
Conversation
546be4c
to
bb4def7
Compare
d2e2cde
to
201f48b
Compare
@@ -70,16 +70,6 @@ | |||
"name": "data_3", | |||
"source": "data_1", | |||
"transform": [{"type": "filter", "expr": "datum.label !== 'PoleStar'"}] | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future PRs, we should try to see how we can make data_2 and data_3 merge for this example. (No need for this PR.)
src/compile/data/filter.ts
Outdated
@@ -22,4 +22,9 @@ export class FilterNode extends DataFlowNode { | |||
expr: this.expr | |||
}; | |||
} | |||
|
|||
public hash() { | |||
// FIX include something from model in hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I thought we agreed to decouple SelectionFilter from ExpressionFilter.
So we only merge ExpressionFilter, not SelectionFilter.
That said, the current code means that it will always be merged. The question is whether this is the right behavior. Maybe it's fine.
To test it, create an spec where the same selection is applied two layers and see if merging the results still produce correct output. (Maybe concat one unit spec with a layered spec then apply selection from the unit spec to both layers in the other spec.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, modify interactive_stocks_nearest_index to use timeUnit would demonstrate if this works well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sad news is that adding timeUnit
to interactive_stocks_nearest_index
doesn't even work, but the diff that this PR is causing to that version seems reasonable.
Thus, I wonder if we should ignore this case for now.
@domoritz any thoughts?
In any case, @invokesus we better remove // FIX include something from model in hash
. I don't think we should add stuff from the model to the hash, but rather get rid of model from Filter #4084
@@ -161,6 +161,7 @@ export function optimizeDataflow(dataComponent: DataComponent) { | |||
|
|||
roots.forEach(moveFacetDown); | |||
roots.forEach(mergeParse); | |||
roots.forEach(optimizers.mergeIdenticalTransforms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, why is the call pattern different from others (mergeParse isn't in optimizers?)
cc: @domoritz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mergeParse should be in optimizers. It's not super clear because in a way it's necessary for correctness as well.
LGTM, but I'll let @domoritz approve. |
97e3ac4
to
0b2f712
Compare
The coverage test is failing but I don't think it makes sense to write new tests for the concerned files. |
607f032
to
0b2f712
Compare
const hashes = transforms.map(x => x.hash()); | ||
const buckets = {}; | ||
for (let i = 0; i < hashes.length; i++) { | ||
if (buckets[hashes[i]] === undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buckets[hashes[i]] = (buckets[hashes[i]] || []).push(transforms[i])
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push
doesn't return the array only the element pushed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could use concat but whatever.
@@ -161,6 +161,7 @@ export function optimizeDataflow(dataComponent: DataComponent) { | |||
|
|||
roots.forEach(moveFacetDown); | |||
roots.forEach(mergeParse); | |||
roots.forEach(optimizers.mergeIdenticalTransforms); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to start from the leaves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think so. But I thought it would be better to have the top-down version working with the mergeParse
(which is also top-down) and refactor them both to bottom-up when I've got the multiple pass thing working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, just don't forget
src/compile/data/bin.ts
Outdated
@@ -144,6 +144,9 @@ export class BinNode extends DataFlowNode { | |||
return out; | |||
} | |||
|
|||
public hash() { | |||
return hash(this.bins); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there be problems since the node type is not part of the hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll include it in the hash.
What's the best way though? Adding a nodeType
key?
Maybe something like
return hash({nodeType: "BinNode", ...this.bins});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just type + hash
src/compile/data/window.ts
Outdated
@@ -59,6 +59,10 @@ export class WindowTransformNode extends DataFlowNode { | |||
return windowFieldDef.as || vgField(windowFieldDef); | |||
} | |||
|
|||
public hash() { | |||
return hash(this.transform); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it makes sense to have tests for these hash functions so we can read them in the diff and see whether they look good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting tests like
i) hash(node) === hash(node.clone())
or
ii) hash(node)
=== '__hash_value__'
I see the benefit that test ii might provide, but the value will have to be updated each time we change things.
However, I think I should write some tests for the hash function itself with probable inputs in util.test.ts
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latter
0882e01
to
e6e6f03
Compare
src/compile/data/stack.ts
Outdated
@@ -191,6 +192,10 @@ export class StackNode extends DataFlowNode { | |||
}, {}); | |||
} | |||
|
|||
public hash() { | |||
return 'StackNode' + hash(this._stack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stack ${hash(this._stack)}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
colon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care but @kanitw suggested that we already use colons in the hash. It doesn't really matter so I'd say space is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks nicer in the test without colon lol -- not that it's a big deal.
ff19493
to
d16240b
Compare
d16240b
to
ea53045
Compare
07632f9
to
e7ebd7a
Compare
Let me know if I missed anything. |
Remove "Node" from hash |
608f261
to
eca0d12
Compare
🎉 |
No description provided.