-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: correct references to flattened fields in selection signals. #5351
Conversation
Looks good. Are you adding flattening for fields that are only referenced in selections as well? |
I'm not planning to primarily due to time constraints (with my limited time, I'd rather push forward on code paths I'm already familiar and this PR felt like a principled fix rather than a hack/workaround). |
Okay. Can you explain what you meant by "this PR felt like a principled fix rather than a hack/workaround"? |
Sure -- I mean that the description/reason I gave when opening this PR feels principled to me in the sense that selection signals do not need to make any assumptions about what is or isn't happening to the |
8837838
to
23849a7
Compare
Just to clarify, if the user starts with this spec. {
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"selection": {
"grid": {
"type": "multi", "fields": ["nested.b"]
}
},
"data": {
"values": [
{"nested":{ "a": "1", "b": 28}},
{"nested":{ "a": "2", "b": 55}},
{"nested":{ "a": "3", "b": 43}},
{"nested":{ "a": "4", "b": 91}},
{"nested":{ "a": "5", "b": 81}},
{"nested":{ "a": "6", "b": 53}},
{"nested":{ "a": "7", "b": 19}},
{"nested":{ "a": "8", "b": 87}},
{"nested":{ "a": "8", "b": 52}}
]
},
"mark": "point",
"encoding": {
"y": {
"field": "nested.a",
"type": "quantitative"
},
"x": {
"field": "nested.a",
"type": "quantitative"
},
"color": {
"condition": {"selection": "grid", "value": "red"}
}
}
} And now they add a calculate transform to derive fields In the spec below, the user has to know that they have to write {
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"selection": {
"grid": {
"type": "multi", "fields": ["nested.bb"]
}
},
"data": {
"values": [
{"nested":{ "a": "1", "b": 28}},
{"nested":{ "a": "2", "b": 55}},
{"nested":{ "a": "3", "b": 43}},
{"nested":{ "a": "4", "b": 91}},
{"nested":{ "a": "5", "b": 81}},
{"nested":{ "a": "6", "b": 53}},
{"nested":{ "a": "7", "b": 19}},
{"nested":{ "a": "8", "b": 87}},
{"nested":{ "a": "8", "b": 52}}
]
},
"transform": [{
"calculate": "datum.nested.a", "as": "nested.aa"
}, {
"calculate": "datum.nested.b", "as": "nested.bb"
}],
"mark": "point",
"encoding": {
"y": {
"field": "nested.aa",
"type": "quantitative"
},
"x": {
"field": "nested.aa",
"type": "quantitative"
},
"color": {
"condition": {"selection": "grid", "value": "red"}
}
}
} |
23849a7
to
96d9071
Compare
Thanks for that example, @domoritz! That helped me understand the full scope of flattening. And it turns out updating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second example in #5351 (comment) still doesn't work for me. Clicking the circles doesn't make them red.
If I replace
{
"name": "grid_tuple_fields",
"value": [{"type": "E", "field": "nested.bb"}]
},
with
{
"name": "grid_tuple_fields",
"value": [{"type": "E", "field": "nested\\.bb"}]
},
in the generated Vega, it works.
src/compile/data/parse.ts
Outdated
@@ -322,6 +322,7 @@ export function parseData(model: Model): DataComponent { | |||
} | |||
|
|||
head = ParseNode.makeImplicitFromEncoding(head, model, ancestorParse) || head; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that we may create unnecessarily flattening for fields that are already flattened in makeImplicitFromEncoding
. I see three solutions. First, we could extract the code that computes implicit
for encodings and selections and then call makeWithAncestors
once for a single implicit dict. Second, makeImplicitFromEncoding could return its implicit
, which we pass to makeImplicitFromSelection
, and there we ignore fields that are already flattened otherwise. Second, you could merge makeImplicitFromEncoding
and makeImplicitFromSelection
so that they share the same implicit
dict. I guess this is like the first approach but encapsulates the logic. I'd prefer the first as it's most explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. Yes, I'd thought about this and tested it informally (e.g., using the spec below in which both makeImplicitFromEncoding
and makeImplicitFromSelection
are called). As far as I could tell, duplicate flattening does not occur. But, I wasn't sure how to formally test this as a unit test (i.e., focused on just parsing/assembling the ParseNode
). I'm happy to add a test case that parses the whole model and assembles the full dataflow and test what's in the transform
array, but that doesn't feel like the right approach. Open to ideas/suggestions here.
Re: your suggested solutions, option (1) won't work because the existing fromEncoding
logic expects TypedFieldDefs
-- selections only hold field and channel names as strings to simplify assembly. I'm happy to merge into a single makeImplicitFromEncodingAndSelection
such that only a single implicit
object is populated and passed to makeWithAncestors
if we prefer that. I chose the current approach because I thought that better separated concerns.
Let me know what you'd prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{
"$schema": "https://vega.github.io/schema/vega-lite/v3.json",
"selection": {
"grid": {
"type": "multi", "fields": ["nested.b"]
}
},
"data": {
"values": [
{"nested":{ "a": "1", "b": 28}},
{"nested":{ "a": "2", "b": 55}},
{"nested":{ "a": "3", "b": 43}},
{"nested":{ "a": "4", "b": 91}},
{"nested":{ "a": "5", "b": 81}},
{"nested":{ "a": "6", "b": 53}},
{"nested":{ "a": "7", "b": 19}},
{"nested":{ "a": "8", "b": 87}},
{"nested":{ "a": "8", "b": 52}}
]
},
"mark": "point",
"encoding": {
"y": {
"field": "nested.b",
"type": "quantitative"
},
"x": {
"field": "nested.b",
"type": "quantitative"
},
"color": {
"condition": {"selection": "grid", "value": "red"}
}
}
}
does not yield any duplicate flattening (though it should, were it an issue I think?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which field would you expect to be flattened twice?
I can take care of avoiding duplicate flattening since I am more familiar with the dataflow stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think we did a simultaneous update. In the updated spec above, the selection is projected over nested.b
which is also used in encoding
. We would expect to see duplicates, but we don't. I think this was your original concern in #5351 (comment) but perhaps I'm misunderstanding?
Thanks, I'd forgotten to send a PR for updating the |
Is it necessary to update Vega or should Vega-Lite generate different spec as I suggested above? |
We should update Vega to keep our logic consistent. Otherwise, we introduce more complexity of needing to remember when we're referring to flattened fields, and when we're referring to nested fields. I'd rather not need to remember how to coordinate that, and just reference flattened fields always. |
Oh I just noticed that Vega-Lite outputs |
We have to remember already since when we access a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s a nice solution! Thanks @domoritz! |
Fields are only flattened if they participate in encoding rules. Thus, we continue to use bracket notation to access datum values in signal expressions as we cannot be guaranteed that the flattened field will be present. However, the field name stored as part of a SelectionProjection will always use dot notation as these names are either returned to us flattened by
model.vgField
or users explicitly specify them in the input spec (in which case, dot notation is their only recourse). Thus, when referencing the field via a selection's top-level signal, we just directly access the flattened field name, rather than unpacking it via bracket notation.Fixes #5334.