Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correct references to flattened fields in selection signals. #5351

Merged
merged 4 commits into from
Sep 10, 2019

Conversation

arvind
Copy link
Member

@arvind arvind commented Aug 28, 2019

Fields are only flattened if they participate in encoding rules. Thus, we continue to use bracket notation to access datum values in signal expressions as we cannot be guaranteed that the flattened field will be present. However, the field name stored as part of a SelectionProjection will always use dot notation as these names are either returned to us flattened by model.vgField or users explicitly specify them in the input spec (in which case, dot notation is their only recourse). Thus, when referencing the field via a selection's top-level signal, we just directly access the flattened field name, rather than unpacking it via bracket notation.

Fixes #5334.

@arvind arvind requested a review from domoritz August 28, 2019 23:58
@arvind arvind changed the title Correct references to flattened fields in selection signals. fix: correct references to flattened fields in selection signals. Aug 29, 2019
@domoritz
Copy link
Member

Looks good. Are you adding flattening for fields that are only referenced in selections as well?

@arvind
Copy link
Member Author

arvind commented Aug 29, 2019

I'm not planning to primarily due to time constraints (with my limited time, I'd rather push forward on code paths I'm already familiar and this PR felt like a principled fix rather than a hack/workaround).

@domoritz
Copy link
Member

Okay. Can you explain what you meant by "this PR felt like a principled fix rather than a hack/workaround"?

@arvind
Copy link
Member Author

arvind commented Aug 29, 2019

Sure -- I mean that the description/reason I gave when opening this PR feels principled to me in the sense that selection signals do not need to make any assumptions about what is or isn't happening to the datum. Instead, selections can remain agnostic/independent about whether any flattening has occurred and assemble purely based on what the user specifies (which matches what is returned by model.vgField).

@domoritz domoritz force-pushed the as/selectionNestedFields branch from 8837838 to 23849a7 Compare August 29, 2019 00:44
@domoritz
Copy link
Member

domoritz commented Aug 29, 2019

Just to clarify, if the user starts with this spec.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "selection": {
    "grid": {
      "type": "multi", "fields": ["nested.b"]
    }
  },
  "data": {
    "values": [
      {"nested":{ "a": "1", "b": 28}},
      {"nested":{ "a": "2", "b": 55}},
      {"nested":{ "a": "3", "b": 43}},
      {"nested":{ "a": "4", "b": 91}},
      {"nested":{ "a": "5", "b": 81}},
      {"nested":{ "a": "6", "b": 53}},
      {"nested":{ "a": "7", "b": 19}},
      {"nested":{ "a": "8", "b": 87}},     
      {"nested":{ "a": "8", "b": 52}}
    ]
  },
  "mark": "point",
  "encoding": {
    "y": {
      "field": "nested.a",
      "type": "quantitative"
    },
    "x": {
      "field": "nested.a",
      "type": "quantitative"
    },
    "color": {
      "condition": {"selection": "grid", "value": "red"}
    }
  }
}

And now they add a calculate transform to derive fields nested.aa and nested.bb.

In the spec below, the user has to know that they have to write "fields": ["nested\\.bb"] instead of "fields": ["nested.bb"] in the selection. They don't need to change anything in the encoding.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "selection": {
    "grid": {
      "type": "multi", "fields": ["nested.bb"]
    }
  },
  "data": {
    "values": [
      {"nested":{ "a": "1", "b": 28}},
      {"nested":{ "a": "2", "b": 55}},
      {"nested":{ "a": "3", "b": 43}},
      {"nested":{ "a": "4", "b": 91}},
      {"nested":{ "a": "5", "b": 81}},
      {"nested":{ "a": "6", "b": 53}},
      {"nested":{ "a": "7", "b": 19}},
      {"nested":{ "a": "8", "b": 87}},     
      {"nested":{ "a": "8", "b": 52}}
    ]
  },
  "transform": [{
    "calculate": "datum.nested.a", "as": "nested.aa"
  }, {
    "calculate": "datum.nested.b", "as": "nested.bb"
  }],
  "mark": "point",
  "encoding": {
    "y": {
      "field": "nested.aa",
      "type": "quantitative"
    },
    "x": {
      "field": "nested.aa",
      "type": "quantitative"
    },
    "color": {
      "condition": {"selection": "grid", "value": "red"}
    }
  }
}

@arvind arvind force-pushed the as/selectionNestedFields branch from 23849a7 to 96d9071 Compare September 7, 2019 17:56
@arvind
Copy link
Member Author

arvind commented Sep 7, 2019

Thanks for that example, @domoritz! That helped me understand the full scope of flattening. And it turns out updating formatparse wasn't as hard/as much work as I'd initially feared, yay! Updated this PR to more thoroughly address this issue, with corresponding test cases. So it should be ready for re-review.

Copy link
Member

@domoritz domoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second example in #5351 (comment) still doesn't work for me. Clicking the circles doesn't make them red.

If I replace

{
      "name": "grid_tuple_fields",
      "value": [{"type": "E", "field": "nested.bb"}]
    },

with

{
      "name": "grid_tuple_fields",
      "value": [{"type": "E", "field": "nested\\.bb"}]
    },

in the generated Vega, it works.

@@ -322,6 +322,7 @@ export function parseData(model: Model): DataComponent {
}

head = ParseNode.makeImplicitFromEncoding(head, model, ancestorParse) || head;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we may create unnecessarily flattening for fields that are already flattened in makeImplicitFromEncoding. I see three solutions. First, we could extract the code that computes implicit for encodings and selections and then call makeWithAncestors once for a single implicit dict. Second, makeImplicitFromEncoding could return its implicit, which we pass to makeImplicitFromSelection, and there we ignore fields that are already flattened otherwise. Second, you could merge makeImplicitFromEncoding and makeImplicitFromSelection so that they share the same implicit dict. I guess this is like the first approach but encapsulates the logic. I'd prefer the first as it's most explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Yes, I'd thought about this and tested it informally (e.g., using the spec below in which both makeImplicitFromEncoding and makeImplicitFromSelection are called). As far as I could tell, duplicate flattening does not occur. But, I wasn't sure how to formally test this as a unit test (i.e., focused on just parsing/assembling the ParseNode). I'm happy to add a test case that parses the whole model and assembles the full dataflow and test what's in the transform array, but that doesn't feel like the right approach. Open to ideas/suggestions here.

Re: your suggested solutions, option (1) won't work because the existing fromEncoding logic expects TypedFieldDefs -- selections only hold field and channel names as strings to simplify assembly. I'm happy to merge into a single makeImplicitFromEncodingAndSelection such that only a single implicit object is populated and passed to makeWithAncestors if we prefer that. I chose the current approach because I thought that better separated concerns.

Let me know what you'd prefer.

Copy link
Member Author

@arvind arvind Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v3.json",
  "selection": {
    "grid": {
      "type": "multi", "fields": ["nested.b"]
    }
  },
  "data": {
    "values": [
      {"nested":{ "a": "1", "b": 28}},
      {"nested":{ "a": "2", "b": 55}},
      {"nested":{ "a": "3", "b": 43}},
      {"nested":{ "a": "4", "b": 91}},
      {"nested":{ "a": "5", "b": 81}},
      {"nested":{ "a": "6", "b": 53}},
      {"nested":{ "a": "7", "b": 19}},
      {"nested":{ "a": "8", "b": 87}},     
      {"nested":{ "a": "8", "b": 52}}
    ]
  },
  "mark": "point",
  "encoding": {
    "y": {
      "field": "nested.b",
      "type": "quantitative"
    },
    "x": {
      "field": "nested.b",
      "type": "quantitative"
    },
    "color": {
      "condition": {"selection": "grid", "value": "red"}
    }
  }
}

does not yield any duplicate flattening (though it should, were it an issue I think?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which field would you expect to be flattened twice?

I can take care of avoiding duplicate flattening since I am more familiar with the dataflow stuff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think we did a simultaneous update. In the updated spec above, the selection is projected over nested.b which is also used in encoding. We would expect to see duplicates, but we don't. I think this was your original concern in #5351 (comment) but perhaps I'm misunderstanding?

@arvind
Copy link
Member Author

arvind commented Sep 9, 2019

Thanks, I'd forgotten to send a PR for updating the vlSelectionTest function to no longer use a field accessor function. We should see the expected behavior with vega/vega#2002.

@domoritz
Copy link
Member

domoritz commented Sep 9, 2019

Is it necessary to update Vega or should Vega-Lite generate different spec as I suggested above?

@arvind
Copy link
Member Author

arvind commented Sep 9, 2019

We should update Vega to keep our logic consistent. Otherwise, we introduce more complexity of needing to remember when we're referring to flattened fields, and when we're referring to nested fields. I'd rather not need to remember how to coordinate that, and just reference flattened fields always.

@arvind
Copy link
Member Author

arvind commented Sep 9, 2019

Oh I just noticed that Vega-Lite outputs nested\\.b for field refs elsewhere. Which function should I be calling to output the escaped version instead?

@domoritz
Copy link
Member

domoritz commented Sep 9, 2019

We have to remember already since when we access a field in Vega and use a ., Vega interprets it as a nested access. But if we use . to write a field with as, the field is flattened and . becomes part of the name of the field.

@domoritz
Copy link
Member

@arvind Please have a look at afc6337 and merge if you think it looks good.

@domoritz domoritz self-requested a review September 10, 2019 04:20
Copy link
Member

@domoritz domoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge after reviewing afc6337. @arvind

@arvind
Copy link
Member Author

arvind commented Sep 10, 2019

That’s a nice solution! Thanks @domoritz!

@arvind arvind merged commit 88333ef into master Sep 10, 2019
@arvind arvind deleted the as/selectionNestedFields branch September 10, 2019 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Selections are not working with nested data
2 participants