Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make facet support sort #2176

Closed
10 tasks
kanitw opened this issue Apr 16, 2017 · 15 comments · Fixed by #3854
Closed
10 tasks

Make facet support sort #2176

kanitw opened this issue Apr 16, 2017 · 15 comments · Fixed by #3854
Assignees
Labels
Area - Data & Transform P1 Critical -- to fix ASAP

Comments

@kanitw
Copy link
Member

kanitw commented Apr 16, 2017

Output modify

  • New window transform to derive the field to sort

  • facet's groupby need to include the derived field

  • sort of the cell group need to use the derived field instead

  • add similar calculation to the header aggregate calculation

  • use the aggregated field to sort the header too

@kanitw kanitw added this to the 2.0.0-α-2 Layer & Facet milestone Apr 16, 2017
@kanitw kanitw self-assigned this Apr 21, 2017
@kanitw kanitw assigned domoritz and unassigned kanitw Apr 28, 2017
@kanitw kanitw modified the milestones: 2.0.0-β Important Feature & Patches, 2.0.0-α-2 Layer & Facet Apr 28, 2017
@domoritz
Copy link
Member

domoritz commented May 24, 2017

@kanitw
Copy link
Member Author

kanitw commented Jun 6, 2017

Sorting when there is only row or column should be straightforward.
However, sorting both row and column can be tricky as we want to sort by the global value of each row and each column, not each cell.

Thus, group mark's sort is not sufficient if sort is specified for both row and column.

cc: @jheer

@kanitw kanitw self-assigned this Jun 7, 2017
@jheer
Copy link
Member

jheer commented Jun 7, 2017

Before the new layout features, I believe this would be handled by sorting scale domains. In that case, sorting works by generating an aggregate transform under the hood and sorting by that. An analogous approach could be taken here. Either (1) one could create an explicit aggregate and sort by the results using the group mark sort (possibly requiring a lookup along the way), or (2) one could generate a scale (with the same parameterization as before) whose sole purpose is to generate range values (e.g., over [0,1]) that one can the sort by. I think option (1) is cleaner, less convoluted, and possibly more performant.

@kanitw
Copy link
Member Author

kanitw commented Jun 7, 2017

Before the new layout features, I believe this would be handled by sorting scale domains.

Yep.

Either (1) one could create an explicit aggregate and sort by the results using the group mark sort (possibly requiring a lookup along the way)

I was thinking that we need to do this too, but that will add some more complexity to the dataflow as we will have to fork two datasets to calculate aggregate for row and for column and then "merge" both of them using lookup (twice). Thus, I wanna run it by you first if this is the only way.
(This is quite complicated, so I also wonder if we better support this post 2.0)

Meanwhile, I was thinking for a while whether Vega should support a transform that adds a new column with group summary values (like in dplyr). This would be equivalent to fork a data source, aggregate and join it back, but can be done within one transform. This could be useful in many cases. For example, calculating group residual (value - group mean) would require such operation. Obviously, this transform would make sorting operation here way simpler too.

In any case, I think we would start by supporting sorting when there is only row or only column first as doing so do not require forking new data sources and lookup. And we can see if we have enough time to support sorting when there are both row and column.

@jheer
Copy link
Member

jheer commented Jun 7, 2017

Sure, I could imagine adding an operator the performs an aggregate and automatically joins the results back onto the input stream. That could be useful for a number of calculations. The internal implementation could reuse the existing aggregate and I assume the actual parameters would be more or less identical.

Any thoughts on a good name for such an operator?

@kanitw
Copy link
Member Author

kanitw commented Jun 7, 2017

Any thoughts on a good name for such an operator?

Good question. (I originally did not name it above because I don't have a good name off the top of my head too :p)

Now that I'm forced to think, I'm gonna some bad ideas to brainstorm:

  • deriveaggregate (rationale: derive a new aggregate field)
  • augment/supplementaggregate (augment/supplement the data with new aggregate field)

Another possible idea is to have a boolean flag for the original aggregate transform.

  • augment/supplement (augment/supplement existing table)
  • addfields

I personally think that this is more like a different transform, so I don't really like boolean flag -- but two-words name are also bad -- so it's probably worth mentioning here. Plus, it seems like MongoDB uses $addFields for a similar functionality. (But their $addFields has additional parameters.

cc: @arvind

@kanitw
Copy link
Member Author

kanitw commented Jun 7, 2017

Thinking a bit more -- I start liking the "boolean" flag approach a bit more as it can fit in naturally with our summarize transform in Vega-Lite. That said, there are probably some room for improvement -- so I wonder what you guys think.

@jheer
Copy link
Member

jheer commented Jun 8, 2017

OK, I've added a new joinaggregate operator, which has been released in vega-dataflow 2.0.0-beta.26. This was surprisingly clean to add by simply subclassing the existing aggregate, and is more efficient than using a combination of aggregate and lookup.

The new operator and documentation will be included in the next Vega beta release.

@kanitw kanitw assigned kanitw and unassigned kanitw and domoritz Jun 15, 2017
@kanitw kanitw assigned domoritz and yhoonkim and unassigned kanitw and domoritz Jun 24, 2017
@kanitw
Copy link
Member Author

kanitw commented Jun 27, 2017

@domoritz -- I believe the right place to joinaggregate for sorting would be before adding FacetNode?

(Since facetNode will be moved down and aggregate might show up above the facetNode, but then will have to make sure that the facetNode takes data that still have an output from joinaggregate)

We will have to help with the dataflow part and @yhoonkim is happy to help with the rest of the related code.

@kanitw kanitw modified the milestones: 2.1? Important Patches, 2.0.0 Critical Feature & Patches Jun 27, 2017
@kanitw kanitw changed the title Make facet support sort Make facet support sort by field Aug 13, 2017
@kanitw kanitw modified the milestones: 2.x? Important Patches, 2.x Data Transforms Sep 22, 2017
@kanitw kanitw added the Blocked 🕐 For issues that are blocked by other issues label Oct 12, 2017
@curran
Copy link

curran commented Nov 3, 2017

Greetings,

I'm wondering if there's a way to specify the ordering of a FacetFieldDef?

For example, I would have expected to be able to specify the scale domain as follows, but this configuration is not changing the ordering:

    {
      "$schema": "https://vega.github.io/schema/vega-lite/v2.json",
      "data": { values: data },
      "mark": "bar",
      "encoding": {...},
        "y": {...},
        "column": {
          "field": "variable",
          "type": "ordinal",
          "scale": {
            "type": "ordinal",
            "domain": ["Online news sites","Social media","Printed newspapers","TV"] // <--- This here is not working.
          }
        },

Perhaps I'm missing something? Or is it not supported to control the ordering of facets? Thank you.

@yhoonkim yhoonkim removed their assignment Jan 17, 2018
@kanitw kanitw changed the title Make facet support sort by field Make facet support sort Mar 18, 2018
@phivk
Copy link

phivk commented Mar 18, 2018

Hi, I have a use case where I want to sort only a column facet (no row).
Specifically, I'd like to sort on 2018 percentage in this grouped bar chart:
export 1
src: https://vega.github.io/editor/#/gist/vega-lite/phivk/0692b23a7009dd430699d58bc63a322d/53f36d8041e281669fdd7fc082761335c12f44a0/sk_2016vs2018_polls.json

I've tried adding an order attribute as was suggested as a workaround for stacked charts, but alas, no luck...
I understand that this is currently not supported, but was wondering if there is another workaround to achieve this kind of sort?

@kanitw
Copy link
Member Author

kanitw commented Mar 18, 2018

One workaround is to modify the underlying Vega spec.

@phivk
Copy link

phivk commented Mar 18, 2018

thanks for the quick reply!
I just got started using Vega-Lite, could you point me at a place in the Vega docs I should start looking into for this?

@kanitw
Copy link
Member Author

kanitw commented Mar 18, 2018

You can use vega editor to generate vega output that you can start from. Then you can use mark sort operator to sort the mark groups (see https://vega.github.io/vega/docs/marks/).

@kanitw kanitw removed the Blocked 🕐 For issues that are blocked by other issues label Mar 18, 2018
@kanitw kanitw added Help wanted P1 Critical -- to fix ASAP labels Apr 29, 2018
@kanitw
Copy link
Member Author

kanitw commented Jun 6, 2018

Fixed in #3854 -- only one case left (supporting sort array) -- but let's fork a new issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Data & Transform P1 Critical -- to fix ASAP
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants