-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom Sort Order #3423
Custom Sort Order #3423
Conversation
Thanks. Before we review, exclude SVG changes first? If the vega files don't change, don't include SVG changes. |
a097b13
to
749f35c
Compare
@kanitw Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress! Still some comments to be addressed. Please also add unit tests.
We're getting close!
_config.yml
Outdated
@@ -49,7 +49,7 @@ kramdown: | |||
syntax_highlighter: rouge | |||
auto_id_stripping: true | |||
|
|||
plugins: | |||
plugins_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this related to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put this in a separate PR.
site/docs/encoding/sort.md
Outdated
@@ -55,14 +55,22 @@ If the channel has a discrete scale (`band`, `point` or `ordinal`), the field's | |||
|
|||
3) Unsorted – `null` – The field is not sorted. This is equivalent to specifying `sort: false` in [Vega's scales](https://vega.github.io/vega/docs/scales/#sort). | |||
|
|||
4) Specify custom order by providing custom `scale`'s [`domain`](scale.html#domain). (In this case, you don't need to use `sort` property.) | |||
4) Specify custom order by: | |||
1) providing `sort` with an array that lists valid input domains |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite misleading.
Using scale domain literally lists "(all) valid input domains". But the whole point of having sort
supports an array is to overcome this limitation and instead lists "preferred order of values".
Thus, instead of having sub 1) and 2) here. I would rather:
- make it focus on sort array
- explain what happen if data field contains unlisted values? + add an example
- Add a
__Note:__
why sort array is different from using scale domain / they should prefer usingsort
array unless they really want to set domain.
src/fielddef.ts
Outdated
@@ -168,12 +168,12 @@ export interface ScaleFieldDef<F> extends FieldDef<F> { | |||
|
|||
/** | |||
* Sort order for the encoded field. | |||
* Supported `sort` values include `"ascending"`, `"descending"` and `null` (no sorting). | |||
* Supported `sort` values include `"ascending"`, `"descending"`, `"array of field's domains"` and `null` (no sorting). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"array of field domain" is misleading.
"array specifying preferred order of the values"?
Please also briefly explain what happens if sort doesn't include all domain values.
src/compile/data/calculate.ts
Outdated
public static calculateExpressionFromSortField(field: string, sortFields: string[]): string { | ||
let expression = ''; | ||
let count = 0; | ||
for (const sortField of sortFields) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need the index anyway, better just use normal for loop and do sortFields[i]
.
for ... of
is normally for when you don't care about index. (With babel, it expands to be full normal for loop.)
src/compile/data/calculate.ts
Outdated
@@ -15,6 +18,32 @@ export class CalculateNode extends DataFlowNode { | |||
super(); | |||
} | |||
|
|||
public static makeAllFromSort(model: ModelWithField): CalculateNode[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is quite vague for its very specific purpose. ("Make all" of what? This is actually a special purposed calculate node, not a generic calculate node.)
Perhaps, makeAllForSortIndex
is better?
(Also feel free to suggest better name.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also start wondering that this is quite a special purposed one, so I wonder if having a distinct type of node for this will help with optimization in longer term?
(But since this is already working, maybe it's waste of time to change it now -- I'm just asking.)
@domoritz -- any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is that special and I can't think of a special optimization that we would do. Let's leave it for now.
}, | ||
"mark": "bar", | ||
"encoding": { | ||
"x": {"field": "a", "type": "ordinal", "sort": ["D", "G", "F"]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's better distinguish between (1) the example that show how sort order normally works when all items are included and (2) another example to explain behavior when some values is "accidentally" ignored?
The first can just have three values B, A, C.
The second can have 2-3 more (D,E,F?).
No need for 9 categories -- only make it more confusing.
src/compile/data/calculate.ts
Outdated
if (isScaleFieldDef(fieldDef) && isSortArray(fieldDef.sort)) { | ||
const transform: CalculateTransform = { | ||
calculate: CalculateNode.calculateExpressionFromSortField(fieldDef.field, fieldDef.sort), | ||
as: `${fieldDef.field}_sort_index` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should channel be a part of the name?
(e.g., ${channel}_${fieldDef.field}_sort_index
.)
Imagine a troll case where I have x and y encode the same field, but use different sort
array. Your current code will behave incorrectly.
src/compile/scale/domain.ts
Outdated
if (isSortArray(sort)) { | ||
return { | ||
op: 'min', | ||
field: `${model.vgField(channel)}_sort_index`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/compile/scale/domain.ts
Outdated
@@ -266,6 +266,15 @@ export function domainSort(model: UnitModel, channel: ScaleChannel, scaleType: S | |||
|
|||
const sort = model.sort(channel); | |||
|
|||
// if the sort is specified with array, use the created data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the derived sort index field
src/compile/data/parse.ts
Outdated
@@ -249,6 +249,11 @@ export function parseData(model: Model): DataComponent { | |||
tu.parent = head; | |||
head = tu; | |||
} | |||
|
|||
for (const calculate of CalculateNode.makeAllFromSort(model)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also update visual diagram on the top.
acc88c5
to
985dbfc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're getting close! Still some changes needed though. Keep up the good work!
site/docs/encoding/sort.md
Outdated
4) Specify custom order by: | ||
1) providing `sort` with an array that lists valid input domains | ||
2) providing custom `scale`'s [`domain`](scale.html#domain) | ||
4) Specify custom order by providing custom `scale`'s [`domain`](scale.html#domain). (In this case, you don't need to use `sort` property.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned earlier, this shouldn't be a "suggested" way for sorting. Plus, putting unpreferred way on top of preferred way is weird.
Please just make this like a note that it can achieve similar effect but has some pitfall(s). (The sort
array can be 4. and this should become just a note.)
Actually, we should also mention about sort
in scale's domain
documentation too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already have that in domain
documentation (https://vega.github.io/vega-lite/docs/scale.html#domain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it didn't say what's the different between domain array and sort array yet so briefly mention about it and link to the full example in sort page would be nice.
site/docs/encoding/sort.md
Outdated
|
||
#### Example: Sorting Ordinal Scale by Another Field | ||
5) Sorting by preferred order of values by providing `sort` with array specifying preferred order of the values. Unspecified values will assume their original orders in the data source, preceded by the orders in the sort array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting by preferred order of values by providing
sort
with array specifying preferred order of the values.
"preferred order of values" showing up twice, quite redundant.
"providing sort
as an array specifying preferred order of the values." ?
src/fielddef.ts
Outdated
@@ -169,8 +169,9 @@ export interface ScaleFieldDef<F> extends FieldDef<F> { | |||
|
|||
/** | |||
* Sort order for the encoded field. | |||
* Supported `sort` values include `"ascending"`, `"descending"`, `"array of field's domains"` and `null` (no sorting). | |||
* Supported `sort` values include `"ascending"`, `"descending"`, `"array specifying the preferred order of values"` and `null` (no sorting). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"array specifying the preferred order of values"
suggests that this is an enum array, so it's a bit misleading.
"ascending"
, "descending"
, null
, or an array specifying the preferred order of values
src/fielddef.ts
Outdated
* For fields with discrete domains, `sort` can also be a [sort field definition object](https://vega.github.io/vega-lite/docs/sort.html#sort-field). | ||
* For `sort` as an "array specifying the preferred order of values", the sort order will obey the values in the array, followed by any unspecified values in their original order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why quoting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add link to the section that you provide an example for sort array.
site/docs/encoding/sort.md
Outdated
|
||
The following example sorts x by mean of Horsepower. | ||
In the case that sort array contains every field value, the sort order will follow the specified values in the array. | ||
<div class="vl-example" data-name="bar_custom_sort_full"></div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line break before DIV?
site/docs/encoding/sort.md
Outdated
|
||
<div class="vl-example" data-name="histogram_sort_mean"></div> | ||
Sort order will precede by the specified values in the array, even though some value is ignored. Note how `D`, `E` and `F` still follows their original order, while `B`, `A` and `C` follows the order in sort array. | ||
<div class="vl-example" data-name="bar_custom_sort_partial"></div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line break before DIV?
Oh and please make tests pass. :) |
site/static/main.ts
Outdated
@@ -72,7 +72,7 @@ function embedExample($target: any, spec: TopLevelExtendedSpec, actions=true, to | |||
}); | |||
|
|||
if (tooltip) { | |||
vegaLite(view, spec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you fix this in a separate PR that we can undo easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup
fix jekyll error and linting
create examples and fix jekyll error in base.html
undo jekyll errors
@kanitw I just rebased. It should be good to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Only have a few last commennts and this should be good to go :)
I also made a pass over the docs. :)
"data": { | ||
"values": [ | ||
{"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43}, | ||
{"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, change this to Z, Y, X?
I believe the result would be B, A, C, Z, Y, X.
(The reason we should change is that using B, A, C, D, E, F, it won't be clear whether it's D E F because of the original order or because D is naturally before E and F.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Please double check to make sure I'm not crazy :p)
src/compile/data/parse.ts
Outdated
@@ -202,6 +204,10 @@ export function parseData(model: Model): DataComponent { | |||
} | |||
|
|||
head = TimeUnitNode.makeFromEncoding(head, model) || head; | |||
|
|||
for (const calculate of CalculateNode.makeAllForSortIndex(head, model)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really still need for loop here? Can't we just set head to the last item in the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it seems like the constructor and assemble code for CalculateNode
doesn't handle the case where it has to generate multiple nodes from one instance. I could change it so that the constructor doesn't require for loop, but that would affect the code we already have for general CalculateNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Why would the following code be different from the for loop you have here?
const sortIndexNodes = CalculateNode.makeAllForSortIndex(head, model);
head = sortIndexNodes[sortIndexNodes.length - 1];
Can you explain more if I overlook anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what I'm trying to do might be different here. I saw that @domoritz made some changes to the way bin
and timeUnit
nodes are created and assembled. Previously we had to iterate over for loop to construct nodes returned from methods like makeAll
, but the new changes made it so that makeAll
returns only one node with all the components inside a dictionary with field name as a key(instead of multiple nodes with separate components). So the constructor of bin
and timeUnit
now takes the parent and the dictionary as params.
Anyway, I will follow your suggestion since I think it doesn't make sense for CalculateNode
to have the same behavior as bin
and timeUnit
node as I explained above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we had to iterate over for loop to construct nodes returned from methods like makeAll, but the new changes made it so that makeAll returns only one node with all the components inside a dictionary with field name as a key(instead of multiple nodes with separate components). So the constructor of bin and timeUnit now takes the parent and the dictionary as params.
I think the old loop is updating head like you do down here in the past, but it's kinda annoying because you need to update head and keep passing current head to the next one.
So we incorporate head into constructor to make the code in data/parse.ts shorten by half.
returns only one node with all the components inside a dictionary with field name as a key
I think your case is more like geopoint/ geojson transform. Which basically updating parent within the parseAll
method. So I think you could follow the pattern and do
head = CalculateNode.parseAllSortIndex(head, model);
where your parseAllSortIndex
returns just the final head (parent)
cc: @domoritz Please correct me if I'm wrong.
src/compile/scale/domain.ts
Outdated
if (isSortArray(sort)) { | ||
return { | ||
op: 'min', | ||
field: `${channel}_${model.vgField(channel)}_sort_index`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you use model.vgField(channel)
but in the calculate.ts file you use ${channel}_{fieldDef.field}_sort_index
.
For the most cases, this should be fine, but I wonder if there are cases where we apply aggregation function for categorical value -- in such case model.vgField(channel)
would return different result from fieldDef.field
.
I think we should have a helper method in calculate.ts
for generating this field name (from a fieldDef
?) and make both calculate.ts and the code here call the same method for safety in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we stick to vgField
or just fieldDef.field
as a field name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vgField(fieldDef)
?
src/fielddef.ts
Outdated
@@ -182,12 +182,13 @@ export interface ScaleFieldDef<F> extends FieldDef<F> { | |||
|
|||
/** | |||
* Sort order for the encoded field. | |||
* Supported `sort` values include `"ascending"`, `"descending"` and `null` (no sorting). | |||
* Supported `sort` values include `"ascending"`, `"descending"`, `array specifying the preferred order of values` and `null` (no sorting). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sort
property can be "ascending"
, "descending"
, null
, or an array specifying the preferred order of values.
(Remove back ticks for the array case too since you have currently have inside backticks is not the exact string you gonna put in the JSON.)
site/docs/encoding/sort.md
Outdated
|
||
<div class="vl-example" data-name="bar_custom_sort_full"></div> | ||
|
||
Sort order will precede by the specified values in the array, even though some value is ignored. Note how `D`, `E` and `F` still follows their original order, while `B`, `A` and `C` follows the order in sort array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Z, Y, X as suggested?
@kanitw Hopefully this fixes the issue with for loop in node parsing. I move that iteration into how |
…e export function for sort index field name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, provided that you make this one last change :)
if (isScaleFieldDef(fieldDef) && isSortArray(fieldDef.sort)) { | ||
const transform: CalculateTransform = { | ||
calculate: CalculateNode.calculateExpressionFromSortField(fieldDef.field, fieldDef.sort), | ||
as: sortArrayIndexField(model, channel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's weird that you're still using model
and channel
given that this is already under model.forEachFieldDef
.
I'd say sortArrayIndexField
should just accept a fieldDef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, how can we know what the channel the given fieldDef
represents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still leave channel
as a function param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long reply, since this function is also used in domain.ts
to look up the sort field, given that only model
and channel
is provided in that code, I think accepting model
and channel
here is reasonable, even though it seems redundant in model.forEachFieldDef
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. :)
Please:
site/docs/
yarn lint
andyarn test
. If your change affects Vega outputs of some examples, please runyarn build:example EXAMPLE_NAME
to re-compile a specific example oryarn build:examples
to re-compile all examples.)master
branch.#1
)Pro-Tip: https://medium.com/@greenberg/writing-pull-requests-your-coworkers-might-enjoy-reading-9d0307e93da3 is a nice article about writing a nice pull request.