Axis autotype update #3070

alexcjohnson · 2018-10-03T18:09:35Z

Fixes #3039 - treat boolean as category data in autotype

Fixes #2473, fixes #1413, by counting only distinct values while determining date and category autotype, as discussed in #2473 (comment). This way we don't need to include any explicit "missing" values; data with up to 2 non-numeric values for every numeric value will still be interpreted as numbers, but even a single non-numeric string (including the previous special values 'None' and '') will be interpreted as a valid category if there are no numbers present.

Note that as part of ^^ I also converted the date determination to counting distinct values. Nobody had complained about this part, but if I didn't do that there could be some strange cases where you have only date strings and numbers but our result is 'category' 🤔

There's probably a way to improve performance (and make it clearer that we can't generate such strange results) by combining moreDates and category into one loop... but I don't think that's a major cost anymore at least since we switched to testing max 1000 points.

cc @etpinard @antoinerg @archmoj

etpinard · 2018-10-03T20:12:45Z

test/jasmine/tests/axes_test.js

+                    y: [0, 'None', true, 'None', 'None', 'None', 'None', 'None']
+                });
+                checkTypes('linear', 'linear');
+            });


Beautiful! This test case here is enough to convince me this PR is non-breaking.

etpinard · 2018-10-03T20:22:27Z

There's probably a way to improve performance (and make it clearer that we can't generate such strange results) by combining moreDates and category into one loop... but I don't think that's a major cost anymore at least since we switched to testing max 1000 points.

Hmm. I would be nice to check with a large splom (e.g. https://codepen.io/etpinard/pen/wjmqmO). You're right, looping over <1000 points shouldn't be too slow, but looping over <1000 pts once per axis ~100 times could have an impact. Related: #2549

Note: I'll check this myself off my latest splom commits.

etpinard · 2018-10-03T20:36:09Z

Note: I'll check this myself off my latest splom commits.

The results are in: these commits here don't make much of a difference in axis-autotype perf.

But I never realized how much time large splom traces spend in axis_autotype. In https://codepen.io/etpinard/pen/wjmqmO, we spend roughly 40ms in axis_autotype ! I'll add a comment about this in #2549. No need to handle this now of course.

💃

alexcjohnson added 2 commits October 3, 2018 09:58

autotype booleans as categories

7176e7a

count *distinct* values for category and date axis autotype

8d446be

alexcjohnson added bug something broken status: reviewable labels Oct 3, 2018

alexcjohnson added this to the v1.42.0 milestone Oct 3, 2018

alexcjohnson mentioned this pull request Oct 3, 2018

Cant print 'None' string in x axis in Scatter plot if first value #1413

Closed

etpinard reviewed Oct 3, 2018

View reviewed changes

etpinard mentioned this pull request Oct 3, 2018

Speed up cartesian axes supplyDefaults #2549

Closed

alexcjohnson merged commit 48209a0 into master Oct 3, 2018

alexcjohnson deleted the bool-cats branch October 3, 2018 20:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Axis autotype update #3070

Axis autotype update #3070

alexcjohnson commented Oct 3, 2018

etpinard Oct 3, 2018 •

edited

Loading

etpinard commented Oct 3, 2018

etpinard commented Oct 3, 2018

Axis autotype update #3070

Axis autotype update #3070

Conversation

alexcjohnson commented Oct 3, 2018

etpinard Oct 3, 2018 • edited Loading

Choose a reason for hiding this comment

etpinard commented Oct 3, 2018

etpinard commented Oct 3, 2018

etpinard Oct 3, 2018 •

edited

Loading