Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Axis autotype update #3070

Merged
merged 2 commits into from
Oct 3, 2018
Merged

Axis autotype update #3070

merged 2 commits into from
Oct 3, 2018

Conversation

alexcjohnson
Copy link
Collaborator

Fixes #3039 - treat boolean as category data in autotype

Fixes #2473, fixes #1413, by counting only distinct values while determining date and category autotype, as discussed in #2473 (comment). This way we don't need to include any explicit "missing" values; data with up to 2 non-numeric values for every numeric value will still be interpreted as numbers, but even a single non-numeric string (including the previous special values 'None' and '') will be interpreted as a valid category if there are no numbers present.

Note that as part of ^^ I also converted the date determination to counting distinct values. Nobody had complained about this part, but if I didn't do that there could be some strange cases where you have only date strings and numbers but our result is 'category' 🤔

There's probably a way to improve performance (and make it clearer that we can't generate such strange results) by combining moreDates and category into one loop... but I don't think that's a major cost anymore at least since we switched to testing max 1000 points.

cc @etpinard @antoinerg @archmoj

y: [0, 'None', true, 'None', 'None', 'None', 'None', 'None']
});
checkTypes('linear', 'linear');
});
Copy link
Contributor

@etpinard etpinard Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! This test case here is enough to convince me this PR is non-breaking.

@etpinard
Copy link
Contributor

etpinard commented Oct 3, 2018

There's probably a way to improve performance (and make it clearer that we can't generate such strange results) by combining moreDates and category into one loop... but I don't think that's a major cost anymore at least since we switched to testing max 1000 points.

Hmm. I would be nice to check with a large splom (e.g. https://codepen.io/etpinard/pen/wjmqmO). You're right, looping over <1000 points shouldn't be too slow, but looping over <1000 pts once per axis ~100 times could have an impact. Related: #2549

Note: I'll check this myself off my latest splom commits.

@etpinard
Copy link
Contributor

etpinard commented Oct 3, 2018

Note: I'll check this myself off my latest splom commits.

The results are in: these commits here don't make much of a difference in axis-autotype perf.

But I never realized how much time large splom traces spend in axis_autotype. In https://codepen.io/etpinard/pen/wjmqmO, we spend roughly 40ms in axis_autotype ! I'll add a comment about this in #2549. No need to handle this now of course.

💃

@alexcjohnson alexcjohnson merged commit 48209a0 into master Oct 3, 2018
@alexcjohnson alexcjohnson deleted the bool-cats branch October 3, 2018 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken
Projects
None yet
2 participants