Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #3039 - treat
boolean
as category data in autotypeFixes #2473, fixes #1413, by counting only distinct values while determining date and category autotype, as discussed in #2473 (comment). This way we don't need to include any explicit "missing" values; data with up to 2 non-numeric values for every numeric value will still be interpreted as numbers, but even a single non-numeric string (including the previous special values
'None'
and''
) will be interpreted as a valid category if there are no numbers present.Note that as part of ^^ I also converted the date determination to counting distinct values. Nobody had complained about this part, but if I didn't do that there could be some strange cases where you have only date strings and numbers but our result is
'category'
🤔There's probably a way to improve performance (and make it clearer that we can't generate such strange results) by combining
moreDates
andcategory
into one loop... but I don't think that's a major cost anymore at least since we switched to testing max 1000 points.cc @etpinard @antoinerg @archmoj