-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: SQL Lab sorting of non-numbers #18006
fix: SQL Lab sorting of non-numbers #18006
Conversation
27549a7
to
4fb1780
Compare
Codecov Report
@@ Coverage Diff @@
## master #18006 +/- ##
=======================================
Coverage 66.32% 66.32%
=======================================
Files 1569 1569
Lines 61658 61661 +3
Branches 6232 6233 +1
=======================================
+ Hits 40892 40895 +3
Misses 19167 19167
Partials 1599 1599
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a small suggestion.
Did you check whether the performance is acceptable when table has the maximum number of rows? I'd imagine regex match for every row to be quite expensive. In regards to the RegEx used, it's probably more important what the database returns other than what JavaScript accepts. For example, some database may return Infinity
as Inf
.
I think in long run we would want to apply different sorting function based on column types. We should pass down the column type prop to this component and apply number parsing only to numeric columns. We may also want to migrate this table to react-table as well. It has a sortType configuration we can use for free.
// exponential notation, NaN, and Infinity. | ||
// See https://stackoverflow.com/a/30987109 for more details | ||
const onlyNumberRegEx = | ||
/^(NaN|-?((\d*\.\d+|\d+)([Ee][+-]?\d+)?|Infinity))$/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this to a global constant or create a helper function?
@ktmud I did not check yet, although aren't regexes pretty fast usually? I suppose i could do a faster check first before the more expensive regex, but my thought was that the As for matching what databases may return, I suppose i could try to handle everything every database returns, but i figured having something consistent with a single language would be worthwhile. If this was breaking for you with a specific database, then you could always write So in summary, i'll test perf and if that looks good, will merge (with the moving of the regex to a global constant, sorry for not doing that in the first place) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching/fixing this issue @etr2460!
/testenv up |
@etr2460 Ephemeral environment spinning up at http://54.69.38.190:8080. Credentials are |
verified that 10k rows still sort pretty much instantly, both for numeric and non numeric columns |
4fb1780
to
2159734
Compare
Ephemeral environment shutdown and build artifacts deleted. |
SUMMARY
#17573 fixed the sorting of floating point numbers, but broke sorting strings that start with numbers. The most common of these are dses like
2022-01-01
, andparseFloat("2022-01-01") === 2022
, so sorting got all janky.I've fixed this by ensuring that we only try to parse numbers if they pass a regex that matches JS's format of numbers.
Note that this solution isn't 100% accurate, and will still cause undefined behavior when sorting a list containing both numeric and non numeric strings. I don't have a good answer for fixing this, but perhaps in the future we'll need to refactor this further. I'd imagine a per column parameter that defines the sorting function (convert to number and then sort, do string sort, sort in hexadecimal because all the strings start with
0x
and contain only0-f
, something else) but that was very much out of scope of this bug fix.TESTING INSTRUCTIONS
New unit test (and ensure the unit test added in #17573 still works)
ADDITIONAL INFORMATION
to: @lyndsiWilliams @eschutho @graceguo-supercat @ktmud