-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unit tests that measure some operation #8204
Comments
@radeusgd can we adjust this test to be a benchmark rather than a unit test. |
This is not entirely true - it measures performance relative to other operations - so it should work regardless of whether its run on a fast or slow machine. The tests have been there for a long time and were failing very rarely. I think the benefit of this is that we get some kind of notification in case we regress on stuff like this. With benchmarks we don't have any kind of checking/notification, so unless we are tracking them regularly, we can very easily not notice such regressions - and since benchmarks are run only on already merged PRs we'd notice them after the regression is introduced anyway. But I guess what is fair is that we have these tests for just a few cases and the performance of these operations is not really special in any way compared to all other operations. Maybe we should try to send us a notification if a new benchmark run is slower by some threshold than the previous run? If we can keep track of this (without too many false positives), tests like this won't be needed indeed. |
…ension (#8212) - Closes #5303 - Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`) - Adds benchmarks for join - Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks. - We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size). - Follow up ticket for consideration later: #8216 - Closes #8215 - After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task. - Closes #8204 - Unrelated, but I ran into this here: adds type checks to other arguments of `set`. - Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.
Recently I stepped into a failing test FAILED should perform Is_In efficiently for builtin types that measures
Is_In
operation and fails if it takes longer than 150 ms. These timing sensitive tests should not be part of our testing suite at all. If performance is important, let's keep it in benchmarks.The text was updated successfully, but these errors were encountered: