Remove unit tests that measure some operation #8204

Akirathan · 2023-11-01T14:40:10Z

Recently I stepped into a failing test FAILED should perform Is_In efficiently for builtin types that measures Is_In operation and fails if it takes longer than 150 ms. These timing sensitive tests should not be part of our testing suite at all. If performance is important, let's keep it in benchmarks.

The text was updated successfully, but these errors were encountered:

jdunkerley · 2023-11-01T14:42:08Z

@radeusgd can we adjust this test to be a benchmark rather than a unit test.

radeusgd · 2023-11-02T08:48:41Z

fails if it takes longer than 150 ms

This is not entirely true - it measures performance relative to other operations - so it should work regardless of whether its run on a fast or slow machine. The tests have been there for a long time and were failing very rarely.

I think the benefit of this is that we get some kind of notification in case we regress on stuff like this. With benchmarks we don't have any kind of checking/notification, so unless we are tracking them regularly, we can very easily not notice such regressions - and since benchmarks are run only on already merged PRs we'd notice them after the regression is introduced anyway.

But I guess what is fair is that we have these tests for just a few cases and the performance of these operations is not really special in any way compared to all other operations.

Maybe we should try to send us a notification if a new benchmark run is slower by some threshold than the previous run? If we can keep track of this (without too many false positives), tests like this won't be needed indeed.

…ension (#8212) - Closes #5303 - Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`) - Adds benchmarks for join - Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks. - We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size). - Follow up ticket for consideration later: #8216 - Closes #8215 - After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task. - Closes #8204 - Unrelated, but I ran into this here: adds type checks to other arguments of `set`. - Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.

Akirathan added the -libs Libraries: New libraries to be implemented label Nov 1, 2023

Akirathan assigned jdunkerley Nov 1, 2023

github-project-automation bot added this to Issues Board Nov 1, 2023

github-project-automation bot moved this to ❓New in Issues Board Nov 1, 2023

jdunkerley assigned radeusgd and unassigned jdunkerley Nov 1, 2023

radeusgd moved this from ❓New to 🔧 Implementation in Issues Board Nov 2, 2023

radeusgd mentioned this issue Nov 2, 2023

Improve performance of Join_Condition.Between by sorting on one dimension #8212

Merged

5 tasks

radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board Nov 2, 2023

mergify bot closed this as completed in #8212 Nov 8, 2023

github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unit tests that measure some operation #8204

Remove unit tests that measure some operation #8204

Akirathan commented Nov 1, 2023

jdunkerley commented Nov 1, 2023

radeusgd commented Nov 2, 2023

Remove unit tests that measure some operation #8204

Remove unit tests that measure some operation #8204

Comments

Akirathan commented Nov 1, 2023

jdunkerley commented Nov 1, 2023

radeusgd commented Nov 2, 2023