Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unit tests that measure some operation #8204

Closed
Akirathan opened this issue Nov 1, 2023 · 2 comments · Fixed by #8212
Closed

Remove unit tests that measure some operation #8204

Akirathan opened this issue Nov 1, 2023 · 2 comments · Fixed by #8212
Assignees
Labels
-libs Libraries: New libraries to be implemented

Comments

@Akirathan
Copy link
Member

Recently I stepped into a failing test FAILED should perform Is_In efficiently for builtin types that measures Is_In operation and fails if it takes longer than 150 ms. These timing sensitive tests should not be part of our testing suite at all. If performance is important, let's keep it in benchmarks.

@Akirathan Akirathan added the -libs Libraries: New libraries to be implemented label Nov 1, 2023
@github-project-automation github-project-automation bot moved this to ❓New in Issues Board Nov 1, 2023
@jdunkerley jdunkerley assigned radeusgd and unassigned jdunkerley Nov 1, 2023
@jdunkerley
Copy link
Member

@radeusgd can we adjust this test to be a benchmark rather than a unit test.

@radeusgd
Copy link
Member

radeusgd commented Nov 2, 2023

fails if it takes longer than 150 ms

This is not entirely true - it measures performance relative to other operations - so it should work regardless of whether its run on a fast or slow machine. The tests have been there for a long time and were failing very rarely.

I think the benefit of this is that we get some kind of notification in case we regress on stuff like this. With benchmarks we don't have any kind of checking/notification, so unless we are tracking them regularly, we can very easily not notice such regressions - and since benchmarks are run only on already merged PRs we'd notice them after the regression is introduced anyway.

But I guess what is fair is that we have these tests for just a few cases and the performance of these operations is not really special in any way compared to all other operations.

Maybe we should try to send us a notification if a new benchmark run is slower by some threshold than the previous run? If we can keep track of this (without too many false positives), tests like this won't be needed indeed.

@radeusgd radeusgd moved this from ❓New to 🔧 Implementation in Issues Board Nov 2, 2023
@radeusgd radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board Nov 2, 2023
@mergify mergify bot closed this as completed in #8212 Nov 8, 2023
mergify bot pushed a commit that referenced this issue Nov 8, 2023
…ension (#8212)

- Closes #5303
- Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`)
- Adds benchmarks for join
- Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks.
- We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size).
- Follow up ticket for consideration later:
#8216
- Closes #8215
- After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task.
- Closes #8204
- Unrelated, but I ran into this here: adds type checks to other arguments of `set`.
- Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants