Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Run CI tests with injected OOMs #7813

Closed
abellina opened this issue Feb 23, 2023 · 0 comments · Fixed by #7925
Closed

[FEA] Run CI tests with injected OOMs #7813

abellina opened this issue Feb 23, 2023 · 0 comments · Fixed by #7925
Assignees
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@abellina
Copy link
Collaborator

The idea behind this task is to leverage our python integration tests to perform retries likely randomly but could be configured to happen deterministically. A retried test that passes CI (where the output is the same as the CPU) is very valuable and much better at stressing this code out, with the alternative being replicated unit tests that touch a narrower scope.

I have prototyped this locally, but can't really PR it yet because I need some hooks from RmmSpark and thread association/disassociation that @revans2 is plumbing with the retry framework already. I am thinking about this in the context of code using withRetry (#7256). This gives us a natural entry for a config to be read that allows us to selectively inject the retry exceptions. Code outside withRetry doesn't have a natural entry points that I can think of, and it wouldn't know how to retry.

We need the tests to be very loud about when they are getting an injected OOM. What I have prototyped adds "INJECT_OOM" in the test name, so it should be really easy to figure out that a failure is likely related to OOM handling.

../../src/main/python/hash_aggregate_test.py::test_hash_reduction_decimal_overflow_sum[32][INJECT_OOM]

vs

../../src/main/python/hash_aggregate_test.py::test_hash_reduction_decimal_overflow_sum[33]
@abellina abellina added feature request New feature or request ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Feb 23, 2023
@sameerz sameerz removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants