[FEA] Run CI tests with injected OOMs #7813

abellina · 2023-02-23T23:05:38Z

The idea behind this task is to leverage our python integration tests to perform retries likely randomly but could be configured to happen deterministically. A retried test that passes CI (where the output is the same as the CPU) is very valuable and much better at stressing this code out, with the alternative being replicated unit tests that touch a narrower scope.

I have prototyped this locally, but can't really PR it yet because I need some hooks from RmmSpark and thread association/disassociation that @revans2 is plumbing with the retry framework already. I am thinking about this in the context of code using withRetry (#7256). This gives us a natural entry for a config to be read that allows us to selectively inject the retry exceptions. Code outside withRetry doesn't have a natural entry points that I can think of, and it wouldn't know how to retry.

We need the tests to be very loud about when they are getting an injected OOM. What I have prototyped adds "INJECT_OOM" in the test name, so it should be really easy to figure out that a failure is likely related to OOM handling.

../../src/main/python/hash_aggregate_test.py::test_hash_reduction_decimal_overflow_sum[32][INJECT_OOM]

vs

../../src/main/python/hash_aggregate_test.py::test_hash_reduction_decimal_overflow_sum[33]

The text was updated successfully, but these errors were encountered:

abellina added feature request New feature or request ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Feb 23, 2023

abellina mentioned this issue Feb 23, 2023

[FEA] Avoid memory over usage on GPU nodes in the SparkPlan #7252

Closed

7 tasks

sameerz removed feature request New feature or request ? - Needs Triage Need team to review and classify labels Feb 28, 2023

mattahrens assigned abellina Mar 10, 2023

abellina mentioned this issue Mar 22, 2023

Inject RetryOOM in CI where retry iterator is used #7925

Merged

abellina closed this as completed in #7925 Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Run CI tests with injected OOMs #7813

[FEA] Run CI tests with injected OOMs #7813

abellina commented Feb 23, 2023

[FEA] Run CI tests with injected OOMs #7813

[FEA] Run CI tests with injected OOMs #7813

Comments

abellina commented Feb 23, 2023