You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The idea behind this task is to leverage our python integration tests to perform retries likely randomly but could be configured to happen deterministically. A retried test that passes CI (where the output is the same as the CPU) is very valuable and much better at stressing this code out, with the alternative being replicated unit tests that touch a narrower scope.
I have prototyped this locally, but can't really PR it yet because I need some hooks from RmmSpark and thread association/disassociation that @revans2 is plumbing with the retry framework already. I am thinking about this in the context of code using withRetry (#7256). This gives us a natural entry for a config to be read that allows us to selectively inject the retry exceptions. Code outside withRetry doesn't have a natural entry points that I can think of, and it wouldn't know how to retry.
We need the tests to be very loud about when they are getting an injected OOM. What I have prototyped adds "INJECT_OOM" in the test name, so it should be really easy to figure out that a failure is likely related to OOM handling.
The idea behind this task is to leverage our python integration tests to perform retries likely randomly but could be configured to happen deterministically. A retried test that passes CI (where the output is the same as the CPU) is very valuable and much better at stressing this code out, with the alternative being replicated unit tests that touch a narrower scope.
I have prototyped this locally, but can't really PR it yet because I need some hooks from
RmmSpark
and thread association/disassociation that @revans2 is plumbing with the retry framework already. I am thinking about this in the context of code usingwithRetry
(#7256). This gives us a natural entry for a config to be read that allows us to selectively inject the retry exceptions. Code outsidewithRetry
doesn't have a natural entry points that I can think of, and it wouldn't know how to retry.We need the tests to be very loud about when they are getting an injected OOM. What I have prototyped adds "INJECT_OOM" in the test name, so it should be really easy to figure out that a failure is likely related to OOM handling.
vs
The text was updated successfully, but these errors were encountered: