Flaky test detection #2083

jan-goral · 2021-07-12T09:07:58Z

Motivation

Flaky tests detection is not supported by am instrument command, so it is required to build a custom solution for this problem.

Goal

Flank is detecting flaky tests and reporting them in JUnitReport.xml

Design

Flaky test detection requires running the test several times, if collected results are not homogeneous the test is flaky.

Flaky tests cannot be predicted on the sharding level, so depending on detection algorithm details, the overall execution time can take longer.

Drawback

Impact on overall duration is the biggest drawback of this feature because the execution time is the one of most important values of parallel test execution.

Options

num-flaky-test-attempts: Rerun failed only - Rerunning only the failed tests can save some time but also have a smaller chance to detect flaky tests with a high level of success ratio.
num-test-runs: Rerun all - Depending on rerun count, this option will increase overall execution time at least twice but also is increasing the chance to detect flaky tests with a high level of success ratio.

Devices

With option Rerun failed only the decision about rerunning a test can occur at any time during the execution. So is important to implement proper device management to help reduce the impact on the overall duration.

Rerunning the test always on the same device in some cases could not be efficient, for example, when all tests in a shard become flaky, this can highly increase overall duration just because of one shard.

Preparation

The device needs to have the app and test apks installed before the test run. If tests will be dispatched dynamically this will require also dynamic apk installation, and also dynamic device creation. This could be solved using queues (channels) for events to dispatch and available devices.

NOTE: The apk upload and installation time could be also taken into account in sharding calculations.

The text was updated successfully, but these errors were encountered:

bootstraponline · 2021-07-13T03:07:53Z

Rerun failed only

the ability to rerun only failed tests is a feature FTL users have been asking for. :)

Ideally num-flaky-test-attempts on corellium would retry only the failed individual tests.

Test case based retries #778

Rerun all

I don't think users want to rerun successful tests, that's expensive and slow. num-test-runs is how we support running a test suite an arbitrary number of times to find problems (similar to load testing).

jan-goral · 2021-07-13T11:01:51Z

@bootstraponline

Rerun failed only

the ability to rerun only failed tests is a feature FTL users have been asking for. :)

Ideally num-flaky-test-attempts on Corellium would retry only the failed individual tests.
* [Test case based retries #778](https://github.com/Flank/flank/issues/778)

This is exactly how I will implement this feature. Each failed test recognized during test execution will be immediately dispatched to run on the first compatible idling device from the device pool.
Additionally, this feature could also provide an option for specifying the number of devices dedicated only for rerunning flaky tests which can speed up the execution for some cases.

Rerun all

I don't think users want to rerun successful tests, that's expensive and slow. num-test-runs is how we support running a test suite an arbitrary number of times to find problems (similar to load testing).

I agree this is expensive while running and not really useful for most cases. But I guess the implementation effort will be small enough for making it a nice-to-have feature for finding flaky tests with a high-level passes ratio.

bootstraponline · 2021-07-13T12:29:33Z

Sounds good. When possible let's keep the semantics the same so users get the same behavior regardless of backend.

Related to #2083 * Expose function for wrapping state into context in ContextProvider. * Add generic function for creating state property which static type checking. * Improve DeadlockError message. * Remove redundant data value from DependenciesError messages.

Related to #2083 Expose num-flaky-test-attempts CLI option in `:corellium:cli` module. ## Checklist - [x] Documented - [x] Unit tested

Related to #2083 ## Test Plan > How do we know the code works? From repository root, build flank: ``` . .env flankScripts assemble flank -d ``` Run tests: ``` flank corellium test android run -c="./test_configs/flank-corellium-many.yml" ``` The execution should finish without errors. The generated `JUnitReport.xml` should contain 2 types of suites prefixed with: * `shard` - for each shards execution. * `rerun` - for failed test reruns. ## New execution graph Core execution. ![TestAndroid.execute](http://www.plantuml.com/plantuml/proxy?cache=no&fmt=svg&src=https://raw.githubusercontent.com/Flank/flank/2083_test_dispatch_flow/corellium/domain/TestAndroid-execute.puml) Device sub-execution triggered for each shard or rerun by the `Device.Tests` task. ![TestAndroid.Device.execute](http://www.plantuml.com/plantuml/proxy?cache=no&fmt=svg&src=https://raw.githubusercontent.com/Flank/flank/2083_test_dispatch_flow/corellium/domain/TestAndroid_Device-execute.puml) ## Checklist - [x] Documented - [x] Unit tested

jan-goral added Feature Corellium labels Jul 12, 2021

jan-goral self-assigned this Jul 19, 2021

jan-goral changed the title ~~Flaky test detection (DRAFT)~~ Flaky test detection Jul 19, 2021

This was referenced Jul 20, 2021

feat: Rerun failed tests #2092

Merged

feat: Add num-flaky-test-attempts CLI option #2096

Merged

fix: Incorrect calculations in JUnit tool #2097

Merged

feat: Improvements for :tool:execution:parallel #2098

Merged

mergify bot pushed a commit that referenced this issue Jul 23, 2021

feat: Add num-flaky-test-attempts CLI option (#2096)

08b2105

Related to #2083 Expose num-flaky-test-attempts CLI option in `:corellium:cli` module. ## Checklist - [x] Documented - [x] Unit tested

jan-goral mentioned this issue Jul 26, 2021

feat: Calculate flaky tests from reruns #2105

Closed

3 tasks

jan-goral mentioned this issue Aug 5, 2021

feat: Calculate flaky tests from reruns #2121

Merged

3 tasks

pawelpasterz closed this as completed in #2121 Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test detection #2083

Flaky test detection #2083

jan-goral commented Jul 12, 2021 •

edited

Loading

bootstraponline commented Jul 13, 2021

jan-goral commented Jul 13, 2021

bootstraponline commented Jul 13, 2021

Flaky test detection #2083

Flaky test detection #2083

Comments

jan-goral commented Jul 12, 2021 • edited Loading

Motivation

Goal

Design

Drawback

Options

Devices

Preparation

bootstraponline commented Jul 13, 2021

jan-goral commented Jul 13, 2021

bootstraponline commented Jul 13, 2021

jan-goral commented Jul 12, 2021 •

edited

Loading