Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test detection #2083

Closed
jan-goral opened this issue Jul 12, 2021 · 3 comments · Fixed by #2121
Closed

Flaky test detection #2083

jan-goral opened this issue Jul 12, 2021 · 3 comments · Fixed by #2121
Assignees

Comments

@jan-goral
Copy link
Contributor

jan-goral commented Jul 12, 2021

Motivation

Flaky tests detection is not supported by am instrument command, so it is required to build a custom solution for this problem.

Goal

Flank is detecting flaky tests and reporting them in JUnitReport.xml

Design

Flaky test detection requires running the test several times, if collected results are not homogeneous the test is flaky.

Flaky tests cannot be predicted on the sharding level, so depending on detection algorithm details, the overall execution time can take longer.

Drawback

Impact on overall duration is the biggest drawback of this feature because the execution time is the one of most important values of parallel test execution.

Options

  • num-flaky-test-attempts: Rerun failed only - Rerunning only the failed tests can save some time but also have a smaller chance to detect flaky tests with a high level of success ratio.
  • num-test-runs: Rerun all - Depending on rerun count, this option will increase overall execution time at least twice but also is increasing the chance to detect flaky tests with a high level of success ratio.

Devices

With option Rerun failed only the decision about rerunning a test can occur at any time during the execution. So is important to implement proper device management to help reduce the impact on the overall duration.

Rerunning the test always on the same device in some cases could not be efficient, for example, when all tests in a shard become flaky, this can highly increase overall duration just because of one shard.

Preparation

The device needs to have the app and test apks installed before the test run. If tests will be dispatched dynamically this will require also dynamic apk installation, and also dynamic device creation. This could be solved using queues (channels) for events to dispatch and available devices.

NOTE: The apk upload and installation time could be also taken into account in sharding calculations.

@bootstraponline
Copy link
Contributor

  • Rerun failed only

the ability to rerun only failed tests is a feature FTL users have been asking for. :)

Ideally num-flaky-test-attempts on corellium would retry only the failed individual tests.

Rerun all

I don't think users want to rerun successful tests, that's expensive and slow. num-test-runs is how we support running a test suite an arbitrary number of times to find problems (similar to load testing).

@jan-goral
Copy link
Contributor Author

@bootstraponline

  • Rerun failed only

the ability to rerun only failed tests is a feature FTL users have been asking for. :)

Ideally num-flaky-test-attempts on Corellium would retry only the failed individual tests.

* [Test case based retries #778](https://github.com/Flank/flank/issues/778)

This is exactly how I will implement this feature. Each failed test recognized during test execution will be immediately dispatched to run on the first compatible idling device from the device pool.
Additionally, this feature could also provide an option for specifying the number of devices dedicated only for rerunning flaky tests which can speed up the execution for some cases.

Rerun all

I don't think users want to rerun successful tests, that's expensive and slow. num-test-runs is how we support running a test suite an arbitrary number of times to find problems (similar to load testing).

I agree this is expensive while running and not really useful for most cases. But I guess the implementation effort will be small enough for making it a nice-to-have feature for finding flaky tests with a high-level passes ratio.

@bootstraponline
Copy link
Contributor

Sounds good. When possible let's keep the semantics the same so users get the same behavior regardless of backend.

@jan-goral jan-goral self-assigned this Jul 19, 2021
@jan-goral jan-goral changed the title Flaky test detection (DRAFT) Flaky test detection Jul 19, 2021
jan-goral added a commit that referenced this issue Jul 23, 2021
Related to #2083

* Expose function for wrapping state into context in ContextProvider.
* Add generic function for creating state property which static type checking.
* Improve DeadlockError message.
* Remove redundant data value from DependenciesError messages.
mergify bot pushed a commit that referenced this issue Jul 23, 2021
Related to #2083 

Expose num-flaky-test-attempts CLI option in `:corellium:cli` module.

## Checklist

- [x] Documented
- [x] Unit tested
mergify bot pushed a commit that referenced this issue Jul 30, 2021
Related to #2083

## Test Plan
> How do we know the code works?

From repository root, build flank:
```
. .env
flankScripts assemble flank -d
```
Run tests:
```
flank corellium test android run -c="./test_configs/flank-corellium-many.yml"
```
The execution should finish without errors. The generated `JUnitReport.xml` should contain 2 types of suites prefixed with:
* `shard` - for each shards execution.
* `rerun` - for failed test reruns. 

## New execution graph

Core execution.

![TestAndroid.execute](http://www.plantuml.com/plantuml/proxy?cache=no&fmt=svg&src=https://raw.githubusercontent.com/Flank/flank/2083_test_dispatch_flow/corellium/domain/TestAndroid-execute.puml)

Device sub-execution triggered for each shard or rerun by the `Device.Tests` task.

![TestAndroid.Device.execute](http://www.plantuml.com/plantuml/proxy?cache=no&fmt=svg&src=https://raw.githubusercontent.com/Flank/flank/2083_test_dispatch_flow/corellium/domain/TestAndroid_Device-execute.puml)


## Checklist

- [x] Documented
- [x] Unit tested
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants