-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky test detection #2083
Comments
the ability to rerun only failed tests is a feature FTL users have been asking for. :) Ideally
I don't think users want to rerun successful tests, that's expensive and slow. |
This is exactly how I will implement this feature. Each failed test recognized during test execution will be immediately dispatched to run on the first compatible idling device from the device pool.
I agree this is expensive while running and not really useful for most cases. But I guess the implementation effort will be small enough for making it a nice-to-have feature for finding flaky tests with a high-level passes ratio. |
Sounds good. When possible let's keep the semantics the same so users get the same behavior regardless of backend. |
Related to #2083 * Expose function for wrapping state into context in ContextProvider. * Add generic function for creating state property which static type checking. * Improve DeadlockError message. * Remove redundant data value from DependenciesError messages.
Related to #2083 Expose num-flaky-test-attempts CLI option in `:corellium:cli` module. ## Checklist - [x] Documented - [x] Unit tested
Related to #2083 ## Test Plan > How do we know the code works? From repository root, build flank: ``` . .env flankScripts assemble flank -d ``` Run tests: ``` flank corellium test android run -c="./test_configs/flank-corellium-many.yml" ``` The execution should finish without errors. The generated `JUnitReport.xml` should contain 2 types of suites prefixed with: * `shard` - for each shards execution. * `rerun` - for failed test reruns. ## New execution graph Core execution. ![TestAndroid.execute](http://www.plantuml.com/plantuml/proxy?cache=no&fmt=svg&src=https://raw.githubusercontent.com/Flank/flank/2083_test_dispatch_flow/corellium/domain/TestAndroid-execute.puml) Device sub-execution triggered for each shard or rerun by the `Device.Tests` task. ![TestAndroid.Device.execute](http://www.plantuml.com/plantuml/proxy?cache=no&fmt=svg&src=https://raw.githubusercontent.com/Flank/flank/2083_test_dispatch_flow/corellium/domain/TestAndroid_Device-execute.puml) ## Checklist - [x] Documented - [x] Unit tested
Motivation
Flaky tests detection is not supported by
am instrument
command, so it is required to build a custom solution for this problem.Goal
Flank is detecting flaky tests and reporting them in
JUnitReport.xml
Design
Flaky test detection requires running the test several times, if collected results are not homogeneous the test is flaky.
Flaky tests cannot be predicted on the sharding level, so depending on detection algorithm details, the overall execution time can take longer.
Drawback
Impact on overall duration is the biggest drawback of this feature because the execution time is the one of most important values of parallel test execution.
Options
num-flaky-test-attempts
: Rerun failed only - Rerunning only the failed tests can save some time but also have a smaller chance to detect flaky tests with a high level of success ratio.num-test-runs
: Rerun all - Depending on rerun count, this option will increase overall execution time at least twice but also is increasing the chance to detect flaky tests with a high level of success ratio.Devices
With option
Rerun failed only
the decision about rerunning a test can occur at any time during the execution. So is important to implement proper device management to help reduce the impact on the overall duration.Rerunning the test always on the same device in some cases could not be efficient, for example, when all tests in a shard become flaky, this can highly increase overall duration just because of one shard.
Preparation
The device needs to have the app and test apks installed before the test run. If tests will be dispatched dynamically this will require also dynamic apk installation, and also dynamic device creation. This could be solved using queues (channels) for events to dispatch and available devices.
NOTE: The apk upload and installation time could be also taken into account in sharding calculations.
The text was updated successfully, but these errors were encountered: