KAFKA-18223 Flaky test report script #17938

santhoshct · 2024-11-25T10:37:58Z

Summary

This pull request introduces a new script, develocity_reports.py, designed to enhance our detailed reports on flaky tests. It leverages the Develocity API to fetch and analyze test results, focusing on identifying and reporting quarantined tests with high failure rates. The script is intended to help developers/CI quickly identify problematic tests that require attention, thereby improving the overall quality of our codebase.

Changes

New Script Addition: Introduced develocity_reports.py to the .github/scripts directory.
Functionality:
- Fetches test results from the Develocity API for a specified project and test type.
- Analyzes test outcomes to identify flaky and failed tests.
- Generates reports highlighting high-priority quarantined tests based on failure rates and quarantine duration.
- Provides detailed timelines and statistics for each test and test case.
Logging: Integrated logging to track the script's execution and handle exceptions gracefully.
Configuration: Allows configuration of API base URL, authentication token, project name, and thresholds for quarantine and failure rates.
Output: Produces a console report summarizing the most problematic tests, including detailed statistics and recent execution timelines.

###Updates

Added support for two more reporting types - identify flaky test regressions, clear tests from quarantine.
Added support for caching build info in github action cache. This will speed up the report generation without pulling build info of the entire date range everytime. It will only pull the delta date range.

Testing

Manual tested. Example output would be like this.

org.apache.kafka.tiered.storage.integration.OffloadAndTxnConsumeFromLeaderTest
==============================================================================
Quarantined for 14 days
Container Failure Rate: 14.51%
Recent Failure Rate: 14.51%

Container Statistics:
  Total Runs: 1013
  Failed: 4
  Flaky: 143
  Passed: 866

Container Recent Executions:
  Date/Time (UTC)      Outcome    Build ID
  ------------------------------------------------
  2024-11-25 02:51  passed     5kqy57pu3uwxs
  2024-11-25 03:18  passed     rgjbtcmlfk7so
  2024-11-25 03:18  passed     hhko6esalsqco
  2024-11-25 03:30  passed     i35nqmpusibpw
  2024-11-25 03:31  flaky      jddx23jdksg5m

Test Cases (Last 7 Days):
  ------------------------------------------------

  → executeTieredStorageTest(String, String)[1]
    Failure Rate: 10.92%
    Runs: 476 | Failed:   0 | Flaky:  52 | Passed: 424

    Recent Executions:
    Date/Time (UTC)      Outcome    Build ID
    --------------------------------------------
    2024-11-25 03:18  passed     hhko6esalsqco
    2024-11-25 03:30  passed     i35nqmpusibpw
    2024-11-25 03:31  flaky      jddx23jdksg5m

  → executeTieredStorageTest(String, String)[2]
    Failure Rate: 5.25%
    Runs: 476 | Failed:   0 | Flaky:  25 | Passed: 451

    Recent Executions:
    Date/Time (UTC)      Outcome    Build ID
    --------------------------------------------
    2024-11-25 03:18  passed     hhko6esalsqco
    2024-11-25 03:30  passed     i35nqmpusibpw
    2024-11-25 03:31  passed     jddx23jdksg5m

Testing for the new report types.

Summary for PR:
==============

1. Flaky Test Regressions
-------------------------
No flaky test regressions found.

2. Cleared Tests (Ready for Unquarantine)
----------------------------------------
Several tests show consistent passing behavior:
- org.apache.kafka.clients.producer.KafkaProducerTest (99.02% success, 410 runs)
- kafka.network.DynamicConnectionQuotaTest (98.58% success, 422 runs)
- kafka.api.SslConsumerTest (98.82% success, 422 runs)
- kafka.api.SaslSslConsumerTest (99.05% success, 422 runs)
- org.apache.kafka.connect.integration.OffsetsApiIntegrationTest (84.60% success, 422 runs)

3. Quarantined Tests Analysis
----------------------------
Test: org.apache.kafka.tiered.storage.integration.OffloadAndTxnConsumeFromLeaderTest

Key Metrics:
- Quarantined for: 7 days
- Overall Failure Rate: 14.21%
- Total Runs: 366 (Failed: 4, Flaky: 48, Passed: 314)

Test Cases Analysis:
1. executeTieredStorageTest[1]:
   - Failure Rate: 11.20%
   - Distribution: 366 runs (2 Failed, 39 Flaky, 325 Passed)

2. executeTieredStorageTest[2]: 
   - Failure Rate: 6.28%
   - Distribution: 366 runs (0 Failed, 23 Flaky, 343 Passed)

Detailed logs and complete test history are available in the attached report file.

Test Analysis Report (2024-12-03 08:23:37 UTC).txt

Updated report with Test Report summary section:

Test Analysis Report (2024-12-11 11:47:27 UTC)
====================================================================================================

Summary of Most Problematic Tests
==================================================

org.apache.kafka.clients.consumer.internals.ConsumerHeartbeatRequestManagerTest
  → testUnsupportedVersion()                                     100.00%

org.apache.kafka.connect.integration.OffsetsApiIntegrationTest
  → testGetSinkConnectorOffsets()                                50.00%
  → testAlterSinkConnectorOffsetsDifferentKafkaClusterTargeted() 9.93%
  → testGetSinkConnectorOffsetsDifferentKafkaClusterTargeted()   6.14%
  → testResetSinkConnectorOffsets()                              5.23%
  → testResetSinkConnectorOffsetsOverriddenConsumerGroupId()     5.05%
  → testAlterSinkConnectorOffsetsOverriddenConsumerGroupId()     0.72%

org.apache.kafka.tiered.storage.integration.TransactionsWithTieredStoreTest
  → testReadCommittedConsumerShouldNotSeeUndecidedData(String, String)[2] 29.96%
  → testBumpTransactionalEpochWithTV2Enabled(String, String, boolean)[1] 22.43%
  → testBumpTransactionalEpochWithTV2Enabled(String, String, boolean)[2] 20.40%
  → testBumpTransactionalEpochWithTV2Disabled(String, String, boolean)[1] 17.69%

kafka.api.TransactionsTest
  → testBumpTransactionalEpochWithTV2Enabled(String, String, boolean)[1] 22.43%
  → testBumpTransactionalEpochWithTV2Enabled(String, String, boolean)[2] 20.04%
  → testBumpTransactionalEpochWithTV2Disabled(String, String, boolean)[1] 9.39%

kafka.api.PlaintextConsumerTest
  → testCloseLeavesGroupOnInterrupt(String, String)[2]           22.38%
  → testCoordinatorFailover(String, String)[2]                   2.17%
  → testCloseLeavesGroupOnInterrupt(String, String)[1]           1.62%
  → testCoordinatorFailover(String, String)[1]                   0.90%

kafka.coordinator.group.CoordinatorPartitionWriterTest
  → testDeleteRecordsResponseContainsError()                     14.29%
  → testDeleteRecordsSuccess()                                   14.29%

==================================================

Detailed Test Reports
====================================================================================================

Flaky Test Regressions
--------------------------------------------------
No flaky test regressions found.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…tive tests

…ded support for reporting types flaky test regression and clear tests from quarantine

mumrah

@santhoshct thanks for working on this! This is an excellent start 👍

The report is very detailed, which is great, but can we also include a summary at the top? For example, in the report I just ran it would be great to see something like this for the worst flaky tests:

org.apache.kafka.message.checker.MetadataSchemaCheckerToolTest
  → testVerifyEvolutionGit()  83.33%

org.apache.kafka.tiered.storage.integration.TransactionsWithTieredStoreTest
  → testReadCommittedConsumerShouldNotSeeUndecidedData(String, String)[2] 46.17%
  → testBumpTransactionalEpochWithTV2Enabled(String, String, boolean)[1]  ...
  → testBumpTransactionalEpochWithTV2Enabled(String, String, boolean)[2]  ...
  → testBumpTransactionalEpochWithTV2Disabled(String, String, boolean)[1] ...
  → testBumpTransactionalEpochWithTV2Disabled(String, String, boolean)[2] ...

kafka.api.TransactionsTest
  → testBumpTransactionalEpochWithTV2Enabled(String, String, boolean)[1]  26.09%
  → testBumpTransactionalEpochWithTV2Enabled(String, String, boolean)[2]  ... 
  → testBumpTransactionalEpochWithTV2Disabled(String, String, boolean)[1] ...

mumrah · 2024-12-04T19:57:39Z

.github/scripts/develocity_reports.py

@@ -0,0 +1,863 @@
+import os


Need a license here. See other scripts for example

Added the license

mumrah · 2024-12-04T19:57:54Z

.github/scripts/requirements.txt

@@ -1,3 +1,4 @@
+<<<<<<< HEAD


Looks like leftovers from a merge conflict

mumrah · 2024-12-04T20:01:05Z

.github/scripts/develocity_reports.py

+        """
+        return f'project:{project} buildStartTime:[{chunk_start.isoformat()} TO {chunk_end.isoformat()}] gradle.requestedTasks:{test_type}'
+
+    def process_chunk(self, chunk_start: datetime, chunk_end: datetime, project: str, 


For this and other methods with many arguments, use the following PEP-8 style:

def process_chunk( self, chunk_start: datetime, chunk_end: datetime, project: str, test_type: str, remaining_build_ids: set, max_builds_per_request: int) -> Dict[str, BuildInfo]:

Corrected this.

mumrah · 2024-12-04T20:09:48Z

.github/scripts/develocity_reports.py

+                reverse=True
+            )
+
+            print(f"\nFound {len(sorted_tests)} high-priority quarantined test containers:")


The count here should be the number of flaky test cases rather than flaky test classes (containers).

Also, we should not use the term "container" in the report since it's kind of confusion. As far as I know, for our purposes a container is always a test class.

Corrected this to test class.

mumrah · 2024-12-04T20:11:25Z

.github/scripts/develocity_reports.py

+
+                        # Show test case timeline
+                        if test_case.timeline:
+                            print("\n    Recent Executions:")


For "Recent" things, let's indicate how far back we're showing in the output.

Added info about the runs.

2. Corrected the method signature to pep 8 style. 3. Added license file to the script 4. Corrected the develocity specific term "container" to more generic test classes. 5. Added more info to the recent executions to make it more descriptive.

mumrah

@santhoshct I've run it locally and the output looks great! I'm going to go ahead and merge this so we can let people start trying it out.

Adds a python script to generate a detailed flaky test report using the Develocity API Reviewers: David Arthur <[email protected]>

santhoshct added 5 commits November 25, 2024 15:49

added script to generate reports for flaky tests. starting with defec…

68c3f9f

…tive tests

removed unnecessary comments

9f57613

added test timeline for priority quarantined tests

5b13055

added test case level details with timeline

5a566c5

changed log level to info

24d8c4d

github-actions bot added the build Gradle build or GitHub Actions label Nov 25, 2024

added support for local cache and github cache to pull build info. ad…

712ae27

…ded support for reporting types flaky test regression and clear tests from quarantine

mumrah added the ci-approved label Dec 4, 2024

mumrah reviewed Dec 4, 2024

View reviewed changes

mumrah changed the title ~~KIP 1090 - Reporting integration with Develocity API~~ KAFKA-18223 Flaky test report script Dec 12, 2024

mumrah approved these changes Dec 12, 2024

View reviewed changes

mumrah merged commit 5bb1ea4 into apache:trunk Dec 12, 2024
15 checks passed

mumrah mentioned this pull request Dec 13, 2024

KAFKA-18223 Add GHA to run report [2/n] #18170

Merged

tedyu pushed a commit to tedyu/kafka that referenced this pull request Jan 6, 2025

KAFKA-18223 Flaky test report script (apache#17938)

72e282b

Adds a python script to generate a detailed flaky test report using the Develocity API Reviewers: David Arthur <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-18223 Flaky test report script #17938

KAFKA-18223 Flaky test report script #17938

santhoshct commented Nov 25, 2024 •

edited

Loading

mumrah left a comment

mumrah Dec 4, 2024

santhoshct Dec 11, 2024

mumrah Dec 4, 2024

santhoshct Dec 11, 2024

mumrah Dec 4, 2024

santhoshct Dec 11, 2024

mumrah Dec 4, 2024

mumrah Dec 4, 2024

santhoshct Dec 11, 2024

mumrah Dec 4, 2024

santhoshct Dec 11, 2024

mumrah left a comment

KAFKA-18223 Flaky test report script #17938

KAFKA-18223 Flaky test report script #17938

Conversation

santhoshct commented Nov 25, 2024 • edited Loading

Summary

Changes

Testing

Committer Checklist (excluded from commit message)

mumrah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumrah left a comment

Choose a reason for hiding this comment

santhoshct commented Nov 25, 2024 •

edited

Loading