Improve ct_master #8946

essen · 2024-10-16T09:32:03Z

At RabbitMQ we have started using ct_master to greatly speed up our test runs in CI, moving away from an approach that was caching test results and guessing what tests we had to run again, to an approach that runs all tests but with greater concurrency. (We changed approaches due to circumstances beyond our control, not based on technical merit, but that's a story for another time.)

Greater concurrency here means running multiple test suites at the same time in a single machine (the same that ct_master runs on).

With ct_master we quickly were able to run all test suites in the rabbit application (https://github.com/rabbitmq/rabbitmq-server/tree/main/deps/rabbit/test), and there are a lot, in under 30 minutes on our development machines.

We pushed forward and applied the same principles to CI for our two biggest applications and were able to cut down the run time to around 13 minutes using 4 workers for rabbit, and 1 worker for rabbitmq_mqtt, both using ct_master. Our other applications are smaller and have not needed this treatment applied to them.

I am also interested in making this parallel execution a feature of Erlang.mk at a later time, when the needed functionality is available in OTP directly.

While it is functional, ct_master is in a fairly bad state, and this PR aims to improve that. It includes a fix for #8911 as well as additional functionality. Most of the changes are not controversial, although the last two commits may be:

ct_master: Return results from ct_master:run: This is a breaking change. But I seriously doubt there's even 1 user of ct_master outside us (and we are using a forked module).
ct_master: Print auto-skipped and failed test cases: This uses the builtin event handler, which wasn't doing much before, so perhaps that's not wanted.

I also noted that the ct_master_status module appears to be completely unused, happy to add a commit to delete it.

The equivalent RabbitMQ PR is at rabbitmq/rabbitmq-server#12502 and one of the comments there links to a few test runs that use ct_master with all changes included in our PR here.

Note that I have not run (or added to!) OTP's CT tests at this point, hoping to get some feedback on the more controversial points, and whether master should be the target or if maint would be OK.

cc @lhoguin

The ct_run:run_test function already takes care of the node's logs. The ct_master_logs module takes care of ct_master itself.

Needed to file:set_cwd like in normal CT.

Before this commit the CT docs were lying as ct_master only handled a small number of testspec instructions. This commit fixes that.

It makes more sense to sort by node name, than to have the results in the order they finished.

Breaking change: instead of returning just `ok` to indicate that the spec file was handled, we return an OK tuple with the results of the tests (number of successful, failed, user and auto skipped tests). This allows the caller to know whether any test error occurred.

At the end of a ct_master run. This uses the builtin CT Master event handler to gather the results.

github-actions · 2024-10-16T09:32:56Z

CT Test Results

2 files 58 suites 1h 24m 23s ⏱️
451 tests 438 ✅ 12 💤 1 ❌
487 runs 471 ✅ 15 💤 1 ❌

For more details on these failures, see this check.

Results for commit fd46027.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

essen · 2024-10-16T13:05:59Z

The test failure is because of the breaking change mentioned earlier.

essen added 7 commits October 16, 2024 09:54

ct_master: Don't refresh logs at the end of run

8be946d

The ct_run:run_test function already takes care of the node's logs. The ct_master_logs module takes care of ct_master itself.

ct_master: Fix the master_runs.html css file paths

9ca4b7a

Needed to file:set_cwd like in normal CT.

ct_master: Fix a small artefact in master_runs.html

63af8ed

ct_master: Handle all testspec instructions

87e5575

Before this commit the CT docs were lying as ct_master only handled a small number of testspec instructions. This commit fixes that.

ct_master: Sort the results printout from ct_master

47c98c1

It makes more sense to sort by node name, than to have the results in the order they finished.

ct_master: Print auto-skipped and failed test cases

fd46027

At the end of a ct_master run. This uses the builtin CT Master event handler to gather the results.

lhoguin mentioned this pull request Oct 16, 2024

4.1/main: Make CI: Fix and enhance ct_master rabbitmq/rabbitmq-server#12502

Merged

u3s self-assigned this Oct 16, 2024

u3s added the team:PS Assigned to OTP team PS label Oct 16, 2024

IngelaAndin added stalled waiting for input by the Erlang/OTP team priority:low labels Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ct_master #8946

Improve ct_master #8946

essen commented Oct 16, 2024

github-actions bot commented Oct 16, 2024 •

edited

Loading

essen commented Oct 16, 2024

Improve ct_master #8946

Are you sure you want to change the base?

Improve ct_master #8946

Conversation

essen commented Oct 16, 2024

github-actions bot commented Oct 16, 2024 • edited Loading

CT Test Results

Artifacts

essen commented Oct 16, 2024

github-actions bot commented Oct 16, 2024 •

edited

Loading