Move Complement test matrix jobs definition to match Sytest and Trial. #14153

realtyem · 2022-10-12T10:40:02Z

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
Pull request includes a sign off
Code style is correct
(run the linters)

In preparation for Faster joins, and further worker mode development, let's move configuration of the test options to the same place as Sytest and Trial and avoid configuration fragmentation.

Currently, the list of workers used by Complement for testing is defined in start_for_complement.sh which is the entrypoint for that docker image. I propose to locate this list in calculate_jobs.py instead. At this time, this is a drop-in replacement for existing code, no workers are removed or added.

Created a backwards compatibility mode for running complement.sh on the command-line so existing functionality is preserved. Using WORKERS=1 and not setting WORKER_TYPES to anything still uses the values originally defined in start_for_complement.sh
Signed-off-by: Jason Little [email protected]

…ate_jobs.py Keep the set of workers defined as-is. Use string concatenation to form the string to not be messy. Perhaps alphabetize later? Make sure to accommodate monolith mode and database types.

While there, make use of env to reduce an obnoxiously long commandline and do a little housecleaning.

SYNAPSE_WORKER_TYPES is now passed through, so don't need that. Don't need to set it as an empty string if not running with workers, as that's done earlier. Add a little extra logging at the top to make sure it actually makes it all way through.

…ne-place

docker/complement/conf/start_for_complement.sh

realtyem · 2022-10-12T21:31:01Z

scripts-dev/complement.sh

@@ -140,7 +140,7 @@ if [[ -n "$WORKERS" ]]; then
  export PASS_SYNAPSE_COMPLEMENT_USE_WORKERS=true

  # Pass through the workers defined. If none, it will be an empty string
-  export PASS_SYNAPSE_WORKER_TYPES="$SYNAPSE_WORKER_TYPES"
+  export PASS_SYNAPSE_WORKER_TYPES="$WORKER_TYPES"


This way it matches with the variables being declared on the commandline and environment at the same time. Standardization is good, redundancy is bad.

…ne-place

…ent.sh. Restore original condition, then hide it behind an empty string condition. Add a bit of logging so you know it's working.

realtyem · 2022-10-16T10:01:11Z

As part of this, I was looking at the part of Synapse that parses the configuration for workers and does some sanity checking before setting up the worker for operation. It looks like the things that are capable of being sharded,

Federation Senders
Pushers

need to be added to their special maps rather there are more than one or not. This could be a left-over relic of not being migrated to generic workers app.

pushers need pusher_instances
federation senders need federation_sender_instances
IIRC, there are more than one way to declare if master should handle these tasks or not. If I'm following this correctly(using federation senders as an example; pushers is virtually identical logic)

* set global send_federation to true if it doesn't exist
* if federation_sender_instances has content
  *  use that
* else federation_sender_instances is empty
  * if global send_federation is true then use master
  * if using the deprecated worker_app 'synapse.app.federation_sender'
    * if send_federation is true error out
    * set worker_name into federation_sender_instances
* set worker's send_federation to true if this worker's name is in federation_sender_instances

So basically, don't need to set send_federation(or start_pushers) at all any more if you are setting the worker instance into it's special map. I guess should mark those as deprecated but backward compatible?
Event persisters are special in that they don't have a special map. As long as they are part of stream_writers and instance_map they need no other configuration. It also appears that you can only have a single stream writer of each kind no matter how large your homeserver is. How does matrix.org handle this? Surely that's overwhelming?

I'll be testing this out shortly.

-Edit: Moving this thought/question into a separate PR

…g#14153.

realtyem · 2022-10-17T06:44:54Z

Please sanity check me

…ne-place

…on?.

…ne-place

DMRobertson

I'm not sure about this one, and I don't fully understand the motivation here.

Firstly: we moved the sytest and trial job definitions to a python script because we wanted to run a limited subset of them on PRs, but the full set on develop and release branches. We haven't made a similar change yet for complement (and this change doesn't seem to do so).

Secondly: I expect changing the complement configuration to spawn more individual workers is going to make it slower for complement tests to run. It's already the longest CI job as it is---I'm not sure if we can afford this. (At least, not on every PR.)

Thirdly: this PR is tricky to review. It changes two things at once: moving the matrix definition; then adding new jobs to that matrix. It's sometimes okay to change multiple things in one PR, but if so it's best to keep the commit history is a sequence of independent, self-contained changes---that's not the case here.

(One can clean up commit history after the fact using an interactive rebase---but we ask that contributers don't do this after requesting review, as it conflicts with Github's review tools.)

I've left some thoughts, but I think it would be best to proceed by breaking this up into smaller PRs.

DMRobertson · 2022-10-28T12:56:27Z

docs/development/contributing_guide.md

@@ -323,7 +323,8 @@ COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestImportHistoric
 The above will run a monolithic (single-process) Synapse with SQLite as the database. For other configurations, try:

 - Passing `POSTGRES=1` as an environment variable to use the Postgres database instead.
- Passing `WORKERS=1` as an environment variable to use a workerised setup instead. This option implies the use of Postgres.
+- Passing `WORKERS=1` as an environment variable to use a set of workers that mirrors what is used in Sytest. This option implies the use of Postgres.
+  - If setting `WORKERS=1`, optionally set `WORKER_TYPES=` to declare which worker types you wish to test. A simple comma-delimited string containing the worker types defined from the template in [here](https://github.com/matrix-org/synapse/blob/develop/docker/configure_workers_and_start.py). A safe example would be `WORKER_TYPES="federation_inbound,federation_sender,synchrotron,"`. See the [worker documentation](https://matrix-org.github.io/synapse/latest/workers.html) for additional information on workers.


I think you don't want to have a trailing comma after synchrotron?

Nice catch, that's right it needs to not be there.

DMRobertson · 2022-10-28T13:09:10Z

.github/workflows/tests.yml

+          set -o pipefail
+          COMPLEMENT_DIR=`pwd`/complement synapse/scripts-dev/complement.sh -json 2>&1 | synapse/.ci/scripts/gotestfmt
+
+  back-compat:


Why wouldn't this be included under the complement job above as part of the matrix?

It's also not clear what this is testing backwards compatibility with?

realtyem · 2022-10-28T20:35:03Z

This was one of my first trys at a PR and I've learned a lot since then. For some reason, the commits history is including a completely different PR in the history, which didn't get documented here. This WAS two separate PR's. #14202

Firstly: we moved the sytest and trial job definitions to a python script because we wanted to run a limited subset of them on PRs, but the full set on develop and release branches. We haven't made a similar change yet for complement (and this change doesn't seem to do so).

That is completely fair and is something I didn't know. Now I know how to look up old PR's correctly(for instance I found the one you speak of in #13713 ).

Secondly: I expect changing the complement configuration to spawn more individual workers is going to make it slower for complement tests to run. It's already the longest CI job as it is---I'm not sure if we can afford this. (At least, not on every PR.)

This PR wasn't supposed to spawn more workers than is currently used. I didn't know it would blend the commit history with another PR just by adding a Needs line in the description(which is the only way I can think of that this happened). The functionality here was only as a drop in replacement and no additional workers are spawned.

Thirdly: this PR is tricky to review. It changes two things at once: moving the matrix definition; then adding new jobs to that matrix. It's sometimes okay to change multiple things in one PR, but if so it's best to keep the commit history is a sequence of independent, self-contained changes---that's not the case here.

This is absolutely true and in hind-sight a good idea. I'm thinking I'll break the backwards-compatibility mode into a PR first, so individual workers can be tested as an override. Then the second PR for moving the matrix definition to the calculate_jobs.py. The extra job that is run now is to be reverted(Commit 7528511 is purely a test to show that the backwards compatilibity mode works as intended)

(One can clean up commit history after the fact using an interactive rebase---but we ask that contributers don't do this after requesting review, as it conflicts with Github's review tools.)

I probably won't ever use that, as I didn't know it existed until now 😁

I've left some thoughts, but I think it would be best to proceed by breaking this up into smaller PRs.

Thanks for the time, I completely agree. This got way messier than it was supposed to.

realtyem added 11 commits October 12, 2022 02:27

Duplicate worker types defined in start_for_complement.sh into calcul…

d7192cd

…ate_jobs.py Keep the set of workers defined as-is. Use string concatenation to form the string to not be messy. Perhaps alphabetize later? Make sure to accommodate monolith mode and database types.

Modify tests.yml to use the new job matrix.

5847f41

While there, make use of env to reduce an obnoxiously long commandline and do a little housecleaning.

Pass the SYNAPSE_WORKER_TYPES through to the Complement tests.

db446cc

Changelog.

84399a3

Merge branch 'develop' into realtyem/complement-move-workers-def-to-o…

2f9f79b

…ne-place

Adjust for PR matrix-org#14028 being merged.

6be11b6

Fix executable permission on start_for_complement.sh.

a246dda

Don't actually need to put the SYNAPSE_ prefix on env variables.

01992aa

Update docs and include example.

785cf2e

Merge branch 'develop' into realtyem/complement-move-workers-def-to-o…

a4306d1

…ne-place

realtyem commented Oct 12, 2022

View reviewed changes

docker/complement/conf/start_for_complement.sh Show resolved Hide resolved

realtyem commented Oct 12, 2022

View reviewed changes

realtyem added 3 commits October 15, 2022 21:38

Merge branch 'develop' into realtyem/complement-move-workers-def-to-o…

a5de665

…ne-place

Create backwards compatibility use for WORKERS=1 in start_for_complem…

079c5d9

…ent.sh. Restore original condition, then hide it behind an empty string condition. Add a bit of logging so you know it's working.

[REVERT THIS] Create quick test to make sure it all works as intended.

7528511

realtyem mentioned this pull request Oct 16, 2022

Add all Stream Writer worker types to configure_workers_and_start.py #14197

Merged

4 tasks

realtyem added a commit to realtyem/synapse-old that referenced this pull request Oct 17, 2022

Need matrix-org#14153 from this point, adjust names cuz pretty.

3aa6c96

realtyem added a commit to realtyem/synapse-old that referenced this pull request Oct 17, 2022

Make adjustments that are in line with requirements from PR matrix-or…

7d11c58

…g#14153.

realtyem mentioned this pull request Oct 17, 2022

Setup extended Complement Testing #14202

Closed

4 tasks

realtyem added 2 commits October 17, 2022 00:07

How did those get in there?

6846a1c

Update to docs.

80afb8a

realtyem marked this pull request as ready for review October 17, 2022 06:44

realtyem requested a review from a team as a code owner October 17, 2022 06:44

realtyem added 4 commits October 21, 2022 15:44

Merge branch 'develop' into realtyem/complement-move-workers-def-to-o…

eca4595

…ne-place

It's an underscore, not a hyphen

b16d4b3

Seriously? Editing on Github directly changes the executable permissi…

6c8675c

…on?.

Merge branch 'develop' into realtyem/complement-move-workers-def-to-o…

720f55e

…ne-place

realtyem mentioned this pull request Oct 26, 2022

Modernize configure_workers_and_start.py bootstrapping script for Dockerfile-workers. #14294

Merged

8 tasks

DMRobertson suggested changes Oct 28, 2022

View reviewed changes

realtyem closed this Oct 28, 2022

realtyem deleted the realtyem/complement-move-workers-def-to-one-place branch November 27, 2022 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move Complement test matrix jobs definition to match Sytest and Trial. #14153

Move Complement test matrix jobs definition to match Sytest and Trial. #14153

realtyem commented Oct 12, 2022 •

edited

Loading

realtyem Oct 12, 2022 •

edited

Loading

realtyem commented Oct 16, 2022 •

edited

Loading

realtyem commented Oct 17, 2022

DMRobertson left a comment •

edited

Loading

DMRobertson Oct 28, 2022

realtyem Oct 28, 2022

DMRobertson Oct 28, 2022

realtyem commented Oct 28, 2022

Move Complement test matrix jobs definition to match Sytest and Trial. #14153

Move Complement test matrix jobs definition to match Sytest and Trial. #14153

Conversation

realtyem commented Oct 12, 2022 • edited Loading

Pull Request Checklist

realtyem Oct 12, 2022 • edited Loading

Choose a reason for hiding this comment

realtyem commented Oct 16, 2022 • edited Loading

realtyem commented Oct 17, 2022

DMRobertson left a comment • edited Loading

Choose a reason for hiding this comment

DMRobertson Oct 28, 2022

Choose a reason for hiding this comment

realtyem Oct 28, 2022

Choose a reason for hiding this comment

DMRobertson Oct 28, 2022

Choose a reason for hiding this comment

realtyem commented Oct 28, 2022

realtyem commented Oct 12, 2022 •

edited

Loading

realtyem Oct 12, 2022 •

edited

Loading

realtyem commented Oct 16, 2022 •

edited

Loading

DMRobertson left a comment •

edited

Loading