Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

CI run against Twisted trunk is failing #14589

Closed
github-actions bot opened this issue Dec 1, 2022 · 5 comments
Closed

CI run against Twisted trunk is failing #14589

github-actions bot opened this issue Dec 1, 2022 · 5 comments
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2022

See https://github.com/matrix-org/synapse/actions/runs/4132151123

@DMRobertson
Copy link
Contributor

$ grep ❌ ~/logs
2022-12-01T08:47:45.1397595Z ##[group]❌ TestFederatedClientSpaces (1m34.03s)
2022-12-01T08:47:48.7297968Z ##[group]❌ TestFederationKeyUploadQuery (1m28.3s)
2022-12-01T08:47:48.7300127Z ##[group]❌ TestFederationKeyUploadQuery/Parallel (0s)
2022-12-01T08:47:48.7300745Z ##[group]❌ TestFederationKeyUploadQuery/Parallel/Can_query_remote_device_keys_using_POST (30.02s)
2022-12-01T08:47:51.6025531Z ##[group]❌ TestFederationRoomsInvite (11m24.79s)
2022-12-01T08:47:51.6029201Z ##[group]❌ TestFederationRoomsInvite/Parallel (31.54s)
2022-12-01T08:47:51.6029838Z ##[group]❌ TestFederationRoomsInvite/Parallel/Invited_user_can_reject_invite_over_federation (30.74s)
2022-12-01T08:47:51.6030924Z ##[group]❌ TestFederationRoomsInvite/Parallel/Invited_user_can_reject_invite_over_federation_for_empty_room (30.77s)
2022-12-01T08:47:51.6031878Z ##[group]❌ TestFederationRoomsInvite/Parallel/Invited_user_can_reject_invite_over_federation_several_times (30.67s)
2022-12-01T08:47:51.6032883Z ##[group]❌ TestFederationRoomsInvite/Parallel/Invited_user_has_'is_direct'_flag_in_prev_content_after_joining (31.54s)
2022-12-01T08:47:51.6033797Z ##[group]❌ TestFederationRoomsInvite/Parallel/Remote_invited_user_can_see_room_metadata (30.68s)
2022-12-01T08:48:00.2516403Z ##[group]❌ TestRemotePresence (4m21.84s)
2022-12-01T08:48:00.2518648Z ##[group]❌ TestRemotePresence/Presence_changes_are_also_reported_to_remote_room_members (30.01s)
2022-12-01T08:48:00.2519625Z ##[group]❌ TestRemotePresence/Presence_changes_to_UNAVAILABLE_are_reported_to_remote_room_members (30.01s)
2022-12-01T08:48:04.7405122Z ##[group]❌ TestRestrictedRoomsRemoteJoin (1m35.77s)
2022-12-01T08:48:07.7478223Z ##[group]❌ TestRestrictedRoomsRemoteJoin/Join_should_fail_when_left_allowed_room (10ms)
2022-12-01T08:48:07.7479412Z ##[group]❌ TestRestrictedRoomsRemoteJoin/Join_should_succeed_when_joined_to_allowed_room (30.46s)
2022-12-01T08:48:07.7480658Z ##[group]❌ TestRestrictedRoomsRemoteJoinInMSC3787Room (1m36.71s)
2022-12-01T08:48:10.7830475Z ##[group]❌ TestRestrictedRoomsRemoteJoinInMSC3787Room/Join_should_fail_when_left_allowed_room (10ms)
2022-12-01T08:48:10.7831752Z ##[group] TestRestrictedRoomsRemoteJoinInMSC3787Room/Join_should_succeed_when_joined_to_allowed_room (30.45s)
$ grep 'because we already tried to pull recently (backing off)' ~/logs | wc -l
198746

Lots of errors of the form

  2022-12-01T08:48:00.2519946Z     federation_presence_test.go:39: CSAPI.DoFunc response returned error: Get "http://127.0.0.1:49242/_matrix/c  lient/v3/sync?timeout=0": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Guessing that there was some kind of network connectivity problem between the containers and the complement binary? Will re-run the test and see if the failure is reproducible.

@DMRobertson DMRobertson self-assigned this Dec 1, 2022
@DMRobertson DMRobertson added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Dec 1, 2022
@squahtx
Copy link
Contributor

squahtx commented Dec 1, 2022

$ grep 'because we already tried to pull recently (backing off)' ~/logs | wc -l
198746

That suggests that during the blueprint setup, one of the servers in the blueprint decided it couldn't contact the other. And then every test that spawned that blueprint failed?

@squahtx
Copy link
Contributor

squahtx commented Dec 1, 2022

When hs2 starts, it immediately resumes Syncing state for room !rgmgoHtwqMRpgzvgCK:hs1 via hs1, which means it was shut down while it was partially joined to the room in the blueprint. Previously, we used to block until fully joined when sending messages into partial state rooms, so the join would complete. Now, we no longer block, so the join may not complete during the blueprint phase.

@DMRobertson
Copy link
Contributor

On retry this failed with a regular flake (#14103).

Tempted to say "let's ignore this failure mode until it shows up again". WDYT Sean?

@squahtx
Copy link
Contributor

squahtx commented Dec 1, 2022

Well, we did see it happen previously in https://github.com/matrix-org/synapse/actions/runs/3583850629/jobs/6029854268

@DMRobertson DMRobertson removed their assignment Dec 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

3 participants