You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
From the logs it looks like a host admin deleted a room multiple times in a row within a few minutes. Initially this caused 403's for message sending for a bot user, which makes sense, but this soon turned to 500's until restarted, after which a 403 was again returned. Also some time after the room delete activity, a sync started to throw long stacktraces with 500 error. This is unconfirmed to have been fixed after restart, but it seems likely given the stack mentions caches.
From logs:
Multiple events of the following events within a few minutes:
synapse.handlers.room - 1339 - INFO - DELETE-6639- Shutting down room '!roomid:domain.tld'
synapse.handlers.room - 1348 - INFO - DELETE-6639- Kicking '@botuser:domain.tld' from '!roomid:domain.tld'...
@botuser:domain.tld starts to see 403
Purge events seems to run multiple times for the rooms within a short period, with various events like this
synapse.storage.databases.main.purge_events - 412 - INFO - DELETE-6639- [purge] removing !roomid:domain.tld from event_push_summary
Room send 403 turns into 500:
> 2021-09-09 18:02:40,129 - synapse.http.server - 93 - ERROR - PUT-6693- Failed handle request via 'RoomSendEventRestServlet': <XForwardedForRequest at 0x7f978dff48b0 method='PUT' uri='/_matrix/client/r0/rooms/!roomid:domain.tld/send/m.room.message/m1627686538.10576?access_token=<redacted>' clientproto='HTTP/1.1' site='8008'>
> Sep 9, 2021 @ 18:02:40.132synapse
Traceback (most recent call last):
Sep 9, 2021 @ 18:02:40.132synapse
File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
Sep 9, 2021 @ 18:02:40.132synapse
result = current_context.run(gen.send, result)
Sep 9, 2021 @ 18:02:40.132synapse
File "/usr/local/lib/python3.8/site-packages/synapse/rest/client/room.py", line 242, in on_POST
Sep 9, 2021 @ 18:02:40.132synapse
) = await self.event_creation_handler.create_and_send_nonmember_event(
Sep 9, 2021 @ 18:02:40.132synapse
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/message.py", line 863, in create_and_send_nonmember_event
Sep 9, 2021 @ 18:02:40.132synapse
event, context = await self.create_event(
Sep 9, 2021 @ 18:02:40.132synapse
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/message.py", line 623, in create_event
Sep 9, 2021 @ 18:02:40.132synapse
event, context = await self.create_new_client_event(
Sep 9, 2021 @ 18:02:40.132synapse
File "/usr/local/lib/python3.8/site-packages/synapse/util/metrics.py", line 91, in measured_func
Sep 9, 2021 @ 18:02:40.132synapse
r = await func(self, *args, **kwargs)
Sep 9, 2021 @ 18:02:40.132synapse
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/message.py", line 943, in create_new_client_event
Sep 9, 2021 @ 18:02:40.132synapse
assert (
Sep 9, 2021 @ 18:02:40.132synapse
AssertionError: Attempting to create an event with no prev_events
The admin continues to create and delete rooms (scripted based on number of API calls and them happening over a short period of time for the same rooms). Later sync starts failing for the admin operating on behalf of another user:
2021-09-10 00:20:08,523 - synapse.access.http.8009 - 389 - INFO - GET-27763- 76.104.170.72 - 8009 - {@admin:domain.tld.@someuser:domain.tld} Processed request: 0.088sec/-0.000sec (0.064sec, 0.008sec) (0.045sec/0.162sec/14) 55B 500 "GET /_matrix/client/r0/sync?filter=1&timeout=0&_cacheBuster=1631233209635&access_token=<redacted> HTTP/1.1" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36" [0 dbevts]
2021-09-10 00:20:15,977 - synapse.http.server - 93 - ERROR - GET-27765- Failed handle request via 'SyncRestServlet': <XForwardedForRequest at 0x7fcd04c930c0 method='GET' uri='/_matrix/client/r0/sync?filter=1&timeout=0&_cacheBuster=1631233217123&access_token=<redacted>' clientproto='HTTP/1.1' site=8009>
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
Traceback (most recent call last):
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1657, in _inlineCallbacks
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
result = current_context.run(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/twisted/python/failure.py", line 500, in throwExceptionIntoGenerator
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
return g.throw(self.type, self.value, self.tb)
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 362, in _wait_for_sync_for_user
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
result: SyncResult = await self.current_sync_for_user(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 406, in current_sync_for_user
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
sync_result = await self.generate_sync_result(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1072, in generate_sync_result
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
res = await self._generate_sync_entry_for_rooms(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1524, in _generate_sync_entry_for_rooms
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
await concurrently_execute(handle_room_entries, room_entries, 10)
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 1661, in _inlineCallbacks
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
result = current_context.run(gen.send, result)
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/util/async_helpers.py", line 191, in _concurrently_execute_inner
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
await maybe_awaitable(func(value))
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1512, in handle_room_entries
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
res = await self._generate_room_entry(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 1936, in _generate_room_entry
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
batch = await self._load_filtered_recents(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/handlers/sync.py", line 583, in _load_filtered_recents
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
loaded_recents = await filter_events_for_client(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/visibility.py", line 84, in filter_events_for_client
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
event_id_to_state = await storage.state.get_state_for_events(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/storage/state.py", line 466, in get_state_for_events
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
group_to_state = await self.stores.state._get_state_for_groups(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/state/store.py", line 237, in _get_state_for_groups
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
(member_state, incomplete_groups_m,) = self._get_state_for_groups_using_cache(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/state/store.py", line 301, in _get_state_for_groups_using_cache
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
state_dict_ids, got_all = self._get_state_for_group_using_cache(
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/state/store.py", line 185, in _get_state_for_group_using_cache
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
cache_entry = cache.get(group)
Sep 10, 2021 @ 00:20:15.979synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/util/caches/dictionary_cache.py", line 96, in get
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
return f(*args, **kwargs)
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/util/caches/lrucache.py", line 483, in cache_get
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
move_node_to_front(node)
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/util/caches/lrucache.py", line 439, in move_node_to_front
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
node.move_to_front(real_clock, list_root)
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/util/caches/lrucache.py", line 294, in move_to_front
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
self._list_node.move_after(cache_list_root)
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
File "/usr/local/lib/python3.8/site-packages/synapse/util/linked_list.py", line 93, in move_after
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
assert self.prev_node
Sep 10, 2021 @ 00:20:15.980synapse-synchrotron
AssertionError
Until Synapse is restarted, the event send graph looks completely dead.
Restart seems to have made everything happy again (no more 500, a good old 403 for event sending as we should).
Not sure if the admin would still see a sync error, but looking at stacktrace it's probably fixed by nuking the caches on restart?
Version information
Homeserver: EMS customer, please ping for host name and other details
Version: v1.42.0
Install method: EMS docker images
Workers: Synchrotron + initial synchrotron
The text was updated successfully, but these errors were encountered:
Looks like the suggestion in the linked issue (#10797) is a good way to go: aka deduplicating multiple requests to shutdown a room, and to make the API friendlier to long-running calls.
Flagging as to-discuss as this has come up in multiple occasions recently (though I have no reason to believe this is a new issue, just a potentially pressing one. Especially if it can be triggered easily by EMS customers).
@jaywink in the meantime, EMS may consider deduplicating the requests client-side until Synapse is more robust in this area.
I haven't looked too deeply into this, but one suggestion might be to put a ResponseCache on it — we do this for various other REST endpoints to deduplicate requests.
Description
From the logs it looks like a host admin deleted a room multiple times in a row within a few minutes. Initially this caused 403's for message sending for a bot user, which makes sense, but this soon turned to 500's until restarted, after which a 403 was again returned. Also some time after the room delete activity, a sync started to throw long stacktraces with 500 error. This is unconfirmed to have been fixed after restart, but it seems likely given the stack mentions caches.
From logs:
synapse.handlers.room - 1339 - INFO - DELETE-6639- Shutting down room '!roomid:domain.tld'
synapse.handlers.room - 1348 - INFO - DELETE-6639- Kicking '@botuser:domain.tld' from '!roomid:domain.tld'...
@botuser:domain.tld
starts to see 403synapse.storage.databases.main.purge_events - 412 - INFO - DELETE-6639- [purge] removing !roomid:domain.tld from event_push_summary
The admin continues to create and delete rooms (scripted based on number of API calls and them happening over a short period of time for the same rooms). Later sync starts failing for the admin operating on behalf of another user:
Until Synapse is restarted, the event send graph looks completely dead.
Restart seems to have made everything happy again (no more 500, a good old 403 for event sending as we should).
Not sure if the admin would still see a sync error, but looking at stacktrace it's probably fixed by nuking the caches on restart?
Version information
The text was updated successfully, but these errors were encountered: