-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix missing sync events during historical batch imports #12319
Fix missing sync events during historical batch imports #12319
Conversation
This can be used to flip betweek topological ordering (the default) and stream ordering as needed by the caller.
fb38e08
to
e974d6c
Compare
room_id, | ||
end_token=stream_position.room_key, | ||
limit=1, | ||
order_by="stream", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This aligns with automatic selection of bounds depending on the token passed into the pagination function.
order_by
argument
@MadLittleMods I'm a bit confused on next steps here - this seems to have unearthed another (complex looking) bug I don't have time (currently) to work on a fix for. My suggestion would be to roll back to the explicit argument fix which resolves this issue and create a new issue to tackle the |
@Fizzadar Go for it 👍 We can get some feedback from the Synapse team whether that is acceptable |
…rovided Part of #12281 Context: #12319 (comment)
As an update, I've fixed up the other complex bug via #12370 So we can probably leave this PR as-is. I've assigned it back to the Synapse core for review ⏩ |
|
||
assert ( | ||
"m.room.create" in state_event_types | ||
), "Missing room full state in sync response" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and fails on develop
but passes with this PR ✅ (as expected)
Thanks @Fizzadar and @MadLittleMods for tracking this down! My main concern is how this changes clients back paginating using a I don't think that is a change we really want to make. My understanding is that that behaviour works correctly currently? Or does that also have issues with backfilled/historical events? |
@@ -175,10 +175,6 @@ async def get_state_events( | |||
state_filter = state_filter or StateFilter.all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is how this changes clients back paginating using a
prev_batch
token from a/sync
timeline
section. Those tokens are currently stream tokens, but/messages
will return them in topological order with a topological next token. This change means that the first/messages
will return events in stream order but will still return a topological next token, meaning that the next/messages
will return events in a topological order.
/messages
could be updated to always return in topological_ordering
regardless of what type of pagination token is given. That's what it was doing before this PR anyway.
I don't think that is a change we really want to make. My understanding is that that behaviour works correctly currently? Or does that also have issues with backfilled/historical events?
I think we have two assumptions that we should probably enforce:
/messages
should always sort bytopological_ordering
regardless of pagination token given/sync
should always sort bystream_ordering
regardless of pagination token given
This also aligns with what we have documented:
/sync
returns things in the order they arrive at the server (stream_ordering
)./messages
(and/backfill
in the federation API) return them in the order determined by the event graph(topological_ordering, stream_ordering)
.
The current behavior of /messages
and /sync
returns events
as expected but not state
.
There is a bug in how incremental /sync
returns state
which this PR aims to fix. The best way to understand the problem is probably reading #12281 (comment) - but basically when calculating what state
to return, /sync
is ordering events by topological_ordering
even though it should be stream_ordering
.
I initially assumed we would want flexibility in how these endpoints sorted events according to the type of pagination token given but it's sounding like we actually want to enforce a given sort according to the endpoint. In which case, we can revert to @Fizzadar initial approach of plumbing a order_by
argument which we can set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry for not getting to this yesterday, it sat at about the second thing on my todo list all day :/)
Thanks for the clarification. There's the added bonus that /sync
will return events in pagination order for the first batch of events for the room, i.e. in an initial sync or when a user joins the room. (The rationale being that it's equivalent to getting no events then immediately doing a back pagination). This is relevant as we use get_recent_events_for_room
for that case as well.
I initially assumed we would want flexibility in how these endpoints sorted events according to the type of pagination token given but it's sounding like we actually want to enforce a given sort according to the endpoint. In which case, we can revert to @Fizzadar initial approach of plumbing a
order_by
argument which we can set.
I wonder if we're just making this harder for ourselves by re-using the same query to do both /messages
and when looking up the state for /sync
? I think the query we want to run is simply:
SELECT event_id FROM events
WHERE room_id = ? AND stream_ordering <= ?
ORDER BY stream_ordering DESC
LIMIT 1
So it might be simpler to just have a separate get_last_event_in_room_before
function that we call when getting the state instead? Rather than extending the already slightly horrific _paginate_room_events_txn
function?
Entirely separately, and I'm not suggesting we necessarily do this now, but it occurs to me that we have a get_forward_extremities_for_room_at_stream_ordering
store function that maps from room_id
to the list of forward extremities in the room at the time. This can then be used to calculate the "current" state of the room at the time more accurately than the current method. The downside of this approach is that get_forward_extremities_for_room_at_stream_ordering
will fail if the stream ordering is too old.
Another approach that has also literally just now occurred to me is that we could use the current_state_delta_stream
to work out the changes in state between two stream orderings in the room, as the stream_id
column correspond to the stream orderings of the changes (I think?).
Anyway, just wanted to record my thoughts here before I forget them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fizzadar sorry for going around the houses really slowly on this 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries at all @erikjohnston - totally agree a separate function makes sense here; I've pushed that up in 9a78d14. I just wrapped an existing function get_room_event_before_stream_ordering
(which doesn't actually return an event) and pulled out the event.
Have undone the pagination ordering changes too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Will have a look after I've finished doing the v1.57.0rc1 release ❤️
This uses an existing method, `get_last_event_in_room_before_stream_ordering`, to retrieve the most recent (stream ordered) event in a room and replaces uses of the complex pagination txn.
2a5ee07
to
8e319cc
Compare
Co-authored-by: Erik Johnston <[email protected]>
Co-authored-by: Eric Eastwood <[email protected]>
Thank you again for bearing with us @Fizzadar ❤️ |
Synapse 1.58.0rc1 (2022-04-26) ============================== As of this release, the groups/communities feature in Synapse is now disabled by default. See [\#11584](matrix-org/synapse#11584) for details. As mentioned in [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1580), this feature will be removed in Synapse 1.61. Features -------- - Implement [MSC3383](matrix-org/matrix-spec-proposals#3383) for including the destination in server-to-server authentication headers. Contributed by @Bubu and @jcgruenhage for Famedly. ([\#11398](matrix-org/synapse#11398)) - Docker images and Debian packages from matrix.org now contain a locked set of Python dependencies, greatly improving build reproducibility. ([Board](https://github.com/orgs/matrix-org/projects/54), [\#11537](matrix-org/synapse#11537)) - Enable processing of device list updates asynchronously. ([\#12365](matrix-org/synapse#12365), [\#12465](matrix-org/synapse#12465)) - Implement [MSC2815](matrix-org/matrix-spec-proposals#2815) to allow room moderators to view redacted event content. Contributed by @tulir. ([\#12427](matrix-org/synapse#12427)) - Build Debian packages for Ubuntu 22.04 "Jammy Jellyfish". ([\#12543](matrix-org/synapse#12543)) Bugfixes -------- - Prevent a sync request from removing a user's busy presence status. ([\#12213](matrix-org/synapse#12213)) - Fix bug with incremental sync missing events when rejoining/backfilling. Contributed by Nick @ Beeper. ([\#12319](matrix-org/synapse#12319)) - Fix a long-standing bug which incorrectly caused `GET /_matrix/client/v3/rooms/{roomId}/event/{eventId}` to return edited events rather than the original. ([\#12476](matrix-org/synapse#12476)) - Fix a bug introduced in Synapse 1.27.0 where the admin API for [deleting forward extremities](https://github.com/matrix-org/synapse/blob/erikj/fix_delete_event_response_count/docs/admin_api/rooms.md#deleting-forward-extremities) would always return a count of 1, no matter how many extremities were deleted. ([\#12496](matrix-org/synapse#12496)) - Fix a long-standing bug where the image thumbnails embedded into email notifications were broken. ([\#12510](matrix-org/synapse#12510)) - Fix a bug in the implementation of [MSC3202](matrix-org/matrix-spec-proposals#3202) where Synapse would use the field name `device_unused_fallback_keys`, rather than `device_unused_fallback_key_types`. ([\#12520](matrix-org/synapse#12520)) - Fix a bug introduced in Synapse 0.99.3 which could cause Synapse to consume large amounts of RAM when back-paginating in a large room. ([\#12522](matrix-org/synapse#12522)) Improved Documentation ---------------------- - Fix rendering of the documentation site when using the 'print' feature. ([\#12340](matrix-org/synapse#12340)) - Add a manual documenting config file options. ([\#12368](matrix-org/synapse#12368), [\#12527](matrix-org/synapse#12527)) - Update documentation to reflect that both the `run_background_tasks_on` option and the options for moving stream writers off of the main process are no longer experimental. ([\#12451](matrix-org/synapse#12451)) - Update worker documentation and replace old `federation_reader` with `generic_worker`. ([\#12457](matrix-org/synapse#12457)) - Strongly recommend [Poetry](https://python-poetry.org/) for development. ([\#12475](matrix-org/synapse#12475)) - Add some example configurations for workers and update architectural diagram. ([\#12492](matrix-org/synapse#12492)) - Fix a broken link in `README.rst`. ([\#12495](matrix-org/synapse#12495)) - Add HAProxy delegation example with CORS headers to docs. ([\#12501](matrix-org/synapse#12501)) - Remove extraneous comma in User Admin API's device deletion section so that the example JSON is actually valid and works. Contributed by @olmari. ([\#12533](matrix-org/synapse#12533)) Deprecations and Removals ------------------------- - The groups/communities feature in Synapse is now disabled by default. ([\#12344](matrix-org/synapse#12344)) - Remove unstable identifiers from [MSC3440](matrix-org/matrix-spec-proposals#3440). ([\#12382](matrix-org/synapse#12382)) Internal Changes ---------------- - Preparation for faster-room-join work: start a background process to resynchronise the room state after a room join. ([\#12394](matrix-org/synapse#12394)) - Preparation for faster-room-join work: Implement a tracking mechanism to allow functions to wait for full room state to arrive. ([\#12399](matrix-org/synapse#12399)) - Remove an unstable identifier from [MSC3083](matrix-org/matrix-spec-proposals#3083). ([\#12395](matrix-org/synapse#12395)) - Run CI in the locked [Poetry](https://python-poetry.org/) environment, and remove corresponding `tox` jobs. ([\#12425](matrix-org/synapse#12425), [\#12434](matrix-org/synapse#12434), [\#12438](matrix-org/synapse#12438), [\#12441](matrix-org/synapse#12441), [\#12449](matrix-org/synapse#12449), [\#12478](matrix-org/synapse#12478), [\#12514](matrix-org/synapse#12514), [\#12472](matrix-org/synapse#12472)) - Change Mutual Rooms' `unstable_features` flag to `uk.half-shot.msc2666.mutual_rooms` which matches the current iteration of [MSC2666](matrix-org/matrix-spec-proposals#2666). ([\#12445](matrix-org/synapse#12445)) - Fix typo in the release script help string. ([\#12450](matrix-org/synapse#12450)) - Fix a minor typo in the Debian changelogs generated by the release script. ([\#12497](matrix-org/synapse#12497)) - Reintroduce the list of targets to the linter script, to avoid linting unwanted local-only directories during development. ([\#12455](matrix-org/synapse#12455)) - Limit length of `device_id` to less than 512 characters. ([\#12454](matrix-org/synapse#12454)) - Dockerfile-workers: reduce the amount we install in the image. ([\#12464](matrix-org/synapse#12464)) - Dockerfile-workers: give the master its own log config. ([\#12466](matrix-org/synapse#12466)) - complement-synapse-workers: factor out separate entry point script. ([\#12467](matrix-org/synapse#12467)) - Back out experimental implementation of [MSC2314](matrix-org/matrix-spec-proposals#2314). ([\#12474](matrix-org/synapse#12474)) - Fix grammatical error in federation error response when the room version of a room is unknown. ([\#12483](matrix-org/synapse#12483)) - Remove unnecessary configuration overrides in tests. ([\#12511](matrix-org/synapse#12511)) - Refactor the relations code for clarity. ([\#12519](matrix-org/synapse#12519)) - Add type hints so `docker` and `stubs` directories pass `mypy --disallow-untyped-defs`. ([\#12528](matrix-org/synapse#12528)) - Update `delay_cancellation` to accept any awaitable, rather than just `Deferred`s. ([\#12468](matrix-org/synapse#12468)) - Handle cancellation in `EventsWorkerStore._get_events_from_cache_or_db`. ([\#12529](matrix-org/synapse#12529))
Synapse 1.58.0 (2022-05-03) =========================== As of this release, the groups/communities feature in Synapse is now disabled by default. See [\#11584](matrix-org/synapse#11584) for details. As mentioned in [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1580), this feature will be removed in Synapse 1.61. No significant changes since 1.58.0rc2. Synapse 1.58.0rc2 (2022-04-26) ============================== This release candidate fixes bugs related to Synapse 1.58.0rc1's logic for handling device list updates. Bugfixes -------- - Fix a bug introduced in Synapse 1.58.0rc1 where the main process could consume excessive amounts of CPU and memory while handling sentry logging failures. ([\#12554](matrix-org/synapse#12554)) - Fix a bug introduced in Synapse 1.58.0rc1 where opentracing contexts were not correctly sent to whitelisted remote servers with device lists updates. ([\#12555](matrix-org/synapse#12555)) Internal Changes ---------------- - Reduce unnecessary work when handling remote device list updates. ([\#12557](matrix-org/synapse#12557)) Synapse 1.58.0rc1 (2022-04-26) ============================== Features -------- - Implement [MSC3383](matrix-org/matrix-spec-proposals#3383) for including the destination in server-to-server authentication headers. Contributed by @Bubu and @jcgruenhage for Famedly. ([\#11398](matrix-org/synapse#11398)) - Docker images and Debian packages from matrix.org now contain a locked set of Python dependencies, greatly improving build reproducibility. ([Board](https://github.com/orgs/matrix-org/projects/54), [\#11537](matrix-org/synapse#11537)) - Enable processing of device list updates asynchronously. ([\#12365](matrix-org/synapse#12365), [\#12465](matrix-org/synapse#12465)) - Implement [MSC2815](matrix-org/matrix-spec-proposals#2815) to allow room moderators to view redacted event content. Contributed by @tulir @ Beeper. ([\#12427](matrix-org/synapse#12427)) - Build Debian packages for Ubuntu 22.04 "Jammy Jellyfish". ([\#12543](matrix-org/synapse#12543)) Bugfixes -------- - Prevent a sync request from removing a user's busy presence status. ([\#12213](matrix-org/synapse#12213)) - Fix bug with incremental sync missing events when rejoining/backfilling. Contributed by Nick @ Beeper. ([\#12319](matrix-org/synapse#12319)) - Fix a long-standing bug which incorrectly caused `GET /_matrix/client/v3/rooms/{roomId}/event/{eventId}` to return edited events rather than the original. ([\#12476](matrix-org/synapse#12476)) - Fix a bug introduced in Synapse 1.27.0 where the admin API for [deleting forward extremities](https://github.com/matrix-org/synapse/blob/erikj/fix_delete_event_response_count/docs/admin_api/rooms.md#deleting-forward-extremities) would always return a count of 1, no matter how many extremities were deleted. ([\#12496](matrix-org/synapse#12496)) - Fix a long-standing bug where the image thumbnails embedded into email notifications were broken. ([\#12510](matrix-org/synapse#12510)) - Fix a bug in the implementation of [MSC3202](matrix-org/matrix-spec-proposals#3202) where Synapse would use the field name `device_unused_fallback_keys`, rather than `device_unused_fallback_key_types`. ([\#12520](matrix-org/synapse#12520)) - Fix a bug introduced in Synapse 0.99.3 which could cause Synapse to consume large amounts of RAM when back-paginating in a large room. ([\#12522](matrix-org/synapse#12522)) Improved Documentation ---------------------- - Fix rendering of the documentation site when using the 'print' feature. ([\#12340](matrix-org/synapse#12340)) - Add a manual documenting config file options. ([\#12368](matrix-org/synapse#12368), [\#12527](matrix-org/synapse#12527)) - Update documentation to reflect that both the `run_background_tasks_on` option and the options for moving stream writers off of the main process are no longer experimental. ([\#12451](matrix-org/synapse#12451)) - Update worker documentation and replace old `federation_reader` with `generic_worker`. ([\#12457](matrix-org/synapse#12457)) - Strongly recommend [Poetry](https://python-poetry.org/) for development. ([\#12475](matrix-org/synapse#12475)) - Add some example configurations for workers and update architectural diagram. ([\#12492](matrix-org/synapse#12492)) - Fix a broken link in `README.rst`. ([\#12495](matrix-org/synapse#12495)) - Add HAProxy delegation example with CORS headers to docs. ([\#12501](matrix-org/synapse#12501)) - Remove extraneous comma in User Admin API's device deletion section so that the example JSON is actually valid and works. Contributed by @olmari. ([\#12533](matrix-org/synapse#12533)) Deprecations and Removals ------------------------- - The groups/communities feature in Synapse is now disabled by default. ([\#12344](matrix-org/synapse#12344)) - Remove unstable identifiers from [MSC3440](matrix-org/matrix-spec-proposals#3440). ([\#12382](matrix-org/synapse#12382)) Internal Changes ---------------- - Preparation for faster-room-join work: start a background process to resynchronise the room state after a room join. ([\#12394](matrix-org/synapse#12394)) - Preparation for faster-room-join work: Implement a tracking mechanism to allow functions to wait for full room state to arrive. ([\#12399](matrix-org/synapse#12399)) - Remove an unstable identifier from [MSC3083](matrix-org/matrix-spec-proposals#3083). ([\#12395](matrix-org/synapse#12395)) - Run CI in the locked [Poetry](https://python-poetry.org/) environment, and remove corresponding `tox` jobs. ([\#12425](matrix-org/synapse#12425), [\#12434](matrix-org/synapse#12434), [\#12438](matrix-org/synapse#12438), [\#12441](matrix-org/synapse#12441), [\#12449](matrix-org/synapse#12449), [\#12478](matrix-org/synapse#12478), [\#12514](matrix-org/synapse#12514), [\#12472](matrix-org/synapse#12472)) - Change Mutual Rooms' `unstable_features` flag to `uk.half-shot.msc2666.mutual_rooms` which matches the current iteration of [MSC2666](matrix-org/matrix-spec-proposals#2666). ([\#12445](matrix-org/synapse#12445)) - Fix typo in the release script help string. ([\#12450](matrix-org/synapse#12450)) - Fix a minor typo in the Debian changelogs generated by the release script. ([\#12497](matrix-org/synapse#12497)) - Reintroduce the list of targets to the linter script, to avoid linting unwanted local-only directories during development. ([\#12455](matrix-org/synapse#12455)) - Limit length of `device_id` to less than 512 characters. ([\#12454](matrix-org/synapse#12454)) - Dockerfile-workers: reduce the amount we install in the image. ([\#12464](matrix-org/synapse#12464)) - Dockerfile-workers: give the master its own log config. ([\#12466](matrix-org/synapse#12466)) - complement-synapse-workers: factor out separate entry point script. ([\#12467](matrix-org/synapse#12467)) - Back out experimental implementation of [MSC2314](matrix-org/matrix-spec-proposals#2314). ([\#12474](matrix-org/synapse#12474)) - Fix grammatical error in federation error response when the room version of a room is unknown. ([\#12483](matrix-org/synapse#12483)) - Remove unnecessary configuration overrides in tests. ([\#12511](matrix-org/synapse#12511)) - Refactor the relations code for clarity. ([\#12519](matrix-org/synapse#12519)) - Add type hints so `docker` and `stubs` directories pass `mypy --disallow-untyped-defs`. ([\#12528](matrix-org/synapse#12528)) - Update `delay_cancellation` to accept any awaitable, rather than just `Deferred`s. ([\#12468](matrix-org/synapse#12468)) - Handle cancellation in `EventsWorkerStore._get_events_from_cache_or_db`. ([\#12529](matrix-org/synapse#12529))
Discovered after much in-depth investigation in #12281. Adding an explicit
order_by
argument to change the ordering of results returned from room pagination queries and updating use in the sync & message handlers.Closes: #12281
Closes: #3305
Signed off by: Nick Mills-Barrett [email protected]
Pull Request Checklist
(run the linters)