Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Don't schedule an async task on every sync #16312

Merged
merged 2 commits into from
Sep 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/16312.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Delete device messages asynchronously and in staged batches using the task scheduler.
37 changes: 26 additions & 11 deletions synapse/handlers/sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -362,21 +362,36 @@ async def _wait_for_sync_for_user(
# (since we now know that the device has received them)
if since_token is not None:
since_stream_id = since_token.to_device_key
# Delete device messages asynchronously and in batches using the task scheduler
await self._task_scheduler.schedule_task(
DELETE_DEVICE_MSGS_TASK_NAME,
resource_id=sync_config.device_id,
params={
"user_id": sync_config.user.to_string(),
"device_id": sync_config.device_id,
"up_to_stream_id": since_stream_id,
},
# Fast path: delete a limited number of to-device messages up front.
# We do this to avoid the overhead of scheduling a task for every
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this overhead entail? I would have guessed we're adding a DB row and sending out a replication notification, which sounds lightweight---but haven't been following.

Other thoughts:

  • should we have a metric for number of tasks spawned?
  • for this to make sense we need to hit the fast past most of the time. How confident are we that this is the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this overhead entail? I would have guessed we're adding a DB row and sending out a replication notification, which sounds lightweight---but haven't been following.

Yup, plus then starting a background process for each one, etc. This is fairly lighweight, but we can end up doing this for almost every sync request, so it does add up. Background processes in particular (with all their log context and metrics) aren't free.

should we have a metric for number of tasks spawned?

Yes. Though we might already.

for this to make sense we need to hit the fast past most of the time. How confident are we that this is the case?

Very, as we will hit this for every sync pretty much. Most syncs won't delete any devices.

# sync.
device_deletion_limit = 100
deleted = await self.store.delete_messages_for_device(
sync_config.user.to_string(),
sync_config.device_id,
since_stream_id,
limit=device_deletion_limit,
)
logger.debug(
"Deletion of to-device messages up to %d scheduled",
since_stream_id,
"Deleted %d to-device messages up to %d", deleted, since_stream_id
)

# If we hit the limit, schedule a background task to delete the rest.
if deleted >= device_deletion_limit:
await self._task_scheduler.schedule_task(
DELETE_DEVICE_MSGS_TASK_NAME,
resource_id=sync_config.device_id,
params={
"user_id": sync_config.user.to_string(),
"device_id": sync_config.device_id,
"up_to_stream_id": since_stream_id,
},
)
logger.debug(
"Deletion of to-device messages up to %d scheduled",
since_stream_id,
)

if timeout == 0 or since_token is None or full_state:
# we are going to return immediately, so don't bother calling
# notifier.wait_for_events.
Expand Down