Fix new scheduled tasks jumping the queue #17962

richvdh · 2024-11-26T13:12:40Z

Currently, when a new scheduled task is added and its scheduled time has already passed, we set it to ACTIVE. This is problematic, because it means it will jump the queue ahead of all other SCHEDULED tasks; furthermore, if the Synapse process gets restarted, it will jump ahead of any ACTIVE tasks which have been started but are taking a while to run.

Instead, we leave it set to SCHEDULED, but kick off a call to _launch_scheduled_tasks, which will decide if we actually have capacity to start a new task, and start the newly-added task if so.

richvdh · 2024-11-26T13:15:58Z

tests/util/test_task_scheduler.py

-        # We need to wait for the next run of the scheduler loop
-        self.reactor.advance((TaskScheduler.SCHEDULE_INTERVAL_MS / 1000))


this hasn't actually changed, I'm just tightening up the test a bit. We didn't need to wait for the next run of the loop, we just had to wait 0.1 seconds. Looks like this changed in matrix-org/synapse#16313 and matrix-org/synapse#16660 without the test being updated.

Currently, when a new scheduled task is added and its scheduled time has already passed, we set it to ACTIVE. This is problematic, because it means it will jump the queue ahead of all other SCHEDULED tasks; furthermore, if the Synapse process gets restarted, it will jump ahead of any ACTIVE tasks which have been started but are taking a while to run. Instead, we leave it set to SCHEDULED, but kick off a call to `_launch_scheduled_tasks`, which will decide if we actually have capacity to start a new task, and start the newly-added task if so.

MatMaul · 2024-11-28T09:26:22Z

synapse/replication/tcp/commands.py

@@ -495,7 +495,7 @@ def to_line(self) -> str:


 class NewActiveTaskCommand(_SimpleCommand):


Should we change the NAME to TASK_READY (and related methods) to match the new behavior ?

Pro it's cleaner, con we may lost a message when doing a rolling restart. Not sure it's a big deal since the scheduler will go over all the tasks in DB when starting up.

No strong opinions from me on this. Will await insight from @element-hq/synapse-core

I don't think its particularly useful for now, and I think we can always add it in later if it is useful (e.g. for metrics maybe?)

so what exactly are you suggesting, Erik? We remove the command altogether?

Ooops, sorry, misunderstood what was going on 🤦

Eh, I don't mind. No one really ever sees what's on the wire anyway 🤷

erikjohnston · 2024-11-28T15:26:37Z

synapse/util/task_scheduler.py

-        # Don't bother trying to launch new tasks if we're already at capacity.
-        if len(self._running_tasks) >= TaskScheduler.MAX_CONCURRENT_RUNNING_TASKS:
-            return
-


Any reason to remove this? It does avoid spinning up a background process, even if its redundant?

I don't quite follow.

This method was previously (only) called when a NEW_ACTIVE_TASK replication command was received, which (previously) meant that another worker had added a new scheduled task and set it to ACTIVE immediately, so we should start running it if possible.

The problem is that now, the other worker doesn't make the decision about whether to set the task to ACTIVE or not -- we first have to check how many tasks are already running, so the other worker has to delegate the SCHEDULED -> ACTIVE transition to the background-tasks worker. Soooo, NEW_ACTIVE_TASK now just means "there is a new task, and it its scheduled time has already passed", which is a hint to the background-tasks worker that it might want to run its scheduler now rather than waiting 60s for it to come round again.

All of which is a long way of saying: launch_task_by_id isn't very useful any more.

ohh, it was just the if len(self._running_tasks) >= TaskScheduler.MAX_CONCURRENT_RUNNING_TASKS you were talking about?

Right, yes, we could reinstate that. Bear with me.

065b11b changes this for both the 60s timer and for the NEW_ACTIVE_TASK command handler.

richvdh · 2024-11-28T16:36:25Z

synapse/util/task_scheduler.py

+    def on_new_task(self, task_id: str) -> None:
+        """Handle a notification that a new ready-to-run task has been added to the queue"""
+        # Just run the scheduler
+        self._launch_scheduled_tasks()

    @wrap_as_background_process("launch_scheduled_tasks")


can I just say: I am not a fan of this decorator. I keep forgetting that _launch_scheduled_tasks runs as a background process.

Karinza38 · 2024-11-28T18:33:52Z

d80cd57

Currently, when a new scheduled task is added and its scheduled time has already passed, we set it to ACTIVE. This is problematic, because it means it will jump the queue ahead of all other SCHEDULED tasks; furthermore, if the Synapse process gets restarted, it will jump ahead of any ACTIVE tasks which have been started but are taking a while to run. Instead, we leave it set to SCHEDULED, but kick off a call to `_launch_scheduled_tasks`, which will decide if we actually have capacity to start a new task, and start the newly-added task if so.

This release contains the security fixes from [v1.120.2](https://github.com/element-hq/synapse/releases/tag/v1.120.2). - Fix release process to not create duplicate releases. ([\#18025](element-hq/synapse#18025)) - Support for [MSC4190](matrix-org/matrix-spec-proposals#4190): device management for Application Services. ([\#17705](element-hq/synapse#17705)) - Update [MSC4186](matrix-org/matrix-spec-proposals#4186) Sliding Sync to include invite, ban, kick, targets when `$LAZY`-loading room members. ([\#17947](element-hq/synapse#17947)) - Use stable `M_USER_LOCKED` error code for locked accounts, as per [Matrix 1.12](https://spec.matrix.org/v1.12/client-server-api/#account-locking). ([\#17965](element-hq/synapse#17965)) - [MSC4076](matrix-org/matrix-spec-proposals#4076): Add `disable_badge_count` to pusher configuration. ([\#17975](element-hq/synapse#17975)) - Fix long-standing bug where read receipts could get overly delayed being sent over federation. ([\#17933](element-hq/synapse#17933)) - Add OIDC example configuration for Forgejo (fork of Gitea). ([\#17872](element-hq/synapse#17872)) - Link to element-docker-demo from contrib/docker*. ([\#17953](element-hq/synapse#17953)) - [MSC4108](matrix-org/matrix-spec-proposals#4108): Add a `Content-Type` header on the `PUT` response to work around a faulty behavior in some caching reverse proxies. ([\#17253](element-hq/synapse#17253)) - Fix incorrect comment in new schema delta. ([\#17936](element-hq/synapse#17936)) - Raise setuptools_rust version cap to 1.10.2. ([\#17944](element-hq/synapse#17944)) - Enable encrypted appservice related experimental features in the complement docker image. ([\#17945](element-hq/synapse#17945)) - Return whether the user is suspended when querying the user account in the Admin API. ([\#17952](element-hq/synapse#17952)) - Fix new scheduled tasks jumping the queue. ([\#17962](element-hq/synapse#17962)) - Bump pyo3 and dependencies to v0.23.2. ([\#17966](element-hq/synapse#17966)) - Update setuptools-rust and fix building abi3 wheels in latest version. ([\#17969](element-hq/synapse#17969)) - Consolidate SSO redirects through `/_matrix/client/v3/login/sso/redirect(/{idpId})`. ([\#17972](element-hq/synapse#17972)) - Fix Docker and Complement config to be able to use `public_baseurl`. ([\#17986](element-hq/synapse#17986)) - Fix building wheels for MacOS which was temporarily disabled in Synapse 1.120.2. ([\#17993](element-hq/synapse#17993)) - Fix release process to not create duplicate releases. ([\#17970](element-hq/synapse#17970), [\#17995](element-hq/synapse#17995)) * Bump bytes from 1.8.0 to 1.9.0. ([\#17982](element-hq/synapse#17982)) * Bump pysaml2 from 7.3.1 to 7.5.0. ([\#17978](element-hq/synapse#17978)) * Bump serde_json from 1.0.132 to 1.0.133. ([\#17939](element-hq/synapse#17939)) * Bump tomli from 2.0.2 to 2.1.0. ([\#17959](element-hq/synapse#17959)) * Bump tomli from 2.1.0 to 2.2.1. ([\#17979](element-hq/synapse#17979)) * Bump tornado from 6.4.1 to 6.4.2. ([\#17955](element-hq/synapse#17955))

richvdh requested a review from a team as a code owner November 26, 2024 13:12

richvdh commented Nov 26, 2024

View reviewed changes

richvdh force-pushed the rav/schedule_ready_tasks branch from 2865a43 to afb7d08 Compare November 26, 2024 15:43

MatMaul reviewed Nov 28, 2024

View reviewed changes

erikjohnston approved these changes Nov 28, 2024

View reviewed changes

richvdh commented Nov 28, 2024

View reviewed changes

Avoid starting bg process if we are at capacity

065b11b

erikjohnston enabled auto-merge (squash) November 28, 2024 16:51

erikjohnston merged commit d80cd57 into develop Nov 28, 2024
39 checks passed

erikjohnston deleted the rav/schedule_ready_tasks branch November 28, 2024 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix new scheduled tasks jumping the queue #17962

Fix new scheduled tasks jumping the queue #17962

richvdh commented Nov 26, 2024

richvdh Nov 26, 2024

MatMaul Nov 28, 2024

richvdh Nov 28, 2024

erikjohnston Nov 28, 2024

richvdh Nov 28, 2024

erikjohnston Nov 28, 2024

erikjohnston Nov 28, 2024

richvdh Nov 28, 2024

richvdh Nov 28, 2024

richvdh Nov 28, 2024

richvdh Nov 28, 2024 •

edited

Loading

Karinza38 commented Nov 28, 2024

		# We need to wait for the next run of the scheduler loop
		self.reactor.advance((TaskScheduler.SCHEDULE_INTERVAL_MS / 1000))

		@@ -495,7 +495,7 @@ def to_line(self) -> str:


		class NewActiveTaskCommand(_SimpleCommand):

Fix new scheduled tasks jumping the queue #17962

Fix new scheduled tasks jumping the queue #17962

Conversation

richvdh commented Nov 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richvdh Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Karinza38 commented Nov 28, 2024

richvdh Nov 28, 2024 •

edited

Loading