-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close race condition in procrastinate_fetch_job #231
Conversation
Codecov Report
@@ Coverage Diff @@
## master #231 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 24 24
Lines 1100 1100
Branches 135 135
=========================================
Hits 1100 1100 Continue to review full report at Codecov.
|
I unchecked documentation. I think we should have a dedicated page on serial locking. We should also explicitely mention that this is the current way of implementing a chain. |
We have https://procrastinate.readthedocs.io/en/stable/howto/locks.html already, and I think it's quite explicit. |
This comment has been minimized.
This comment has been minimized.
@elemoine We should remove this section of the doc : https://github.com/peopledoc/procrastinate/blob/master/docs/discussions.rst#the-procrastinate_job_locks-table, and maybe update the mention |
👍
I do not see the problem with that statement, but that may be me. |
Not wrong, but it appeared not very clear for me. I could suggest something like : |
@@ -115,9 +115,7 @@ their identifiers could be used (there's no hard limit on the length of a lock s | |||
but stay reasonable). | |||
|
|||
A task can only take a single lock so there's no dead-lock scenario possible where two | |||
running tasks are waiting one another. That being said, if a worker dies with a lock, it | |||
will be up to you to free it. If the task fails but the worker survives though, the | |||
lock will be freed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It this outdated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could rephrasing it as:
If a worker is killed without ending its job, following jobs with the same lock will not run until the interrupted job is either manually set to "failed" or "succeeded". If a job simply fails, following jobs with the same locks may run.
@@ -0,0 +1,48 @@ | |||
DROP TABLE IF EXISTS procrastinate_job_locks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll be explicit in the changelog that with those migrations, the workers will need to be stopped when running the migration.
Of course, one can always write better migrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small suggestion...
docs/discussions.rst
Outdated
running tasks are waiting one another. That being said, if a worker dies with a lock, it | ||
will be up to you to free it. If the task fails but the worker survives though, the | ||
lock will be freed. | ||
running tasks are waiting one another. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running tasks are waiting one another. | |
running tasks are waiting for one another. |
This closes a race condition in the procrastinate_fetch_job plpgsql function, where jobs sharing the same lock can be run out of order. With this commit jobs with the same lock are **always** executed in order, whatever their ETAs and queues. In effect: - if job A in queue 1 (id 1) and job B in queue 2 (id 2) have the same lock, and no workers process queue 1, then job B won't be executed, because job A must be executed first - if job A is deferred with ETA 1 year, no other jobs with the same lock will be executed for 1 year The lock name may change from "lock" to "serial lock" in the future.
The result is the same, but it makes the query more easily readable.
We wanted to rename the lock "serial lock" but this will change a LOT of code, so it will have its own PR. |
This closes a race condition in the procrastinate_fetch_job plpgsql function, where jobs sharing the same lock can be run out of order.
With this commit jobs with the same lock are always executed in order, whatever their ETAs and queues.
In effect:
The lock name may change from "lock" to "serial lock" in the future.
Closes #212.
Successful PR Checklist: