Error removing jobs from the reserved queue #47356

andrej-schaefer · 2023-06-05T16:09:42Z

Laravel Version

10.13.1

PHP Version

8.2.6

Database Driver & Version

No response

Description

We encountered problems with job uniqueness. The jobs, which are configured to have a maximum of 1 attempt, failed multiple times and couldn't be written to the failed_jobs table due to the failed_jobs_uuid_unique constraint.
It seems that the failed_jobs_uuid_unique constraint may be a separate bug for jobs that have a "ShouldBeUnique" implementation, and there is an issue with error handling when they fail multiple times. Upon further investigation through the stack trace, we found that the issue was caused by the MaxAttemptsExceededException.

[2023-06-05 13:59:10] production.ERROR: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'b0985078-42cd-4512-a70f-91fad1f69ca5' for key 'failed_jobs.failed_jobs_uuid_unique' (Connection: mysql, SQL: insert into `failed_jobs` (`uuid`, `connection`, `queue`, `payload`, `exception`, `failed_at`) values (b0985078-42cd-4512-a70f-91fad1f69ca5, redis, default, {"backoff":null,"uuid":"b0985078-42cd-4512-a70f-91fad1f69ca5","timeout":290,"tags":{},"id":"b0985078-42cd-4512-a70f-91fad1f69ca5","job":"Illuminate\\Queue\\CallQueuedHandler@call","pushedAt":"1685973367.0205","maxExceptions":null,"maxTries":1,"data":{"command":"O:47:\"App\\Domain\\Caching\\Jobs\\FullyCacheIdentifierJob\":1:{s:56:\"\u0000App\\Domain\\Caching\\Jobs\\FullyCacheIdentifierJob\u0000refUuid\";s:36:\"63f3cb9c-66f4-47af-a915-882d37e999d6\";}","commandName":"App\\Domain\\Caching\\Jobs\\FullyCacheIdentifierJob"},"failOnTimeout":true,"silenced":false,"retryUntil":null,"type":"job","displayName":"App\\Domain\\Caching\\Jobs\\FullyCacheIdentifierJob","attempts":2}, Illuminate\Queue\MaxAttemptsExceededException: App\Domain\Caching\Jobs\FullyCacheIdentifierJob has been attempted too many times or run too long. The job may have previously timed out. in /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:780
Stack trace:
#0 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(514): Illuminate\Queue\Worker->maxAttemptsExceededException(Object(Illuminate\Queue\Jobs\RedisJob))
#1 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(423): Illuminate\Queue\Worker->markJobAsFailedIfAlreadyExceedsMaxAttempts('redis', Object(Illuminate\Queue\Jobs\RedisJob), 1)
#2 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(384): Illuminate\Queue\Worker->process('redis', Object(Illuminate\Queue\Jobs\RedisJob), Object(Illuminate\Queue\WorkerOptions))
#3 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(175): Illuminate\Queue\Worker->runJob(Object(Illuminate\Queue\Jobs\RedisJob), 'redis', Object(Illuminate\Queue\WorkerOptions))
#4 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(136): Illuminate\Queue\Worker->daemon('redis', 'default', Object(Illuminate\Queue\WorkerOptions))
#5 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(119): Illuminate\Queue\Console\WorkCommand->runWorker('redis', 'default')
#6 /home/www-data/api/vendor/laravel/horizon/src/Console/WorkCommand.php(51): Illuminate\Queue\Console\WorkCommand->handle()
#7 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(36): Laravel\Horizon\Console\WorkCommand->handle()
#8 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/Util.php(41): Illuminate\Container\BoundMethod::Illuminate\Container\{closure}()
#9 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(93): Illuminate\Container\Util::unwrapIfClosure(Object(Closure))
#10 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(35): Illuminate\Container\BoundMethod::callBoundMethod(Object(Illuminate\Foundation\Application), Array, Object(Closure))
#11 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/Container.php(661): Illuminate\Container\BoundMethod::call(Object(Illuminate\Foundation\Application), Array, Array, NULL)
#12 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Console/Command.php(194): Illuminate\Container\Container->call(Array)
#13 /home/www-data/api/vendor/symfony/console/Command/Command.php(312): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
#14 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Console/Command.php(163): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
#15 /home/www-data/api/vendor/symfony/console/Application.php(1040): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#16 /home/www-data/api/vendor/symfony/console/Application.php(314): Symfony\Component\Console\Application->doRunCommand(Object(Laravel\Horizon\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#17 /home/www-data/api/vendor/symfony/console/Application.php(168): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#18 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(200): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/www-data/api/artisan(35): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 {main}, 2023-06-05 14:00:40)) {"exception":{"errorInfo":["23000",1062,"Duplicate entry '3ca6725e-ed61-45b1-8df2-109631e964bb' for key 'failed_jobs.failed_jobs_uuid_unique'"],"connectionName":"mysql"}}

After removing the unique constraint from the database (for easier debugging), we narrowed down the error to the following log line:

[2023-06-05 13:43:54] production.ERROR: App\Domain\Caching\Jobs\BuildCacheJob has been attempted too many times or run too long. The job may have previously timed out. {"exception":[]}

After diving deeper into debugging, we discovered that the jobs were not being removed from the "reserved" queue correctly, and the workers were executing them again, resulting in the MaxAttemptsExceededException.
Further examination of the source code revealed a failed removal of entries in \Illuminate\Queue\RedisQueue::deleteReserved. The result of the execution was always 0, indicating that the entry was not being removed.

After spending more hours experimenting with different Redis clients and configurations, we found that setting either compression and/or serialization in the Redis database configuration resulted in a faulty zrem request to Redis. As a workaround, we disabled the two options for the default database, which is used by the worker, and did not investigate the problem further.

Our hypothesis is that the lack of compression or serialization in the Redis client causes a mismatch between the member in the zrem request, leading to the entry not being deleted.
Workaround was to disable serializer and compression

'redis' => [

    'client' => env('REDIS_CLIENT', 'phpredis'),

    'options' => [
        'cluster' => env('REDIS_CLUSTER', 'redis'),
        'prefix' => 'XXX:',
        'serializer' => 2, // Redis::SERIALIZER_IGBINARY,
        'compression' => 2 // Redis::COMPRESSION_ZSTD,
    ],

    'default' => [
        'url' => env('REDIS_URL'),
        'host' => env('REDIS_HOST', '127.0.0.1'),
        'password' => env('REDIS_PASSWORD'),
        'port' => env('REDIS_PORT', '6379'),
        'database' => env('REDIS_DB', '0'),
        'scheme' => env('REDIS_SCHEME', 'tcp'),
        'options' => [
            'serializer' => 0, // disabled global configuration due to failing removing jobs from reserver queue
            'compression' => 0, // disabled global configuration due to failing removing jobs from reserver queue
        ],
    ],

    'cache' => [
        'url' => env('REDIS_URL'),
        'host' => env('REDIS_HOST', '127.0.0.1'),
        'password' => env('REDIS_PASSWORD'),
        'port' => env('REDIS_PORT', '6379'),
        'database' => env('REDIS_CACHE_DB', '1'),
        'scheme' => env('REDIS_SCHEME', 'tcp'),
    ],
],

Hope i haven't forgotten anything :)
Thanks

Steps To Reproduce

Configure redis DB
phpredis as client
serializer option 2 (Redis::SERIALIZER_IGBINARY)
compression option set to 2 (Redis::COMPRESSION_ZSTD)

'redis' => [

    'client' => env('REDIS_CLIENT', 'phpredis'),

    'options' => [
        'cluster' => env('REDIS_CLUSTER', 'redis'),
        'prefix' => 'XXX:',
        'serializer' => 2, // Redis::SERIALIZER_IGBINARY,
        'compression' => 2 // Redis::COMPRESSION_ZSTD,
    ],

    'default' => [
        'url' => env('REDIS_URL'),
        'host' => env('REDIS_HOST', '127.0.0.1'),
        'password' => env('REDIS_PASSWORD'),
        'port' => env('REDIS_PORT', '6379'),
        'database' => env('REDIS_DB', '0'),
        'scheme' => env('REDIS_SCHEME', 'tcp'),
    ],

    'cache' => [
        'url' => env('REDIS_URL'),
        'host' => env('REDIS_HOST', '127.0.0.1'),
        'password' => env('REDIS_PASSWORD'),
        'port' => env('REDIS_PORT', '6379'),
        'database' => env('REDIS_CACHE_DB', '1'),
        'scheme' => env('REDIS_SCHEME', 'tcp'),
    ],
],

Create a simple job with tries set to 1 (I have also added the ShouldBeUnique to provocate the DB constraint error which from my point of view is a separate error)

class SomeJob implements ShouldQueue, ShouldBeUnique
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public $tries = 1;

    public function handle(): void
    {
        sleep(1);
    }

    public function uniqueId(): string
    {
        return 7411;
    }
}

Dispatch the job and wait for second execution
As described above this will result in multiple executions on the second worker run due to the job was not removed from the reserved queue

The text was updated successfully, but these errors were encountered:

driesvints · 2023-06-06T12:36:03Z

Heya, thanks for reporting.

We'll need more info and/or code to debug this further. Can you please create a repository with the command below, commit the code that reproduces the issue as one separate commit on the main/master branch and share the repository here? Please make sure that you have the latest version of the Laravel installer in order to run this command. Please also make sure you have both Git & the GitHub CLI tool properly set up.

laravel new bug-report --github="--public"

Please do not amend and create a separate commit with your custom changes. After you've posted the repository, we'll try to reproduce the issue.

Thanks!

tillkruss · 2023-06-06T13:12:14Z

Is this reproducible when the serializer and compression options are disabled?

driesvints · 2023-06-06T13:26:12Z

@tillkruss seems that works:

tillkruss · 2023-06-06T17:27:36Z

My first thought is Lua scripts. They are used by the Queue component, but will disregard any serializer or compression options that PhpRedis may use. So PhpRedis may set queue:job-123 to {foo: "bar"} and then the Lua script would consider it gibberish and fail.

For this scenario the only solution is: Don't use data mutations for queue Redis connections.

Alternatively, refactor everything to not use Lua (not a bad idea) and use atomic MULTI transactions.

It may be good to actually unset the two Redis::OPT_* options on the connection in there queue module.

driesvints · 2023-06-07T06:40:04Z

@tillkruss nice finds. Thanks!

@andrej-schaefer are you willing to work on a PR?

andrej-schaefer · 2023-06-07T06:54:10Z

@driesvints im sorry but im not
Im not so deep in laravel that i can somehow find the error fast enough, and I have very limited time
What i can offer is discussion and analysis on proposals :)

BTW: do i have to create the commit for the reproduction if it is already reproduced :), due to my limited time it will take a while.
And keep in mind the second error regarding the DB constraint on unique jobs and multiple fails (If you want i can prepare a ticket for it separate) Or you say it is not a bug and i have misunderstood the documentation for unique jobs?

driesvints · 2023-06-07T07:31:50Z

No worries. I'll keep this open then and see if anyone can help here.

github-actions · 2023-06-07T07:32:07Z

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

driesvints · 2023-06-07T07:32:44Z

And keep in mind the second error regarding the DB constraint on unique jobs and multiple fails (If you want i can prepare a ticket for it separate) Or you say it is not a bug and i have misunderstood the documentation for unique jobs?

Maybe that's something for the docs?

andrej-schaefer · 2023-06-07T07:43:01Z

I mean, when creating unique jobs through ShouldBeUnique interface it means it is unique for the current execution, that multiple workers are not picking a job with the same jobId in parallel?
If so then if the same job with the same jobId fails multiple times (f.e. first run fail, second run succeed, third run fail again) it will result in a unqie constraint error
I think it is solvable by using the WithoutOverlapping middleware due to this not effect the jobID.
As I said, maybe i misunderstand the ShouldBeUnique interface, but even then it is a bug (I think :) )

Huggyduggy · 2023-06-21T14:59:32Z

@tillkruss : Would this be something relay.so, being the successor of phpredis, would/could handle differently?

Besides, I'd guess there's quite a connection to #40569 & #41718 , which sadly wasn't tackled back than.

tillkruss · 2023-06-21T16:06:19Z

@Huggyduggy: Theoretically, yes. But I think it's much better to unset any compression/serializer on the queue Redis connection, that will solve it and it's hard to do, just needs a PR?

Huggyduggy · 2023-06-21T18:28:50Z

Thanks @tillkruss - I've drafted a PR, fixing the same issue as reported by @andrej-schaefer. I'll submit it against the current 10.x branch tomorrow.

Sad that it's prohibiting using compression within queues, guess we'll stick to manually compressing our larger objects within our event listeners & jobs prior to dispatching 😌

jameshulse · 2023-11-03T00:46:14Z

Can somebody advise whether this issue would also affect Laravel Horizon. I can't reproduce the issue from a quick test.

driesvints · 2023-11-06T10:48:02Z

Looks like Taylor noted he'd appreciate a note being added to the docs: #48180 (comment).

driesvints added the needs more info label Jun 6, 2023

driesvints added help wanted and removed needs more info labels Jun 7, 2023

Rizky92 mentioned this issue Jun 18, 2023

Duplicate failed_jobs uuid error #47475

Closed

Huggyduggy mentioned this issue Jun 22, 2023

[10.x] Disable PhpRedis serialization & compression for RedisQueue #47531

Closed

SudoGetBeer mentioned this issue Aug 25, 2023

[10.x] Remove serializer and compression for RedisQueue #48180

Closed

driesvints closed this as completed Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error removing jobs from the reserved queue #47356

Error removing jobs from the reserved queue #47356

andrej-schaefer commented Jun 5, 2023

driesvints commented Jun 6, 2023

tillkruss commented Jun 6, 2023

driesvints commented Jun 6, 2023

tillkruss commented Jun 6, 2023

driesvints commented Jun 7, 2023

andrej-schaefer commented Jun 7, 2023

driesvints commented Jun 7, 2023

github-actions bot commented Jun 7, 2023

driesvints commented Jun 7, 2023

andrej-schaefer commented Jun 7, 2023

Huggyduggy commented Jun 21, 2023

tillkruss commented Jun 21, 2023

Huggyduggy commented Jun 21, 2023

jameshulse commented Nov 3, 2023

driesvints commented Nov 6, 2023

Error removing jobs from the reserved queue #47356

Error removing jobs from the reserved queue #47356

Comments

andrej-schaefer commented Jun 5, 2023

Laravel Version

PHP Version

Database Driver & Version

Description

Steps To Reproduce

driesvints commented Jun 6, 2023

tillkruss commented Jun 6, 2023

driesvints commented Jun 6, 2023

tillkruss commented Jun 6, 2023

driesvints commented Jun 7, 2023

andrej-schaefer commented Jun 7, 2023

driesvints commented Jun 7, 2023

github-actions bot commented Jun 7, 2023

driesvints commented Jun 7, 2023

andrej-schaefer commented Jun 7, 2023

Huggyduggy commented Jun 21, 2023

tillkruss commented Jun 21, 2023

Huggyduggy commented Jun 21, 2023

jameshulse commented Nov 3, 2023

driesvints commented Nov 6, 2023