Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error removing jobs from the reserved queue #47356

Closed
andrej-schaefer opened this issue Jun 5, 2023 · 15 comments
Closed

Error removing jobs from the reserved queue #47356

andrej-schaefer opened this issue Jun 5, 2023 · 15 comments

Comments

@andrej-schaefer
Copy link

Laravel Version

10.13.1

PHP Version

8.2.6

Database Driver & Version

No response

Description

We encountered problems with job uniqueness. The jobs, which are configured to have a maximum of 1 attempt, failed multiple times and couldn't be written to the failed_jobs table due to the failed_jobs_uuid_unique constraint.
It seems that the failed_jobs_uuid_unique constraint may be a separate bug for jobs that have a "ShouldBeUnique" implementation, and there is an issue with error handling when they fail multiple times. Upon further investigation through the stack trace, we found that the issue was caused by the MaxAttemptsExceededException.

[2023-06-05 13:59:10] production.ERROR: SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'b0985078-42cd-4512-a70f-91fad1f69ca5' for key 'failed_jobs.failed_jobs_uuid_unique' (Connection: mysql, SQL: insert into `failed_jobs` (`uuid`, `connection`, `queue`, `payload`, `exception`, `failed_at`) values (b0985078-42cd-4512-a70f-91fad1f69ca5, redis, default, {"backoff":null,"uuid":"b0985078-42cd-4512-a70f-91fad1f69ca5","timeout":290,"tags":{},"id":"b0985078-42cd-4512-a70f-91fad1f69ca5","job":"Illuminate\\Queue\\CallQueuedHandler@call","pushedAt":"1685973367.0205","maxExceptions":null,"maxTries":1,"data":{"command":"O:47:\"App\\Domain\\Caching\\Jobs\\FullyCacheIdentifierJob\":1:{s:56:\"\u0000App\\Domain\\Caching\\Jobs\\FullyCacheIdentifierJob\u0000refUuid\";s:36:\"63f3cb9c-66f4-47af-a915-882d37e999d6\";}","commandName":"App\\Domain\\Caching\\Jobs\\FullyCacheIdentifierJob"},"failOnTimeout":true,"silenced":false,"retryUntil":null,"type":"job","displayName":"App\\Domain\\Caching\\Jobs\\FullyCacheIdentifierJob","attempts":2}, Illuminate\Queue\MaxAttemptsExceededException: App\Domain\Caching\Jobs\FullyCacheIdentifierJob has been attempted too many times or run too long. The job may have previously timed out. in /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php:780
Stack trace:
#0 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(514): Illuminate\Queue\Worker->maxAttemptsExceededException(Object(Illuminate\Queue\Jobs\RedisJob))
#1 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(423): Illuminate\Queue\Worker->markJobAsFailedIfAlreadyExceedsMaxAttempts('redis', Object(Illuminate\Queue\Jobs\RedisJob), 1)
#2 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(384): Illuminate\Queue\Worker->process('redis', Object(Illuminate\Queue\Jobs\RedisJob), Object(Illuminate\Queue\WorkerOptions))
#3 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Worker.php(175): Illuminate\Queue\Worker->runJob(Object(Illuminate\Queue\Jobs\RedisJob), 'redis', Object(Illuminate\Queue\WorkerOptions))
#4 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(136): Illuminate\Queue\Worker->daemon('redis', 'default', Object(Illuminate\Queue\WorkerOptions))
#5 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Queue/Console/WorkCommand.php(119): Illuminate\Queue\Console\WorkCommand->runWorker('redis', 'default')
#6 /home/www-data/api/vendor/laravel/horizon/src/Console/WorkCommand.php(51): Illuminate\Queue\Console\WorkCommand->handle()
#7 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(36): Laravel\Horizon\Console\WorkCommand->handle()
#8 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/Util.php(41): Illuminate\Container\BoundMethod::Illuminate\Container\{closure}()
#9 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(93): Illuminate\Container\Util::unwrapIfClosure(Object(Closure))
#10 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(35): Illuminate\Container\BoundMethod::callBoundMethod(Object(Illuminate\Foundation\Application), Array, Object(Closure))
#11 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Container/Container.php(661): Illuminate\Container\BoundMethod::call(Object(Illuminate\Foundation\Application), Array, Array, NULL)
#12 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Console/Command.php(194): Illuminate\Container\Container->call(Array)
#13 /home/www-data/api/vendor/symfony/console/Command/Command.php(312): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
#14 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Console/Command.php(163): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
#15 /home/www-data/api/vendor/symfony/console/Application.php(1040): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#16 /home/www-data/api/vendor/symfony/console/Application.php(314): Symfony\Component\Console\Application->doRunCommand(Object(Laravel\Horizon\Console\WorkCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#17 /home/www-data/api/vendor/symfony/console/Application.php(168): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#18 /home/www-data/api/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(200): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#19 /home/www-data/api/artisan(35): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#20 {main}, 2023-06-05 14:00:40)) {"exception":{"errorInfo":["23000",1062,"Duplicate entry '3ca6725e-ed61-45b1-8df2-109631e964bb' for key 'failed_jobs.failed_jobs_uuid_unique'"],"connectionName":"mysql"}}

After removing the unique constraint from the database (for easier debugging), we narrowed down the error to the following log line:

[2023-06-05 13:43:54] production.ERROR: App\Domain\Caching\Jobs\BuildCacheJob has been attempted too many times or run too long. The job may have previously timed out. {"exception":[]}

After diving deeper into debugging, we discovered that the jobs were not being removed from the "reserved" queue correctly, and the workers were executing them again, resulting in the MaxAttemptsExceededException.
Further examination of the source code revealed a failed removal of entries in \Illuminate\Queue\RedisQueue::deleteReserved. The result of the execution was always 0, indicating that the entry was not being removed.

After spending more hours experimenting with different Redis clients and configurations, we found that setting either compression and/or serialization in the Redis database configuration resulted in a faulty zrem request to Redis. As a workaround, we disabled the two options for the default database, which is used by the worker, and did not investigate the problem further.

Our hypothesis is that the lack of compression or serialization in the Redis client causes a mismatch between the member in the zrem request, leading to the entry not being deleted.
Workaround was to disable serializer and compression

'redis' => [

    'client' => env('REDIS_CLIENT', 'phpredis'),

    'options' => [
        'cluster' => env('REDIS_CLUSTER', 'redis'),
        'prefix' => 'XXX:',
        'serializer' => 2, // Redis::SERIALIZER_IGBINARY,
        'compression' => 2 // Redis::COMPRESSION_ZSTD,
    ],

    'default' => [
        'url' => env('REDIS_URL'),
        'host' => env('REDIS_HOST', '127.0.0.1'),
        'password' => env('REDIS_PASSWORD'),
        'port' => env('REDIS_PORT', '6379'),
        'database' => env('REDIS_DB', '0'),
        'scheme' => env('REDIS_SCHEME', 'tcp'),
        'options' => [
            'serializer' => 0, // disabled global configuration due to failing removing jobs from reserver queue
            'compression' => 0, // disabled global configuration due to failing removing jobs from reserver queue
        ],
    ],

    'cache' => [
        'url' => env('REDIS_URL'),
        'host' => env('REDIS_HOST', '127.0.0.1'),
        'password' => env('REDIS_PASSWORD'),
        'port' => env('REDIS_PORT', '6379'),
        'database' => env('REDIS_CACHE_DB', '1'),
        'scheme' => env('REDIS_SCHEME', 'tcp'),
    ],
],

Hope i haven't forgotten anything :)
Thanks

Steps To Reproduce

Configure redis DB
phpredis as client
serializer option 2 (Redis::SERIALIZER_IGBINARY)
compression option set to 2 (Redis::COMPRESSION_ZSTD)

'redis' => [

    'client' => env('REDIS_CLIENT', 'phpredis'),

    'options' => [
        'cluster' => env('REDIS_CLUSTER', 'redis'),
        'prefix' => 'XXX:',
        'serializer' => 2, // Redis::SERIALIZER_IGBINARY,
        'compression' => 2 // Redis::COMPRESSION_ZSTD,
    ],

    'default' => [
        'url' => env('REDIS_URL'),
        'host' => env('REDIS_HOST', '127.0.0.1'),
        'password' => env('REDIS_PASSWORD'),
        'port' => env('REDIS_PORT', '6379'),
        'database' => env('REDIS_DB', '0'),
        'scheme' => env('REDIS_SCHEME', 'tcp'),
    ],

    'cache' => [
        'url' => env('REDIS_URL'),
        'host' => env('REDIS_HOST', '127.0.0.1'),
        'password' => env('REDIS_PASSWORD'),
        'port' => env('REDIS_PORT', '6379'),
        'database' => env('REDIS_CACHE_DB', '1'),
        'scheme' => env('REDIS_SCHEME', 'tcp'),
    ],
],

Create a simple job with tries set to 1 (I have also added the ShouldBeUnique to provocate the DB constraint error which from my point of view is a separate error)

class SomeJob implements ShouldQueue, ShouldBeUnique
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public $tries = 1;

    public function handle(): void
    {
        sleep(1);
    }

    public function uniqueId(): string
    {
        return 7411;
    }
}

Dispatch the job and wait for second execution
As described above this will result in multiple executions on the second worker run due to the job was not removed from the reserved queue

@driesvints
Copy link
Member

Heya, thanks for reporting.

We'll need more info and/or code to debug this further. Can you please create a repository with the command below, commit the code that reproduces the issue as one separate commit on the main/master branch and share the repository here? Please make sure that you have the latest version of the Laravel installer in order to run this command. Please also make sure you have both Git & the GitHub CLI tool properly set up.

laravel new bug-report --github="--public"

Please do not amend and create a separate commit with your custom changes. After you've posted the repository, we'll try to reproduce the issue.

Thanks!

@tillkruss
Copy link
Contributor

Is this reproducible when the serializer and compression options are disabled?

@driesvints
Copy link
Member

@tillkruss seems that works:

Screenshot 2023-06-06 at 15 22 16

@tillkruss
Copy link
Contributor

My first thought is Lua scripts. They are used by the Queue component, but will disregard any serializer or compression options that PhpRedis may use. So PhpRedis may set queue:job-123 to {foo: "bar"} and then the Lua script would consider it gibberish and fail.

For this scenario the only solution is: Don't use data mutations for queue Redis connections.

Alternatively, refactor everything to not use Lua (not a bad idea) and use atomic MULTI transactions.

It may be good to actually unset the two Redis::OPT_* options on the connection in there queue module.

@driesvints
Copy link
Member

@tillkruss nice finds. Thanks!

@andrej-schaefer are you willing to work on a PR?

@andrej-schaefer
Copy link
Author

@driesvints im sorry but im not
Im not so deep in laravel that i can somehow find the error fast enough, and I have very limited time
What i can offer is discussion and analysis on proposals :)

BTW: do i have to create the commit for the reproduction if it is already reproduced :), due to my limited time it will take a while.
And keep in mind the second error regarding the DB constraint on unique jobs and multiple fails (If you want i can prepare a ticket for it separate) Or you say it is not a bug and i have misunderstood the documentation for unique jobs?

@driesvints
Copy link
Member

No worries. I'll keep this open then and see if anyone can help here.

@github-actions
Copy link

github-actions bot commented Jun 7, 2023

Thank you for reporting this issue!

As Laravel is an open source project, we rely on the community to help us diagnose and fix issues as it is not possible to research and fix every issue reported to us via GitHub.

If possible, please make a pull request fixing the issue you have described, along with corresponding tests. All pull requests are promptly reviewed by the Laravel team.

Thank you!

@driesvints
Copy link
Member

And keep in mind the second error regarding the DB constraint on unique jobs and multiple fails (If you want i can prepare a ticket for it separate) Or you say it is not a bug and i have misunderstood the documentation for unique jobs?

Maybe that's something for the docs?

@andrej-schaefer
Copy link
Author

I mean, when creating unique jobs through ShouldBeUnique interface it means it is unique for the current execution, that multiple workers are not picking a job with the same jobId in parallel?
If so then if the same job with the same jobId fails multiple times (f.e. first run fail, second run succeed, third run fail again) it will result in a unqie constraint error
I think it is solvable by using the WithoutOverlapping middleware due to this not effect the jobID.
As I said, maybe i misunderstand the ShouldBeUnique interface, but even then it is a bug (I think :) )

@Huggyduggy
Copy link

@tillkruss : Would this be something relay.so, being the successor of phpredis, would/could handle differently?

Besides, I'd guess there's quite a connection to #40569 & #41718 , which sadly wasn't tackled back than.

@tillkruss
Copy link
Contributor

@Huggyduggy: Theoretically, yes. But I think it's much better to unset any compression/serializer on the queue Redis connection, that will solve it and it's hard to do, just needs a PR?

@Huggyduggy
Copy link

Thanks @tillkruss - I've drafted a PR, fixing the same issue as reported by @andrej-schaefer. I'll submit it against the current 10.x branch tomorrow.

Sad that it's prohibiting using compression within queues, guess we'll stick to manually compressing our larger objects within our event listeners & jobs prior to dispatching 😌

@jameshulse
Copy link
Contributor

Can somebody advise whether this issue would also affect Laravel Horizon. I can't reproduce the issue from a quick test.

@driesvints
Copy link
Member

Looks like Taylor noted he'd appreciate a note being added to the docs: #48180 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants