Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: moveToFailed throws an exception when using Elasticache serverless #2787

Closed
amit-opus opened this issue Sep 26, 2024 · 10 comments
Closed
Labels
enhancement New feature or request

Comments

@amit-opus
Copy link

amit-opus commented Sep 26, 2024

Version

5.12.0

Platform

NodeJS

What happened?

We are using Elasticache Serverless instance (redis v7.1)
When adding a job to queue and the job fails an exception is thrown -

ReplyError: EXECABORT Transaction discarded because of previous errors.
    at parseError (bull-monitor/node_modules/redis-parser/lib/parser.js:179:12)
    at parseType (bull-monitor/node_modules/redis-parser/lib/parser.js:302:14) {
  command: { name: 'exec', args: [] },
  previousErrors: [
    ReplyError: ERR command not supported inside transaction
        at parseError (bull-monitor/node_modules/redis-parser/lib/parser.js:179:12)
        at parseType (bull-monitor/node_modules/redis-parser/lib/parser.js:302:14) {
      command: [Object]
    },
    ReplyError: ERR command not supported inside transaction
        at parseError (bull-monitor/node_modules/redis-parser/lib/parser.js:179:12)
        at parseType (bull-monitor/node_modules/redis-parser/lib/parser.js:302:14) {
      command: [Object]
    }
  ]
}

How to reproduce.

replace some-serverless-host with a relevant redis instance

import {Worker, Queue, UnrecoverableError} from 'bullmq';
import Redis from 'ioredis';

const clusterQueue = new Queue('test-queue', {
    prefix: '{bullMQ}',
    connection: new Redis.Cluster([
        {host: 'some-serverless-host', port: 6379},
    ], {
        dnsLookup: (address, callback) => callback(null, address),
        redisOptions: {
            tls: true,
        }
    })
})

export async function renderQueue() {
    await clusterQueue.add('name:some-name', 'some-job-data')
}

const WorkerQueue = new Worker('test-queue', async (job) => {
    throw new UnrecoverableError('test cluster exception')
}, {
    connection: new Redis.Cluster([
        {host: 'some-serverless-host', port: 6379},
    ], {
        dnsLookup: (address, callback) => callback(null, address),
        redisOptions: {
            tls: true,
        }
    }),
    prefix: '{bullMQ}'
})

WorkerQueue.on('waiting', () => console.log('waiting completed'))
WorkerQueue.on('completed', () => console.log('jobs completed'))
WorkerQueue.on('failed', () => console.log('failed completed'))

Relevant log output

ReplyError: EXECABORT Transaction discarded because of previous errors.
    at parseError (bull-monitor/node_modules/redis-parser/lib/parser.js:179:12)
    at parseType (bull-monitor/node_modules/redis-parser/lib/parser.js:302:14) {
  command: { name: 'exec', args: [] },
  previousErrors: [
    ReplyError: ERR command not supported inside transaction
        at parseError (bull-monitor/node_modules/redis-parser/lib/parser.js:179:12)
        at parseType (bull-monitor/node_modules/redis-parser/lib/parser.js:302:14) {
      command: [Object]
    },
    ReplyError: ERR command not supported inside transaction
        at parseError (bull-monitor/node_modules/redis-parser/lib/parser.js:179:12)
        at parseType (bull-monitor/node_modules/redis-parser/lib/parser.js:302:14) {
      command: [Object]
    }
  ]
}


### Code of Conduct

- [X] I agree to follow this project's Code of Conduct
@amit-opus amit-opus added the bug Something isn't working label Sep 26, 2024
@roggervalf
Copy link
Collaborator

hey, from what I can see from the stack trance. It pointa to bull-monitor and redis-parser internal

@roggervalf
Copy link
Collaborator

Another comment is that you must not use job names that includes : as we will throw an error.

@or-opus
Copy link

or-opus commented Oct 1, 2024

Hi @roggervalf
basically we are using BullMq with elasticache serverless
And for some reason we are getting that error, every time a task failed and tries to move to error
any idea why?
(in regular elasticache its working as expected)

@manast
Copy link
Contributor

manast commented Nov 16, 2024

ChatGPT told me the following: "Based on the detailed information you’ve provided, the error you’re encountering stems from using AWS ElastiCache Redis Serverless, which has certain limitations compared to standard Redis installations. Specifically, it does not support some commands that BullMQ relies on, such as EVAL and EVALSHA, especially within transactions. This incompatibility leads to the ERR command not supported inside transaction error when BullMQ tries to execute these commands."

So it seems that Elasticache server less does not support calling Lua scripts within a transaction, which is something that is used in moveToFailed. Although not used extensively, there are other parts where we use evalsha in multi/exec transactions, such as when adding jobs in bulk. The only way to support AWS elasticache server less would be to convert these transactions till pure Lua scripts, which is doable but probably a couple of days of work. Maybe AWS also plans to support for this themselves?

@manast manast added enhancement New feature or request and removed bug Something isn't working labels Nov 16, 2024
@manast
Copy link
Contributor

manast commented Nov 16, 2024

I will keep this open as an enhancement as moving to pure Lua scripts is a long term goal anyway, as it is more robust than using multi/exec from a transactional perspective (as you get better atomic guarantees).

@or-opus
Copy link

or-opus commented Nov 18, 2024

@manast thanks for the info!

@mariuszbeltowski
Copy link

+1, serverless redis becomes the first choice option nowadays in AWS. It looks like soon it will be server less valkey due to redis licensing.

@bowenzhou222
Copy link

bowenzhou222 commented Nov 29, 2024

+1. We use AWS Elasticache serverless and recently got error complaining about EVALSHA which breaks the job lock functionality. As result any job running for more than 30s will be put back to the queue and double executed.

@manast Your previous comment suggested EVALSHA is not supported by serverless however I found it in the doc , any other possibility that this command doesn’t work at all?

@manast
Copy link
Contributor

manast commented Dec 8, 2024

@bowenzhou222 EVAL and EVALSHA works, but what does not work is calling these commands within a multi/exec Redis transaction. However I cannot find where this is stated, nor where it could be reported so that they could implement it in the future. For now I am trying to eliminate the use of multi + eval in the most used code paths of BullMQ, but there will be some features that will not work as they are too complicated to fix, such as flows and adding jobs in bulks.

@manast
Copy link
Contributor

manast commented Dec 9, 2024

The PR that was just merged should resolve the issue with failed jobs and lock extension, however some features will not work yet, such as flows and add bulk which uses multi as well, unfortunately they are too complex to solve as we did for moveToFailed. I think that it would be good if you contact AWS customer support and ask them about this missing feature, it may be something they could easily support if they just realise it is important for some users.

@manast manast closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants