Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Large data in the store caused the app to crash #71

Open
AdelUnito opened this issue Jan 20, 2021 · 0 comments
Open

🐛 Large data in the store caused the app to crash #71

AdelUnito opened this issue Jan 20, 2021 · 0 comments

Comments

@AdelUnito
Copy link

I'm using better-queue in production with a PostgreSQL store.

We had an issue in the service running the queue and some significant data piled up in the store (2 312 233 tasks).
Screen Shot 2021-01-19 at 10 23 47 PM

This caused the application to crash 1 min after the start. The crash was completely silent, with no unhandledRejection nor uncaughtException and no signal traps either (SIGTERM, SIGINT, SIGUSR2). The machine resources were not exhausted the CPU was at 40% and very low memory consumption.

I have logs set at the beginning of the queue process and on error events as you can see in the following code

const queueOptions = {
  batchSize: 250000,
  batchDelay: 3600000,
  concurrent: 1,
  maxRetries: Infinity,
  autoResume: false, // I tested with and without the `autoResume`
  retryDelay: 3600000 + 10,
  afterProcessDelay: 3600000,
  precondition: async (cb: any) => {
    try {
      const lock = await cacheInstance.getValue(LOCK_KEY);
      if (lock) {
        logger.info('Precondition failed, resources still locked');
        cb(null, false);
      } else {
        cb(null, true);
      }
    } catch (err) {
      logger.warn('Couldn\'t check the queue precondition', err);
      cb(err);
    }
  },
  preconditionRetryTimeout: 3600000,
};
if (config.env === 'production') {
  // @ts-ignore
  queueOptions.store = {
    type: 'sql',
    dialect: 'postgres',
    host: process.env.DATABASE_HOST,
    port: process.env.DATABASE_PORT,
    username: process.env.DATABASE_USERNAME,
    password: process.env.DATABASE_PASSWORD,
    dbname: process.env.DATABASE_NAME,
    tableName: 'my_queue_store', // The table will be created automatically.
  };
}
const myQueue = new Queue(async (payload: any, cb: any) => {
  try {
    const lock = await cacheInstance.lock(LOCK_KEY, 3600000);
    // await doTheProcessing() and release the lock.
    cb(null, 'queue_processed');
  } catch (err) {
    // Release the lock.
    cb(err);
  }
}, queueOptions );

// Queue logs
myQueue.on('batch_failed', (error: string) => {
  logger.warn(`Failed to process the queue`, error);
});
myQueue.on('batch_finish', () => {
  logger.info(`Processed the queue batch`);
});

also, I have the following logs when I push data to the queue

myQueue.push(payload)
        .on('finish', (result) => {
          logger.verbose(`Pushed an event to the queue`, result);
        })
        .on('failed', (err) => {
          logger.warn(`Failed to push an event to queue`, err);
        });

The absence of logs made it very hard to find the issue, I discovered the issue when disabling the SQL store and using the default memory where the app stopped crashing.

My only solution, for now, was to backup the my_queue_store table and truncate it.

my tech stack is the following:
OS: 64bit Amazon Linux 2/5.2.1 running in EBS
Node version: Node.js 12 running
better-queue: 3.8.2
better-queue-sql: 1.0.3

How can this be avoided?
How to improve logs for a similar situation?

Thank you 💗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant