Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

I found the cause of SQS Queue Worker's memory leak #1380

Open
sh-ogawa opened this issue Nov 6, 2018 · 14 comments
Open

I found the cause of SQS Queue Worker's memory leak #1380

sh-ogawa opened this issue Nov 6, 2018 · 14 comments

Comments

@sh-ogawa
Copy link

sh-ogawa commented Nov 6, 2018

This behavior sounds similar to #1645 where we found that the SDK (or one of its dependencies) does appear to have cyclic references that have to be cleaned up by PHP's garbage collector, however garbage collection does occur naturally after a certain amount of references have built up.

This problem seems to be due to the implementation of SqsClient 's promise usage, which still has references left.

However, this problem can be solved by using gc_collect_cycles().

The easiest place to insert is in the Illuminate\Queue\Worker::daemon().
But I do not know if I can put it here, so I raised the issue.

We will share the exchange with aws-sdk-php issue, so would you please proceed with the direction to fix this problem?

aws/aws-sdk-php#1667 (comment)

@deleugpn
Copy link

Since this is a problem in a specific driver, it would be better if we could solve it at the driver itself. Perhaps at the class Laravel uses to interact with the SDK.

@sh-ogawa
Copy link
Author

It is true as it is, but it is impossible to put it in because it can not do postprocessing as it is currently implemented.

@sisve
Copy link

sisve commented Nov 14, 2018

Would it work to call gc_collect_cycles() in the Looping event?

@deleugpn
Copy link

I was thinking that it could be something done right at the pop method of SqsQueue (https://github.com/laravel/framework/blob/5.7/src/Illuminate/Queue/SqsQueue.php#L117).

The reasoning is that since it's a SqsQueue problem, it might be best to have a SqsQueue solution instead of affecting all drivers. Of course, if it doesn't cause harm, we might as well do it in the Worker loop.

Ideally, we'd do it right after, which means we would add a worked method to the Job contract and have it called after working a job. The SqsQueue implementation could do the clean up. However, given the lack of need for a worked until now, we might want to hold off that approach until it's more needed. Hence the proposal for the pop. Right before getting a new Job, let's clear (gc_collect_cycles) any residue from previous jobs.

@sisve
Copy link

sisve commented Nov 15, 2018

Keeping it in SqsQueue would make sense in the context of focusing on this single problem. Putting it in the Worker makes sense if you consider that it's the Worker's responsibility to shut down when memory usage exceeds what is set by --memory. I was slightly surprised that the Worker didn't call gc_collect_cycles before shutting down to see if it could free memory.

@sh-ogawa
Copy link
Author

sh-ogawa commented Nov 15, 2018

Call a gc_collect_cycles before sqs pop, if there are similar problems in future, will not you do the same kind of correspondence?

Mr, @sisve
It feels good, certainly it is good to release memory once before checking if shutdown the queue worker when used memory exceeds.

Laravel users will not be (maybe) welcomed to shutdown queue workers due to memory overruns.

@t202wes
Copy link

t202wes commented Dec 24, 2018

This still hasn't been updated in a recent release has it?

Is it possible just to run this function from a cronjob to resolve the memory issue?

Are we able to insert gc_collect_cycles() into our own codebase, into each command to easily resolve this?

What's the easiest way for something to implement this fix, without having to wait for Laravel, any other packages to update, or trying to host our own customized packages outside our main application files?

@mfn
Copy link

mfn commented Dec 24, 2018

Are we able to insert gc_collect_cycles() into our own codebase, into each command to easily resolve this?

If you are not using horizon, I think you can:

  • extend \Illuminate\Queue\Console\WorkCommand (i.e. create your own version of the command)
    which
  • __construct receives your Worker class which extends \Illuminate\Queue\Worker and overrides the daemon method and adds gc_collect_cycles()
  • and then call that particular command

If you're using horizon, I'm not sure. Because horizon itself employs this very same technique, it extends \Illuminate\Queue\Console\WorkCommand as part of \Laravel\Horizon\Console\WorkCommand and you would somehow need to find a way to teach horizon not to invoke the horizon:work but your own version the new command, etc.

I think that Laravels worker could benefit from calling gc_collect_cycles() but one first have to proof someone it has no ill effects.

I.e. performance might be affected when a worker runs many many small/fast jobs and thus calls gc_collect_cycles() often, thus redundantly. Maybe it should be able to be tuned (only trigger once in a while) or turned off/on. But asking these question already shows it needs more preparation to land this properly.

@sh-ogawa
Copy link
Author

I was struggling to incorporate opinions.
In my opinion,
I think that real time nature has not been requested so far at the time of queue job, and our team does not ask for it.
If you are really looking for real-time nature, just use another method.

My problem is that the queue job will continue to increase memory just by checking for jobs.
This may also affect other programs running on the same server.
Also, if you are using EC 2, this problem may cause disk IO to occur frequently and you may need to pay CPU credits.
Is this obviously disadvantageous to the user?

@sisve
Copy link

sisve commented Dec 26, 2018

@sh-ogawa Did you try calling call gc_collect_cycles() in a custom listener that is listening for the Looping event? It should be a workaround for your application until a suitable framework solution is implemented.

@sh-ogawa
Copy link
Author

@sisve Is it about our queue job?
If it is, that can not be done.
This is because the SQS Queue Worker just increases the memory by simply checking the message in SQS.
If there is no message at this time, our queue job will not be called, so there is no timing to call gc_collect_cycles.

@sisve
Copy link

sisve commented Dec 26, 2018

You're probably thinking of the JobProcessing/JobProcessed/JobFailed events which is tied to a specific job and wouldn't execute on an empty queue.

The Looping event is fired every time the worker checks for an available job (and either executes it, or sleep if there was no available job). I use dedicated listener classes for my logic, but you can just throw this into your favorite service provider to execute gc_collect_cycles() between every job. It's probably not appropriate to use on a framework level, but you can probably use it as a workaround in your code base until Laravel has better fix.

Queue::looping(function() {
    gc_collect_cycles();
});

@sh-ogawa
Copy link
Author

@sisve
thank you!!
I write that in AppServiceProvider and confirmed that memory does not increase.
As a provisional correspondence, write this code and avoid it.

@mfn
Copy link

mfn commented Nov 15, 2019

There was an update at aws/aws-sdk-php#1667 (comment) which points to an interesting thing:

The short of it being something with cURL SSL verification being leaky.

From https://forums.aws.amazon.com/message.jspa?messageID=695298 (linked in above comment):

memory_get_usage() does NOT report the leak, it shows steady usage. Using top and ps aux --sort -vsz | head from the CLI shows the VSize and RSize memory increasing consistently until the entire instance goes down.

I think the bug is either in libcurl or in the SDK's usage of curl, regarding SSL peer verification. It does not matter which service we were contacting; S3, SQS, DynamoDb, and others all exhibit the same behavior in a long-running process.

Turning off peer verification as in the following code fixed the leak:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants