-
-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[🐛 BUG]: RR [v2.5.7
] doesn't construct new workers after call resetting command
#1168
Comments
Hey @obrazkowv 👋🏻 . Could you please try the |
Also, |
v2.5.7
] doesn't construct new workers after call resetting command
This issue is difficult to reproduce. It appears at different moment of time and i don't know what is the cause of this issue. |
I removed redirect stderr to /dev/null from resetting script and started to reset rr workers. After that i found, that sometimes i get different errors:
|
@obrazkowv was that after upgrade or still with the old version? |
I was still on same version, but few minutes ago i upgraded the version to "rr version 2.10.3 (build time: 2022-06-02T10:51:28+0000, go1.18.2)" and even after that i faced with issue. P.S. i found, that i have different version between rr binary and php lib. The version of php lib was 2.0.0, that already outdated and besides this, it mismatched to the version of the rr binary. So i will try to reproduce this problem with same version (2.10.3) |
After upgrading to 2.10.3, unfortunately, nothing changed
That's what i am doing, maybe it can help: I will try to debug what is happening via strace, maybe i will find something that seems useful and finally i will do tcpdump of 6001 port(rcp listener) |
I started to call rr reset and at some moment it was stuck without any response. And any other command doesn't respond(rr workers). I have simple endpoint, that always answer with 200 code and i wrote the bash script, in which there is infinite loop that makes 10 calls simultaneously each iteration
This logs i found in syslog
And that what i found in rr logs
|
@obrazkowv Could you please update to EDIT: |
RR won't re-read a new path when you update a symlink. You have to restart RR in that case.
What's the name of the lib? |
What do you mean when you said restart rr? Do you mean reset workers or i need to restart the application fully? If i need to restart the application by stopping the serving process, how can i avoid downtime during restarting the application?
I mean spiral/roadrunner lib, that use in the application(workers) |
No problem, i will try to test with new version today lately or tomorrow |
I'm not quite sure what are doing. You may explain your approach if you want.
It's not quite necessary to have the same version. IIRC there were no BCs between RR-PHP and RR server. |
I faced with same behavior after upgrading to 2.10.4. I will try to reproduce this issue from scratch project, but i think that nothing will change because the answer from go is wrong as i think For reproducing this issue, i start the app, and in another tab start simple bash script, that creates calls to the app(i think, that ab or wrk also will be good for this purpose) and finally, i do reset of workers until i face with error
|
I was able to reproduce this with code from scratch
|
Also, i found, that you can skip sending requests and just create the loop of resetting commands. So, i started the rr and started script
And script stuck on receiving the signal
|
It's not possible to reset workers from workers. |
I mean, you should use the |
what do you mean? |
I started the same command on the server, which worked in the loop for eight days. |
So, first of all, lets clarify some things: php -r '
require_once "/var/www/...../vendor/autoload.php";
$rpc = \Spiral\Goridge\RPC\RPC::create("tcp://127.0.0.1:6001");
$rpc->call("resetter.Reset", "http");
' Doing reset from the PHP is not possible because this command wasn't designed to work with PHP. |
Can the power of server be related to the problem? I am testing this behavior on aws t3.medium. But I doubt this... |
I have already changed the command, i am using "rr reset" command near with rr.yaml file, so i don't call it through php, but it doesn't help |
Could you please try to do that with 1 RR and 1 |
Do you mean 1 worker in the config when you told 1 RR? |
I assume that there is smt wrong with your test cases. Keep in mind of |
I mean 1 RR process. You have dozen of them running in your |
i'll try to find what can be cause of this issue, maybe it's related to the settings of the OS. |
You are wrong, i have only one main process. "rr reset" was called by ubuntu user |
Also, as I mentioned (in general, this is not the issue, but...) you are running an |
It might be a child PHP processes; however, I don't know. |
So, to summarize, I was trying to reproduce your case for eight days with a RR (with 30 workers, the latest PHP) and didn't see the result you showed.
|
I already said that you can skip the step of sending requests. The reset command freezes when you do sequential reset. Try to remove the sleep after each call, idk I did it on ubuntu 18.04, Is it enough fresh? I already have similar code
And finally, i think, that different user doesn't matter to this issue, because i would be faced with this on first iteration. Okey, i will try to find cause of the problem and if i find, i will add explanation, thank you for your time |
When i have changed relay from pipe to socket, after some time i got warning. I hope it can help... |
I tried different scenarios, but no freezes.
It is fine. This is not an error, that might happen when you're doing a lot of resets.
It's very old.
|
I found the cause of issue, we are using OS signals in our php application, so each time, when i send the signal to the process, rr will freeze...
So, now you should be able to reproduce this problem and i hope it helps you |
If you send a SIGUSR2 syscall to the PHP app, this is not a problem. If you are using reset at the same time - this is a problem. You're not allowed to do that at the moment. This is not a bug, this is expected behavior. |
I don't understand, why you need to stop your worker and then stop all the workers at the same time with |
Why do you think, that i can use it only for signaling to stop? I use this signal to notify about something another process and i have used it in the application before i started to use roadrunner, so i didn't even assume that it can be the problem. I think, that you should add this point in the documentation because it can be unexpected behavior for someone like me because you can change of the reaction on the signal in your app, i mean, you can use it not only for termination. |
Because if you send a syscall and worker won't stop, this is not a problem for the
We're always welcome to the PRs 😉 You may also create a |
I think that this cannot be resolved in roadrunner app(application written on go) because you don't know which sent the signal and how it must handle. Developer must care about this when he starts using this library, but he must be aware of this potential issue. |
Sorry for repeating this, but again, you're free to use system calls until they won't stop the worker. The problem is not in the syscall (any of them), the problem is in the syscalls which stop the worker at the same time with the
As you can see, this is not a problem. RR will just create a new worker instead of stopping one. And then you may use a So, feel free to use syscalls that lead to worker stopping OR |
The main problem is that you don't know can you use signals or not right now, i mean simultaneously. I know how to resolve the issue in my environment, but it can be not so obvious in another case. And i need to add a point, that i use the signal to notify that something changed, so i use the signal not for stopping. You can set up the reaction on the specific signal in php and the worker won't stop after sending syscall. https://www.gnu.org/software/libc/manual/html_node/Miscellaneous-Signals.html Finally, i think, that we can close this issue. |
My pleasure 🤝 |
@obrazkowv This is the correct description of the problem: #1180. |
No duplicates 🥲.
What happened?
Sometimes i faced with issue when rr doesn't start workers after resetting
I use the command for reloading application after deploying new version
And this works fine. In logs of the roadrunner, i can find such logs.
But sometimes when i tried to reload workers i faced with unexpected behavior when rr doesn't construct new workers
Construction of new workers was after restarting the service
And that's all. It seems like reload plugin was stopped by something. After that, i need to restart the service completely.
Also, i have such logs in syslog that appeared after restarting the service, but i don't know does it relate to the issue or not
PHP7.4 (opcache_cli enabled without storing in files)
Ubuntu 18.04 x86_64
rr version 2.5.7 (build time: 2021-11-13T16:43:25+0000, go1.17.2)
Maybe i do something wrong when i try to reload workers?
Thank you for any help.
Version
2.5.7
Relevant log output
No response
The text was updated successfully, but these errors were encountered: