-
-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix halfway-stopping of listeners #348
Conversation
Hi Viktor, long time no see 😅
Yes, that would be a blueprint for a test case for my second point, when a listener already is in the halfway-stopped state. The hard-to-test part revolves around my first point, which should make it even impossible to put a listener in that state by "normal means" and crash at unfortunate times =^^= |
@essen? WDYT of the approach I used and the PR in general? |
Why isn't this done in |
Sure, could be done, not a bad idea :) I'll make a draft to show to you, maybe tomorrow. |
... or maybe today =^^= WDYT? |
I would not move the stop logic to Something like |
Ah, ok, so you mean leave the stopping code in the The way you propose would enable anyone to run any code in the
|
Yes.
We can restrict the M and F to ranch and do_stop_listener or whatever the name is, for now. |
@essen What's the point of this extra abstraction? To me it looks like over-engineering. Why not just do the simple protection (or workaround if consider it an OTP bug) for the fact that |
We need to do it in a process that won't be the caller, it can either be a random process that we spawn for this specifically, or Not sure which part is over engineering. |
If |
Thanks, makes sense. |
1c3b4f7
to
de6d4b6
Compare
de6d4b6
to
a2c959e
Compare
Looks good, thanks! I'm off for the next two weeks but I'll look into merging and releasing a new version when I get back. |
Have a nice vacation then 👋 |
@essen what about this PR? 3+ months have passed 😅 |
I have updated the CI, please force push to run CI again. |
b90878a
to
70e0dc4
Compare
@essen Hm. Looks like we have a race condition there now. It should be of little to no consequence in reaility, but it messes up the tests. With the rewrite of At the end of most tests, we call So, I could put back the explicit WDYT? |
Can we make |
Maybe... This could be done in 3 ways, as I see it:
|
I would do the following:
Then depending on what the tests tell us we may need to do more. |
Looking at the code, the late cleanup is likely problematic in a tight stop/start loop.
An explicit call to
A check-wait-retry loop also contains a race condition where, if
🤷♂️ |
It's OK that there are race conditions when start and stop are called by different processes. I don't think we should support more than "stop then start" from within the same process. It's up to the user to synchronize if they need more. |
If the explicit cleanup allows tight stop/start loops to work, then let's go with this. |
Have just been thinking... if we explicitly call |
Not sure if something can go wrong when stopping. But yes we could swap the order we do things if it makes things better. |
Pushed a new commit with that. I don't think anything much could go wrong. For one, we are running the stop procedure in an isolated process now, so whatever happens to the process calling |
The start/stop loop test I mentioned would also be good to have if we don't already have it. |
I'll have to see to that later =^^= Which means, probably next week if I can't get back to it today. |
There :) |
I think it looks good. Please rebase and squash and we can merge. If there are hard macOS failures in CI when you do this it's because the runner image is being changed and this makes the cached OTP build invalid for that environment. |
* if the process calling ranch:stop_listener crashes before finishing, the stopping procedure is still executed completely * if a listener is terminated but not deleted, calling ranch:stop_listener removes the remnant
791e095
to
80e012e
Compare
Done 👍 |
There is another failure now, macOS again, not sure what this one is about. Seems unrelated to changes in this PR, anyway 🤷♂️ |
Merged, thanks! |
Fixes #347.
The changes in this PR actually do two things:
ranch:stop_listener
at an unfortunate time (namely, between thesupervisor:terminate_child
andsupervisor:delete_child
calls involved) should no longer prevent the stopping procedure from going to completionThere are no tests yet (and for the former case, I don't see how we could reliably create the scenario of the calling process crashing at just the right time). I want to hear what you think about this approach first before investing more time.