-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart handler mechanism appears to have problems #48
Comments
Fixes 1st part of issue: galaxyproject#48
Just wanted to add that since 17.09 (also before, but we fixed some additional issues in 17.09) galaxy restarts are not necessary after installing tools or data managers, in case that helps. |
@mvdbeek Thanks - it would be great to remove the restarter functionality from this role - it does seem very out of place. I had a chat with Enis about this and he suggested that we mark it as deprecated and leave it for another release, because 17.09 is still fairly new. Having said that, I think the issue is that all this time, the restarter generally did not get called unless you used the playbook in isolation. The reason is that the ansible-galaxy role also has a handler with the same name, which was effectively hiding the handler defined in ansible-galaxy-tools. This can be seen in the kick start playbook. This is the Travis build log prior to merging PR #47: However, PR #47 fixes this and now the correct handler gets called. However, this caused a downstream build failure in kickstarter here: https://travis-ci.org/ARTbio/GalaxyKickStart/jobs/312871378#L1844 So the underlying reason I think is that supervisorctl restart galaxy doesn't work properly as intended. All other restarters (docker and run.sh) work, because they are tested in this role and elsewhere. So fixing the supervisor process definition downstream seems to be the correct fix for that. Does that sound right to you? |
Sure, that's fine. You may also want to look at the API version to decide whether or not to restart. |
@mvdbeek Sounds good. We were also thinking it would be good to not totally lose this functionality, and instead, move it to the ansible-galaxy role, where it's currently awaiting an implementation. All 3 restart mechanisms are fairly generic, so it should be fairly usable at that level. We can also use a topic listener to allow downstream roles to intercept should they wish to do so. |
@mvdbeek @nuwang Sounds good to me too. Yet, two things: generally, the galaxy-tools role is the most downstream isn't it ? The other is that the reason why the most recent handler is failing is far from clear for me: what is puzzling is that there is a supervisorctl stop then start task, for instance here https://github.com/ARTbio/GalaxyKickStart/blob/master/roles/galaxy.movedata/tasks/export.yml, and I think we can say it works. The only difference I see, considering that a handler is also a suite of tasks, is that @nuwang used state=restarted whereas in the pointed file, its a state=stopped followed by another task with state=started. I am not sure but I may remember having noticed that the state=restarted for supervisorctl ansible module was quite slow. |
@drosofff Yes, the handler here uses restarted instead of a start followed by a stop:
However, since a restart is in effect the same, the probable reason is that galaxy has not fully stopped yet before attempting to restart. In the kick start playbook, you have several intermediary actions between the stop and start, which probably gives Galaxy enough time to shut down. I didn't author the original restart handler, so I've never checked this, but it's most likely a misconfiguration issue in the supervisor config. You may need to check whether |
@nuwang Yeah actually I forgot to reply to this: You cannot imagine the number of time I already did It's true that sometimes uwsgi is not properly killed by supervisorctl stop or restarted, which causes a port conflict. However, this is sporadic (I would say 1/20 or 1/30). I noticed that in the last situation it's 100% (I did not properly tested the recent PR yet , sorry) This said, if we can improve the supervisor/conf.d/galaxy.conf or related config files, I would be more than happy ! I'll give it a shot when I have time, which unfortunately is not the thing I have the most these days Cheers |
@drosofff Thanks for testing that. It's great if we can get supervisor's uwsgi control fixed - please keep us posted on that! cheers! |
@drosofff A killall on uwsgi may not work because this role could be run on a machine which has other uwsgi processes running, and processes other than galaxy could get killed. There are some good suggestions in this post about changes to the supervisor config that sound promising. e.g. the I assume you're using the supervisor config in galaxy-extras? As a temporary measure, I've added a flag in this PR: #50 which allows control over whether or not to run the restart handler, and defaulted it to no, so that the original behaviour is restored. |
Hi @nuwang Thanks for the suggestion, I'll test it asasp. Yes we are using galaxy-extras, thus I guess that any change of that sort should be proposed for this role right ? On another note, I can confirm that currently the handler is triggering as many supervisorctl restarts as the number of installed tools. 2 times here https://travis-ci.org/ARTbio/ansible-galaxy-tools/builds/316153504#L702 and in separate test not in Travis, I could get 20 restarts with twenty tools ! I know this is not the expected behavior, but it does happen ! |
@drosofff Yes, I think a PR against galaxy-extras would be great. WRT to the handler being triggered unnecessarily, I'm seeing that in the old build logs too and it looks like this bug in Ansible which has been resolved in Ansible 2.3.0: ansible/ansible#18178 Which version of Ansible are you using? I've also updated the Ansible version used by travis so this may have resolved the problem, but it's no longer immediately apparent because the restart handlers don't run automatically anymore. |
We use ansible==2.2, but I don't think it would be a problem to update to 2.3.0 or even 2.4, apart from the warnings for syntax changes. I tried a couple of time and it worked. We would just need to lint all roles at some point. |
I can confirm that switching to ansible==2.3.0 fixes the behavior of the handler. |
Problem:
The restart handler mechanism doesn't appear to work consistently and/or work well with other roles.
Background:
See discussion here: #47
Issues:
There are several attendant issues that also need to be solved:
but this can cause the incorrect handler to be called when both galaxy and galaxy-tools roles are used together. We could use the same handler name if we use a "generic topic" as introduced in ansible 2.2 onwards or make the handler name unique, which is what was done in PR: Make sure ansible-galaxy-tools version of the restart handler is called #47. However, the latter seems to have caused a build break for unknown reasons.
Since a restart command should succeed, whether Galaxy is running or not, there may also be a supervisor config issue in the extras playbook.
The text was updated successfully, but these errors were encountered: