Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move signal handling into the MiqServer object #15206

Merged
merged 3 commits into from
May 23, 2017

Conversation

carbonin
Copy link
Member

This insures the @workers and @workers_lock instance variables are still accessible when trying to communicate messages to the workers for a clean shut down.

Previously the process survived, but the MiqServer instance was no longer in scope. This caused the @workers and @workers_lock objects used for message passing via drb to be nil, ensuring that the workers never got the message to exit.

This will let them be properly handled, rather than attempting
a reconnect.
@@ -369,6 +369,13 @@ def monitor_loop
_log.info "Server Monitoring Complete - Timings: #{timings.inspect}" unless timings[:total_time] < server_log_timings_threshold
sleep monitor_poll
end
rescue Interrupt => e
_log.info("Recieved #{e.message} signal, killing server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo... Recieved => Received (Same on the one below)

rescue Interrupt => e
_log.info("Recieved #{e.message} signal, killing server")
self.class.kill
exit 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's really strange for a regular method on a model to exit 1, however that being said, this is no normal method. We really need to rip out these "MiqServer" classes into standalone classes.

carbonin added 2 commits May 23, 2017 14:58
This insures the workers and workers_lock instance variables
are still accessible when trying to communicate messages to
the workers for a clean shut down.

Previously the process survived, but the MiqServer instance was
no longer in scope. This caused the workers and workers_lock objects
used for message passing via drb to be nil, ensuring that the workers
never got the message to exit.
@carbonin carbonin force-pushed the wait_for_stop_on_sigterm branch from 7589f00 to 9944176 Compare May 23, 2017 18:58
rescue Interrupt => e
_log.info("Received #{e.message} signal, killing server")
self.class.kill
exit 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got squashed from @Fryguy

I think it's really strange for a regular method on a model to exit 1, however that being said, this is no normal method. We really need to rip out these "MiqServer" classes into standalone classes.

Copy link
Member Author

@carbonin carbonin May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fryguy I agree, the alternative was to create a separate method (like #kill_and_exit or something) and just call it from here.

I thought that was a bit overkill, but I have no real opinion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just raise instead of exit 1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, to be fair, we were exiting before so I guess that's unchanged behavior.

@miq-bot
Copy link
Member

miq-bot commented May 23, 2017

Checked commits carbonin/manageiq@a4504ac~...9944176 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
3 files checked, 1 offense detected

app/models/miq_server.rb

_log.info("Received #{e.message} signal, killing server")
self.class.kill
exit 1
rescue SignalException => e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, we're now handling other soft signals I believe. I don't recall why we were only handling term, usr1, and usr2 as soft signals before. Maybe it doesn't matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going with "it doesn't matter". Plus I think this is objectively better behavior. The alternative would be to exit the server process with the exception (it was being re-raised previously) and have the workers go down with DRb connection errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's see what happens and make it more complicated if we need to


it "unhandled signal SIGALRM" do
allow(MiqServer).to receive(:start).and_raise(SignalException, "SIGALRM")
expect { server.start }.to raise_error(SignalException, "SIGALRM")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, does this type matter? We're now treating them as soft signals where we weren't before...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above, we're going to let these other signals be treated differently when we find it's a problem.

@jrafanie jrafanie merged commit 972cec5 into ManageIQ:master May 23, 2017
@jrafanie jrafanie added this to the Sprint 62 Ending Jun 5, 2017 milestone May 23, 2017
@carbonin carbonin deleted the wait_for_stop_on_sigterm branch October 13, 2017 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants