-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Monitor multiple servers when we're running in pods #19734
Conversation
lib/workers/evm_server.rb
Outdated
@@ -31,4 +33,149 @@ def set_process_title | |||
def self.start(*args) | |||
new.start | |||
end | |||
|
|||
def start_server_environment(server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it ok if server
is nil here? I can be the first time, before seed_primordial is run.
lib/workers/evm_server.rb
Outdated
end | ||
end | ||
|
||
unless server.new_record? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💣 if nil, right?
lib/workers/evm_server.rb
Outdated
|
||
Vmdb::Appliance.log_config_on_startup | ||
|
||
@server.ntp_reload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is @server
initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird, I couldn't find impersonate_server
called anywhere before... Maybe I fat fingered my search for it or only looked at 1 commit... thanks!
I'm having a problem with settings ... Specifically when I call I'm not sure how to deal with this. Right now, what happens is that we are doing a bit extra work on each monitor loop and each time the worker heartbeats, so nothing should be really broken ... Edit: EditEdit: Actually I think this will happen on the appliance too which is not great. I could special case |
lib/workers/evm_server.rb
Outdated
end | ||
|
||
def servers_from_db | ||
MiqEnvironment::Command.is_podified? ? MiqServer.all.to_a : [MiqServer.my_server(true)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, I like how the podified/non-podified code is 99% the same and the only difference is the return from this method. 👍
lib/workers/evm_server.rb
Outdated
save_local_network_info | ||
set_local_server_vm | ||
reset_server_runtime_info | ||
log_server_info |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gotta see how the above methods work with server impersonation. I wonder how we want to deal with server failover. I'm guessing we'll need really good detection for when the pod needs to be bounced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so right now we are checking the health of this pod by just ensuring that the MIQ Server process is running ref.
If we're concerned that the actual monitor process might get stuck we could add a heartbeat to a file similar to the worker heartbeat to ensure that it's still looping.
90fef18
to
f68e918
Compare
f68e918
to
5f6e17b
Compare
lib/workers/evm_server.rb
Outdated
end | ||
|
||
def impersonate_server(s) | ||
return if s == @server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be clearer named @current_server
or something to that effect.
lib/workers/evm_server.rb
Outdated
# such as MiqServer.my_server and MiqServer.my_guid, and also the | ||
# contents of the global ::Settings constant. | ||
###################################################################### | ||
def for_each_server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for_each_server
the method name feels too generic for me. Perhaps as_each_server
, or each_impersonated_server
?
lib/workers/evm_server.rb
Outdated
# It is important that we continue to use the same server instance here. | ||
# A lot of "global" state is stored in instance variables on the server. | ||
@server = s | ||
Vmdb::Settings.init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I think you can use Vmdb::Settings.reset_settings_constant
lib/workers/evm_server.rb
Outdated
# Remove and shutdown a server if we're monitoring it and it is no longer in the database | ||
servers_to_monitor.delete_if do |monitor_server| | ||
servers_from_db.none? { |db_server| db_server.id == monitor_server.id }.tap do |should_delete| | ||
monitor_server.shutdown if should_delete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we also need to impersonate here as MiqServer.quiesce_workers_loop
uses @worker_monitor_settings
wondering if renaming |
The settings issue should be mostly dealt with after #19758 is merged. |
MiqServer#shutdown_and_exit and MiqServer.kill take it upon themselves to exit the current process when the finish dealing with a single server. When monitoring multiple servers, we can't have that so lightly re-implement them in EvmServer so that they don't exit.
Prior to this change we would try to actually kill the worker's pid on the orchestrator. That's never going to do what we want and at worst could kill something we don't want to, so this change implements MiqWorker#kill_process as #destroy_container_objects when we're running as the orchestrator. This isn't really different from what #stop would do, but it's better than what we currently have.
An important bit of this commit is that the particular instances that we are using in EvmServer are preserved by the refresh process. We can't just treat it as a cache and overwrite the local list. This is because a bunch of each server's state is stored in instance variables. If we were to overwrite the instance we would lose all of that information which would cause (unquestionably bad) undefined behavior.
We create the initial server record between initializing the object and starting the server so the first time you start on a fresh database the list would be empty.
Now that we're monitoring multiple servers, it's possible that a worker of the same type belonging to a different server is relying on that service. It's easier to leave the service alone than to figure out if it is safe to delete. It might even be best to define the service in the pods repo rather than creating it here. The httpd config already assumes the services will be created with hardcoded names ...
In containers, sometimes worker records are not being created even though the worker pod is running. This causes the orchestrator to thrash trying to create the worker even though it exists in openshift.
Before this change, a single server would init the settings every time it was monitored. This causes an unnecessary amount of extra work in the appliance model where we will always have only a single server. To avoid this, return from #impersonate_server if we already are the requested server, and only reset the caches if we changed servers.
…erver Additionally clear the server cache when impersonating a new server. Previously this was only working because Vmdb::Settings.init was calling MiqServer.my_server(true) which busted the cache for us.
bddb7fa
to
a6f642c
Compare
Checked commits carbonin/manageiq@257f12e~...a6f642c with ruby 2.5.5, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0 lib/workers/evm_server.rb
|
Okay, so right now deleting servers doesn't make a ton of sense, but it seems to work in the tests that I've run. In the next PR I'm going to bypass the check that is preventing us from removing servers and remove all the servers in a zone when the zone is removed (all only for pods, of course). So then I'll be able to work out the kinks with the removal side here. |
end | ||
|
||
def self.start(*args) | ||
new.start | ||
end | ||
|
||
private | ||
|
||
def monitoring_server?(server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😵 🔥 🗑 🚽
This PR allows
EvmServer
to loop through multiple servers as a part of the monitor loop in a single process.The biggest issue solved here is that the "local" server is defined by a bunch of global state, specifically the guid class variable on
MiqServer
. When changing servers in this patch we set the guid on the class and reset the global settings variable. This allows all the normal monitoring methods to work as they did when the servers were separate processes.When on an appliance, we will only monitor the actual local server.
This gets us very close to solving ManageIQ/manageiq-pods#353