Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor multiple servers when we're running in pods #19734

Merged
merged 19 commits into from
Jan 27, 2020

Conversation

carbonin
Copy link
Member

@carbonin carbonin commented Jan 17, 2020

This PR allows EvmServer to loop through multiple servers as a part of the monitor loop in a single process.

The biggest issue solved here is that the "local" server is defined by a bunch of global state, specifically the guid class variable on MiqServer. When changing servers in this patch we set the guid on the class and reset the global settings variable. This allows all the normal monitoring methods to work as they did when the servers were separate processes.

When on an appliance, we will only monitor the actual local server.

This gets us very close to solving ManageIQ/manageiq-pods#353

@@ -31,4 +33,149 @@ def set_process_title
def self.start(*args)
new.start
end

def start_server_environment(server)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok if server is nil here? I can be the first time, before seed_primordial is run.

end
end

unless server.new_record?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💣 if nil, right?


Vmdb::Appliance.log_config_on_startup

@server.ntp_reload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is @server initialized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I couldn't find impersonate_server called anywhere before... Maybe I fat fingered my search for it or only looked at 1 commit... thanks!

@carbonin
Copy link
Member Author

carbonin commented Jan 21, 2020

I'm having a problem with settings ...

Specifically when I call Vmdb::Settings.init in impersonate_server it resets the last_loaded value which causes all the workers to re-fetch the settings even though it's likely that no settings have actually changed.

I'm not sure how to deal with this. Right now, what happens is that we are doing a bit extra work on each monitor loop and each time the worker heartbeats, so nothing should be really broken ...


Edit:
Also, I'm not sure that this needs to be fixed on this first pass, just really putting this here to see if there are any ideas.


EditEdit:

Actually I think this will happen on the appliance too which is not great. I could special case impersonate_server when there is only one server and we're trying to impersonate the one we're already on....

lib/workers/evm_server.rb Outdated Show resolved Hide resolved
end

def servers_from_db
MiqEnvironment::Command.is_podified? ? MiqServer.all.to_a : [MiqServer.my_server(true)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I like how the podified/non-podified code is 99% the same and the only difference is the return from this method. 👍

save_local_network_info
set_local_server_vm
reset_server_runtime_info
log_server_info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gotta see how the above methods work with server impersonation. I wonder how we want to deal with server failover. I'm guessing we'll need really good detection for when the pod needs to be bounced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so right now we are checking the health of this pod by just ensuring that the MIQ Server process is running ref.

If we're concerned that the actual monitor process might get stuck we could add a heartbeat to a file similar to the worker heartbeat to ensure that it's still looping.

@carbonin carbonin force-pushed the monitor_multiple_servers branch from 90fef18 to f68e918 Compare January 22, 2020 20:32
@carbonin carbonin force-pushed the monitor_multiple_servers branch from f68e918 to 5f6e17b Compare January 22, 2020 21:35
@carbonin carbonin changed the title [WIP] Use EvmServer to monitor multiple server objects when running in containers Monitor multiple servers when we're running in pods Jan 22, 2020
@miq-bot miq-bot removed the wip label Jan 22, 2020
lib/workers/evm_server.rb Outdated Show resolved Hide resolved
end

def impersonate_server(s)
return if s == @server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer named @current_server or something to that effect.

# such as MiqServer.my_server and MiqServer.my_guid, and also the
# contents of the global ::Settings constant.
######################################################################
def for_each_server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for_each_server the method name feels too generic for me. Perhaps as_each_server, or each_impersonated_server?

# It is important that we continue to use the same server instance here.
# A lot of "global" state is stored in instance variables on the server.
@server = s
Vmdb::Settings.init
Copy link
Member

@Fryguy Fryguy Jan 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I think you can use Vmdb::Settings.reset_settings_constant

lib/workers/evm_server.rb Outdated Show resolved Hide resolved
# Remove and shutdown a server if we're monitoring it and it is no longer in the database
servers_to_monitor.delete_if do |monitor_server|
servers_from_db.none? { |db_server| db_server.id == monitor_server.id }.tap do |should_delete|
monitor_server.shutdown if should_delete
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we also need to impersonate here as MiqServer.quiesce_workers_loop uses @worker_monitor_settings

@chessbyte
Copy link
Member

wondering if renaming EvmServer to MiqOrchestrator would make things easier to understand. The EVM part of the name is a throwback to ManageIQ, Inc. and its product name Enterprise Virtualization Manager (EVM) and is probably pretty confusing in 2020.

@carbonin
Copy link
Member Author

The settings issue should be mostly dealt with after #19758 is merged.

MiqServer#shutdown_and_exit and MiqServer.kill take it upon
themselves to exit the current process when the finish dealing
with a single server. When monitoring multiple servers, we can't
have that so lightly re-implement them in EvmServer so that they
don't exit.
Prior to this change we would try to actually kill the worker's
pid on the orchestrator. That's never going to do what we want
and at worst could kill something we don't want to, so this change
implements MiqWorker#kill_process as #destroy_container_objects
when we're running as the orchestrator.

This isn't really different from what #stop would do, but it's better
than what we currently have.
An important bit of this commit is that the particular instances
that we are using in EvmServer are preserved by the refresh process.
We can't just treat it as a cache and overwrite the local list.

This is because a bunch of each server's state is stored in instance
variables. If we were to overwrite the instance we would lose all
of that information which would cause (unquestionably bad) undefined
behavior.
We create the initial server record between initializing the object
and starting the server so the first time you start on a fresh database
the list would be empty.
Now that we're monitoring multiple servers, it's possible that
a worker of the same type belonging to a different server is
relying on that service.

It's easier to leave the service alone than to figure out if it is
safe to delete. It might even be best to define the service in the
pods repo rather than creating it here. The httpd config already
assumes the services will be created with hardcoded names ...
In containers, sometimes worker records are not being created
even though the worker pod is running.

This causes the orchestrator to thrash trying to create the worker
even though it exists in openshift.
Before this change, a single server would init the settings every
time it was monitored. This causes an unnecessary amount of extra
work in the appliance model where we will always have only a single
server.

To avoid this, return from #impersonate_server if we already are
the requested server, and only reset the caches if we changed servers.
@miq-bot miq-bot added the wip label Jan 24, 2020
…erver

Additionally clear the server cache when impersonating a new server.
Previously this was only working because Vmdb::Settings.init was calling
MiqServer.my_server(true) which busted the cache for us.
@carbonin carbonin force-pushed the monitor_multiple_servers branch from bddb7fa to a6f642c Compare January 24, 2020 19:58
@miq-bot
Copy link
Member

miq-bot commented Jan 24, 2020

Checked commits carbonin/manageiq@257f12e~...a6f642c with ruby 2.5.5, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
5 files checked, 5 offenses detected

lib/workers/evm_server.rb

@carbonin carbonin changed the title [WIP] Monitor multiple servers when we're running in pods Monitor multiple servers when we're running in pods Jan 24, 2020
@carbonin carbonin removed the wip label Jan 24, 2020
@carbonin
Copy link
Member Author

Okay, so right now deleting servers doesn't make a ton of sense, but it seems to work in the tests that I've run.

In the next PR I'm going to bypass the check that is preventing us from removing servers and remove all the servers in a zone when the zone is removed (all only for pods, of course). So then I'll be able to work out the kinks with the removal side here.

end

def self.start(*args)
new.start
end

private

def monitoring_server?(server)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😵 🔥 🗑 🚽

@Fryguy Fryguy merged commit 218f76f into ManageIQ:master Jan 27, 2020
@Fryguy Fryguy added this to the Sprint 129 Ending Feb 3, 2020 milestone Jan 27, 2020
@carbonin carbonin deleted the monitor_multiple_servers branch January 27, 2020 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants