Monitor multiple servers when we're running in pods #19734

carbonin · 2020-01-17T22:09:45Z

This PR allows EvmServer to loop through multiple servers as a part of the monitor loop in a single process.

The biggest issue solved here is that the "local" server is defined by a bunch of global state, specifically the guid class variable on MiqServer. When changing servers in this patch we set the guid on the class and reset the global settings variable. This allows all the normal monitoring methods to work as they did when the servers were separate processes.

When on an appliance, we will only monitor the actual local server.

This gets us very close to solving ManageIQ/manageiq-pods#353

jrafanie · 2020-01-21T17:06:07Z

lib/workers/evm_server.rb

@@ -31,4 +33,149 @@ def set_process_title
  def self.start(*args)
    new.start
  end
+
+  def start_server_environment(server)


is it ok if server is nil here? I can be the first time, before seed_primordial is run.

jrafanie · 2020-01-21T17:07:16Z

lib/workers/evm_server.rb

+      end
+    end
+
+    unless server.new_record?


💣 if nil, right?

jrafanie · 2020-01-21T18:26:52Z

lib/workers/evm_server.rb

+
+    Vmdb::Appliance.log_config_on_startup
+
+    @server.ntp_reload


Where is @server initialized?

In #impersonate_server

Weird, I couldn't find impersonate_server called anywhere before... Maybe I fat fingered my search for it or only looked at 1 commit... thanks!

carbonin · 2020-01-21T18:53:08Z

I'm having a problem with settings ...

Specifically when I call Vmdb::Settings.init in impersonate_server it resets the last_loaded value which causes all the workers to re-fetch the settings even though it's likely that no settings have actually changed.

I'm not sure how to deal with this. Right now, what happens is that we are doing a bit extra work on each monitor loop and each time the worker heartbeats, so nothing should be really broken ...

Edit:
Also, I'm not sure that this needs to be fixed on this first pass, just really putting this here to see if there are any ideas.

EditEdit:

Actually I think this will happen on the appliance too which is not great. I could special case impersonate_server when there is only one server and we're trying to impersonate the one we're already on....

lib/workers/evm_server.rb

jrafanie · 2020-01-21T20:01:41Z

lib/workers/evm_server.rb

+  end
+
+  def servers_from_db
+    MiqEnvironment::Command.is_podified? ? MiqServer.all.to_a : [MiqServer.my_server(true)]


Wow, I like how the podified/non-podified code is 99% the same and the only difference is the return from this method. 👍

jrafanie · 2020-01-21T20:04:10Z

lib/workers/evm_server.rb

+    save_local_network_info
+    set_local_server_vm
+    reset_server_runtime_info
+    log_server_info


I gotta see how the above methods work with server impersonation. I wonder how we want to deal with server failover. I'm guessing we'll need really good detection for when the pod needs to be bounced.

Right, so right now we are checking the health of this pod by just ensuring that the MIQ Server process is running ref.

If we're concerned that the actual monitor process might get stuck we could add a heartbeat to a file similar to the worker heartbeat to ensure that it's still looping.

lib/workers/evm_server.rb

Fryguy · 2020-01-22T22:07:53Z

lib/workers/evm_server.rb

+  end
+
+  def impersonate_server(s)
+    return if s == @server


I think this would be clearer named @current_server or something to that effect.

Fryguy · 2020-01-22T22:08:37Z

lib/workers/evm_server.rb

+  # such as MiqServer.my_server and MiqServer.my_guid, and also the
+  # contents of the global ::Settings constant.
+  ######################################################################
+  def for_each_server


for_each_server the method name feels too generic for me. Perhaps as_each_server, or each_impersonated_server?

Fryguy · 2020-01-22T22:09:04Z

lib/workers/evm_server.rb

+    # It is important that we continue to use the same server instance here.
+    # A lot of "global" state is stored in instance variables on the server.
+    @server = s
+    Vmdb::Settings.init


As discussed, I think you can use Vmdb::Settings.reset_settings_constant

lib/workers/evm_server.rb

carbonin · 2020-01-22T22:17:42Z

lib/workers/evm_server.rb

+    # Remove and shutdown a server if we're monitoring it and it is no longer in the database
+    servers_to_monitor.delete_if do |monitor_server|
+      servers_from_db.none? { |db_server| db_server.id == monitor_server.id }.tap do |should_delete|
+        monitor_server.shutdown if should_delete


Think we also need to impersonate here as MiqServer.quiesce_workers_loop uses @worker_monitor_settings

lib/workers/evm_server.rb

chessbyte · 2020-01-23T15:40:52Z

wondering if renaming EvmServer to MiqOrchestrator would make things easier to understand. The EVM part of the name is a throwback to ManageIQ, Inc. and its product name Enterprise Virtualization Manager (EVM) and is probably pretty confusing in 2020.

carbonin · 2020-01-23T23:00:37Z

The settings issue should be mostly dealt with after #19758 is merged.

MiqServer#shutdown_and_exit and MiqServer.kill take it upon themselves to exit the current process when the finish dealing with a single server. When monitoring multiple servers, we can't have that so lightly re-implement them in EvmServer so that they don't exit.

Prior to this change we would try to actually kill the worker's pid on the orchestrator. That's never going to do what we want and at worst could kill something we don't want to, so this change implements MiqWorker#kill_process as #destroy_container_objects when we're running as the orchestrator. This isn't really different from what #stop would do, but it's better than what we currently have.

An important bit of this commit is that the particular instances that we are using in EvmServer are preserved by the refresh process. We can't just treat it as a cache and overwrite the local list. This is because a bunch of each server's state is stored in instance variables. If we were to overwrite the instance we would lose all of that information which would cause (unquestionably bad) undefined behavior.

We create the initial server record between initializing the object and starting the server so the first time you start on a fresh database the list would be empty.

Now that we're monitoring multiple servers, it's possible that a worker of the same type belonging to a different server is relying on that service. It's easier to leave the service alone than to figure out if it is safe to delete. It might even be best to define the service in the pods repo rather than creating it here. The httpd config already assumes the services will be created with hardcoded names ...

In containers, sometimes worker records are not being created even though the worker pod is running. This causes the orchestrator to thrash trying to create the worker even though it exists in openshift.

Before this change, a single server would init the settings every time it was monitored. This causes an unnecessary amount of extra work in the appliance model where we will always have only a single server. To avoid this, return from #impersonate_server if we already are the requested server, and only reset the caches if we changed servers.

…erver Additionally clear the server cache when impersonating a new server. Previously this was only working because Vmdb::Settings.init was calling MiqServer.my_server(true) which busted the cache for us.

miq-bot · 2020-01-24T20:02:02Z

Checked commits carbonin/manageiq@257f12e~...a6f642c with ruby 2.5.5, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
5 files checked, 5 offenses detected

lib/workers/evm_server.rb

❗ - Line 215, Col 69 - Rails/Present - Use if @current_server.ipaddress.present? instead of unless @current_server.ipaddress.blank?.
❗ - Line 216, Col 69 - Rails/Present - Use if @current_server.hostname.present? instead of unless @current_server.hostname.blank?.
❗ - Line 217, Col 69 - Rails/Present - Use if @current_server.mac_address.present? instead of unless @current_server.mac_address.blank?.
❗ - Line 285, Col 26 - Naming/UncommunicativeMethodParamName - Method parameter must be at least 3 characters long.
❗ - Line 37, Col 5 - Rails/Exit - Do not use exit in Rails applications.

carbonin · 2020-01-24T20:50:47Z

Okay, so right now deleting servers doesn't make a ton of sense, but it seems to work in the tests that I've run.

In the next PR I'm going to bypass the check that is preventing us from removing servers and remove all the servers in a zone when the zone is removed (all only for pods, of course). So then I'll be able to work out the kinks with the removal side here.

Fryguy · 2020-01-27T17:11:53Z

lib/workers/evm_server.rb

  end

  def self.start(*args)
    new.start
  end

+  private
+
+  def monitoring_server?(server)


😵 🔥 🗑 🚽

carbonin added enhancement core labels Jan 17, 2020

carbonin requested a review from jrafanie January 17, 2020 22:09

carbonin assigned Fryguy Jan 17, 2020

miq-bot added the wip label Jan 17, 2020

jrafanie reviewed Jan 21, 2020

View reviewed changes

lib/workers/evm_server.rb Outdated

end

end

unless server.new_record?

Copy link

Member

jrafanie Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💣 if nil, right?

jrafanie reviewed Jan 21, 2020

View reviewed changes

lib/workers/evm_server.rb Outdated Show resolved Hide resolved

jrafanie reviewed Jan 21, 2020

View reviewed changes

This was referenced Jan 22, 2020

Remove the unsafe cattr_accessor for my_guid_cache #19744

Merged

Commit the data directory rather than creating it when the server starts #19745

Merged

Move server start to EvmServer class #19746

Merged

carbonin force-pushed the monitor_multiple_servers branch from 90fef18 to f68e918 Compare January 22, 2020 20:32

miq-bot added the unmergeable label Jan 22, 2020

carbonin force-pushed the monitor_multiple_servers branch from f68e918 to 5f6e17b Compare January 22, 2020 21:35

carbonin changed the title ~~[WIP] Use EvmServer to monitor multiple server objects when running in containers~~ Monitor multiple servers when we're running in pods Jan 22, 2020

miq-bot removed the wip label Jan 22, 2020

Fryguy removed the unmergeable label Jan 22, 2020

Fryguy reviewed Jan 22, 2020

View reviewed changes

lib/workers/evm_server.rb Outdated Show resolved Hide resolved

Fryguy reviewed Jan 22, 2020

View reviewed changes

lib/workers/evm_server.rb Outdated Show resolved Hide resolved

carbonin commented Jan 22, 2020

View reviewed changes

Fryguy reviewed Jan 22, 2020

View reviewed changes

lib/workers/evm_server.rb Show resolved Hide resolved

carbonin added 15 commits January 24, 2020 14:23

Cleanup comments in EvmServer and declare more things private

3c4f4a5

Start a newly added server before trying to monitor it

1961430

Refresh the server list before starting the servers

261c97b

We create the initial server record between initializing the object and starting the server so the first time you start on a fresh database the list would be empty.

Save the worker record harder

0314d28

In containers, sometimes worker records are not being created even though the worker pod is running. This causes the orchestrator to thrash trying to create the worker even though it exists in openshift.

Scope podified servers to monitor to the region

c9ca1f0

Don't return [nil] when we don't have a local server yet

fa5870e

Use current_server rather than server in EvmServer

7efa285

Rename EvmServer#for_each_server to as_each_server

9001390

Spec for removing a server

55bc245

Refactor EvmServer#refresh_servers_to_monitor

bc25a82

miq-bot added the wip label Jan 24, 2020

carbonin added 2 commits January 24, 2020 14:41

Publicize Vmdb::Settings.reset_settings_constant and use it from EvmS…

094bd53

…erver Additionally clear the server cache when impersonating a new server. Previously this was only working because Vmdb::Settings.init was calling MiqServer.my_server(true) which busted the cache for us.

Log when we change server identities

a6f642c

carbonin force-pushed the monitor_multiple_servers branch from bddb7fa to a6f642c Compare January 24, 2020 19:58

carbonin changed the title ~~[WIP] Monitor multiple servers when we're running in pods~~ Monitor multiple servers when we're running in pods Jan 24, 2020

carbonin removed the wip label Jan 24, 2020

Fryguy reviewed Jan 27, 2020

View reviewed changes

lib/workers/evm_server.rb

end

def self.start(*args)

new.start

end

private

def monitoring_server?(server)

Copy link

Member

Fryguy Jan 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😵 🔥 🗑 🚽

Fryguy merged commit 218f76f into ManageIQ:master Jan 27, 2020

Fryguy added this to the Sprint 129 Ending Feb 3, 2020 milestone Jan 27, 2020

carbonin deleted the monitor_multiple_servers branch January 27, 2020 19:07

agrare mentioned this pull request Nov 1, 2021

[WIP] Move current_pods/deployments to instance attributes #21525

Closed

Fryguy mentioned this pull request Nov 2, 2021

[WIP] [POC] Remove multiple server support from the orchestrator process. #21535

Closed

Fryguy mentioned this pull request Mar 2, 2023

Remove as_each_server from worker management #22387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor multiple servers when we're running in pods #19734

Monitor multiple servers when we're running in pods #19734

carbonin commented Jan 17, 2020 •

edited

Loading

jrafanie Jan 21, 2020

jrafanie Jan 21, 2020

jrafanie Jan 21, 2020

carbonin Jan 21, 2020

jrafanie Jan 21, 2020

carbonin commented Jan 21, 2020 •

edited

Loading

jrafanie Jan 21, 2020

jrafanie Jan 21, 2020

carbonin Jan 21, 2020

Fryguy Jan 22, 2020

Fryguy Jan 22, 2020

Fryguy Jan 22, 2020 •

edited

Loading

carbonin Jan 22, 2020

chessbyte commented Jan 23, 2020

carbonin commented Jan 23, 2020

miq-bot commented Jan 24, 2020

carbonin commented Jan 24, 2020

Fryguy Jan 27, 2020

Monitor multiple servers when we're running in pods #19734

Monitor multiple servers when we're running in pods #19734

Conversation

carbonin commented Jan 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carbonin commented Jan 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy Jan 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chessbyte commented Jan 23, 2020

carbonin commented Jan 23, 2020

miq-bot commented Jan 24, 2020

carbonin commented Jan 24, 2020

Choose a reason for hiding this comment

carbonin commented Jan 17, 2020 •

edited

Loading

carbonin commented Jan 21, 2020 •

edited

Loading

Fryguy Jan 22, 2020 •

edited

Loading