Only run kube_monitor threads on master MiqServer #21532

agrare · 2021-11-01T18:55:13Z

On podified there are multiple miq_server instances but only one "master". All miq_servers run on the same kubernetes cluster and thus only have to run one set of monitor_pods/monitor_deployments threads to update the global current_pods/current_deployments hashes.

Follow-up to #21525

app/models/miq_server/worker_management/kubernetes.rb

    ensure_kube_monitors_started
-    cleanup_orphaned_worker_rows
-    sync_deployment_settings


agrare · 2021-11-01T18:56:28Z

app/models/miq_server/worker_management/kubernetes.rb

+  def sync_workers
+    # Before syncing the workers check for any orphaned worker rows that don't have
+    # a current pod and delete them
+    cleanup_orphaned_worker_rows


Wondering if this belongs in the cleanup_failed_workers section

We may have to ensure that the master server goes first in the list...otherwise if a non-master server goes first it will try to compare to an empty "global" list.

Either that or we spawn off the kubernetes watches as a completely separate thing beforehand, and then piggy back on the master server only to ensure the thread is still alive ornot.

It's unfortunate podified does this thing where it creates servers for zones. Anything that hits the miq_workers table is scoped for the server of the zone that it's currently processing, but anything that looks at current_pods is global and needs the check to make sure only the primary server does it.

It does make sense to to put this with delete_failed_deployments in cleanup_failed_workers as long as the timing works but the latter is global so it has the primary? check whereas this is for each server and doesn't. It's similar behavior at a different scope.

agrare · 2021-11-01T18:56:47Z

app/models/miq_server/worker_management/kubernetes.rb

+    cleanup_orphaned_worker_rows
+
+    # Update worker deployments with updated settings such as cpu/memory limits
+    sync_deployment_settings


TODO this seems like something that sync_workers should handle when looping through the worker classes

Yes although, we'd be trading guard clauses:

# currently we loop through `podified_miq_workers` and skip duplicate deployment names we already processed: next if checked_deployments.include?(worker.worker_deployment_name)

Instead, we'd have to skip worker classes that have no existing podified_miq_workers / deployment.

Either way, it's easier now whereas previously it was just messy to have sync_workers have podified? checks plus introspect the server's current_pods and current_deployments.

agrare · 2021-11-01T18:57:32Z

app/models/miq_server/worker_management/kubernetes.rb

@@ -131,6 +144,8 @@ def ensure_kube_monitors_started
  end

  def delete_failed_deployments
+    return unless my_server.is_master?
+
    failed_deployments.each do |failed|


failed_deployments has "global" scope (aka not scoped to a specific miq_server) so this should also be only run on the master server

On podified there are multiple miq_server instances but only one "master". All miq_servers run on the same kubernetes cluster and thus only have to run one set of monitor_pods/monitor_deployments threads to update the global current_pods/current_deployments hashes.

Fryguy · 2021-11-01T21:12:40Z

Does it makes sense to go all the way and extract the kubernetes monitoring into its own objects? Like MiqServer::WorkerManagement::Kubernetes::Watcher?

agrare · 2021-11-01T23:22:26Z

app/models/miq_server/worker_management/kubernetes.rb

-    ensure_kube_monitors_started
+    # All miq_server instances have to reside on the same Kubernetes cluster, so
+    # we only have to sync the list of pods and deployments once
+    ensure_kube_monitors_started if my_server.is_master?


Hm so just is_master? won't do it because that is only set after initial server startup. We might have to check if my_zone is default (and make sure that server is first)

I thinking maybe we encapsulate that my_zone == "default" logic into a method

Fryguy · 2021-11-02T15:50:49Z

app/models/miq_server/worker_management/kubernetes.rb

-    ensure_kube_monitors_started
+    # All miq_server instances have to reside on the same Kubernetes cluster, so
+    # we only have to sync the list of pods and deployments once
+    ensure_kube_monitors_started if my_server_is_primary?


With this change I think we now need the "default" to be first in the list

Actually, as I'm looking at all of this closer, this is all really weird to me. We should really be starting this thread outside of the as_each_server { monitor } loop and also ensure it's still alive outside of the loop. This whole monitoring kube thing has nothing to do with individual servers. The only thing we really need individual servers for is to know if that server's workers are now orphaned or not. Even then we don't need to do it server by server, because we can just make it check all the workers in_my_region. I'm tempted to say let's just get rid of this whole "looping over the servers" thing.

Oh, good point. I don't think we need anything to be done on each individual server. The code outside of the as_each_server block could easily check the deployments, settings, and pods/rows for the whole region.

Yeah we're definitely overloading the miq_server concept here, doesn't help that we're creating multiple miq_servers for a single "platform" instance (aka before 1 miq_server <=> 1 vm)

I think we need to decide what exactly an miq_server is, does is track 1:1 with a runtime platform (vm, k8s namespace) or is it just a logical grouping for zones and other resources.

Okay, short term I've unshifted the "real" miq_server to the front of the line. Long term we definitely need to either extract this out of miq_server or not create "fake" miq_servers for zones.

Not sure if there is enough here to be considered short-term, maybe we just jump straight in to reworking this

yeah I have a solid idea on my head on that - I'll rip out a quick WIP PR once this is merged to discuss.

miq-bot · 2021-11-02T16:26:56Z

Checked commits agrare/manageiq@91a9230~...3773816 with ruby 2.6.3, rubocop 1.13.0, haml-lint 0.35.0, and yamllint
3 files checked, 0 offenses detected
Everything looks fine. ⭐

Fryguy

LGTM - want @jrafanie to also approve before merge.

jrafanie · 2021-11-02T18:15:47Z

LGTM

agrare requested review from Fryguy and jrafanie as code owners November 1, 2021 18:55

agrare commented Nov 1, 2021

View reviewed changes

app/models/miq_server/worker_management/kubernetes.rb

ensure_kube_monitors_started

cleanup_orphaned_worker_rows

sync_deployment_settings

This comment was marked as outdated.

Sign in to view

agrare commented Nov 1, 2021

View reviewed changes

agrare force-pushed the only_run_kube_monitor_on_master_miq_server branch from 70789e1 to 59abe1f Compare November 1, 2021 18:57

agrare force-pushed the only_run_kube_monitor_on_master_miq_server branch from 59abe1f to 91a9230 Compare November 1, 2021 18:58

agrare added the core/pods label Nov 1, 2021

Fryguy assigned jrafanie and Fryguy Nov 1, 2021

agrare commented Nov 1, 2021

View reviewed changes

Don't rely on is_master? on podified

17ed688

Fryguy reviewed Nov 2, 2021

View reviewed changes

Ensure primary miq_server is first in servers_from_db

3773816

Fryguy approved these changes Nov 2, 2021

View reviewed changes

Fryguy mentioned this pull request Nov 2, 2021

[WIP] [POC] Remove multiple server support from the orchestrator process. #21535

Closed

jrafanie approved these changes Nov 2, 2021

View reviewed changes

jrafanie merged commit d684403 into ManageIQ:master Nov 2, 2021

agrare deleted the only_run_kube_monitor_on_master_miq_server branch November 2, 2021 18:17

agrare added the bug label Nov 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only run kube_monitor threads on master MiqServer #21532

Only run kube_monitor threads on master MiqServer #21532

agrare commented Nov 1, 2021

This comment was marked as outdated.

agrare Nov 1, 2021

Fryguy Nov 1, 2021

Fryguy Nov 1, 2021

jrafanie Nov 2, 2021

agrare Nov 1, 2021

jrafanie Nov 2, 2021

jrafanie Nov 2, 2021

agrare Nov 1, 2021

Fryguy commented Nov 1, 2021

agrare Nov 1, 2021

Fryguy Nov 2, 2021

Fryguy Nov 2, 2021

Fryguy Nov 2, 2021

jrafanie Nov 2, 2021

agrare Nov 2, 2021

agrare Nov 2, 2021

Fryguy Nov 2, 2021

miq-bot commented Nov 2, 2021

Fryguy left a comment

jrafanie commented Nov 2, 2021

Only run kube_monitor threads on master MiqServer #21532

Only run kube_monitor threads on master MiqServer #21532

Conversation

agrare commented Nov 1, 2021

This comment was marked as outdated.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy commented Nov 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miq-bot commented Nov 2, 2021

Fryguy left a comment

Choose a reason for hiding this comment

jrafanie commented Nov 2, 2021