[Rearch] Use memcached for sending messages to workers #15471

gtanzillo · 2017-06-28T22:35:33Z

Use memcached and a timestamp as a mechanism for telling workers to synchronize their config and role without going through a Drb connection.

When the worker monitor detects that they have changes it stores the timestamp in memcached in the key that corresponds to the thing that changes. The workers check memcachd each time they heartbeat and if the timestamp is greater than the timestamp of the last sync it will fabricate the message that corresponds to the thing that changed and process it.

miq-bot · 2017-07-20T18:37:16Z

This pull request is not mergeable. Please rebase and repush.

gtanzillo · 2017-08-14T20:00:51Z

@jrafanie @Fryguy This is ready to go

…hange

NickLaMuro

So, I realize you didn't ask for my review, but I am giving it anyway! 😤

But in all seriousness, this seems to coincide with some of the research I am doing around SIGTERM with workers (or rather, how MiqServer currently tells workers to exit), and specifically that the same message interface is used for killing workers currently as it is for syncing configs:

https://github.com/ManageIQ/manageiq/blob/8ca456cdc/app/models/miq_server/worker_management/monitor/stop.rb#L44

I don't think it is necessary to do anything for in this PR, especially since the workers will be killed through OpenShift/kubernetes I believe (not sure what we have in place for that so far). But if any messages might be needed to be passed around, this would probably be how we would do it (that said, some of my review comments suggest getting rid of a shared mechanism for this, so those are dependent on if message passing is really needed).

Anyway, just some things I noticed while taking a look at what you have done here.

NickLaMuro · 2017-08-14T23:24:01Z

app/models/miq_server/worker_management/monitor.rb

@@ -147,8 +147,22 @@ def sync_needed?
      start_apache               if roles_changed &&  apache_needed?

      reset_queue_messages       if config_changed || roles_changed
+
+      update_sync_timestamp(@last_sync)


Should this be gated with if ENV["WORKER_HEARTBEAT_METHOD"], otherwise these messages won't get processed, right?

I guess this isn't really a "message to be processed", and just a single key/val... so maybe not so relevant.

I figured it wouldn't hurt to always set the timestamp on memcached even if the worker is still getting its messages from Drb.

NickLaMuro · 2017-08-14T23:29:27Z

app/models/miq_worker/runner.rb

+
+  def key_store
+    @key_store ||= Dalli::Client.new(MiqMemcached.server_address, :namespace => "server_monitor")
+  end


Since this is used in both the server and the worker models, should this be tossed into a module/mixin?

One thing I would say is we probably want to split this into three modules (don't hate me... 😅 ):

MiqWorkerMonitorMemcache::Connection

MiqWorkerMonitorMemcache::Reader

MiqWorkerMonitorMemcache::Writer

Names are a total WIP, but the idea being that they would share the connection stuff, but probably only the worker_monitor would be writing to Memcache, while the workers would just be reading (otherwise there is mutex stuff you would have to deal with). You might need to for other messages though, but that depends on how we implement some things.

Good point. Is it premature? My hope was that we'd know better about how this would look when we figure out how and if we'd use memcached/redis/etc. to tell a worker (who's caught a SIGINT/SIGTERM and continued finishing the queue work it was doing) to exit. Either way, @NickLaMuro's research into how we let workers gracefully finish their work might drive the interface for this.

@NickLaMuro's research into how we let workers gracefully finish their work might drive the interface for this.

Most of that work should be handled here: #15818

And that said, I don't know that there are any messages we would have to worry about doing this for after the above, so I would agree that my suggestion is probably premature knowing that.

NickLaMuro · 2017-08-14T23:49:46Z

app/models/miq_worker/runner.rb

+  def get_messages
+    messages = []
+    now = Time.now.utc
+    @my_last_config_change ||= now


Can we simplify these to lines to this?

@my_last_config_change ||= Time.now.utc

That way, we are only calling Time.now.utc if @my_last_config_change isn't set, and it doesn't look like the now variable is being used elsewhere.

NickLaMuro · 2017-08-14T23:53:49Z

app/models/miq_worker/runner.rb

+      _log.info("#{log_prefix} Configuration has changed, New TS: #{last_config_change}, Old TS: #{@my_last_config_change}")
+      messages << ["sync_config"]
+
+      @my_last_config_change = last_config_change


Should we change this to this maybe?

@my_last_config_change = Time.now.utc

Reason being, is you do a > check in the if, and if you just set it to the same time, then it will just continue calling sync_config every time.

Good call, I'll make that change, thanks!

Actually, I think this is ok the way it is. When @my_last_config_change and last_config_change are the same it won't call sync_config.

@gtanzillo You are right, I had read the if statement like:

@my_last_config_change < last_config_change

Instead of what you have. Yours works correctly, so lets just stick with that.

NickLaMuro · 2017-08-14T23:55:49Z

app/models/miq_worker/runner.rb

@@ -374,6 +375,32 @@ def heartbeat_to_drb
  def heartbeat_to_file(timeout = nil)
    timeout ||= worker_settings[:heartbeat_timeout] || Workers::MiqDefaults.heartbeat_timeout
    File.write(@worker.heartbeat_file, (Time.now.utc + timeout).to_s)
+
+    get_messages.each { |msg, *args| process_message(msg, *args) }


Do we see a need going through all of the extra work of looping through an array of messages if there will only ever be one type we will be handling via memcache? We could simplify get_messages to something like this:

def sync_config_if_needed @my_last_config_change ||= Time.now.utc last_config_change = server_last_change(:last_config_change) if last_config_change && last_config_change > @my_last_config_change _log.info("#{log_prefix} Configuration has changed, New TS: #{last_config_change}, Old TS: #{@my_last_config_change}") sync_config @my_last_config_change = Time.now.utc end end

And just change this line to sync_config_if_needed? No new array needed, and we will just exit early if the if in the above isn't satisfied.

For this PR, I was just looking to get to functional equivalence with messaging through Drb. We can start evolving this in future PRs as we start removing Drb from the workers.

…m key store with last change timestamp

miq-bot · 2017-08-16T13:36:50Z

Some comments on commits gtanzillo/manageiq@3be2350~...829ce96

spec/models/miq_worker/runner_spec.rb

⚠️ - 44 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.

miq-bot · 2017-08-16T13:37:48Z

Checked commits gtanzillo/manageiq@3be2350~...829ce96 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
4 files checked, 1 offense detected

app/models/miq_worker/runner.rb

❗ - Line 382, Col 7 - Style/AccessorMethodName - Do not prefix reader method names with get_.

jrafanie

Looks good to me. @NickLaMuro what do you think?

jrafanie · 2017-08-17T13:46:42Z

YOLO MERGING. I believe @NickLaMuro told me he was good with the changes yesterday... If not, we can fix in a followup PR.

NickLaMuro · 2017-08-17T15:47:12Z

@jrafanie I am mostly good with the changes, but I did want to have @gtanzillo address the test using allow_any_instance_of and RuboCop error (which I figured could have been changed to fetch_messages instead).

NickLaMuro · 2017-08-17T15:52:01Z

spec/models/miq_worker/runner_spec.rb

+  context "#get_messages" do
+    before do
+      allow_any_instance_of(MiqWorker::Runner).to receive(:worker_initialization)
+      @worker_base = MiqWorker::Runner.new


Possibly you could have added this to the top of the spec:

module MiqWorkerForSpecs module Runner < MiqWorker::Runner def initialize(cfg ={}) end end end

Then these two lines could have changed to this single line:

@worker_base = MiqWorkerForSpecs::Runner.new

NickLaMuro · 2017-08-17T15:52:29Z

app/models/miq_worker/runner.rb

+    get_messages.each { |msg, *args| process_message(msg, *args) }
+  end
+
+  def get_messages


Change this to fetch_messages maybe to avoid the rubocop warning.

NickLaMuro · 2017-08-17T15:52:59Z

Left comments for future fixup in a later PR.

gtanzillo added enhancement wip labels Jun 28, 2017

chessbyte assigned Fryguy Jul 5, 2017

gtanzillo mentioned this pull request Jul 18, 2017

[Rearch] Combine worker messages 'sync_config' and sync_active_role' into a single 'sync_config' message. #15597

Merged

miq-bot added the unmergeable label Jul 20, 2017

gtanzillo force-pushed the rearch-messages-key-store branch from c266896 to 4ad30dc Compare August 2, 2017 21:56

miq-bot removed the unmergeable label Aug 2, 2017

gtanzillo added core/workers and removed wip labels Aug 14, 2017

gtanzillo changed the title ~~[WIP] [Rearch] Use memcached for sending messages to workers~~ [Rearch] Use memcached for sending messages to workers Aug 14, 2017

gtanzillo force-pushed the rearch-messages-key-store branch from 4ad30dc to 1053c9b Compare August 14, 2017 20:33

Using memcached as a key store, save timestamp when config or roles c…

3be2350

…hange

gtanzillo force-pushed the rearch-messages-key-store branch from 1053c9b to 6a536af Compare August 14, 2017 20:46

NickLaMuro reviewed Aug 15, 2017

View reviewed changes

Process server monitor sync_config message by comparing timestamp fro…

829ce96

…m key store with last change timestamp

gtanzillo force-pushed the rearch-messages-key-store branch from 6a536af to 829ce96 Compare August 16, 2017 13:34

jrafanie approved these changes Aug 16, 2017

View reviewed changes

jrafanie merged commit 02cfa8f into ManageIQ:master Aug 17, 2017

jrafanie added this to the Sprint 67 Ending Aug 21, 2017 milestone Aug 17, 2017

NickLaMuro reviewed Aug 17, 2017

View reviewed changes

jrafanie mentioned this pull request Apr 9, 2018

Require dalli before using it #17269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Rearch] Use memcached for sending messages to workers #15471

[Rearch] Use memcached for sending messages to workers #15471

gtanzillo commented Jun 28, 2017 •

edited

Loading

miq-bot commented Jul 20, 2017

gtanzillo commented Aug 14, 2017

NickLaMuro left a comment

NickLaMuro Aug 14, 2017

NickLaMuro Aug 15, 2017

gtanzillo Aug 15, 2017

NickLaMuro Aug 14, 2017

jrafanie Aug 15, 2017

NickLaMuro Aug 16, 2017 •

edited

Loading

NickLaMuro Aug 14, 2017

gtanzillo Aug 15, 2017

NickLaMuro Aug 14, 2017

gtanzillo Aug 15, 2017

gtanzillo Aug 16, 2017

NickLaMuro Aug 16, 2017

NickLaMuro Aug 14, 2017

gtanzillo Aug 15, 2017

miq-bot commented Aug 16, 2017

miq-bot commented Aug 16, 2017

jrafanie left a comment

jrafanie commented Aug 17, 2017

NickLaMuro commented Aug 17, 2017

NickLaMuro Aug 17, 2017

NickLaMuro Aug 17, 2017

NickLaMuro commented Aug 17, 2017

[Rearch] Use memcached for sending messages to workers #15471

[Rearch] Use memcached for sending messages to workers #15471

Conversation

gtanzillo commented Jun 28, 2017 • edited Loading

miq-bot commented Jul 20, 2017

gtanzillo commented Aug 14, 2017

NickLaMuro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NickLaMuro Aug 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miq-bot commented Aug 16, 2017

miq-bot commented Aug 16, 2017

jrafanie left a comment

Choose a reason for hiding this comment

jrafanie commented Aug 17, 2017

NickLaMuro commented Aug 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NickLaMuro commented Aug 17, 2017

gtanzillo commented Jun 28, 2017 •

edited

Loading

NickLaMuro Aug 16, 2017 •

edited

Loading