[V2V] Modify active_tasks so that it always reloads #18860

djberg96 · 2019-06-13T12:33:44Z

At the moment the relation we created for active_tasks is getting cached, which is causing failures for v2v concurrency. By calling count instead of size it effectively forces a reload on active tasks so that we always get an up to date value when checking for the number of concurrent tasks.

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1716283

BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1698761

Thanks to @Fryguy for the suggestion. :)

Fryguy · 2019-06-13T12:50:14Z

If you use .count instead of .size then it wont execute the query, and we'll instead do a count * every time, and thus no need for reload.

https://mensfeld.pl/2014/09/activerecord-count-vs-length-vs-size-and-what-will-happen-if-you-use-it-the-way-you-shouldnt/

djberg96 · 2019-06-13T12:59:32Z

@Fryguy Ah, true. Well, unless you use a counter_cache, but we aren't. :)

djberg96 · 2019-06-13T13:25:19Z

Not sure what's happening with 2.5.x, but the failures appear to be unrelated.

Fryguy · 2019-06-13T13:53:01Z

That error looks related to the embedded ansible changes in #18687 . @djberg96 if this PR is not fully rebased, please try that first, otherwise just rerun that one travis build. That code is being replaced anyway, so if it's sporadic, I'll take care of it.

Fryguy · 2019-06-13T13:55:44Z

lib/infra_conversion_throttler.rb

@@ -1,10 +1,10 @@
 class InfraConversionThrottler
  def self.start_conversions
    pending_conversion_jobs.each do |ems, jobs|
-      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.size }
+      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.count }


This feels like a terrible N+1 (query in a loop is bad)

there will be less than 20 conversion hosts.

I did come up with a single query for this, but since it is not in the primary loop, that can hold off for another day. This is only called 1 time per ems and is a lower concern

Fryguy

The newer changes introduce N+1s (which may have been there previously if the relations weren't cached). cc @kbrock

djberg96 · 2019-06-13T14:28:35Z

I don't see a way around it, we have to have up to date information every time. Plus, it looks to me like the database caches that explain plan (the plan, not the result), so subsequent runs are zippier than the first.

Unless @kbrock has a suggestion, I'm afraid I don't know how to avoid it without losing accuracy.

djberg96 · 2019-06-13T14:47:48Z

@Fryguy was forked off latest master, e90a007 was last commit (from today by Martin P) before I created branch.

Use count instead of reload.

miq-bot · 2019-06-13T17:00:48Z

Checked commit https://github.com/djberg96/manageiq/commit/9efc1a621415c60c3a2864bc1b7d740eabb2857d with ruby 2.3.3, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
2 files checked, 2 offenses detected

lib/infra_conversion_throttler.rb

❗ - Line 11, Col 12 - Style/NumericPredicate - Use eligible_hosts.size.positive? instead of eligible_hosts.size > 0.
❗ - Line 11, Col 12 - Style/ZeroLengthPredicate - Use !empty? instead of size > 0.

djberg96 · 2019-06-13T17:03:04Z

I'm not fixing those current cops, they're dumb.

kbrock · 2019-06-13T18:18:39Z

ok, was chatting with @djberg96

defining a virtual total gets us 95% of the way there
unfortunately, Ems#conversion_hosts is not a simple relation but rather it is vm_conversion_hosts + host_conversion_hosts. So we'll need to define a manual sum method

wasn't able to

class ConversionHost
  virtual_total :total_tasks, :active_tasks
end

class ExtManagementSystem
  def total_conversion_host_tasks
    vm_conversion_hosts.sum(:total_tasks) + host_conversion_hosts.sum(:total_tasks)
  end
end

kbrock · 2019-06-13T18:20:34Z

Use !empty? instead of size > 0
--rubocop

Also of note: size vs count vs empty? vs present? are not always easily substitutable. Some call queries and others do not. Please don't blindly listen to rubocop here. (I know you said "no", but just asking others to do the same when it comes to relations)

kbrock

While I'm not sure that cached counts is really our culprit, I can respect us wanting to get the current task count.

just s/size/count will force an 2N+1 at the very least. It will slow down this method too much.

The time between downloading all these records (or counts) and acting upon them is a kind of race condition that will cause the imbalances that you are seeing. So getting this as fast as possible will shorten this window and give us better results.

lets start off defining a virtual_total, to at least get the counts into the individual {vm,host}_conversion_host collections. If nothing else, we'll be able to use select(:total_tasks) to prefetch these total values in one fell swoop.

then lets see if we can figure out how to get eligible into the db.
that will allow us to pull back only 2 conversion_host records.

running counts in the db may be quicker, but it will also be atomic, giving us a much better chance of picking the best host quickly.

kbrock · 2019-06-13T23:37:23Z

app/models/conversion_host.rb

  def check_concurrent_tasks
    max_tasks = max_concurrent_tasks || Settings.transformation.limits.max_concurrent_tasks_per_host
-    active_tasks.size < max_tasks
+    active_tasks.count < max_tasks


please introduce total_tasks and reference that here.

that way we can preload this value in this query and not cause an N+1

I understand that you don't want to have a cached value from more than 10 seconds ago, but caching it within a single query / second seems prudent and non-wasteful.

also, of note, this count is checked within a loop that also runs counts.
so a separate count here doesn't make sense.

(also using size and prefetching all active_tasks isn't much better)

consensus: this is bad and should be changed
BUT
not today

==> Keep it with count

kbrock · 2019-06-13T23:39:11Z

lib/infra_conversion_throttler.rb

@@ -1,15 +1,24 @@
 class InfraConversionThrottler
  def self.start_conversions
    pending_conversion_jobs.each do |ems, jobs|
-      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.size }
+      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.count }


Looks like it may be tricky treating all conversion_hosts the same.

I'll try and think of a way, but adding together vm_conversion_hosts values and host_conversion_hosts values may be the next best thing.

encapsulating this and putting it into ems may at least make this method look good.

kbrock · 2019-06-13T23:40:58Z

lib/infra_conversion_throttler.rb

      jobs.each do |job|
-        eligible_hosts = ems.conversion_hosts.select(&:eligible?).sort_by { |ch| ch.active_tasks.size }
+        eligible_hosts = ems.conversion_hosts.select(&:eligible?).sort_by { |ch| ch.active_tasks.count }


not a big fan of bringing back every conversion host to then select out the eligible ones.
to then hit the database for each of those.

would like to find a way to get eligible into the query and then possibly do the sum in the database too

kbrock · 2019-06-13T23:44:11Z

lib/infra_conversion_throttler.rb

+        if eligible_hosts.size > 0
+          $log&.debug("The following conversion hosts are currently eligible: " + eligible_hosts.map(&:name).join(', '))
+        end
+
        break if slots <= 0 || eligible_hosts.empty?
        job.migration_task.update_attributes!(:conversion_host => eligible_hosts.first)


I know this is not you but...

do we really need to bring back every eligible host?

if we could get it into the query, could we just bring back the top host and vm and pick one of those?

punting on this idea - looks like it may make sense to just cache the conversion hosts

kbrock · 2019-06-13T23:44:59Z

lib/infra_conversion_throttler.rb

+        if eligible_hosts.size > 0
+          $log&.debug("The following conversion hosts are currently eligible: " + eligible_hosts.map(&:name).join(', '))
+        end
+
        break if slots <= 0 || eligible_hosts.empty?


again not you but, if there are no slots, do we even need to do this jobs.each - what will that buy us?
we're going to throw all this work away anyway.

ghost · 2019-06-14T06:08:08Z

@kbrock thanks for all these comments. They show how bad our original code is :)
However, this PR aims at fixing a regression and @Fryguy agreed that a small/good enough fix is ok for backport. I have created #18865 to track the effort to make that whole method better in master, where we can do bigger/riskier changes. Would you mind commenting on the issue, so that we don't lose all this great analysis?

ghost · 2019-06-14T07:03:23Z

@miq-bot add-label transformation, bug, hammer/yes

bdunne · 2019-06-14T12:10:08Z

lib/infra_conversion_throttler.rb

@@ -1,15 +1,24 @@
 class InfraConversionThrottler
  def self.start_conversions
    pending_conversion_jobs.each do |ems, jobs|
-      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.size }
+      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.count }
+      $log&.debug("There are currently #{running} conversion hosts running.")


Why $log&.debug? $log shouldn't be nil. Also, you probably only want to log to debug if the logger is in debug mode, so something like this may be more appropriate..

$log.debug { "There are currently #{running} conversion hosts running." }

yes, please remove this from the PR

djberg96 · 2019-06-14T13:03:51Z

@kbrock I appreciate your insights here and there's no doubt this could use some refactoring. The problem is that I don't really understand the solutions you're proposing, e.g. I have no idea what virtual_total is actually doing behind the scenes.

I wouldn't feel comfortable submitting a PR where I don't really understand the code, so I would like for you to submit a separate PR to refactor this at a later time. Given that we are pressed for time, I would ask that this be given a pass for now.

kbrock

FUTURE: move the eligible (minus the max check) out of the jobs loop and up into the ems loop

NOW: ship it

kbrock · 2019-06-14T17:23:52Z

lib/infra_conversion_throttler.rb

@@ -1,10 +1,10 @@
 class InfraConversionThrottler
  def self.start_conversions
    pending_conversion_jobs.each do |ems, jobs|
-      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.size }
+      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.count }


there will be less than 20 conversion hosts.

I did come up with a single query for this, but since it is not in the primary loop, that can hold off for another day. This is only called 1 time per ems and is a lower concern

kbrock · 2019-06-14T17:24:45Z

app/models/conversion_host.rb

  def check_concurrent_tasks
    max_tasks = max_concurrent_tasks || Settings.transformation.limits.max_concurrent_tasks_per_host
-    active_tasks.size < max_tasks
+    active_tasks.count < max_tasks


consensus: this is bad and should be changed
BUT
not today

==> Keep it with count

kbrock · 2019-06-14T18:16:31Z

lib/infra_conversion_throttler.rb

@@ -1,15 +1,24 @@
 class InfraConversionThrottler
  def self.start_conversions
    pending_conversion_jobs.each do |ems, jobs|
-      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.size }
+      running = ems.conversion_hosts.inject(0) { |sum, ch| sum + ch.active_tasks.count }
+      $log&.debug("There are currently #{running} conversion hosts running.")


yes, please remove this from the PR

kbrock · 2019-06-14T18:16:54Z

lib/infra_conversion_throttler.rb

      slots = (ems.miq_custom_get('Max Transformation Runners') || Settings.transformation.limits.max_concurrent_tasks_per_ems).to_i - running
+      $log&.debug("The maximum number of concurrent tasks for the EMS is: #{slots}.")


can we remove this from the PR as well?

kbrock · 2019-06-14T18:17:42Z

lib/infra_conversion_throttler.rb

+        if eligible_hosts.size > 0
+          $log&.debug("The following conversion hosts are currently eligible: " + eligible_hosts.map(&:name).join(', '))
+        end
+
        break if slots <= 0 || eligible_hosts.empty?
        job.migration_task.update_attributes!(:conversion_host => eligible_hosts.first)


punting on this idea - looks like it may make sense to just cache the conversion hosts

@kbrock

deferring to @kbrock

Fryguy · 2019-06-14T20:23:12Z

@kbrock I'll leave this one up to you to review and merge.

Fryguy · 2019-06-14T20:26:37Z

@kbrock , As @fdupont-redhat said above, I'd be ok with merging a less-than optimal change if it fixes the code and we can do a followup to remove the N+1s or other optimizations later, but I'll defer to your judgement on this one.

[V2V] Modify active_tasks so that it always reloads (cherry picked from commit 660387c) Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1721117 Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1721118

simaishi · 2019-06-17T11:53:52Z

Hammer backport details:

$ git log -1
commit 59e5de4f03c21eaaa584ef5a2ab933bd4ab4a8c5
Author: Keenan Brock <[email protected]>
Date:   Fri Jun 14 18:03:37 2019 -0400

    Merge pull request #18860 from djberg96/conversion_host_throttling
    
    [V2V] Modify active_tasks so that it always reloads
    
    (cherry picked from commit 660387c242a7581405ad85280370cd77317aac36)
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1721117
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1721118

Fryguy approved these changes Jun 13, 2019

View reviewed changes

Fryguy reviewed Jun 13, 2019

View reviewed changes

Fryguy self-requested a review June 13, 2019 13:59

Fryguy previously requested changes Jun 13, 2019

View reviewed changes

Modify active_tasks so that it always reloads.

9efc1a6

Use count instead of reload.

ghost mentioned this pull request Jun 13, 2019

Fix "query in loop" in InfraConversionThrottler #18865

Closed

kbrock reviewed Jun 13, 2019

View reviewed changes

miq-bot added bug hammer/yes v2v labels Jun 14, 2019

bdunne reviewed Jun 14, 2019

View reviewed changes

kbrock mentioned this pull request Jun 14, 2019

[V2V] lookup active tasks in a single query #18876

Closed

kbrock approved these changes Jun 14, 2019

View reviewed changes

Fryguy self-requested a review June 14, 2019 20:22

Fryguy assigned kbrock Jun 14, 2019

kbrock added this to the Sprint 114 Ending Jun 24, 2019 milestone Jun 14, 2019

kbrock merged commit 660387c into ManageIQ:master Jun 14, 2019

simaishi added hammer/backported and removed hammer/yes labels Jun 17, 2019

		slots = (ems.miq_custom_get('Max Transformation Runners') \|\| Settings.transformation.limits.max_concurrent_tasks_per_ems).to_i - running
		$log&.debug("The maximum number of concurrent tasks for the EMS is: #{slots}.")

[V2V] Modify active_tasks so that it always reloads #18860

[V2V] Modify active_tasks so that it always reloads #18860

Conversation

djberg96 commented Jun 13, 2019 • edited Loading

Fryguy commented Jun 13, 2019 • edited Loading

djberg96 commented Jun 13, 2019

djberg96 commented Jun 13, 2019

Fryguy commented Jun 13, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy left a comment

Choose a reason for hiding this comment

djberg96 commented Jun 13, 2019

djberg96 commented Jun 13, 2019

miq-bot commented Jun 13, 2019

djberg96 commented Jun 13, 2019

kbrock commented Jun 13, 2019 • edited Loading

kbrock commented Jun 13, 2019 • edited Loading

kbrock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Jun 14, 2019

ghost commented Jun 14, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djberg96 commented Jun 14, 2019

kbrock left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fryguy commented Jun 14, 2019

Fryguy commented Jun 14, 2019

simaishi commented Jun 17, 2019

djberg96 commented Jun 13, 2019 •

edited

Loading

Fryguy commented Jun 13, 2019 •

edited

Loading

kbrock commented Jun 13, 2019 •

edited

Loading

kbrock commented Jun 13, 2019 •

edited

Loading