Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Cap & U - ems centric cleanup #16099

Closed
wants to merge 5 commits into from

Conversation

kbrock
Copy link
Member

@kbrock kbrock commented Oct 3, 2017

Relevant PRs:

Overview

Currently perf_capture_timer, perf_capture, and perf_process are not working so well together. They create a large number of objects and a large number of queue entries. They also rely upon MiqQueue#put_or_update which is problematic going to a real queue implementation.

The goal is to reorganize the code to more closely map to how providers best communicate metrics and our bottlenecks.

Before

  • Determine the objects to be collected in a single zone.
  • This code is generic.
  • Message is sent to requesting each object be collected
  • Each object is captured, processed, and persisted.

Good: Collection can be done in parallel. (The bottleneck is persisting, not collecting)
Bad: This floods the queue and causes a number of issues for our users. This causes multiple requests for collecting the same objects.

determine objects (perf_capture_timer)
    -queue->
        collect + persist (perf_capture, perf_process)

After

  • Determine the objects to be collected for an ems.
  • This code is moved to metric_capture.rb and can be extended.
  • Capture metrics in the main process. Code moved to metric_capture.rb to allow for extensibility.
  • Send request to process a group of objects.
  • Group of objects are processed and persisted perf_process.
  • Callback on those message allows Hosts to be rolled up into an EmsCluster metrics record.

Good: able to extend the collection. Already seen big benefits for VMware.
Bad: collection is not able to run in parallel (as said before, it is not a bottleneck)

determine objects + collect(perf_capture_timer capture_metrics)
    -queue->
        collect + persist (perf_process)

Notes

storage.rb overrode all of perf_capture work. Since it does not communicate with the provider/ems I left it intact and just offloaded to the processor.

@@ -1,4 +1,5 @@
module Metric::Capture
class Metric
class Capture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This so you didn't have to indent the whole rest of the file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, I could do class Metric::Capture

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

@@ -38,6 +39,12 @@ def self.alert_capture_threshold(target)
::Settings.performance.capture_threshold_with_alerts.default)
end

attr_accessor :ems, :interval_name
def initialize(ems = nil, interval_name = "realtime")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a nil ems a valid default here? Looks like it is required further down (at least for ems.zone and maybe Metric::Targets.all_capture_targets(ems)...not sure what that does with a nil arg yet)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this object is not valid without an ems
But I feel we should always be able to create objects without args for things like specs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@kbrock
Copy link
Member Author

kbrock commented Oct 3, 2017

@agrare wants this in MetricsCapture

@kbrock kbrock force-pushed the fetch_collector_collect branch 4 times, most recently from 981452f to 62431a8 Compare October 5, 2017 18:31
_log.info("Queueing performance capture...Complete")
def self.perf_collect_all_metrics(ems, interval_name = "realtime", start_time = nil, end_time = nil, options = nil)
ems = ExtManagementSystem.find(ems) unless ems.kind_of?(ExtManagementSystem)
klass = ems.class::MetricsCapture rescue ManageIQ::Providers::BaseManager::MetricsCapture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to worry about this, if an EMS doesn't have a MetricsCapture we will just blowup later in just_perf_capture when calling perf_collect_metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was thinking this gave us compatibility for all providers while we rolled out new code

Since this is going into older branch, thought we'd go for minimal change to each of those repos

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@kbrock kbrock force-pushed the fetch_collector_collect branch 6 times, most recently from 52574b7 to 2f9e4af Compare October 6, 2017 21:14
start_time ||= ems.last_metrics_success_date
# For hourly on the first capture, we don't want to get all of the
# historical data, so we shorten the query
start_time ||= 4.hours.ago.utc if interval_name == 'hourly'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably move 4.hours.ago.utc as the "realtime cutoff" to a constant since we use it here too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@agrare agrare requested a review from Fryguy October 9, 2017 15:05
@agrare
Copy link
Member

agrare commented Oct 9, 2017

So this isn't using each target's last_perf_capture_on to drive the start_time instead using the ems's last_metrics_success_date right?

@agrare
Copy link
Member

agrare commented Oct 9, 2017

@Fryguy can you take a look?

@miq-bot
Copy link
Member

miq-bot commented Oct 16, 2017

This pull request is not mergeable. Please rebase and repush.

@kbrock kbrock force-pushed the fetch_collector_collect branch from 2f9e4af to 13a83a4 Compare October 23, 2017 13:53
@kbrock kbrock force-pushed the fetch_collector_collect branch from 13a83a4 to 54c8809 Compare October 25, 2017 13:30
@kbrock kbrock changed the title [WIP] Cap & U - ems centric [WIP] Cap & U - ems centric cleanup Oct 27, 2017
@kbrock kbrock force-pushed the fetch_collector_collect branch from 54c8809 to 9babbae Compare October 27, 2017 15:37
# called by ui
# run a perf capture zone for a zone or ems
def self.perf_capture_gap_queue(start_time, end_time, zone = nil)
emses = if zone.kind_of?(ExtManagementSystem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor this method's signature? The way this param zone can be an ems is against intuition.

emses = if zone.kind_of?(ExtManagementSystem)
zone
else
zone = Zone.find(zone) if zone && !zone.kind_of?(Zone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this Zone.find take in any thing and return the zone that the thing is in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it takes integers or strings - typically from the queue

@kbrock kbrock force-pushed the fetch_collector_collect branch from 3eb8d47 to c128c36 Compare November 6, 2017 15:45
@@ -62,111 +53,12 @@
}
end

it "should queue up enabled targets" do
expect(MiqQueue.group(:class_name, :method_name).count).to eq(expected_queue_items)
Copy link
Contributor

@jameswnl jameswnl Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This let block defining this now deleted expected_queue_items should be removed as well

@@ -34,15 +34,6 @@
end

context "executing capture_targets" do
it "should find enabled targets" do
targets = Metric::Targets.capture_targets
assert_infra_targets_enabled targets, %w(ManageIQ::Providers::Vmware::InfraManager::Vm ManageIQ::Providers::Vmware::InfraManager::Host ManageIQ::Providers::Vmware::InfraManager::Host ManageIQ::Providers::Vmware::InfraManager::Vm ManageIQ::Providers::Vmware::InfraManager::Host Storage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert_infra_targets_enabled seems obsoleted as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I'll delete Metric::Targets.capture_targets as well

:class_name => "Metric::Capture",
:method_name => "perf_collect_all_metrics",
:args => [ems.id, "realtime", nil, nil, :exclude_storage => true],
:zone => ems.zone_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 so this should be able to catch the zone_name vs zone.id issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

provided that the test was written correctly in the first place :( - thanks for the catch

end

# legacy messages on the queue
# went away in 4.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this #went away in 4.6 still true?
we need this for 4.6 coz 4.5 could have left behind queued item for this deprecated method.

Copy link
Member Author

@kbrock kbrock Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the callers have all gone away.
I put this note in there to let people know that it is just around for legacy messages.

Are you thinking the scheduler should be paired down and still generate this message?
So all the intelligence will be in this process instead of the Scheduler::Runner?

Alternatively, we could delete the contents of this method since there will only ever be one message in the queue and the contents will just get picked up a few minutes later when the scheduler submits it again.

perf_collect_all_metrics_queue(zone.ext_management_systems, "realtime")
end

def self.perf_collect_all_metrics_queue(ems, interval_name = "realtime", start_time = nil, end_time = nil, options = {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why not make the param ems as emses and we have no need for the Array.wrap?
(Is this ruby style to make the caller happier?)

Copy link
Member Author

@kbrock kbrock Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not realized that everything coming in is an array.

My initial intent was for us to queue a single metrics collect.
Sad to see the Array.wrap go, but think you are right - making edit

it "triggers better capture" do
EvmSpecHelper.local_miq_server.zone
ems # default zone
ems2 # other zone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and ems3 as well

@kbrock kbrock force-pushed the fetch_collector_collect branch from c128c36 to 26c10fe Compare November 8, 2017 23:48
@kbrock kbrock changed the title Cap & U - ems centric cleanup [WIP] Cap & U - ems centric cleanup Nov 14, 2017
@kbrock kbrock added the wip label Nov 14, 2017
@kbrock kbrock force-pushed the fetch_collector_collect branch 2 times, most recently from 872c419 to c58df51 Compare November 14, 2017 15:07
@miq-bot
Copy link
Member

miq-bot commented Nov 14, 2017

This pull request is not mergeable. Please rebase and repush.

@kbrock kbrock force-pushed the fetch_collector_collect branch from c58df51 to a65edb1 Compare November 22, 2017 16:58
@kbrock kbrock force-pushed the fetch_collector_collect branch 2 times, most recently from e09afa4 to 1cde354 Compare January 25, 2018 22:29
@miq-bot
Copy link
Member

miq-bot commented Feb 16, 2018

This pull request is not mergeable. Please rebase and repush.

@kbrock kbrock force-pushed the fetch_collector_collect branch from b1d6402 to fc76af2 Compare February 27, 2018 02:39
@miq-bot
Copy link
Member

miq-bot commented Feb 27, 2018

Some comments on commits kbrock/manageiq@45d685c~...fc76af2

spec/models/miq_vim_broker_worker_spec.rb

  • ⚠️ - 41 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.
  • ⚠️ - 7 - Detected allow_any_instance_of. This RSpec method is highly discouraged, please only use when absolutely necessary.

@miq-bot
Copy link
Member

miq-bot commented Feb 27, 2018

Checked commits kbrock/manageiq@45d685c~...fc76af2 with ruby 2.3.3, rubocop 0.52.0, haml-lint 0.20.0, and yamllint 1.10.0
14 files checked, 4 offenses detected

app/models/manageiq/providers/base_manager/metrics_capture.rb

app/models/vim_performance_state.rb

  • ❗ - Line 249, Col 7 - Style/SafeNavigation - Use safe navigation (&.) instead of checking if an object exists before calling the method.

@miq-bot
Copy link
Member

miq-bot commented Mar 20, 2018

This pull request is not mergeable. Please rebase and repush.

@miq-bot
Copy link
Member

miq-bot commented Sep 24, 2018

This pull request has been automatically closed because it has not been updated for at least 6 months.

Feel free to reopen this pull request if these changes are still valid.

Thank you for all your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants