-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve metrics saving #15976
Improve metrics saving #15976
Conversation
89d5384
to
54ddd52
Compare
@agrare so this is just a super WIP, but the numbers looks good example of just saving 25920 AWS samples (I am changing generic saving code, so any metrics will apply), which is 6 days of data, doing @kbrock @Fryguy ^ check the :process_perfs_db, the saving is 20x - 30x faster just by doing a simple batching (batch INSERT and batch UPDATE), with more room to optimize it (batch upsert, rules instead of triggers, etc...)
|
Hm I hadn't thought of using an InventoryCollection for metrics... |
54ddd52
to
ec220d8
Compare
@agrare so I am thinking that I should add postgre_batch adapter, so we can easily switch to the old version? |
@Ladas is that something we could set with an options hash? Idk how much logic is in the adapter but to duplicate a whole adapter for one option sounds like a lot. |
@@ -35,6 +35,32 @@ def write(metric) | |||
def write_multiple(*_metrics) | |||
raise NotImplementedError, "must implemented by the adapter" | |||
end | |||
|
|||
def transform_parameters(resource, interval_name, _start_time, _end_time, rt_rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ladas can this stay in perf_process? It seems weird to call into ActiveMetrics then call back out to Metric::Helper.process_derived_columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I am extracting it here since PG adapter will have specific code. Right now we are transforming the data like 4 times from the original data, ending with original data format. :-)
perf_process(interval_name, start_range, end_range, counters, counter_values) | ||
|
||
# Convert to format allowing to send multiple resources at once | ||
counters_data = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed this will be perfect to queue perf_process
for the metrics_processor_worker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool 👍
For 5.0, I would like to move to the generic Persistor format, so we can reuse inventory Persistors (and using grouping, we should be able to make generic worker a dedicated worker?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going over to using the queue before 5.0 - so would be nice to go over to the generic Persistor
format within the next week or so here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this format is fine since we're going to be invoking perf_process
directly in the metrics_processor_worker
not some generic persister.
@agrare @Fryguy @kbrock these are current time/memory benefits of the optimized code. In a next PR, I am going to bring back the old code as a postgre_legacy adapter and I'll try to create a spec that compares the 2 postgre adapters produce exactly the same numbers (plus I'll try to expand the specs a bit, we do not test it as much as I would like) Benchmark scenario:
The most important metric is create all metrics as 95% of metric are just created, not updated. But we can get to update all Metric state if we need to rerun collection for some reason. The scenario is done on 6days of data for 1 Vm, but it's the same as 1day of data for 6 Vms, etc. Also it's worth to note that the old code is doing extra ~75k queries (1 per each metric + begin/end transaction per each), comparing to the new code. So on a ~1ms remote DB, that is +75s. So the total time will be easily 20x faster for a new code. :-) |
@miq-bot remove_label wip |
@@ -175,6 +175,7 @@ def perf_capture(interval_name, start_time = nil, end_time = nil) | |||
|
|||
start_range = end_range = counters = counter_values = nil | |||
_, t = Benchmark.realtime_block(:total_time) do | |||
# TODO why we call capture here? We call the same in processing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean via vim_performance_state_for_ts
? It looks like that's only in the event that we are missing the state:
state = vim_performance_states.find_by(:timestamp => ts)
...
state ||= perf_capture_state
``
So this the the primary place it should be called from. Maybe we can get rid of the "backup" in perf_process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah possibly, my tests are a bit weird with this, since I do run capture 6 days back, most of the vim_performance_states are missing. So then it does a lot of duplicate queries trying to catch them, ending with assigning the last vim_performance_states to every ts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not thrilled with the ts
lookup caching. There may be a way to explicitly look them up and populate them. I'm suspect this code is causing a memory leak that @NickLaMuro is researching. But also, it would be nice to load the correct ts
values for @Ladas .
(Sorry @NickLaMuro - I may have misunderstood and unnecessarily roped you in)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, is it? I was thinking it could mem leak, but it just stores cached values in instance variable? (or is it doing something nasty inside?)
@kbrock right, so I am at least calling preload_vim_performance_state_for_ts_iso8601, so it doesn't do n+1 queries, but it still does n+1 queries if it can't find some hour state. But yeah, more refactoring can be done here, but perf-wise, this is not bottleneck now.
@kbrock for me this is good to go. (I just wanted to add YARD doc, haha) For backport, I am using GraphRefresh batch saving to build the batched query, so we can't really backport unless we backport that whole graph refresh machinery (I'd just copy paste the whole manager_refresh dir, so it's doable theoretically :-)) |
@@ -35,6 +35,33 @@ def write(metric) | |||
def write_multiple(*_metrics) | |||
raise NotImplementedError, "must implemented by the adapter" | |||
end | |||
|
|||
def transform_parameters(_resources, interval_name, _start_time, _end_time, rt_rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ladas I still think this belongs in Metric::CiMixin::Processing
not in ActiveMetrics
. It is very specific to the rt_rows
that we return from Metric::CiMixin::Capture
and already calls back out to Metric::Processing
for process_derived_columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so would you rather do the parsing based on Adapter class, in the Metric::CiMixin::Processing code? (there is a different transform_parameters for the new postgre adapter)
Save metrics in batches
This pull request is not mergeable. Please rebase and repush. |
f422a44
to
8aa20fd
Compare
Tweak graph refresh to allow timestamp as an index
More effective interface for passing data to pg adapter
Optimize the memory some more
Preload vim_performance_state_for_ts to avoind n+1 queries
Allow perf_process to receive data of multiple resources
Fetch all records for 1 model in 1 query, instead of doing n+1 queries.
Move transforming parameters back to processing code. Based on reviews, this doesn't belong to ActiveMetric adapter code.
8aa20fd
to
7523d62
Compare
Fix rubocop issues
7523d62
to
47fc2e6
Compare
Remove clarified TODO
Pack transform_resources! in a helper method
Checked commits Ladas/manageiq@c3b649c~...ce5ebca with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0 app/models/metric/ci_mixin/capture.rb
app/models/metric/ci_mixin/processing.rb
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM
look forward to getting rid of the legacy adapter
Improve metrics saving
Lets optimize metrics saving