Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate metric_rollups not dealing with active relation #16166

Merged
merged 1 commit into from
Oct 13, 2017

Conversation

jameswnl
Copy link
Contributor

https://bugzilla.redhat.com/show_bug.cgi?id=1463165

Duplicated rollup records are still there in reports.

@jameswnl
Copy link
Contributor Author

@miq-bot add_label wip, bug

@jameswnl
Copy link
Contributor Author

This change would fix the issue. it’s a wip coz not sure why we had that check in first place, i guess it’s for performance consideration, wanna to hold back hitting db til last.

@jameswnl
Copy link
Contributor Author

@lpichler @Ladas

if recs.respond_to?(:klass) # active record relation
return recs unless recs.klass.kind_of?(Metric) || recs.klass.kind_of?(MetricRollup)
elsif recs.empty? || !recs.all? { |r| r.kind_of?(Metric) || r.kind_of?(MetricRollup) }
if recs.empty? || !recs.all? { |r| r.kind_of?(Metric) || r.kind_of?(MetricRollup) }
Copy link
Contributor

@lpichler lpichler Oct 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can still use previous code and only change
this condition:
https://github.com/ManageIQ/manageiq/blob/master/app/models/metric/helper.rb#L124
to

 return recs unless recs.klass <= Metric || recs.klass <=MetricRollup

because kind_of? is for instances and not for classes.

I think that with this change we can fix the bug and keep @kbrock's intention in #14224.

it should work also with VmPerformance because it is inherited from MetricRollup.

It would be also nice to add specs when the method has AR scope as input (MetricRollup.where(..))

thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Done

@jameswnl jameswnl force-pushed the dedup branch 2 times, most recently from cad295a to 9b508f9 Compare October 11, 2017 14:24
@miq-bot
Copy link
Member

miq-bot commented Oct 11, 2017

Checked commit jameswnl@d338c6f with ruby 2.3.3, rubocop 0.47.1, and haml-lint 0.20.0
2 files checked, 0 offenses detected
Everything looks fine. ⭐

@jameswnl jameswnl changed the title [WIP] Remove duplicate metric_rollups not dealing with active relation Remove duplicate metric_rollups not dealing with active relation Oct 11, 2017
@jameswnl
Copy link
Contributor Author

@miq-bot remove_label wip

@miq-bot miq-bot removed the wip label Oct 11, 2017
@lpichler
Copy link
Contributor

cc @gtanzillo - just letting you know - it could affect chargeback (not sure how much when we are using max or avg)

Copy link
Contributor

@lpichler lpichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 thanks!

Copy link
Contributor

@Ladas Ladas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome

@blomquisg
Copy link
Member

it could affect chargeback (not sure how much when we are using max or avg)

@lpichler can you elaborate on your comment?

What's the overlap of max/avg and this dedup code? Are you expecting the max/avg results to change, but now be correct?

@jameswnl
Copy link
Contributor Author

@miq-bot add_labels fine/yes, euwe/yes
Both fine/euwe are affected

@@ -121,7 +121,7 @@ def self.sanitize_start_end_time(interval, interval_name, start_time, end_time)

def self.remove_duplicate_timestamps(recs)
if recs.respond_to?(:klass) # active record relation
return recs unless recs.klass.kind_of?(Metric) || recs.klass.kind_of?(MetricRollup)
return recs unless recs.klass <= Metric || recs.klass <= MetricRollup
elsif recs.empty? || !recs.all? { |r| r.kind_of?(Metric) || r.kind_of?(MetricRollup) }
return recs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameswnl I just noticed this looking at the diff in git, can you remove the tailing whitespace here? It was added in 9dab1c4

@agrare
Copy link
Member

agrare commented Oct 13, 2017

@lpichler @gtanzillo it looks to me that this would make chargeback "more correct" but I understand any change in chargeback isn't good. Is this something that we should do more planning on or do you think this is okay?

@lpichler
Copy link
Contributor

it could affect chargeback (not sure how much when we are using max or avg)
@lpichler can you elaborate on your comment?
"more correct"

yes we are now more correct but how much it is affected - it is based on how much duplicated records have been in the system for specific date range ( range the related to report whether it is monthly, weekly, daily )

Chargeback rating calculations are not related to counts of metric rollups.
Metric value (of any metric, which is used for chargeback cost calculation) is taken for the special date range and there is applied MAX or AVG.

so it can affect AVG then MAX - because for AVG we are summing up metric values and we are diving it by hours in the date range.

But previously(year before) when I was facing this issue(#11017) duplicated records mean that (duplicate records are records with same resource and timestamp)
I have one metric rollup record with any values and lot of them with empty metric values.
and this case shouldn't affect AVG neither MAX at all.

No sure how looks data where were we found the issue - if they have duplicated records with non-empty metric values.

So it depends how duplication is frequentant (especially for AVGs).

@gtanzillo
Copy link
Member

I agree that this change should make any calculations based on a range of metrics more accurate. However, it's not apparent what the cause of the dups is and whether they are truly dups - all values are the same or whether they are dups based on timestamp and resource but with different metric values. If it's the latter, is there a risk of throwing away valid metrics?

I guess my question is somewhat beyond the scope of this PR since we were already doing this. And if this make the process more accurate, I'm good with it.

@agrare
Copy link
Member

agrare commented Oct 13, 2017

@gtanzillo so we believe the cause of the dups is because we have >1 metrics_processor_workers and we have overlapping date ranges queued so they both process the same rollup range at the same time and without unique indexes on the table or some external locking it will be difficult to prevent this. This is something we're planning on addressing in rearch.

@Ladas @jameswnl please correct anything I got wrong in ^^ :D

@lpichler
Copy link
Contributor

resource but with different metric values. If it's the latter, is there a risk of throwing away valid metrics?

the method is doing 'merging' values. It means that if any value is missing(is nil) it is populated from other duplicate record with any value. So if duplicated records have values, the winner is the first record with any value. (For chargeback query is ordered by 'resource_id, timestamp' which are same, so I don't know how to determine what is first record in the set of duplicates.)

@agrare agrare added this to the Sprint 71 Ending Oct 16, 2017 milestone Oct 13, 2017
@agrare agrare merged commit d338c6f into ManageIQ:master Oct 13, 2017
agrare added a commit that referenced this pull request Oct 13, 2017
Remove duplicate metric_rollups not dealing with active relation
@agrare
Copy link
Member

agrare commented Oct 13, 2017

Nice find @jameswnl !

@Ladas
Copy link
Contributor

Ladas commented Oct 13, 2017

@gtanzillo yes what @agrare said, we will be adding unique indexes for 5.0, so any duplicates will not be possible. Until then, we need to live with it, since it's pretty hard to avoid duplication.

@jameswnl jameswnl deleted the dedup branch October 13, 2017 21:36
simaishi pushed a commit that referenced this pull request Oct 23, 2017
Remove duplicate metric_rollups not dealing with active relation
(cherry picked from commit 15e48b8)

https://bugzilla.redhat.com/show_bug.cgi?id=1505417
@simaishi
Copy link
Contributor

Euwe backport details:

$ git log -1
commit 76255e3f57ecc6e78e7361b95b430a8602719380
Author: Adam Grare <[email protected]>
Date:   Fri Oct 13 09:32:10 2017 -0400

    Merge pull request #16166 from jameswnl/dedup
    
    Remove duplicate metric_rollups not dealing with active relation
    (cherry picked from commit 15e48b8e96c1aa28e5f91fa67035d14e0c98baf9)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1505417

simaishi pushed a commit that referenced this pull request Nov 13, 2017
Remove duplicate metric_rollups not dealing with active relation
(cherry picked from commit 15e48b8)

https://bugzilla.redhat.com/show_bug.cgi?id=1505415
@simaishi
Copy link
Contributor

Fine backport details:

$ git log -1
commit 5702efb5e85650bbd506b7178138d928ab2e3c2c
Author: Adam Grare <[email protected]>
Date:   Fri Oct 13 09:32:10 2017 -0400

    Merge pull request #16166 from jameswnl/dedup
    
    Remove duplicate metric_rollups not dealing with active relation
    (cherry picked from commit 15e48b8e96c1aa28e5f91fa67035d14e0c98baf9)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1505415

d-m-u pushed a commit to d-m-u/manageiq that referenced this pull request Jun 6, 2018
Remove duplicate metric_rollups not dealing with active relation
(cherry picked from commit 15e48b8)

https://bugzilla.redhat.com/show_bug.cgi?id=1505415
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants