Skip to content
This repository has been archived by the owner on Jul 31, 2023. It is now read-only.

clear min and max on every ExportView #1182

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

marwan-at-work
Copy link

@marwan-at-work marwan-at-work commented Oct 30, 2019

Fixes #1181

Or at least I think so :)

I am not familiar with the code that much so would love some close eyes on this. I am not sure what kind of impact this one line of code would have on performance. However, it does seem to fix the behavior described in the issue above and all the unit tests did pass locally.

@@ -233,6 +233,7 @@ func (w *worker) reportView(v *viewInternal, now time.Time) {
e.ExportView(viewData)
}
exportersMu.Unlock()
v.clearRows()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect sum/count and bucket count in Distribution. It will also affect count aggregation which is expected to be cumulative.

I understand the problem of min and max that you are trying to solve. Reseting just min and max might be the solution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rghetia Thanks for the review :)

So I don't think it's just the min/max. I'm not sure about the sum/count in Distribution just yet, but for the Count-Aggregation, it should probably also be reset and let the Exporter be responsible for the cumulativeness side of things.

The reason I say that is because if the server restarts for whatever reason, then the count is completely gone. Furthermore, if the Count is displayed on a timeseries graph, then a lot of these values will be falsely duplicated.

For example, imagine a server receives 10 requests per second..after 10 seconds, the graph will say there are 100 requests in the last second (which is wrong, there were only 10) and if the serve restarts you will see a sharp drop from 100 back to 10.

If we collect the "count per interval", then that should solve the issue.

Would also love to know more about which data should never be reset, even if the process restarts and the data gets "forced" to be reset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Distributions and Counters have Start-time. Start-time lets backend detect the restart.
It also lets it calculate the rate correctly.
Example:

At time T0: Application starts 
T1: First request served. Start-time is set to T1
T2: 100 request served. 
T3: 200 request served. Export enabled.
T4: 300 request served. First export happens, Count = 300, Start-time = T1. Rate=300/(T4-T1)
T5: 400 request served.
T6: Application Restarts
T7: First request served after restart, Start-time= T7
T8: 100 request served. Export count = 100, Start-time = T7, Rate = 100/(T8-T7)

Resetting all metrics would be a behavior change and would have an adverse consequence on some backends.

Copy link
Author

@marwan-at-work marwan-at-work Nov 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rghetia awesome. However, from what I can see there's no way for a backend to determine that:

The CountData only has access to Value and no idea of start time: https://github.com/census-instrumentation/opencensus-go/blob/master/stats/view/aggregation_data.go#L42

In your example, the Exporter does not have access to T1 (does it?), it can have access to T0 by just putting func init() { t1 = time.Now() } I imagine but that's not accurate.

Furthermore, dividing Count by (Tnow-T1) will just make an even distribution across time, and not the real distribution. For example, if the first minute we got 300 requests, but the next minute we got 10 requests, it will not show that accurately no?

Finally, do you have an example of an exporter doing this time calculation?

Thanks again :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startTime per view is stored here

There are two interface for exporters to get the data

  1. ExportView(vd *ViewData)
    starttime for ViewData is set here
  2. ExportMetric(context.Context, []*metricdata.Metric)
    startTime for Metric is retrieved here
    and set here

Regarding even distribution - If backend has individual data points then it can correctly represent the rate for each time window. For example
T0: Count = 300
T1: Count = 400
T2: Count = 410
Rate for the window T1-T0 = (400-300)/(T1-T0)
Rate for the window T2-T1 = (410-400)/(T2-T1)

ViewData and Metric both have end time as well. So the backend can compute the rate accordingly.

I'll post a screenshot shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer

So in this case, min/max are restarted, count/sum/avg are diffed, and SumOfSquaredDev is an unsolvable bug. This means Exporters should probably be aware of those details to export the metrics accurately.

over

Would you prefer the above over view.DisableMonotonicity(true) configuration?

because, the later changes the meaning of aggregation as currently defined. It would be sort of a hack.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rghetia sounds good, then feel free to review/merge this pr 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rghetia there's a potential new issue: how can we ensure that ExportView does not get called if a stats.Record() was never called after the last interval?

AFAIK, in the StatsD model, you only report to the backend when there's data. But OpenCensus will infinitely call ExportData even if the code hasn't called stats.Record since the last ReportingPeriod...right?

Copy link
Author

@marwan-at-work marwan-at-work Nov 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original solution, clearRows made sure that there are no rows, and therefore no new data is reported.

I'm not sure how I can replicate that or find a way to determine that no new data has been given. This is because if I take the diff from the current data minus the previous data, if the value is zero it can either mean that the there's no new data, or that by accident the current data is equal to the previous data.

Copy link

@AlexandreYang AlexandreYang Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Create Delta types for both Count [and Sum?] and DistributionData and implement them everywhere

@marwan-at-work @rghetia I'm not sure if that ^ option has been discarded or not. Wouldn't this solve all issues mentioned if implemented ?

@rghetia rghetia changed the title clearRows on every ExportView clear min and max on every ExportView Nov 18, 2019
@@ -233,6 +233,7 @@ func (w *worker) reportView(v *viewInternal, now time.Time) {
e.ExportView(viewData)
}
exportersMu.Unlock()
v.resetValues()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This should be also called from w.Read() method. This is for new mechanism (ExportMetrics)
  2. Apart from that reportUsage() should not go through all the views unless there is an exporter registered (used for old mechanism ExportView())
    If user configures both ways or multiple exporter using new mechanism then values of min/max is unpredictable.
  3. Min/Max doc should be updated to reflect the behavior.
  4. Add tests for min/max.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rghetia done, let me know if the last commit is what you were looking for 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test is good.
Item 2 above is not taken care of.

There should be new option in ReadAndExport method. If the option to reset min/max is specified then only do the reset.

Similarly, for old approach (export viewdata) there should be a new method view.ResetMinMaxOnExport() - It is rather a long name but I don't have a good suggestion. Alternatively, provide reset functionality only using new approach with ReadAndExport().

d, ok := vd.Rows[0].Data.(*CountData)
if !ok {
debug.PrintStack()
panic("BYE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary?

@@ -233,6 +233,7 @@ func (w *worker) reportView(v *viewInternal, now time.Time) {
e.ExportView(viewData)
}
exportersMu.Unlock()
v.resetValues()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test is good.
Item 2 above is not taken care of.

There should be new option in ReadAndExport method. If the option to reset min/max is specified then only do the reset.

Similarly, for old approach (export viewdata) there should be a new method view.ResetMinMaxOnExport() - It is rather a long name but I don't have a good suggestion. Alternatively, provide reset functionality only using new approach with ReadAndExport().

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DistributionData and view.Rows in general do not get reset between Reporting Intervals
4 participants