-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add decompressed datapoint based cost accounting for query engine #1207
Conversation
f262e6c
to
aa653af
Compare
Codecov Report
@@ Coverage Diff @@
## master #1207 +/- ##
========================================
+ Coverage 70.8% 70.9% +0.1%
========================================
Files 836 840 +4
Lines 71520 71753 +233
========================================
+ Hits 50655 50918 +263
+ Misses 17551 17506 -45
- Partials 3314 3329 +15
Continue to review full report at Codecov.
|
2b30cdb
to
68f95d6
Compare
0e62976
to
6b93db8
Compare
87243cc
to
2af43c3
Compare
0310b7d
to
3c086c4
Compare
bcd59ff
to
6916737
Compare
d6b3861
to
2f4a6e3
Compare
986e77a
to
30f9ba6
Compare
This is a prerequisite diff for cost accounting in m3query. It lifts the cost package from statsdex, modifying it slightly to work in the m3 monorepo context, and tweaking some interfaces that I found inconvenient. I use this in #1207 to implement datapoint based cost accounting. Note: I haven't maintained the statsdex commit history here, but @jeromefroe is the actual author of most of this code.
2f4a6e3
to
c3f95c8
Compare
c74939d
to
35d47f1
Compare
perQuery: | ||
maxComputedDatapoints: 12000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a discussion q, but should these be added to the sample config or do we want to keep that as simple as possible (i.e. no max)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_config
is maybe a bad name for this; I actually use it as a test case. I'll rename it to reflect that.
If we do use it as an actual sample though, I always like it when repos have two provided configs, a simple one with minimal config and a "kitchen sink"/documentation config, which uses every conceivable option and comments on them.
src/x/cost/enforcer.go
Outdated
|
||
type noopEnforcerReporter struct{} | ||
|
||
func (noopEnforcerReporter) ReportCost(c Cost) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't know if it's a common stylistic choice, but for noopX, we usually jam the methods in a single line without line breaks in between, for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah works for me; I think my IDE just generates implementations with the line. I'll tweak that.
src/x/cost/enforcer.go
Outdated
@@ -151,11 +159,56 @@ func NoopEnforcer() Enforcer { | |||
return noopEnforcer | |||
} | |||
|
|||
// An EnforcerReporter is a listener for Enforcer events. | |||
type EnforcerReporter interface { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: newline, also I think we tend to put interfaces (especially public ones) in types.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, will move.
src/query/cost/cost.go
Outdated
reporter chainedReporter | ||
} | ||
|
||
var noopChainedEnforcer, _ = NewChainedEnforcer("", []cost.Enforcer{cost.NoopEnforcer()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you pull this into the NoopChainedEnforcer()
function? The overhead to building a new noop enforcer on errors would be negligible, and otherwise it's potentially mutable right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and otherwise it's potentially mutable right?
Technically yes, but it's a private variable; chances of someone bothering to mutate it are slim (that is, I don't know why someone would).
That said, there's no real reason to hide that error; I created a small func mustNoopChainedEnforcer() ChainedEnforcer
to panic on it (panic is appropriate since it's an initialization error).
src/query/server/server.go
Outdated
} | ||
|
||
func (globalReporter) ReportOverLimit(enabled bool) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add a comment explaining why this noops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch; it really shouldn't noop haha. I'll implement it.
src/query/cost/cost.go
Outdated
OnChildRelease(currentCost cost.Cost) | ||
|
||
// OnRelease is called whenever this reporter's chainedEnforcer is released. | ||
OnRelease(currentCost cost.Cost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just spitballing here, but would it be prudent for these to return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an easy refactor later if we want to make them return an error (this isn't a library so I don't think back compatibility is a huge concern). Going to leave it as is for now.
9d59070
to
add5124
Compare
add5124
to
cb44777
Compare
src/x/cost/enforcer.go
Outdated
@@ -151,11 +159,56 @@ func NoopEnforcer() Enforcer { | |||
return noopEnforcer | |||
} | |||
|
|||
// An EnforcerReporter is a listener for Enforcer events. | |||
type EnforcerReporter interface { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, will move.
@@ -184,16 +214,6 @@ func (c Configuration) LookbackDurationOrDefault() (time.Duration, error) { | |||
return v, nil | |||
} | |||
|
|||
// LimitsOrDefault returns the specified limit configuration if provided, or the | |||
// default value otherwise. | |||
func (c Configuration) LimitsOrDefault() *LimitsConfiguration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardartoul I reverted your changes around this. The previous default was to have limits disabled, which I think is still probably the better default (at least for the moment). The defaultLimitsConfiguration
variable was an unfortunate holdover--I didn't catch it in the original PR, and forgot to remove it even though it was unused. I'm assuming your intent with introducing LimitsOrDefault()
was to fix the unused variable--let me know if that wasn't the case!
cb44777
to
0351872
Compare
Adds a `queryContext` argument to `OpNode.Process` to hold any per query state. I use this in both my [cost accounting](#1207) and [tracing](#1321) PR's. At the time, I based my tracing branch off of the cost accounting branch. Tracing is closer to landing though, so I've now factored out the common changes, and rebased them both against this branch.
04daac4
to
52796d1
Compare
src/x/cost/types.go
Outdated
|
||
// An EnforcerReporter is a listener for Enforcer events. | ||
type EnforcerReporter interface { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extraneous newline
maxDatapointsHistMetric = "max_datapoints_hist" | ||
) | ||
|
||
// newConfiguredChainedEnforcer returns a ChainedEnforcer with 3 configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice comment! Might be worth extracting/duplicating some of this in docs since there's not a lot of visibility into cost accounting at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Suggestions on where to put those? I may do that in a quick followup diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably in a query howto? We should sync with Ben about this as well, he's been handling a lot of documentation
fbc1a97
to
3c6a5bc
Compare
type LimitsConfiguration struct { | ||
MaxComputedDatapoints int64 `yaml:"maxComputedDatapoints"` | ||
// deprecated: use PerQuery.MaxComputedDatapoints instead. | ||
DeprecatedMaxComputedDatapoints int64 `yaml:"maxComputedDatapoints"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:| That's so dumb, how is that the default parsing behavior for yaml haha
src/query/executor/engine_test.go
Outdated
}, results) | ||
|
||
// drain the channel | ||
res := <-results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this channel should already be closing as a defer step in ExecuteExpr https://github.com/m3db/m3/blob/master/src/query/executor/engine.go#L135; was thinking it might be worth checking that only a single value is put into this channel to ensure we don't leave any dangling open channels around
} | ||
|
||
func (gr *globalReporter) ReportCost(c cost.Cost) { | ||
if c > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that would make more sense I think
maxDatapointsHistMetric = "max_datapoints_hist" | ||
) | ||
|
||
// newConfiguredChainedEnforcer returns a ChainedEnforcer with 3 configured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably in a query howto? We should sync with Ben about this as well, he's been handling a lot of documentation
src/x/cost/test/assert.go
Outdated
|
||
// AssertCurCost is a helper assertion to check that an enforcer has the | ||
// given cost. | ||
func AssertCurCost(t assert.TestingT, expectedCost cost.Cost, ef cost.Enforcer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think AssertCurrentCost
might be better here
return defaultCostExceededError(cost, limit) | ||
} | ||
return costExceededError(e.costMsg, cost, limit) | ||
return NewCostExceededError(e.costMsg, cost, limit.Threshold) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think having the check above be cost <= limit.Threshold || !limit.Enabled
; would clear up error messages such as ... 6 exceeds limit of 6...
which are a bit questionable imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah agreed that the error message was confusing; I've changed it to: limit reached (current = 1, limit = 2)
. Any better? Semantically I think <
is more typical though for limits (e.g. the classic for i := 0; i < n; i++
, but it doesn't really matter too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks better, yeah
return r.Error | ||
} | ||
|
||
cb.blockDatapoints.Inc(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a pretty good guarantee towards that at the moment; unmatched series lengths are pretty bad news and should probably be an explicit failure case... just a bit worried about having a lock (somewhere in the stack for this) per point being a little slow for big queries, but that's likely premature
@@ -180,6 +190,13 @@ func (cb ColumnBlockBuilder) AppendValue(idx int, value float64) error { | |||
return fmt.Errorf("idx out of range for append: %d", idx) | |||
} | |||
|
|||
r := cb.enforcer.Add(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind putting up an issue for it? Can probably just reference this chain and paste in the discussion from chat
3c6a5bc
to
478b281
Compare
fcece39
to
f4d13bc
Compare
@arnikola @benraskin92 here's a screenshot of the metrics: One piece of weirdness--that gauge is always zero. I don't think I have a bug; I've added prints around where the value is set, and it does get set to non-zero values. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved; the gauge not showing is weird but may be worth handling in a followup?
f4d13bc
to
509bee5
Compare
As discussed on chat--looks like the issue was just that queries finished faster than the scrape interval. Adding a spurious time.Sleep to a node to slow things down makes them show up. |
This PR tracks and limits datapoint usage in the query engine. Tracking/limiting occurs at both the per-query and global levels. This is the second of 2 PR's, building on top of @jeromefroe 's x/cost package which I lifted and modified from statsdex.
It adds a
query/cost
package to track both per query and global resource usage, and integrates it through the query lifecycle (starting at engine, and going down toaccounted_series_iter.go
).The approach here is to hook a
cost.ChainedEnforcer
instance into creation of any blocks. Cost is added when datapoints are added to a block; cost is reduced when the block is closed (ChainedEnforcer.Release()
). Datapoints are added:accounted_series_iter.go
), since that's the first point at which we know datapoints have actually been used.ColumnBlockBuilder
, since we allocate further slices of datapoints at that point.The datapoints are considered "in use" until the block is closed.
Currently the code reports these stats:
coordinator_cost_global_datapoints
: gauge; the number of datapoints currently in use globally across this instance. Increases when datapoints are created (via decompression or adding to a block); decreases when blocks are closed.coordinator_cost_global_datapoints_counter
: counter; different view of global datapoints, which doesn't trackfree
's (since it's a counter). We can potentially remove this in favor of only the gauge.I currently don't have stats at the per-query level, because I wasn't sure what the best thing to report would be. One option: count total usage of datapoints in the query; report that as a histogram.