Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add decompressed datapoint based cost accounting for query engine #1207

Merged
merged 1 commit into from
Mar 13, 2019

Conversation

andrewmains12
Copy link
Contributor

@andrewmains12 andrewmains12 commented Nov 27, 2018

This PR tracks and limits datapoint usage in the query engine. Tracking/limiting occurs at both the per-query and global levels. This is the second of 2 PR's, building on top of @jeromefroe 's x/cost package which I lifted and modified from statsdex.

It adds a query/cost package to track both per query and global resource usage, and integrates it through the query lifecycle (starting at engine, and going down to accounted_series_iter.go).

The approach here is to hook a cost.ChainedEnforcer instance into creation of any blocks. Cost is added when datapoints are added to a block; cost is reduced when the block is closed (ChainedEnforcer.Release()). Datapoints are added:

  • At the decompression step (see accounted_series_iter.go), since that's the first point at which we know datapoints have actually been used.
  • In ColumnBlockBuilder, since we allocate further slices of datapoints at that point.

The datapoints are considered "in use" until the block is closed.

Currently the code reports these stats:

coordinator_cost_global_datapoints: gauge; the number of datapoints currently in use globally across this instance. Increases when datapoints are created (via decompression or adding to a block); decreases when blocks are closed.
coordinator_cost_global_datapoints_counter: counter; different view of global datapoints, which doesn't track free's (since it's a counter). We can potentially remove this in favor of only the gauge.

I currently don't have stats at the per-query level, because I wasn't sure what the best thing to report would be. One option: count total usage of datapoints in the query; report that as a histogram.

@codecov
Copy link

codecov bot commented Nov 27, 2018

Codecov Report

Merging #1207 into master will increase coverage by 0.1%.
The diff coverage is 80.7%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1207     +/-   ##
========================================
+ Coverage    70.8%   70.9%   +0.1%     
========================================
  Files         836     840      +4     
  Lines       71520   71753    +233     
========================================
+ Hits        50655   50918    +263     
+ Misses      17551   17506     -45     
- Partials     3314    3329     +15
Flag Coverage Δ
#aggregator 82.3% <ø> (ø) ⬆️
#cluster 85.8% <ø> (ø) ⬆️
#collector 63.7% <ø> (ø) ⬆️
#dbnode 80.8% <ø> (+0.1%) ⬆️
#m3em 73.2% <ø> (ø) ⬆️
#m3ninx 74.2% <ø> (ø) ⬆️
#m3nsch 51.1% <ø> (ø) ⬆️
#metrics 17.6% <ø> (ø) ⬆️
#msg 75% <ø> (+0.1%) ⬆️
#query 66.2% <79.7%> (+0.4%) ⬆️
#x 76.3% <87.8%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fbb1bf...509bee5. Read the comment docs.

@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch 5 times, most recently from 2b30cdb to 68f95d6 Compare November 28, 2018 03:38
@andrewmains12 andrewmains12 changed the base branch from master to andrewmains12/x_cost November 28, 2018 16:54
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch 2 times, most recently from 0e62976 to 6b93db8 Compare November 28, 2018 16:55
src/dbnode/encoding/series_iterators.go Outdated Show resolved Hide resolved
src/query/api/v1/handler/prometheus/native/read.go Outdated Show resolved Hide resolved
src/cmd/services/m3query/config/config.go Outdated Show resolved Hide resolved
src/cmd/services/m3query/config/config.go Outdated Show resolved Hide resolved
src/dbnode/encoding/types.go Outdated Show resolved Hide resolved
src/query/cost/cost.go Outdated Show resolved Hide resolved
src/query/cost/cost.go Outdated Show resolved Hide resolved
src/query/cost/cost.go Outdated Show resolved Hide resolved
src/query/executor/engine.go Outdated Show resolved Hide resolved
src/query/storage/m3/storage.go Outdated Show resolved Hide resolved
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch 5 times, most recently from 0310b7d to 3c086c4 Compare January 8, 2019 19:01
@andrewmains12 andrewmains12 force-pushed the andrewmains12/x_cost branch 2 times, most recently from bcd59ff to 6916737 Compare January 14, 2019 16:30
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch 3 times, most recently from d6b3861 to 2f4a6e3 Compare January 14, 2019 21:13
andrewmains12 added a commit that referenced this pull request Jan 18, 2019
This is a prerequisite diff for cost accounting in m3query. It lifts the cost package from statsdex, modifying it slightly to work in the m3 monorepo context, and tweaking some interfaces that I found inconvenient. I use this in #1207 to implement datapoint based cost accounting.

Note: I haven't maintained the statsdex commit history here, but @jeromefroe is the actual author of most of this code.
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch from 2f4a6e3 to c3f95c8 Compare January 18, 2019 18:02
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch from c74939d to 35d47f1 Compare February 14, 2019 16:22
perQuery:
maxComputedDatapoints: 12000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a discussion q, but should these be added to the sample config or do we want to keep that as simple as possible (i.e. no max)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_config is maybe a bad name for this; I actually use it as a test case. I'll rename it to reflect that.

If we do use it as an actual sample though, I always like it when repos have two provided configs, a simple one with minimal config and a "kitchen sink"/documentation config, which uses every conceivable option and comments on them.


type noopEnforcerReporter struct{}

func (noopEnforcerReporter) ReportCost(c Cost) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't know if it's a common stylistic choice, but for noopX, we usually jam the methods in a single line without line breaks in between, for example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah works for me; I think my IDE just generates implementations with the line. I'll tweak that.

@@ -151,11 +159,56 @@ func NoopEnforcer() Enforcer {
return noopEnforcer
}

// An EnforcerReporter is a listener for Enforcer events.
type EnforcerReporter interface {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: newline, also I think we tend to put interfaces (especially public ones) in types.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, will move.

src/x/cost/enforcer.go Show resolved Hide resolved
src/query/storage/m3/storage.go Show resolved Hide resolved
src/query/cost/cost.go Outdated Show resolved Hide resolved
reporter chainedReporter
}

var noopChainedEnforcer, _ = NewChainedEnforcer("", []cost.Enforcer{cost.NoopEnforcer()})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you pull this into the NoopChainedEnforcer() function? The overhead to building a new noop enforcer on errors would be negligible, and otherwise it's potentially mutable right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and otherwise it's potentially mutable right?

Technically yes, but it's a private variable; chances of someone bothering to mutate it are slim (that is, I don't know why someone would).

That said, there's no real reason to hide that error; I created a small func mustNoopChainedEnforcer() ChainedEnforcer to panic on it (panic is appropriate since it's an initialization error).

}

func (globalReporter) ReportOverLimit(enabled bool) {
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add a comment explaining why this noops

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; it really shouldn't noop haha. I'll implement it.

OnChildRelease(currentCost cost.Cost)

// OnRelease is called whenever this reporter's chainedEnforcer is released.
OnRelease(currentCost cost.Cost)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just spitballing here, but would it be prudent for these to return an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an easy refactor later if we want to make them return an error (this isn't a library so I don't think back compatibility is a huge concern). Going to leave it as is for now.

@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch 2 times, most recently from 9d59070 to add5124 Compare February 15, 2019 22:29
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch from add5124 to cb44777 Compare February 21, 2019 04:37
@@ -151,11 +159,56 @@ func NoopEnforcer() Enforcer {
return noopEnforcer
}

// An EnforcerReporter is a listener for Enforcer events.
type EnforcerReporter interface {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, will move.

src/x/cost/enforcer.go Show resolved Hide resolved
src/query/storage/m3/storage.go Show resolved Hide resolved
src/query/storage/m3/accounted_series_iter.go Show resolved Hide resolved
@@ -184,16 +214,6 @@ func (c Configuration) LookbackDurationOrDefault() (time.Duration, error) {
return v, nil
}

// LimitsOrDefault returns the specified limit configuration if provided, or the
// default value otherwise.
func (c Configuration) LimitsOrDefault() *LimitsConfiguration {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardartoul I reverted your changes around this. The previous default was to have limits disabled, which I think is still probably the better default (at least for the moment). The defaultLimitsConfiguration variable was an unfortunate holdover--I didn't catch it in the original PR, and forgot to remove it even though it was unused. I'm assuming your intent with introducing LimitsOrDefault() was to fix the unused variable--let me know if that wasn't the case!

@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch from cb44777 to 0351872 Compare February 21, 2019 17:25
andrewmains12 added a commit that referenced this pull request Feb 21, 2019
Adds a `queryContext` argument to `OpNode.Process` to hold any per query state. I use this in both my [cost accounting](#1207) and [tracing](#1321) PR's. At the time, I based my tracing branch off of the cost accounting branch. Tracing is closer to landing though, so I've now factored out the common changes, and rebased them both against this branch.
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch 4 times, most recently from 04daac4 to 52796d1 Compare February 27, 2019 00:11
src/x/cost/options.go Show resolved Hide resolved

// An EnforcerReporter is a listener for Enforcer events.
type EnforcerReporter interface {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extraneous newline

src/query/server/cost_reporters.go Outdated Show resolved Hide resolved
src/query/storage/types.go Outdated Show resolved Hide resolved
src/query/storage/block.go Outdated Show resolved Hide resolved
src/query/cost/cost_prop_test.go Outdated Show resolved Hide resolved
src/query/cost/cost_test.go Outdated Show resolved Hide resolved
maxDatapointsHistMetric = "max_datapoints_hist"
)

// newConfiguredChainedEnforcer returns a ChainedEnforcer with 3 configured
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comment! Might be worth extracting/duplicating some of this in docs since there's not a lot of visibility into cost accounting at the moment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Suggestions on where to put those? I may do that in a quick followup diff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably in a query howto? We should sync with Ben about this as well, he's been handling a lot of documentation

src/query/cost/cost.go Show resolved Hide resolved
src/query/cost/cost.go Show resolved Hide resolved
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch 3 times, most recently from fbc1a97 to 3c6a5bc Compare March 1, 2019 16:17
type LimitsConfiguration struct {
MaxComputedDatapoints int64 `yaml:"maxComputedDatapoints"`
// deprecated: use PerQuery.MaxComputedDatapoints instead.
DeprecatedMaxComputedDatapoints int64 `yaml:"maxComputedDatapoints"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:| That's so dumb, how is that the default parsing behavior for yaml haha

}, results)

// drain the channel
res := <-results
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this channel should already be closing as a defer step in ExecuteExpr https://github.com/m3db/m3/blob/master/src/query/executor/engine.go#L135; was thinking it might be worth checking that only a single value is put into this channel to ensure we don't leave any dangling open channels around

src/query/cost/cost.go Show resolved Hide resolved
}

func (gr *globalReporter) ReportCost(c cost.Cost) {
if c > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would make more sense I think

maxDatapointsHistMetric = "max_datapoints_hist"
)

// newConfiguredChainedEnforcer returns a ChainedEnforcer with 3 configured
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably in a query howto? We should sync with Ben about this as well, he's been handling a lot of documentation


// AssertCurCost is a helper assertion to check that an enforcer has the
// given cost.
func AssertCurCost(t assert.TestingT, expectedCost cost.Cost, ef cost.Enforcer) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think AssertCurrentCost might be better here

return defaultCostExceededError(cost, limit)
}
return costExceededError(e.costMsg, cost, limit)
return NewCostExceededError(e.costMsg, cost, limit.Threshold)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think having the check above be cost <= limit.Threshold || !limit.Enabled; would clear up error messages such as ... 6 exceeds limit of 6... which are a bit questionable imo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agreed that the error message was confusing; I've changed it to: limit reached (current = 1, limit = 2). Any better? Semantically I think < is more typical though for limits (e.g. the classic for i := 0; i < n; i++, but it doesn't really matter too much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks better, yeah

return r.Error
}

cb.blockDatapoints.Inc(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a pretty good guarantee towards that at the moment; unmatched series lengths are pretty bad news and should probably be an explicit failure case... just a bit worried about having a lock (somewhere in the stack for this) per point being a little slow for big queries, but that's likely premature

@@ -180,6 +190,13 @@ func (cb ColumnBlockBuilder) AppendValue(idx int, value float64) error {
return fmt.Errorf("idx out of range for append: %d", idx)
}

r := cb.enforcer.Add(1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind putting up an issue for it? Can probably just reference this chain and paste in the discussion from chat

src/query/cost/cost.go Show resolved Hide resolved
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch from 3c6a5bc to 478b281 Compare March 4, 2019 16:45
@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch 2 times, most recently from fcece39 to f4d13bc Compare March 13, 2019 02:57
@andrewmains12
Copy link
Contributor Author

@arnikola @benraskin92 here's a screenshot of the metrics:

Screen Shot 2019-03-13 at 11 40 40 AM

One piece of weirdness--that gauge is always zero. I don't think I have a bug; I've added prints around where the value is set, and it does get set to non-zero values. Thoughts?

Copy link
Collaborator

@arnikola arnikola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved; the gauge not showing is weird but may be worth handling in a followup?

@andrewmains12 andrewmains12 force-pushed the andrewmains12/cost_accounting branch from f4d13bc to 509bee5 Compare March 13, 2019 17:55
@andrewmains12 andrewmains12 merged commit b1213c2 into master Mar 13, 2019
@andrewmains12
Copy link
Contributor Author

Approved; the gauge not showing is weird but may be worth handling in a followup?

As discussed on chat--looks like the issue was just that queries finished faster than the scrape interval. Adding a spurious time.Sleep to a node to slow things down makes them show up.

@justinjc justinjc deleted the andrewmains12/cost_accounting branch June 17, 2019 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants