Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-artedi histograms should be compatible with Prometheus #17

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@
## Not yet released
None

## 2.0.0 [backward incompatible if you use histograms]
* #17 Histogram buckets completely changed to be compatible with Prometheus. You
can now either pass in an array of bucket values. If you do not pass in bucket
values, you will get the default buckets:

[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]

## 1.4.1
* #15 improve the performance of hashObj()

Expand Down
35 changes: 21 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,13 @@ collector.collect(artedi.FMT_PROM, function (err, metrics) {
});

var histogram = collector.histogram({
name: 'http_request_latency_ms',
help: 'latency of muskie http requests'
name: 'http_request_latency_seconds',
help: 'latency of muskie http requests',
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
});

// Observe a latency of 998ms for a 'putobjectdir' request.
histogram.observe(998, {
histogram.observe(0.998, {
method: 'putobjectdir'
});

Expand All @@ -59,17 +60,23 @@ collector.collect(artedi.FMT_PROM, function (err, metrics) {
// Prints:
// # HELP http_requests_completed count of muskie http requests completed
// # TYPE http_requests_completed counter
// http_requests_completed{zone="e5d3",method="getobject",code="200"} 1
// # HELP http_request_latency_ms latency of muskie http requests
// # TYPE http_request_latency_ms histogram
// http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="729"} 0
// http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="2187"} 1
// http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="3645"} 1
// http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="5103"} 1
// http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="6561"} 1
// http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="+Inf"} 1
// http_request_latency_ms_count{zone="e5d3",method="getobject",code="200"} 1
// http_request_latency_ms_sum{zone="e5d3",method="getobject",code="200"} 998
// http_requests_completed{method="getobject",code="200",zone="e5d3"} 1
// # HELP http_request_latency_seconds latency of muskie http requests
// # TYPE http_request_latency_seconds histogram
// http_request_latency_seconds{method="putobjectdir",le="0.005"} 0
// http_request_latency_seconds{method="putobjectdir",le="0.01"} 0
// http_request_latency_seconds{method="putobjectdir",le="0.025"} 0
// http_request_latency_seconds{method="putobjectdir",le="0.05"} 0
// http_request_latency_seconds{method="putobjectdir",le="0.01"} 0
// http_request_latency_seconds{method="putobjectdir",le="0.25"} 0
// http_request_latency_seconds{method="putobjectdir",le="0.5"} 0
// http_request_latency_seconds{method="putobjectdir",le="1"} 1
// http_request_latency_seconds{method="putobjectdir",le="2.5"} 1
// http_request_latency_seconds{method="putobjectdir",le="5"} 1
// http_request_latency_seconds{method="putobjectdir",le="10"} 1
// http_request_latency_seconds{le="+Inf",method="putobjectdir"} 1
// http_request_latency_seconds_count{method="putobjectdir"} 1
// http_request_latency_seconds_sum{method="putobjectdir"} 998
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this _sum value would be .998, right?

});
```

Expand Down
68 changes: 58 additions & 10 deletions docs/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,21 +83,29 @@ var gauge = collector.gauge({
});
```
### collector.histogram(opts) : Histogram
Creates a new Histogram object with the given options (incl. labels). This call
is idempotent. `opts` must include 'help' and 'name' fields, and may optionally
include a 'labels' object.
Creates a new Histogram object with the given options (incl. labels and
buckets). This call is idempotent. `opts` must include 'help' and 'name' fields,
and may optionally include a 'labels' object and/or a buckets array.

Example:
```javascript
var histogram = collector.histogram({
name: 'http_request_latency_ms',
name: 'http_request_latency_seconds',
help: 'latency of http requests',
labels: {
component: 'muskie'
}
},
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
});
```

Note: If `buckets` are not specified, the default buckets will be:

```
[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
```


### collector.addTriggerFunction(func(Collector, callback))
Adds `func` to a list of triggers to call immediately before metrics are
collected during a call to `collector.collect()`.
Expand Down Expand Up @@ -191,6 +199,7 @@ Example:
counter.getValue( { operation: 'click' } );
```


## Gauge
Gauges are similar to counters. Gauges can count up, or count down relative
to their current value, or be set to an arbitrary value. Gauges start with an
Expand Down Expand Up @@ -233,15 +242,54 @@ count values that fall between a number of buckets.
### histogram.observe(value, labels)
Increment buckets with a value >= `value`.

Note that it isn't necessary to specify which
buckets to use. Log/linear buckets are automatically generated. More details
about log/linear buckets can be found at the
[DTrace blog](http://dtrace.org/blogs/bmc/2011/02/08/llquantize/).

Example:
```javascript
histogram.observe(1111, {
method: 'putobject',
code: 204
});
```

### Bucket Generators
Artedi includes several generator functions that help create `buckets` arrays
for use with histograms.

#### artedi.linearBuckets(min, width, count)
Generate `count` buckets starting with `min` with each bucket being `width`
larger than the previous.

Example:
```javascript
artedi.linearBuckets(0.5, 0.5, 10);
// returns [ 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5 ]
```

Note: The min parameter must be > 0. It will likely be common to use the same
value for `width` and `min` as in the example above.

#### artedi.exponentialBuckets(min, factor, count)
Generate `count` buckets starting with `min` with each bucket being `factor`
times larger than the previous.

Example:
```javascript
artedi.exponentialBuckets(1, 2, 5);
// returns [ 1, 2, 4, 8, 16 ]
```

#### logLinearBuckets(base, lowPower, highPower, bucketsPerMagnitude)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing the artedi. namespace preface here

Generate a set of log-linear buckets. This will create `bucketsPerMagnitude`
buckets for the magnitude that contains base^lowPower, and each magnitude
up to and including the magnitude that starts with highPower.

Example:

```javascript
artedi.logLinearBuckets(10, -2, 1, 5);
// returns [ 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.4, 0.6, 0.8, 1, 2, 4, 6, 8, 10, 20, 40, 60, 80, 100 ]
```

Note in the above example, the lowPower was -2 so we started with 10^-2 = 0.01
and used that magnitude (10^-2 to 10^-1) as the first set of 5 buckets. Then we
created buckets for the magnitudes 10^-1 to 10^0, 10^0 to 10^1 and finally 10^1
(our highPower parameter) to 10^2.
105 changes: 0 additions & 105 deletions docs/DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,111 +75,6 @@ The reported metric from that operation will look something like this:
marlin_agent_jobs_running{zone="e5d03bc",key="value",owner="kkantor"} 1
```

### Log/Linear Buckets
One of the major problems with creating histograms in existing metric clients
is that they require the user to provide a static list of bucket values for
values to fall into. There are some problems with this, and to understand them
we first need to understand what role buckets serve.

Buckets are upper bounds on the value being tracked. For example, if a histogram
is tracking request latency, you may have the following buckets: [100, 200, 300,
400, 500, Inf] where each number represents time to completion in milliseconds.
Inf is a special bucket that counts ALL values, including those that are greater
than the largest bucket. So the value of Inf is >= sum(all bucket counts). Each
bucket counts values that are less than or equal to the bucket. So for a request
that took 222ms, the 300, 400, 500, and Inf buckets will be incremented.

This sounds good if we know that we'll have a normal distribution of inputs and
we know the approximate values that we should be receiving. This makes a lot of
sense for simple use cases, like a webserver that serves text files. The latency
of something simple like that should be relatively consistent. The usefulness of
static buckets degrades quickly when workloads become much more varied.

In Muskie, for example, we have some operations that finish quickly
(`putdirectory`), and some that can take a long time (`putobject`). The latency
of `putdirectory` will be relatively stable and low when compared to
`putobject`. The latency of `putobject` can vary widely based on how large the
object being uploaded is. We would like fine granularity when monitoring the
latency of `putdirectory`, and a coarse granularity when monitoring `putobject`.

With current metric clients, we have to either know the
expected behavior of our application, or sacrifice either fine or coarse
granularity in order to get accurate measurements from histograms.

Luckily, this problem has been solved in-house already! DTrace has support
for log/linear quantization. In short, it gives us the ability to represent
both fine and coarse granularity in the same histogram. For more information on
log/linear quantization, see
[this DTrace blog post](http://dtrace.org/blogs/bmc/2011/02/08/llquantize/).

Rather than requiring the user to provide a static list of buckets, log/linear
buckets can be automatically generated. They are still static buckets in that
the value of the bucket itself doesn't change. They are, however, dynamic in that
the number of buckets can expand out as needed by the client's observations.

Let's take an example. If my application observes a latency of 6ms, this is what
a set of log/linear buckets would produce:
```
# HELP muskie_request_latency_ms latency of requests completed
# TYPE muskie_request_latency_ms histogram
http_request_latency_ms{le="1"} 0
http_request_latency_ms{le="3"} 0
http_request_latency_ms{le="5"} 0
http_request_latency_ms{le="7"} 1
http_request_latency_ms{le="9"} 1
http_request_latency_ms{le="+Inf"} 1
http_request_latency_ms_count{} 1
http_request_latency_ms_sum{} 6
```

Now if my application observes a latency of 600ms, this is what we receive:
```
# HELP http_request_latency_ms latency of requests completed
# TYPE http_request_latency_ms histogram
http_request_latency_ms{le="1"} 0
http_request_latency_ms{le="3"} 0
http_request_latency_ms{le="5"} 0
http_request_latency_ms{le="7"} 1
http_request_latency_ms{le="9"} 1
http_request_latency_ms{le="81"} 1
http_request_latency_ms{le="243"} 1
http_request_latency_ms{le="405"} 1
http_request_latency_ms{le="567"} 1
http_request_latency_ms{le="729"} 2
http_request_latency_ms{le="+Inf"} 2
http_request_latency_ms_count{} 2
http_request_latency_ms_sum{} 606
```

And then maybe we observe a latency of 60000ms!
```
# HELP http_request_latency_ms latency of requests completed
# TYPE http_request_latency_ms histogram
http_request_latency_ms{le="1"} 0
http_request_latency_ms{le="3"} 0
http_request_latency_ms{le="5"} 0
http_request_latency_ms{le="7"} 1
http_request_latency_ms{le="9"} 1
http_request_latency_ms{le="81"} 1
http_request_latency_ms{le="243"} 1
http_request_latency_ms{le="405"} 1
http_request_latency_ms{le="567"} 1
http_request_latency_ms{le="729"} 2
http_request_latency_ms{le="59049"} 2
http_request_latency_ms{le="177147"} 3
http_request_latency_ms{le="295245"} 3
http_request_latency_ms{le="413343"} 3
http_request_latency_ms{le="531441"} 3
http_request_latency_ms{le="+Inf"} 3
http_request_latency_ms_count{} 3
http_request_latency_ms_sum{} 60606
```

We can see that log/linear buckets give us a lot of flexibility while only
removing a slight bit of precision.

Prometheus is fine with this way of adding additional buckets between
scrape periods as long as we don't modify the bucket values themselves.

### Dynamic Labelling
We can see in the last example that the metric inherited two labels, and
Expand Down
2 changes: 0 additions & 2 deletions docs/private_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,5 +189,3 @@ metrics are collected. See collector.processTriggers() for more information.
|prometheus|callback|iterates through the Counters, calling `prometheus()` on their `MetricVector` object. The results are stitched together and added to the result of calling `prometheus()` on the Gauge's MetricVector|None (string and error via callback)|

`Histogram()` is called by the parent object's `histogram()` function.
Buckets will be created using the log/linear method, similar to how it's done in
[DTrace](http://dtrace.org/blogs/bmc/2011/02/08/llquantize/).
Loading