TritonDataCenter · joshwilsdon · Oct 17, 2018 · Oct 17, 2018 · Oct 17, 2018 · Oct 17, 2018
diff --git a/CHANGES.md b/CHANGES.md
@@ -3,6 +3,13 @@
 ## Not yet released
 None
 
+## 2.0.0 [backward incompatible if you use histograms]
+* #17 Histogram buckets completely changed to be compatible with Prometheus. You
+  can now either pass in an array of bucket values. If you do not pass in bucket
+  values, you will get the default buckets:
+
+  [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
+
 ## 1.4.1
 * #15 improve the performance of hashObj()
 

diff --git a/README.md b/README.md
@@ -39,12 +39,13 @@ collector.collect(artedi.FMT_PROM, function (err, metrics) {
 });
 
 var histogram = collector.histogram({
-    name: 'http_request_latency_ms',
-    help: 'latency of muskie http requests'
+    name: 'http_request_latency_seconds',
+    help: 'latency of muskie http requests',
+    buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
 });
 
 // Observe a latency of 998ms for a 'putobjectdir' request.
-histogram.observe(998, {
+histogram.observe(0.998, {
     method: 'putobjectdir'
 });
 
@@ -59,17 +60,23 @@ collector.collect(artedi.FMT_PROM, function (err, metrics) {
     // Prints:
     // # HELP http_requests_completed count of muskie http requests completed
     // # TYPE http_requests_completed counter
-    // http_requests_completed{zone="e5d3",method="getobject",code="200"} 1
-    // # HELP http_request_latency_ms latency of muskie http requests
-    // # TYPE http_request_latency_ms histogram
-    // http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="729"} 0
-    // http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="2187"} 1
-    // http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="3645"} 1
-    // http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="5103"} 1
-    // http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="6561"} 1
-    // http_request_latency_ms{zone="e5d3",method="getobject",code="200",le="+Inf"} 1
-    // http_request_latency_ms_count{zone="e5d3",method="getobject",code="200"} 1
-    // http_request_latency_ms_sum{zone="e5d3",method="getobject",code="200"} 998
+    // http_requests_completed{method="getobject",code="200",zone="e5d3"} 1
+    // # HELP http_request_latency_seconds latency of muskie http requests
+    // # TYPE http_request_latency_seconds histogram
+    // http_request_latency_seconds{method="putobjectdir",le="0.005"} 0
+    // http_request_latency_seconds{method="putobjectdir",le="0.01"} 0
+    // http_request_latency_seconds{method="putobjectdir",le="0.025"} 0
+    // http_request_latency_seconds{method="putobjectdir",le="0.05"} 0
+    // http_request_latency_seconds{method="putobjectdir",le="0.01"} 0
+    // http_request_latency_seconds{method="putobjectdir",le="0.25"} 0
+    // http_request_latency_seconds{method="putobjectdir",le="0.5"} 0
+    // http_request_latency_seconds{method="putobjectdir",le="1"} 1
+    // http_request_latency_seconds{method="putobjectdir",le="2.5"} 1
+    // http_request_latency_seconds{method="putobjectdir",le="5"} 1
+    // http_request_latency_seconds{method="putobjectdir",le="10"} 1
+    // http_request_latency_seconds{le="+Inf",method="putobjectdir"} 1
+    // http_request_latency_seconds_count{method="putobjectdir"} 1
+    // http_request_latency_seconds_sum{method="putobjectdir"} 998
 });
 ```
 

diff --git a/docs/API.md b/docs/API.md
@@ -83,21 +83,29 @@ var gauge = collector.gauge({
 });
 ```
 ### collector.histogram(opts) : Histogram
-Creates a new Histogram object with the given options (incl. labels). This call
-is idempotent. `opts` must include 'help' and 'name' fields, and may optionally
-include a 'labels' object.
+Creates a new Histogram object with the given options (incl. labels and
+buckets). This call is idempotent. `opts` must include 'help' and 'name' fields,
+and may optionally include a 'labels' object and/or a buckets array.
 
 Example:
 ```javascript
 var histogram = collector.histogram({
-    name: 'http_request_latency_ms',
+    name: 'http_request_latency_seconds',
     help: 'latency of http requests',
     labels: {
         component: 'muskie'
-    }
+    },
+    buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
 });
 ```
 
+Note: If `buckets` are not specified, the default buckets will be:
+
+```
+[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
+```
+
+
 ### collector.addTriggerFunction(func(Collector, callback))
 Adds `func` to a list of triggers to call immediately before metrics are
 collected during a call to `collector.collect()`.
@@ -191,6 +199,7 @@ Example:
 counter.getValue( { operation: 'click' } );
 ```
 
+
 ## Gauge
 Gauges are similar to counters. Gauges can count up, or count down relative
 to their current value, or be set to an arbitrary value. Gauges start with an
@@ -233,15 +242,54 @@ count values that fall between a number of buckets.
 ### histogram.observe(value, labels)
 Increment buckets with a value >= `value`.
 
-Note that it isn't necessary to specify which
-buckets to use. Log/linear buckets are automatically generated. More details
-about log/linear buckets can be found at the
-[DTrace blog](http://dtrace.org/blogs/bmc/2011/02/08/llquantize/).
-
 Example:
 ```javascript
 histogram.observe(1111, {
     method: 'putobject',
     code: 204
 });
 ```
+
+### Bucket Generators
+Artedi includes several generator functions that help create `buckets` arrays
+for use with histograms.
+
+#### artedi.linearBuckets(min, width, count)
+Generate `count` buckets starting with `min` with each bucket being `width`
+larger than the previous.
+
+Example:
+```javascript
+artedi.linearBuckets(0.5, 0.5, 10);
+// returns [ 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5 ]
+```
+
+Note: The min parameter must be > 0. It will likely be common to use the same
+value for `width` and `min` as in the example above.
+
+#### artedi.exponentialBuckets(min, factor, count)
+Generate `count` buckets starting with `min` with each bucket being `factor`
+times larger than the previous.
+
+Example:
+```javascript
+artedi.exponentialBuckets(1, 2, 5);
+// returns [ 1, 2, 4, 8, 16 ]
+```
+
+#### logLinearBuckets(base, lowPower, highPower, bucketsPerMagnitude)
+Generate a set of log-linear buckets. This will create `bucketsPerMagnitude`
+buckets for the magnitude that contains base^lowPower, and each magnitude
+up to and including the magnitude that starts with highPower.
+
+Example:
+
+```javascript
+artedi.logLinearBuckets(10, -2, 1, 5);
+// returns [ 0.02, 0.04, 0.06, 0.08, 0.1, 0.2, 0.4, 0.6, 0.8, 1, 2, 4, 6, 8, 10, 20, 40, 60, 80, 100 ]
+```
+
+Note in the above example, the lowPower was -2 so we started with 10^-2 = 0.01
+and used that magnitude (10^-2 to 10^-1) as the first set of 5 buckets. Then we
+created buckets for the magnitudes 10^-1 to 10^0, 10^0 to 10^1 and finally 10^1
+(our highPower parameter) to 10^2.
diff --git a/docs/DESIGN.md b/docs/DESIGN.md
@@ -75,111 +75,6 @@ The reported metric from that operation will look something like this:
 marlin_agent_jobs_running{zone="e5d03bc",key="value",owner="kkantor"} 1
 ```
 
-### Log/Linear Buckets
-One of the major problems with creating histograms in existing metric clients
-is that they require the user to provide a static list of bucket values for
-values to fall into. There are some problems with this, and to understand them
-we first need to understand what role buckets serve.
-
-Buckets are upper bounds on the value being tracked. For example, if a histogram
-is tracking request latency, you may have the following buckets: [100, 200, 300,
-400, 500, Inf] where each number represents time to completion in milliseconds.
-Inf is a special bucket that counts ALL values, including those that are greater
-than the largest bucket. So the value of Inf is >= sum(all bucket counts). Each
-bucket counts values that are less than or equal to the bucket. So for a request
-that took 222ms, the 300, 400, 500, and Inf buckets will be incremented.
-
-This sounds good if we know that we'll have a normal distribution of inputs and
-we know the approximate values that we should be receiving. This makes a lot of
-sense for simple use cases, like a webserver that serves text files. The latency
-of something simple like that should be relatively consistent. The usefulness of
-static buckets degrades quickly when workloads become much more varied.
-
-In Muskie, for example, we have some operations that finish quickly
-(`putdirectory`), and some that can take a long time (`putobject`). The latency
-of `putdirectory` will be relatively stable and low when compared to
-`putobject`. The latency of `putobject` can vary widely based on how large the
-object being uploaded is. We would like fine granularity when monitoring the
-latency of `putdirectory`, and a coarse granularity when monitoring `putobject`.
-
-With current metric clients, we have to either know the
-expected behavior of our application, or sacrifice either fine or coarse
-granularity in order to get accurate measurements from histograms.
-
-Luckily, this problem has been solved in-house already! DTrace has support
-for log/linear quantization. In short, it gives us the ability to represent
-both fine and coarse granularity in the same histogram. For more information on
-log/linear quantization, see
-[this DTrace blog post](http://dtrace.org/blogs/bmc/2011/02/08/llquantize/).
-
-Rather than requiring the user to provide a static list of buckets, log/linear
-buckets can be automatically generated. They are still static buckets in that
-the value of the bucket itself doesn't change. They are, however, dynamic in that
-the number of buckets can expand out as needed by the client's observations.
-
-Let's take an example. If my application observes a latency of 6ms, this is what
-a set of log/linear buckets would produce:
-```
-# HELP muskie_request_latency_ms latency of requests completed
-# TYPE muskie_request_latency_ms histogram
-http_request_latency_ms{le="1"} 0
-http_request_latency_ms{le="3"} 0
-http_request_latency_ms{le="5"} 0
-http_request_latency_ms{le="7"} 1
-http_request_latency_ms{le="9"} 1
-http_request_latency_ms{le="+Inf"} 1
-http_request_latency_ms_count{} 1
-http_request_latency_ms_sum{} 6
-```
-
-Now if my application observes a latency of 600ms, this is what we receive:
-```
-# HELP http_request_latency_ms latency of requests completed
-# TYPE http_request_latency_ms histogram
-http_request_latency_ms{le="1"} 0
-http_request_latency_ms{le="3"} 0
-http_request_latency_ms{le="5"} 0
-http_request_latency_ms{le="7"} 1
-http_request_latency_ms{le="9"} 1
-http_request_latency_ms{le="81"} 1
-http_request_latency_ms{le="243"} 1
-http_request_latency_ms{le="405"} 1
-http_request_latency_ms{le="567"} 1
-http_request_latency_ms{le="729"} 2
-http_request_latency_ms{le="+Inf"} 2
-http_request_latency_ms_count{} 2
-http_request_latency_ms_sum{} 606
-```
-
-And then maybe we observe a latency of 60000ms!
-```
-# HELP http_request_latency_ms latency of requests completed
-# TYPE http_request_latency_ms histogram
-http_request_latency_ms{le="1"} 0
-http_request_latency_ms{le="3"} 0
-http_request_latency_ms{le="5"} 0
-http_request_latency_ms{le="7"} 1
-http_request_latency_ms{le="9"} 1
-http_request_latency_ms{le="81"} 1
-http_request_latency_ms{le="243"} 1
-http_request_latency_ms{le="405"} 1
-http_request_latency_ms{le="567"} 1
-http_request_latency_ms{le="729"} 2
-http_request_latency_ms{le="59049"} 2
-http_request_latency_ms{le="177147"} 3
-http_request_latency_ms{le="295245"} 3
-http_request_latency_ms{le="413343"} 3
-http_request_latency_ms{le="531441"} 3
-http_request_latency_ms{le="+Inf"} 3
-http_request_latency_ms_count{} 3
-http_request_latency_ms_sum{} 60606
-```
-
-We can see that log/linear buckets give us a lot of flexibility while only
-removing a slight bit of precision.
-
-Prometheus is fine with this way of adding additional buckets between
-scrape periods as long as we don't modify the bucket values themselves.
 
 ### Dynamic Labelling
 We can see in the last example that the metric inherited two labels, and

diff --git a/docs/private_api.md b/docs/private_api.md
@@ -189,5 +189,3 @@ metrics are collected. See collector.processTriggers() for more information.
 |prometheus|callback|iterates through the Counters, calling `prometheus()` on their `MetricVector` object. The results are stitched together and added to the result of calling `prometheus()` on the Gauge's MetricVector|None (string and error via callback)|
 
 `Histogram()` is called by the parent object's `histogram()` function.
-Buckets will be created using the log/linear method, similar to how it's done in
-[DTrace](http://dtrace.org/blogs/bmc/2011/02/08/llquantize/).