[riakcs] support Riak CS 2.1+ stats format #2920

millerdev · 2016-10-17T19:29:08Z

What does this PR do?

Motivation

Our Riak CS integration has been saying "no data received" for a long time. This adapts the existing collector to the new stats format introduced in Riak CS 2.1.

Testing Guidelines

There are no tests for the riakcs stats check. However, I dropped riakcs.py from this PR on one of our servers and verified that it works and that the stats can be queried in the web interface.

Additional Notes

The new stats format includes many more stats (>1000), and the app link for riakcs in our infrastructure list on the website now takes FOREVER to load, and it freezes the whole page while it's loading. Seems like that could be optimized, but it's outside the scope of this change.

millerdev · 2016-10-17T19:35:03Z

checks.d/riakcs.py

+
+STATS_METHODS = {
+    "one": "count",
+    "total": "monotonic_count",


For metrics ending in _one or _total I guessed they would be better tracked as counts than a as simple gauge. It wasn't entirely clear which method was the best for each of these. More info on various types of stats here: https://github.com/basho/riak_cs/wiki/%5BRFC%5D-Riak-CS-and-Stanchion-metrics-2.1

As I'm reviewing these again and after reviewing the stats reported in my test, I think total maybe should use gauge rather than monotonic_count. Thoughts?

remh · 2016-10-25T16:31:18Z

Thanks a lot @millerdev
Could you add some tests ? Riak CS tests are actually mocked https://github.com/DataDog/dd-agent/blob/master/tests/checks/mock/test_riakcs.py given that setting up RiakCS in a test environment is not straight forward.

Could you update this test with the new format ?

Also i'm a bit worried about the number of new metrics it collects. Looks like it will start collecting thousands of metrics while it was just collecting in the hundreds. Any possibility to trim that down and to focus on the most relevant metrics ?

millerdev

@remh I pared down the list of collected metrics and added tests for the new stats format.

millerdev · 2016-11-10T17:50:43Z

checks.d/riakcs.py

+# config; the value should be a list of metric names.
+#
+# Helpful references:
+# - https://github.com/basho/riak_cs/wiki/Riak-cs-and-stanchion-metrics


I chose to only collect a limited set of S3 API and memory metrics by default. I'm not sure how to identify the set that most people will want, so I just went with a set that seemed reasonable for my usage (although I haven't done anything with it yet, so this may evolve). Here are some reasons for each of the exclusions listed above:

bucket_acl_(get|put) and object_acl_(get|put) - I'm not using ACLs.

bucket_policy_(get|put|delete) - I'm not doing any automated bucket policy updates.

*_in_(one|total) - guessing this is a similar metric to the _out_ variants.

*_time_error_* - it seems like the timing of errors is less important than the count of errors.

*_time_100 - had a hard time figuring out what this even means. Is it 99.9th percentile or worst case (max) latency? Decided to exclude it since it was not in the old metrics format.

The total number of metrics collected now (by default) is around 150.

remh

Thank you!

[riakcs] support Riak CS 2.1+ stats format

bd28a78

millerdev commented Oct 17, 2016

View reviewed changes

masci added checks bugfix labels Oct 18, 2016

masci added this to the Triage milestone Oct 18, 2016

remh modified the milestones: 5.10.0, Triage Oct 21, 2016

remh self-assigned this Oct 25, 2016

remh modified the milestones: 5.10.1, 5.10.0 Oct 27, 2016

Pair down # of riakcs metrics and add tests

2467771

millerdev commented Nov 10, 2016

View reviewed changes

remh approved these changes Nov 17, 2016

View reviewed changes

remh merged commit 20485a6 into DataDog:master Nov 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[riakcs] support Riak CS 2.1+ stats format #2920

[riakcs] support Riak CS 2.1+ stats format #2920

millerdev commented Oct 17, 2016

millerdev Oct 17, 2016

millerdev Oct 17, 2016

remh commented Oct 25, 2016

millerdev left a comment

millerdev Nov 10, 2016 •

edited

Loading

remh left a comment

[riakcs] support Riak CS 2.1+ stats format #2920

[riakcs] support Riak CS 2.1+ stats format #2920

Conversation

millerdev commented Oct 17, 2016

What does this PR do?

Motivation

Testing Guidelines

Additional Notes

millerdev Oct 17, 2016

Choose a reason for hiding this comment

millerdev Oct 17, 2016

Choose a reason for hiding this comment

remh commented Oct 25, 2016

millerdev left a comment

Choose a reason for hiding this comment

millerdev Nov 10, 2016 • edited Loading

Choose a reason for hiding this comment

remh left a comment

Choose a reason for hiding this comment

millerdev Nov 10, 2016 •

edited

Loading