Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JMX Scraper - YAML config and integration test for HBase #1538

Merged
merged 10 commits into from
Nov 25, 2024
Original file line number Diff line number Diff line change
Expand Up @@ -43,22 +43,22 @@ void endToEnd() {
metric,
"hbase.master.region_server.count",
"The number of region servers.",
"{servers}",
"{server}",
attrs -> attrs.contains(entry("state", "dead")),
attrs -> attrs.contains(entry("state", "live"))),
metric ->
assertSum(
metric,
"hbase.master.regions_in_transition.count",
"The number of regions that are in transition.",
"{regions}",
"{region}",
/* isMonotonic= */ false),
metric ->
assertSum(
metric,
"hbase.master.regions_in_transition.over_threshold",
"The number of regions that have been in transition longer than a threshold time.",
"{regions}",
"{region}",
/* isMonotonic= */ false),
metric ->
assertGauge(
Expand All @@ -71,14 +71,14 @@ void endToEnd() {
metric,
"hbase.region_server.region.count",
"The number of regions hosted by the region server.",
"{regions}",
"{region}",
attrs -> attrs.containsKey("region_server")),
metric ->
assertSumWithAttributes(
metric,
"hbase.region_server.disk.store_file.count",
"The number of store files on disk currently managed by the region server.",
"{files}",
"{file}",
attrs -> attrs.containsKey("region_server")),
metric ->
assertSumWithAttributes(
Expand All @@ -92,22 +92,22 @@ void endToEnd() {
metric,
"hbase.region_server.write_ahead_log.count",
"The number of write ahead logs not yet archived.",
"{logs}",
"{log}",
attrs -> attrs.containsKey("region_server")),
metric ->
assertSumWithAttributes(
metric,
"hbase.region_server.request.count",
"The number of requests received.",
"{requests}",
"{request}",
attrs -> attrs.contains(entry("state", "write")),
attrs -> attrs.contains(entry("state", "read"))),
metric ->
assertSumWithAttributes(
metric,
"hbase.region_server.queue.length",
"The number of RPC handlers actively servicing requests.",
"{handlers}",
"{handler}",
attrs -> attrs.contains(entry("state", "flush")),
attrs -> attrs.contains(entry("state", "compaction"))),
metric ->
Expand All @@ -122,7 +122,7 @@ void endToEnd() {
metric,
"hbase.region_server.request.count",
"The number of requests received.",
"{requests}",
"{request}",
attrs -> attrs.contains(entry("state", "write")),
attrs -> attrs.contains(entry("state", "read"))),
metric ->
Expand Down Expand Up @@ -347,7 +347,7 @@ void endToEnd() {
metric,
"hbase.region_server.operations.slow",
"Number of operations that took over 1000ms to complete.",
"{operations}",
"{operation}",
attrs -> attrs.contains(entry("operation", "delete")),
attrs -> attrs.contains(entry("operation", "append")),
attrs -> attrs.contains(entry("operation", "get")),
Expand All @@ -358,21 +358,21 @@ void endToEnd() {
metric,
"hbase.region_server.open_connection.count",
"The number of open connections at the RPC layer.",
"{connections}",
"{connection}",
attrs -> attrs.containsKey("region_server")),
metric ->
assertSumWithAttributes(
metric,
"hbase.region_server.active_handler.count",
"The number of RPC handlers actively servicing requests.",
"{handlers}",
"{handler}",
attrs -> attrs.containsKey("region_server")),
metric ->
assertSumWithAttributes(
metric,
"hbase.region_server.queue.request.count",
"The number of currently enqueued requests.",
"{requests}",
"{request}",
attrs -> attrs.contains(entry("state", "replication")),
attrs -> attrs.contains(entry("state", "user")),
attrs -> attrs.contains(entry("state", "priority"))),
Expand All @@ -381,7 +381,7 @@ void endToEnd() {
metric,
"hbase.region_server.authentication.count",
"Number of client connection authentication failures/successes.",
"{authentication requests}",
"{authentication request}",
robsunday marked this conversation as resolved.
Show resolved Hide resolved
attrs -> attrs.contains(entry("state", "successes")),
attrs -> attrs.contains(entry("state", "failures"))),
metric ->
Expand Down
30 changes: 15 additions & 15 deletions jmx-metrics/src/main/resources/target-systems/hbase.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -16,45 +16,45 @@

def beanMasterServer = otel.mbeans("Hadoop:service=HBase,name=Master,sub=Server")
otel.instrument(beanMasterServer, "hbase.master.region_server.count",
"The number of region servers.", "{servers}",
"The number of region servers.", "{server}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably misremembering, but I thought the plan was to not change things in the existing jmx-metrics component, and fix them only in the jmx-scraper implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was our initial plan, however we (or at least I) changed a bit our mind about it for units because:

  • units is just metadata, the values captured remain the same and it's quite unlikely any implementation strongly relies on those values (for example, it might break some UI/i18n, but the values remain the same). Changing the metric or its attribute names would not be the same minor impact.
  • the semconv conventions about this are stable, thus those changes will eventually have to be done at some point, and this makes one less difference to deal with when enhancing the mappings later.

We already did similar changes with other JMX scraper PRs, for example:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may only add that this change should be perfectly safe since the only changes were made in unit annotations that are discarded by the parsers anyway (according to these docs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks!

["numDeadRegionServers":["state" : {"dead"}], "numRegionServers": ["state" : {"live"}]],
otel.&longUpDownCounterCallback)

def beanMasterAssignmentManager = otel.mbean("Hadoop:service=HBase,name=Master,sub=AssignmentManager")
otel.instrument(beanMasterAssignmentManager, "hbase.master.regions_in_transition.count",
"The number of regions that are in transition.", "{regions}",
"The number of regions that are in transition.", "{region}",
"ritCount", otel.&longUpDownCounterCallback)
otel.instrument(beanMasterAssignmentManager, "hbase.master.regions_in_transition.over_threshold",
"The number of regions that have been in transition longer than a threshold time.", "{regions}",
"The number of regions that have been in transition longer than a threshold time.", "{region}",
"ritCountOverThreshold", otel.&longUpDownCounterCallback)
otel.instrument(beanMasterAssignmentManager, "hbase.master.regions_in_transition.oldest_age",
"The age of the longest region in transition.", "ms",
"ritOldestAge", otel.&longValueCallback)

def beanRegionServerServer = otel.mbean("Hadoop:service=HBase,name=RegionServer,sub=Server")
otel.instrument(beanRegionServerServer, "hbase.region_server.region.count",
"The number of regions hosted by the region server.", "{regions}",
"The number of regions hosted by the region server.", "{region}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
"regionCount", otel.&longUpDownCounterCallback)
otel.instrument(beanRegionServerServer, "hbase.region_server.disk.store_file.count",
"The number of store files on disk currently managed by the region server.", "{files}",
"The number of store files on disk currently managed by the region server.", "{file}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
"storeFileCount", otel.&longUpDownCounterCallback)
otel.instrument(beanRegionServerServer, "hbase.region_server.disk.store_file.size",
"Aggregate size of the store files on disk.", "By",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
"storeFileSize", otel.&longUpDownCounterCallback)
otel.instrument(beanRegionServerServer, "hbase.region_server.write_ahead_log.count",
"The number of write ahead logs not yet archived.", "{logs}",
"The number of write ahead logs not yet archived.", "{log}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
"hlogFileCount", otel.&longUpDownCounterCallback)
otel.instrument(beanRegionServerServer, "hbase.region_server.request.count",
"The number of requests received.", "{requests}",
"The number of requests received.", "{request}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
["writeRequestCount":["state" : {"write"}], "readRequestCount": ["state" : {"read"}]],
otel.&longUpDownCounterCallback)
otel.instrument(beanRegionServerServer, "hbase.region_server.queue.length",
"The number of RPC handlers actively servicing requests.", "{handlers}",
"The number of RPC handlers actively servicing requests.", "{handler}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
["flushQueueLength":["state" : {"flush"}], "compactionQueueLength": ["state" : {"compaction"}]],
otel.&longUpDownCounterCallback)
Expand All @@ -63,7 +63,7 @@ otel.instrument(beanRegionServerServer, "hbase.region_server.blocked_update.time
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
"updatesBlockedTime", otel.&longValueCallback)
otel.instrument(beanRegionServerServer, "hbase.region_server.block_cache.operation.count",
"Number of block cache hits/misses.", "{operations}",
"Number of block cache hits/misses.", "{operation}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
["blockCacheMissCount":["state" : {"miss"}], "blockCacheHitCount": ["state" : {"hit"}]],
otel.&longValueCallback)
Expand Down Expand Up @@ -199,7 +199,7 @@ otel.instrument(beanRegionServerServer, "hbase.region_server.operation.increment
"Increment_median", otel.&longValueCallback)

otel.instrument(beanRegionServerServer, "hbase.region_server.operations.slow",
"Number of operations that took over 1000ms to complete.", "{operations}",
"Number of operations that took over 1000ms to complete.", "{operation}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
[
"slowDeleteCount":["operation" : {"delete"}],
Expand All @@ -212,15 +212,15 @@ otel.instrument(beanRegionServerServer, "hbase.region_server.operations.slow",

def beanRegionServerIPC = otel.mbean("Hadoop:service=HBase,name=RegionServer,sub=IPC")
otel.instrument(beanRegionServerIPC, "hbase.region_server.open_connection.count",
"The number of open connections at the RPC layer.", "{connections}",
"The number of open connections at the RPC layer.", "{connection}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
"numOpenConnections", otel.&longUpDownCounterCallback)
otel.instrument(beanRegionServerIPC, "hbase.region_server.active_handler.count",
"The number of RPC handlers actively servicing requests.", "{handlers}",
"The number of RPC handlers actively servicing requests.", "{handler}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
"numActiveHandler", otel.&longUpDownCounterCallback)
otel.instrument(beanRegionServerIPC, "hbase.region_server.queue.request.count",
"The number of currently enqueued requests.", "{requests}",
"The number of currently enqueued requests.", "{request}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
[
"numCallsInReplicationQueue":["state" : {"replication"}],
Expand All @@ -229,7 +229,7 @@ otel.instrument(beanRegionServerIPC, "hbase.region_server.queue.request.count",
],
otel.&longUpDownCounterCallback)
otel.instrument(beanRegionServerIPC, "hbase.region_server.authentication.count",
"Number of client connection authentication failures/successes.", "{authentication requests}",
"Number of client connection authentication failures/successes.", "{authentication request}",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
["authenticationSuccesses":["state" : {"successes"}], "authenticationFailures": ["state" : {"failures"}]],
otel.&longUpDownCounterCallback)
Expand All @@ -246,4 +246,4 @@ otel.instrument(beanJVMMetrics, "hbase.region_server.gc.young_gen.time",
otel.instrument(beanJVMMetrics, "hbase.region_server.gc.old_gen.time",
"Time spent in garbage collection of the old generation.", "ms",
["region_server" : { mbean -> mbean.getProperty("tag.Hostname") }],
"GcTimeMillisConcurrentMarkSweep", otel.&longCounterCallback)
"GcTimeMillisConcurrentMarkSweep", otel.&longCounterCallback)
Loading
Loading