Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: add metrics for cross-region snapshots #104124

Closed
wenyihu6 opened this issue May 30, 2023 · 0 comments · Fixed by #104111
Closed

kv: add metrics for cross-region snapshots #104124

wenyihu6 opened this issue May 30, 2023 · 0 comments · Fixed by #104111
Assignees
Labels
A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team

Comments

@wenyihu6
Copy link
Contributor

wenyihu6 commented May 30, 2023

Is your feature request related to a problem? Please describe.

Currently, it is difficult to observe cross-region snapshots sent and received.
This limitation becomes problematic when we need to assess the volume of
cross-region snapshots across regions.

Describe the solution you'd like

One solution is to collect byte metrics of snapshots sent and received for each
store.

Jira issue: CRDB-28363

@wenyihu6 wenyihu6 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv Anything in KV that doesn't belong in a more specific category. A-kv-observability labels May 30, 2023
@wenyihu6 wenyihu6 self-assigned this May 30, 2023
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue May 30, 2023
This commit refactors `getSnapshotBytesMetrics` in `replica_learner_test`
to return a `map[string]snapshotBytesMetrics` instead of
`map[SnapShotRequest_Priority]snapshotBytesMetrics`. This allows us to include
and compare different types of snapshot metrics, removing the constraint of
being limited to `SnapShotRequest_Priority`. This commit does not change any
existing functionality, and the main purpose is to make future commits cleaner.

Part of: cockroachdb#104124
Release note: none
@wenyihu6 wenyihu6 added T-kv KV Team and removed A-kv Anything in KV that doesn't belong in a more specific category. labels May 30, 2023
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue May 30, 2023
Previously, there were no metrics to observe cross-region snapshot traffic in a
store.

To improve this issue, this commit adds two new metrics to StoreMetrics -
RangeSnapShotCrossRegionSentBytes and RangeSnapShotCrossRegionRcvdBytes. These
metrics track the byte count for snapshots sent and received within a store.

Release note (ops change): Two new metrics - RangeSnapShotCrossRegionSentBytes,
RangeSnapShotCrossRegionRcvdBytes - are now added to StoreMetrics.

Fixes: cockroachdb#104124
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue May 31, 2023
Previously, there were no metrics to observe cross-region snapshot traffic in a
store.

To improve this issue, this commit adds two new metrics to StoreMetrics -
RangeSnapShotCrossRegionSentBytes and RangeSnapShotCrossRegionRcvdBytes. These
metrics track the byte count for snapshots sent and received within a store.

Note: These metrics require nodes’ localities to include a tier with the key
“region”. If a node does not have this key but participates in cross-region
batch activities, metrics will remain unchanged, and an error message will be
logged. The region of each node is determined by using the locality field and
the “region” tier value. Unfortunately, the “region” key is hard coded here.
Ideally, we would prefer a more flexible approach to determine node locality.

Fixes: cockroachdb#104124
Release note (ops change): Two new metrics - RangeSnapShotCrossRegionSentBytes,
RangeSnapShotCrossRegionRcvdBytes - are now added to StoreMetrics. Note that
these metrics require nodes’ localities to include a “region” tier key. If a
node lacks this key but is involved in cross-region batch activities, an error
message will be logged.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 1, 2023
This commit refactors `getSnapshotBytesMetrics` in `replica_learner_test`
to return a `map[string]snapshotBytesMetrics` instead of
`map[SnapShotRequest_Priority]snapshotBytesMetrics`. This allows us to include
and compare different types of snapshot metrics, removing the constraint of
being limited to `SnapShotRequest_Priority`. This commit does not change any
existing functionality, and the main purpose is to make future commits cleaner.

Part of: cockroachdb#104124
Release note: none
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 1, 2023
Previously, there were no metrics to observe cross-region snapshot traffic
between stores within a cluster.

To improve this issue, this commit adds two new store metrics -
`range.snapshots.cross-region.sent-bytes` and
`range.snapshots.cross-region.rcvd-bytes`. These metrics track the aggregate
snapshot bytes sent from and received at a store.

Fixes: cockroachdb#104124

Release note (ops change): Two new store metrics -
`range.snapshots.cross-region.sent-bytes` and
`range.snapshots.cross-region.rcvd-bytes` - are now added to track the aggregate
snapshot bytes sent from and received at a store. Note that these metrics
require nodes’ localities to include a “region” tier key. If a node lacks this
key but is involved in cross-region batch activities, an error message will be
logged.
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 2, 2023
This commit refactors `getSnapshotBytesMetrics` in `replica_learner_test`
to return a `map[string]snapshotBytesMetrics` instead of
`map[SnapShotRequest_Priority]snapshotBytesMetrics`. This allows us to include
and compare different types of snapshot metrics, removing the constraint of
being limited to `SnapShotRequest_Priority`. This commit does not change any
existing functionality, and the main purpose is to make future commits cleaner.

Part of: cockroachdb#104124
Release note: none
wenyihu6 added a commit to wenyihu6/cockroach that referenced this issue Jun 2, 2023
Previously, there were no metrics to observe cross-region snapshot traffic
between stores within a cluster.

To improve this issue, this commit adds two new store metrics -
`range.snapshots.cross-region.sent-bytes` and
`range.snapshots.cross-region.rcvd-bytes`. These metrics track the aggregate
snapshot bytes sent from and received at a store.

Fixes: cockroachdb#104124

Release note (ops change): Two new store metrics -
`range.snapshots.cross-region.sent-bytes` and
`range.snapshots.cross-region.rcvd-bytes` - are now added to track the aggregate
snapshot bytes sent from and received at a store. Note that these metrics
require nodes’ localities to include a “region” tier key. If a node lacks this
key but is involved in cross-region batch activities, an error message will be
logged.
craig bot pushed a commit that referenced this issue Jun 5, 2023
104058: pgcrypto: add helper functions for PKCS padding r=rafiss a=andyyang890

This patch adds helper functions for PKCS padding/unpadding, which is
needed for pgcrypto's raw encryption functions.

Informs #21001 

Release note: None

104111: kvserver: add cross-region snapshot byte metrics to StoreMetrics r=kvoli,andrewbaptist a=wenyihu6

**kvserver: refactor getSnapshotBytesMetrics**

This commit refactors `getSnapshotBytesMetrics` in `replica_learner_test`
to return a `map[string]snapshotBytesMetrics` instead of
`map[SnapShotRequest_Priority]snapshotBytesMetrics`. This allows us to include
and compare different types of snapshot metrics, removing the constraint of
being limited to `SnapShotRequest_Priority`. This commit does not change any
existing functionality, and the main purpose is to make future commits cleaner.

Part of: #104124
Release note: none

---
 
**kvserver: add cross-region snapshot byte metrics to StoreMetrics**

Previously, there were no metrics to observe cross-region snapshot traffic
between stores within a cluster.

To improve this issue, this commit adds two new store metrics -
`range.snapshots.cross-region.sent-bytes` and
`range.snapshots.cross-region.rcvd-bytes`. These metrics track the aggregate of
snapshot bytes sent from and received at a store across different regions.

Resolves: #104124

Release note (ops change): Two new store metrics -
`range.snapshots.cross-region.sent-bytes` and
`range.snapshots.cross-region.rcvd-bytes` - are now added to track the aggregate
of snapshot bytes sent from and received at a store across different regions.
Note that these metrics require nodes’ localities to include a “region” tier
key. If a node lacks this key but is involved in cross-region batch activities,
an error message will be logged.



Co-authored-by: Andy Yang <[email protected]>
Co-authored-by: Wenyi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant