You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This FR is to add more visibility into reparenting operations in vtctld. Currently there are 3 x counters to explain Emergency Reparent Shard, but not Planned Parent Reshard:
ers_counter
ers_failure_counter
ers_success_counter
Also, these metrics don't explain the performance of the reshard operation, which is important in some investigations we've ran into in Production
I would like to add:
Counters for PRS operations (same ones as ERS)
prs_counter
prs_failure_counter
prs_success_counter
Add the Keyspace label to all ERS and PRS counters
Shard would be nice too but the metric cardinality would be an issue
Timings for both ERS and PRS operations
Finally, an opinion/question: I don't think names prefixed with ers_ and prs_ are very friendly and something more verbose, such as emergency_reparent_shard_ and planned_reparent_shard_ would be easier to search in graphing tools, etc. This may be a good time to rename if this is a concern to others, but it's totally optional
Use Case(s)
Observing the frequency of ERS + PRS (by Keyspace)
Observing the timing/performance of ERS + PRS
This is useful during investigations where the timeline and performance of these operations needs investigation
The text was updated successfully, but these errors were encountered:
This is a good idea. We should add Shard as a label as well, or combine keyspace/shard into one label (cluster?) regardless of cardinality because that is the level at which the operation happens and having just keyspace may not be sufficient.
Feature Description
This FR is to add more visibility into reparenting operations in
vtctld
. Currently there are 3 x counters to explain Emergency Reparent Shard, but not Planned Parent Reshard:ers_counter
ers_failure_counter
ers_success_counter
Also, these metrics don't explain the performance of the reshard operation, which is important in some investigations we've ran into in Production
I would like to add:
prs_counter
prs_failure_counter
prs_success_counter
Keyspace
label to all ERS and PRS countersShard
would be nice too but the metric cardinality would be an issueFinally, an opinion/question: I don't think names prefixed with
ers_
andprs_
are very friendly and something more verbose, such asemergency_reparent_shard_
andplanned_reparent_shard_
would be easier to search in graphing tools, etc. This may be a good time to rename if this is a concern to others, but it's totally optionalUse Case(s)
The text was updated successfully, but these errors were encountered: