Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud: add read and write metrics #89584

Merged
merged 1 commit into from
Nov 4, 2022
Merged

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Oct 7, 2022

This change adds cloud.read_bytes and cloud.write_bytes metrics.
These are updated on all read/write operations to external storage endpoints.

These metrics while shared by all external storage interactions
will provide an immediate signal into whether we're seeing an
abnormal rate of reads and writes errors during support investigations.

Fixes: #89242

Release note: None

@adityamaru adityamaru requested review from a team as code owners October 7, 2022 18:52
@adityamaru adityamaru requested a review from a team October 7, 2022 18:52
@adityamaru adityamaru requested review from a team as code owners October 7, 2022 18:52
@adityamaru adityamaru requested review from renatolabs and benbardin and removed request for a team October 7, 2022 18:52
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@adityamaru adityamaru requested review from stevendanna and removed request for a team and renatolabs October 7, 2022 18:52
@adityamaru
Copy link
Contributor Author

A 2TB import; I paused it in the middle to fiddle with some cluster settings for funsies.

Screen Shot 2022-10-07 at 2 55 06 PM

And then a backup of that data:

Screen Shot 2022-10-07 at 2 55 18 PM

@adityamaru adityamaru force-pushed the metric-cloud branch 3 times, most recently from 3197ce2 to 120206b Compare October 8, 2022 13:11
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we concerned that the retry count is potentially not going to reflect retries in the implementation of some clients and/or at a higher layer either? A misleading metric might be worse than no metric eg if someone looks at it, sees it is flat and dismisses retries even if they're happening inside foo-sdk or library?

@adityamaru
Copy link
Contributor Author

Are we concerned that the retry count is potentially not going to reflect retries in the implementation of some clients and/or at a higher layer either?

Yeah that's a good point and one that crossed my mind when I was plumbing counter increments into all the retry loops. We could try to be diligent about having all clients read and write paths that we retry in pkg/cloud/<provider> increment this counter but maybe that's too optimistic. This doesn't address internal SDK retries or higher-level job retries either but maybe tweaking the metric description to make this clear solves that?

Alternatively, we could breakdown the retries into cloud.resumingreader.error.retries, cloud.custom.error.retries to better define what they track? The latter being any custom error handling we inject into the SDKs.

@adityamaru adityamaru changed the title cloud: add read, write and error retry metrics cloud: add read and write metrics Oct 24, 2022
@adityamaru adityamaru force-pushed the metric-cloud branch 3 times, most recently from 199cb4f to 1b8933b Compare October 25, 2022 13:35
@adityamaru adityamaru requested a review from a team as a code owner October 25, 2022 13:35
@adityamaru
Copy link
Contributor Author

TFTR!

bors r=dt

@craig
Copy link
Contributor

craig bot commented Oct 26, 2022

Merge conflict.

pkg/roachpb/api.proto Outdated Show resolved Hide resolved
This change adds cloud.read_bytes and cloud.write_bytes metrics.
These are updated on all read/write operations to external storage endpoints.

These metrics while shared by all external storage interactions
will provide an immediate signal into whether we're seeing an
abnormal rate of reads and writes errors during support investigations.

Fixes: cockroachdb#89242

Release note: None
@adityamaru
Copy link
Contributor Author

bors r=dt,stevendanna

@adityamaru
Copy link
Contributor Author

bors retry

@adityamaru
Copy link
Contributor Author

bors r-

@craig
Copy link
Contributor

craig bot commented Nov 4, 2022

Canceled.

@adityamaru
Copy link
Contributor Author

bors r+

@craig craig bot merged commit c57b340 into cockroachdb:master Nov 4, 2022
@craig
Copy link
Contributor

craig bot commented Nov 4, 2022

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Nov 4, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 17a5159 to blathers/backport-release-22.2-89584: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloud: add a metrics recorder that tracks per provider metrics
4 participants