Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: add pebble ingestion/flush metrics #81039

Merged
merged 1 commit into from
May 11, 2022

Conversation

tbg
Copy link
Member

@tbg tbg commented May 5, 2022

This adds:

storage.l0-bytes-flushed
storage.lX-num-files
storage.lX-bytes-ingested

which are all useful to understand inverted LSMs.

Release note: None

@tbg tbg requested review from sumeerbhola and a team May 5, 2022 09:11
@tbg tbg requested a review from a team as a code owner May 5, 2022 09:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)


pkg/kv/kvserver/metrics.go line 465 at r1 (raw file):

	"Number of SSTables in Level %d",
	"SSTables",
	metric.Unit_COUNT,

How will we use the number of files in the other levels?
The L0 files are exported because they are used in admission control, and now also as a compaction signal for L0.
That is not the case for the other levels.


pkg/kv/kvserver/metrics.go line 1871 at r1 (raw file):

		AverageQueriesPerSecond: metric.NewGaugeFloat64(metaAverageQueriesPerSecond),
		AverageWritesPerSecond:  metric.NewGaugeFloat64(metaAverageWritesPerSecond),
		// TODO(tbg): this histogram seems bogus? What are we tracking here?

I think it is part of the allocator decision making
@kvoli would know.


pkg/kv/kvserver/metrics.go line 1906 at r1 (raw file):

		RdbPendingCompaction:        metric.NewGauge(metaRdbPendingCompaction),
		RdbMarkedForCompactionFiles: metric.NewGauge(metaRdbMarkedForCompactionFiles),
		RdbL0BytesFlushed:           metric.NewGauge(metaRdbL0BytesFlushed),

Flushed and ingested are cumulative values, i.e., not gauges. I guess we are having to do this because of the deficiency of our metrics system.
We should at least have a code comment. A bunch of the existing metrics, *Hits, *Misses, *Count are also cumulative.

This adds:

```
storage.l0-bytes-flushed
storage.lX-num-files
storage.lX-bytes-ingested
```

which are all useful to understand inverted LSMs

Release note: None
@tbg tbg force-pushed the pebble-level-metrics branch from b28dc0c to 8dfd075 Compare May 11, 2022 01:37
@tbg tbg requested a review from sumeerbhola May 11, 2022 01:38
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @sumeerbhola from 3 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif and @sumeerbhola)


pkg/kv/kvserver/metrics.go line 465 at r1 (raw file):

Previously, sumeerbhola wrote…

How will we use the number of files in the other levels?
The L0 files are exported because they are used in admission control, and now also as a compaction signal for L0.
That is not the case for the other levels.

I removed all but L0.


pkg/kv/kvserver/metrics.go line 1906 at r1 (raw file):

Previously, sumeerbhola wrote…

Flushed and ingested are cumulative values, i.e., not gauges. I guess we are having to do this because of the deficiency of our metrics system.
We should at least have a code comment. A bunch of the existing metrics, *Hits, *Misses, *Count are also cumulative.

I added a comment.

@tbg
Copy link
Member Author

tbg commented May 11, 2022

bors r=sumeerbhola

TFTR!

@craig
Copy link
Contributor

craig bot commented May 11, 2022

Build failed:

@tbg
Copy link
Member Author

tbg commented May 11, 2022

bors r=sumeerbhola

TestLeaseholdersRejectClockUpdateWithJump
    test_log_scope.go:79: test logs captured to: /go/src/github.com/cockroachdb/cockroach/artifacts/logTestLeaseholdersRejectClockUpdateWithJump1654522692
    test_log_scope.go:80: use -show-logs to present logs inline
    client_replica_test.go:182: remote wall time is too far ahead (500.007519ms) to be trustworthy

#79641

@craig
Copy link
Contributor

craig bot commented May 11, 2022

Build succeeded:

@craig craig bot merged commit bc1ee7c into cockroachdb:master May 11, 2022
@tbg tbg deleted the pebble-level-metrics branch May 12, 2022 11:56
@irfansharif
Copy link
Contributor

blathers backport 22.1

@blathers-crl
Copy link

blathers-crl bot commented Nov 29, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 8dfd075 to blathers/backport-release-22.1-81039: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@irfansharif
Copy link
Contributor

irfansharif commented Nov 29, 2022

@tbg, these metrics would've come in handy in this internal escalation. They were running 22.1, these metrics are only present in 22.2+. Could we backport?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants