Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: debug page for RocksDB tickers/histograms #36196

Merged
merged 1 commit into from
Apr 2, 2019

Conversation

ajkr
Copy link
Contributor

@ajkr ajkr commented Mar 27, 2019

Set up a debug page at "/_status/enginestats/<node_id>" that contains
values of all RocksDB ticker stats and histograms. It is also included
in debug zip.

A portion of the debug page (including both tickers and histograms)
looks like this:

tickershistograms

The debug zip file contains a JSON file named "debug/nodes/<node_id>/enginestats" with all the tickers/histograms, and looks like this:

{
  "stats": [
    {
      "storeId": 1,
      "tickersAndHistograms": {
        "tickers": {
          "rocksdb.blobdb.blob.file.bytes.read": "0",
...
          "rocksdb.write.wal": "296"
        },
        "histograms": {
          "rocksdb.blobdb.blob.file.read.micros": {
            "mean": 0,
            "p50": 0,
            "p95": 0,
            "p99": 0,
            "max": 0,
            "count": "0",
            "sum": "0"
          },
...
          "rocksdb.write.raw.block.micros": {
            "mean": 1.8235294117647058,
            "p50": 0.6071428571428571,
            "p95": 10.749999999999993,
            "p99": 12,
            "max": 12,
            "count": "17",
            "sum": "31"
          }
        }
      }
    }
  ]
}

Release Note: None

@ajkr ajkr requested a review from a team March 27, 2019 04:04
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg
Copy link
Member

tbg commented Mar 27, 2019

Hmm, 170 lines is hefty. You mention that we're collecting them already anyways (see snippets below), so what do we win by printing them in the logs? Just easier debuggability in situations in which we can't look at the time series? It seems hard to make good use of them since they're all counters and so you end up manually diffing them. For perf work I doubt we're ever going to want to just look at the time series, which also give you 10s granularity while here we have 60s only.
I'm sorry I'm only bringing all this up now. I somehow assumed (for no good reason) that the stats dump would somehow be more compact and less metrics-y.

func (sm *StoreMetrics) updateRocksDBStats(stats engine.Stats) {
// We do not grab a lock here, because it's not possible to get a point-in-
// time snapshot of RocksDB stats. Retrieving RocksDB stats doesn't grab any
// locks, and there's no way to retrieve multiple stats in a single operation.
sm.RdbBlockCacheHits.Update(stats.BlockCacheHits)
sm.RdbBlockCacheMisses.Update(stats.BlockCacheMisses)
sm.RdbBlockCacheUsage.Update(stats.BlockCacheUsage)
sm.RdbBlockCachePinnedUsage.Update(stats.BlockCachePinnedUsage)
sm.RdbBloomFilterPrefixUseful.Update(stats.BloomFilterPrefixUseful)
sm.RdbBloomFilterPrefixChecked.Update(stats.BloomFilterPrefixChecked)
sm.RdbMemtableTotalSize.Update(stats.MemtableTotalSize)
sm.RdbFlushes.Update(stats.Flushes)
sm.RdbCompactions.Update(stats.Compactions)
sm.RdbTableReadersMemEstimate.Update(stats.TableReadersMemEstimate)
}

What are examples of some situations in which this dump would be useful? Can we make the timeseries useful instead? For example, which of the stats should we add to the snippet above? How should we expose the timeseries? For example, we have cockroach debug dumpts that dumps the time series contained in a cluster. That command is too bare bones right now, for example here's the output

https://gist.github.com/tbg/006a2cfde07ddbfc9c96f10e5b3d4f33

(I think this means that the snippet is cr.store.totalbytes from store 1 and it contains three datapoints 10s apart over a 30s interval)

but it's already more pleasant than trying to piece together multiple stats dumps from the logs. Do you think that can work instead? We need to include that info in the debug zip (not all of the ts, that's too much, but some recent interval).

@ajkr
Copy link
Contributor Author

ajkr commented Mar 27, 2019

You mention that we're collecting them already anyways (see snippets below), so what do we win by printing them in the logs? Just easier debuggability in situations in which we can't look at the time series?

Sorry "we're collecting" was the wrong word choice. What I meant is RocksDB is collecting them internally. Besides the handful of stats in the snippet you showed, the stats RocksDB is collecting don't appear to be exposed in Cockroach in any form.

It seems hard to make good use of them since they're all counters and so you end up manually diffing them. For perf work I doubt we're ever going to want to just look at the time series, which also give you 10s granularity while here we have 60s only.

I don't know. I find non-granular stats useful sometimes. For example you can compare rocksdb.bloom.filter.prefix.checked and rocksdb.bloom.filter.prefix.useful to answer whether your prefix bloom filters are effective. Or compare rocksdb.number.iter.skip and rocksdb.number.db.next.found to answer whether iterators are hitting lots of tombstones. Or compare rocksdb.block.cache.miss and rocksdb.block.cache.hit to answer whether the memory we're allocating to block cache is worthwhile.

What are examples of some situations in which this dump would be useful? Can we make the timeseries useful instead? For example, which of the stats should we add to the snippet above?

It is useful for cases where we didn't predict a stat would be useful so don't have it in the timeseries. Or maybe we upgrade RocksDB and don't notice a new stat. If we can think of a way to include everything in timeseries, that'd be preferable to me.

How should we expose the timeseries? For example, we have cockroach debug dumpts that dumps the time series contained in a cluster.

This is fine for me. I think the only question I have is whether you agree with preemptively exposing all stats as timeseries, or prefer to cherry-pick the ones we think are relevant?

@tbg
Copy link
Member

tbg commented Mar 27, 2019

For example you can compare rocksdb.bloom.filter.prefix.checked and rocksdb.bloom.filter.prefix.useful to answer whether your prefix bloom filters are effective. Or compare rocksdb.number.iter.skip and rocksdb.number.db.next.found to answer whether iterators are hitting lots of tombstones. Or compare rocksdb.block.cache.miss and rocksdb.block.cache.hit to answer whether the memory we're allocating to block cache is worthwhile.

That's a good point.

I talked to @bdarnell and we agreed that we can't add them wholesale to either timeseries or the logs, it's just too much (we have <600 time series in total right now, so we don't want to add 170 at once).
My preference would thus be cherry-picking the timeseries you think are most relevant. We can expose the whole list via a debug page on top of that to give access to everything when desired. It won't be available historically, but we can at least include it in a debug zip. How many of the stats do you think are "relevant" enough for timeseries?

Regarding missing a newly added metric, if you worry about that the thing to do - I think - is a unit test that basically extracts all the stats names from your dump string and fails whenever the list changes. I'll let you decide whether that is worth doing, I suspect not.

@ajkr
Copy link
Contributor Author

ajkr commented Mar 27, 2019

We can expose the whole list via a debug page on top of that to give access to everything when desired. It won't be available historically, but we can at least include it in a debug zip.

This is actually the main thing I'm interested in. Adding a few stats to timeseries is nice too but only works if we know in advance what we're looking for. For example, when I found the data synthesized by our workload tool is uncompressible that was due to noticing the stat "rocksdb.bytes.compressed" is always zero. It's not a very good example, but maybe we'd notice interesting engine-related issues in production too if we examine stats that we don't decide months in advance that we need.

@ajkr ajkr force-pushed the log-rocks-stats branch from 6c89fc5 to d266d6f Compare March 28, 2019 07:18
@ajkr ajkr requested a review from a team March 28, 2019 07:18
@ajkr ajkr requested a review from a team as a code owner March 28, 2019 07:18
@ajkr ajkr changed the title storage: periodically dump RocksDB stats [wip] storage: debug page for RocksDB tickers/histograms Mar 28, 2019
@ajkr
Copy link
Contributor Author

ajkr commented Mar 28, 2019

The *.pb.{h,cc,go} and some other auto-generated-looking files show huge changes on GitHub but do not show any changes on my local machine. I am not sure what causes this.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for banging this out so quickly! Only a few minor comments and you should write a test that calls the stats endpoint. I think an analogue of

// TestStatusGossipJson ensures that the output response for the full gossip
// info contains the required fields.
func TestStatusGossipJson(t *testing.T) {
defer leaktest.AfterTest(t)()
s, _, _ := serverutils.StartServer(t, base.TestServerArgs{})
defer s.Stopper().Stop(context.TODO())
var data gossip.InfoStatus
if err := getStatusJSONProto(s, "gossip/local", &data); err != nil {
t.Fatal(err)
}
if _, ok := data.Infos["first-range"]; !ok {
t.Errorf("no first-range info returned: %v", data)
}
if _, ok := data.Infos["cluster-id"]; !ok {
t.Errorf("no clusterID info returned: %v", data)
}
if _, ok := data.Infos["node:1"]; !ok {
t.Errorf("no node 1 info returned: %v", data)
}
if _, ok := data.Infos["system-db"]; !ok {
t.Errorf("no system config info returned: %v", data)
}
}
is enough, so it shouldn't be much work.

For what's exposed as time series, do you know of any that should be added? It is helpful to have historical information for some key metrics and I'm curious what you think about the ones we have so far.

Reviewed 19 of 19 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajkr)


c-deps/libroach/engine.cc, line 224 at r1 (raw file):

// `GetTickersAndHistograms` retrieves maps of all RocksDB tickers and histograms.
// It differs from `GetStats` by getting _every_ ticker and histogram, and by not
// getting anything else (DB properties, for example).

Mention that the caller needs to free stats.{tickers,histograms}.


pkg/server/serverpb/status.proto, line 231 at r1 (raw file):

}

message EngineStatsResponse {

This should also contain the StoreID, otherwise with multiple stores you won't know which stats belong to which store.


pkg/storage/engine/rocksdb.go, line 1195 at r1 (raw file):

	}

	tickers := (*[1 << 20]C.TickerInfo)(

For consistency, use the same constant value instead of 1 << 20 from here

src := (*[maxArrayLen]byte)(ptr)[:len:len]

(I know we don't ever expect to see 1 << 20 < tickers_len, but still.


pkg/storage/engine/rocksdb.go, line 1206 at r1 (raw file):

	res.Histograms = make(map[string]*enginepb.HistogramData)
	histograms := (*[1 << 20]C.HistogramInfo)(

ditto


pkg/storage/engine/rocksdb.go, line 1219 at r1 (raw file):

			Sum: uint64(histogram.sum),
		}
		res.Histograms[name] = value

My preference would be to have Histograms a map[string]HistogramData and not map[string]*HistogramData. You should be able to make this happen in the .proto file by using the option as here:

map<int64, config.ZoneConfig> zone_configs = 8 [(gogoproto.nullable) = false];

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS I don't know about the diffs. You've definitely done the right thing (i.e. the files needed to change) and they're also supposed to be checked in. Locally git suppresses these diffs (treats it as binary files, we've set things up that way), so maybe you just didn't notice them.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajkr)

@ajkr
Copy link
Contributor Author

ajkr commented Mar 28, 2019

Hm the tests use engine.InMem, while this is only supported for engine.RocksDB.

@tbg
Copy link
Member

tbg commented Mar 28, 2019

I think you just need to feed it a config that has a StoreSpec with a real dir in it:

StoreSpecs []StoreSpec

You can get a dir with testutils.TempDir()

@tbg
Copy link
Member

tbg commented Mar 28, 2019

Here's an example

dir, cleanupFn := testutils.TempDir(t)
defer cleanupFn()
s, err := serverutils.StartServerRaw(base.TestServerArgs{
StoreSpecs: []base.StoreSpec{{
Path: dir,
}},
})

@ajkr
Copy link
Contributor Author

ajkr commented Mar 28, 2019

That was fast, thanks!

@ajkr ajkr changed the title [wip] storage: debug page for RocksDB tickers/histograms storage: debug page for RocksDB tickers/histograms Mar 28, 2019
@ajkr ajkr force-pushed the log-rocks-stats branch 2 times, most recently from 8016ae1 to fd7fdf1 Compare March 28, 2019 23:25
@ajkr
Copy link
Contributor Author

ajkr commented Mar 28, 2019

Thanks a lot for the fast and detailed review. Addressed all your comments, I believe.

For what's exposed as time series, do you know of any that should be added? It is helpful to have historical information for some key metrics and I'm curious what you think about the ones we have so far.

OK, I went through them and picked a few that might be worth adding. Will do that in a separate PR, somewhat later.

  • "rocksdb.number.superversion_acquires", "rocksdb.number.superversion_releases", and "rocksdb.number.superversion_cleanups" - can be useful for debugging high memory/space usage. superversions pin memtables and ssts so holding them too long can cause problems.
  • "rocksdb.num.iterator.created" and "rocksdb.num.iterator.deleted" - after identifying old superversions are a problem, these can be useful to check whether those superversions are pinned by old iterators.
  • "rocksdb.number.iter.skip", "rocksdb.number.db.seek.found", "rocksdb.number.db.next.found", and "rocksdb.number.db.prev.found" - can be useful for debugging slow reads. Skips will be relatively high when there are lots of tombstones.
  • "rocksdb.no.file.opens" and "rocksdb.no.file.closes" - can be useful for debugging slow reads. Opening files in the read path is slower than getting them from table cache.
  • "rocksdb.stall.micros"

@ajkr ajkr force-pushed the log-rocks-stats branch 2 times, most recently from 446f105 to d266d6f Compare March 29, 2019 20:07
Set up a debug page at "/_status/enginestats/<node_id>" that contains
values of all RocksDB ticker stats and histograms. It is also included
in debug zip.

A portion of the debug page (including both tickers and histograms)
looks like this:

![tickershistograms](https://user-images.githubusercontent.com/4780362/55138005-aa88ab00-50ef-11e9-83d6-cc335cc10e6e.png)

The debug zip file contains a JSON file named "debug/nodes/<node_id>/enginestats" with all the tickers/histograms, and looks like this:

```
{
  "stats": [
    {
      "storeId": 1,
      "tickersAndHistograms": {
        "tickers": {
          "rocksdb.blobdb.blob.file.bytes.read": "0",
...
          "rocksdb.write.wal": "296"
        },
        "histograms": {
          "rocksdb.blobdb.blob.file.read.micros": {
            "mean": 0,
            "p50": 0,
            "p95": 0,
            "p99": 0,
            "max": 0,
            "count": "0",
            "sum": "0"
          },
...
          "rocksdb.write.raw.block.micros": {
            "mean": 1.8235294117647058,
            "p50": 0.6071428571428571,
            "p95": 10.749999999999993,
            "p99": 12,
            "max": 12,
            "count": "17",
            "sum": "31"
          }
        }
      }
    }
  ]
}
```

Release note: None
@ajkr ajkr force-pushed the log-rocks-stats branch from d266d6f to 166015e Compare March 29, 2019 20:15
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: and thanks!

Reviewed 14 of 14 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajkr)

@ajkr
Copy link
Contributor Author

ajkr commented Apr 2, 2019

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 2, 2019

Build failed

@tbg
Copy link
Member

tbg commented Apr 2, 2019

CI failed on #36369

bors r+

craig bot pushed a commit that referenced this pull request Apr 2, 2019
36196: storage: debug page for RocksDB tickers/histograms r=tbg a=ajkr

Set up a debug page at "/_status/enginestats/<node_id>" that contains
values of all RocksDB ticker stats and histograms. It is also included
in debug zip.

A portion of the debug page (including both tickers and histograms)
looks like this:

![tickershistograms](https://user-images.githubusercontent.com/4780362/55138005-aa88ab00-50ef-11e9-83d6-cc335cc10e6e.png)

The debug zip file contains a JSON file named "debug/nodes/<node_id>/enginestats" with all the tickers/histograms, and looks like this:

```
{
  "stats": [
    {
      "storeId": 1,
      "tickersAndHistograms": {
        "tickers": {
          "rocksdb.blobdb.blob.file.bytes.read": "0",
...
          "rocksdb.write.wal": "296"
        },
        "histograms": {
          "rocksdb.blobdb.blob.file.read.micros": {
            "mean": 0,
            "p50": 0,
            "p95": 0,
            "p99": 0,
            "max": 0,
            "count": "0",
            "sum": "0"
          },
...
          "rocksdb.write.raw.block.micros": {
            "mean": 1.8235294117647058,
            "p50": 0.6071428571428571,
            "p95": 10.749999999999993,
            "p99": 12,
            "max": 12,
            "count": "17",
            "sum": "31"
          }
        }
      }
    }
  ]
}
```

Release Note: None

Co-authored-by: Andrew Kryczka <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 2, 2019

Build succeeded

@craig craig bot merged commit 166015e into cockroachdb:master Apr 2, 2019
@ajkr
Copy link
Contributor Author

ajkr commented Apr 3, 2019

@tbg What do you think about backporting this to 19.1? Having the RocksDB stats in debug zip might be useful.

@tbg
Copy link
Member

tbg commented Apr 3, 2019

Absolutely, please backport this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants