Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver: TestTenantRateLimiter failed #101901

Closed
cockroach-teamcity opened this issue Apr 20, 2023 · 7 comments · Fixed by #101949
Closed

kv/kvserver: TestTenantRateLimiter failed #101901

cockroach-teamcity opened this issue Apr 20, 2023 · 7 comments · Fixed by #101949
Assignees
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-robot Originated from a bot. T-kv KV Team
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Apr 20, 2023

kv/kvserver.TestTenantRateLimiter failed with artifacts on release-23.1 @ ad6ce866ea3b5c5bb47ba9a0ac19b721a0c98add:

      github.com/cockroachdb/cockroach/pkg/server/node_http_router.go:63 +0x163
  github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler()
      github.com/cockroachdb/cockroach/pkg/server/server_http.go:312 +0x3c4
  github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler-fm()
      <autogenerated>:1 +0x57
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_http.go:106 +0x6a5
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux-fm()
      <autogenerated>:1 +0x57
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  net/http.serverHandler.ServeHTTP()
      GOROOT/src/net/http/server.go:2947 +0x641
  net/http.(*conn).serve()
      GOROOT/src/net/http/server.go:1991 +0xbe4
  net/http.(*Server).Serve.func3()
      GOROOT/src/net/http/server.go:3102 +0x58

Goroutine 1264535 (running) created at:
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:461 +0x619
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:332 +0x1ae
  github.com/cockroachdb/cockroach/pkg/server.(*Node).startComputePeriodicMetrics()
      github.com/cockroachdb/cockroach/pkg/server/node.go:808 +0x26
  github.com/cockroachdb/cockroach/pkg/server.(*Node).start()
      github.com/cockroachdb/cockroach/pkg/server/node.go:580 +0xd16
  github.com/cockroachdb/cockroach/pkg/server.(*Server).PreStart()
      github.com/cockroachdb/cockroach/pkg/server/server.go:1699 +0x33ba
  github.com/cockroachdb/cockroach/pkg/server.(*TestServer).Start()
      github.com/cockroachdb/cockroach/pkg/server/testserver.go:614 +0x8f
  github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServer()
      github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:330 +0x1b6
  github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServer()
      github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:325 +0x96
  github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestTenantRateLimiter()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/client_tenant_test.go:168 +0x6a9
  testing.tRunner()
      GOROOT/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      GOROOT/src/testing/testing.go:1493 +0x47

Goroutine 1269778 (running) created at:
  net/http.(*Server).Serve()
      GOROOT/src/net/http/server.go:3102 +0x837
  github.com/cockroachdb/cockroach/pkg/server.startHTTPService.func4()
      github.com/cockroachdb/cockroach/pkg/server/server_http.go:280 +0x4f
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x1f6
==================

Parameters: TAGS=bazel,gss,race

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-27180

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Apr 20, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Apr 20, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Apr 20, 2023
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Apr 20, 2023

Looks like a legit race. Conservatively marking as GA blocker.

WARNING: DATA RACE
Write at 0x00c00e278758 by goroutine 1264535:
  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).Update()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:497 +0xe4
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).ComputeMetricsPeriodically()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:3203 +0x3e7
  github.com/cockroachdb/cockroach/pkg/server.(*Node).computeMetricsPeriodically.func1()
      github.com/cockroachdb/cockroach/pkg/server/node.go:832 +0xfa
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores.func1()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:150 +0x5d
  github.com/cockroachdb/cockroach/pkg/util/syncutil.(*IntMap).Range()
      github.com/cockroachdb/cockroach/pkg/util/syncutil/int_map.go:352 +0x330
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:149 +0x75
  github.com/cockroachdb/cockroach/pkg/server.(*Node).computeMetricsPeriodically()
      github.com/cockroachdb/cockroach/pkg/server/node.go:831 +0xac
  github.com/cockroachdb/cockroach/pkg/server.(*Node).startComputePeriodicMetrics.func1()
      github.com/cockroachdb/cockroach/pkg/server/node.go:816 +0x23b
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x1f6

Previous read at 0x00c00e278758 by goroutine 1269778:
  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).ToPrometheusMetric()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:577 +0x9a
  github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeRegistry.func1()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:87 +0xa9
  github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each.func1()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/registry.go:167 +0x6d
  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).Inspect()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:566 +0x3d
  github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/registry.go:166 +0x141
  github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeRegistry()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:105 +0x16d
  github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).ScrapeIntoPrometheus()
      github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:317 +0x2a7
  github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).ScrapeIntoPrometheus-fm()
      <autogenerated>:1 +0x44
  github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeAndPrintAsText()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:134 +0xb3
  github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).PrintAsText()
      github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:329 +0xb2
  github.com/cockroachdb/cockroach/pkg/server.varsHandler.handleVars()
      github.com/cockroachdb/cockroach/pkg/server/status.go:2024 +0x219
  github.com/cockroachdb/cockroach/pkg/server.varsHandler.handleVars-fm()
      <autogenerated>:1 +0x84
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  net/http.(*ServeMux).ServeHTTP()
      GOROOT/src/net/http/server.go:2487 +0xc5
  net/http.(*ServeMux).ServeHTTP-fm()
      <autogenerated>:1 +0x57
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  github.com/NYTimes/gziphandler.NewGzipLevelAndMinSize.func1.1()
      github.com/NYTimes/gziphandler/external/com_github_nytimes_gziphandler/gzip.go:253 +0x352
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  net/http.Handler.ServeHTTP-fm()
      <autogenerated>:1 +0x75
  github.com/cockroachdb/cockroach/pkg/server.(*nodeProxy).nodeProxyHandler()
      github.com/cockroachdb/cockroach/pkg/server/node_http_router.go:63 +0x163
  github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler()
      github.com/cockroachdb/cockroach/pkg/server/server_http.go:312 +0x3c4
  github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler-fm()
      <autogenerated>:1 +0x57
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_http.go:106 +0x6a5
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux-fm()
      <autogenerated>:1 +0x57
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  net/http.serverHandler.ServeHTTP()
      GOROOT/src/net/http/server.go:2947 +0x641
  net/http.(*conn).serve()
      GOROOT/src/net/http/server.go:1991 +0xbe4
  net/http.(*Server).Serve.func3()
      GOROOT/src/net/http/server.go:3102 +0x58

@andrewbaptist
Copy link
Collaborator

This was a result of the change to the histograms. @kvoli and @coolcom200 can you take a quick look at this. It seems like we just need a read lock in ToPrometheusMetric.

@kvoli
Copy link
Collaborator

kvoli commented Apr 20, 2023

This was a result of the change to the histograms. @kvoli and @coolcom200 can you take a quick look at this. It seems like we just need a read lock in ToPrometheusMetric.

You're right, we aren't locking when accessing the mwh.cum field in ToPrometheusMetric. This was added in 4a2d06a but because we so rarely write to manual histograms, it was fairly unlikely we'd encounter a race.

I'll post a patch to resolve and backport.

@andrewbaptist
Copy link
Collaborator

It makes sense to also lock it in Inspect. Although I'm not sure if that is ever called, but if it was, it could hit the same race while iterating over the data.

@kvoli
Copy link
Collaborator

kvoli commented Apr 20, 2023

I don't think we are able to lock in Inspect without using a re-entrant lock - see the stack trace:

  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).ToPrometheusMetric()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:577 +0x9a
  github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeRegistry.func1()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:87 +0xa9
  github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each.func1()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/registry.go:167 +0x6d
  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).Inspect()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:566 +0x3d

kvoli added a commit to kvoli/cockroach that referenced this issue Apr 20, 2023
It was possible for concurrent `Update` and `ToPrometheusMetric` calls
to race due to a read lock not being held when calling
`ToPrometheusMetric`.

This patch adds such a lock and updates the structure to conform to the
regular convention of a `mu` struct field, which indicates fields
requiring a lock.

Fixes: cockroachdb#101901

Release note: None
craig bot pushed a commit that referenced this issue Apr 21, 2023
101949: metric: fix race in manual histogram r=andrewbaptist a=kvoli

It was possible for concurrent `Update` and `ToPrometheusMetric` calls to race due to a read lock not being held when calling `ToPrometheusMetric`.

This patch adds such a lock and updates the structure to conform to the regular convention of a `mu` struct field, which indicates fields requiring a lock.

```
dev test pkg/kv/kvserver -f TestTenantRateLimiter -v --stress --race
...
288 runs so far, 0 failures, over 14m15s
```

Fixes: #101901

Release note: None

Co-authored-by: Austen McClernon <[email protected]>
@craig craig bot closed this as completed in 899a79a Apr 21, 2023
blathers-crl bot pushed a commit that referenced this issue Apr 21, 2023
It was possible for concurrent `Update` and `ToPrometheusMetric` calls
to race due to a read lock not being held when calling
`ToPrometheusMetric`.

This patch adds such a lock and updates the structure to conform to the
regular convention of a `mu` struct field, which indicates fields
requiring a lock.

Fixes: #101901

Release note: None
@kvoli kvoli reopened this Apr 21, 2023
@kvoli
Copy link
Collaborator

kvoli commented Apr 21, 2023

Resolved when backports merged.

kvoli added a commit to kvoli/cockroach that referenced this issue Apr 21, 2023
Manual reconstruction of cockroachdb#101949.

It was possible for concurrent `Update` and `ToPrometheusMetric` calls
to race due to a read lock not being held when calling
`ToPrometheusMetric`.

This patch adds such a lock.

Fixes: cockroachdb#101901

Release note: None
@kvoli
Copy link
Collaborator

kvoli commented Apr 21, 2023

Closed on #102009

@kvoli kvoli closed this as completed Apr 21, 2023
@erikgrinaker erikgrinaker added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-test-failure Broken test (automatically or manually discovered). GA-blocker O-robot Originated from a bot. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants