Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: fix remaining data races #10341

Closed
wants to merge 12 commits into from
Closed

Conversation

dnephin
Copy link
Contributor

@dnephin dnephin commented Jun 2, 2021

Related to #8329
Fixes #9458 and includes a hacky fix for #9457
Enables a few more packages to run with the race detector.

There are still a few data races in serf, and a bunch in the agent package, but I'd like to see if the command and agent/local package will run reliably now with the race detector enabled.

TODO:

  • the "quick hack" in the second commit to fix the serf tag race seems to be causing a few tests to fail. needs a better solution

dnephin added 4 commits June 2, 2021 18:30
And remove unnecessary functions. This is done in preparation for
properly locking when updating tags.
Some global variables are patched to shorter values in these tests. But the goroutines that read
them can outlive the test because nothing waited for them to exit.

This commit adds a Wait() method to the routine manager, so that tests can wait for the goroutines
to exit. This prevents the data race because the 'reset to original value' can happen
after all other goroutines have stopped.
@github-actions github-actions bot added theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics type/ci Relating to continuous integration (CI) tooling for testing or releases labels Jun 2, 2021
@dnephin dnephin added the pr/no-changelog PR does not need a corresponding .changelog entry label Jun 2, 2021
@dnephin dnephin changed the title WIP: fix remaining data rances WIP: fix remaining data races Jun 2, 2021
The LogOutput io.Writer used by TestAgent must allow concurrent reads and writes, and a
bytes.Buffer does not allow this. The bytes.Buffer must be wrapped with a lock to make this safe.
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 8, 2021 23:11 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 8, 2021 23:11 Inactive
The dnsConfig pulled from the atomic.Value is a pointer, so modifying it in place
creates a data race. Use the exported ReloadConfig interface instead.
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 8, 2021 23:26 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 8, 2021 23:26 Inactive
dnephin added 2 commits June 9, 2021 12:14
The test was modifying a pointer to a struct that had been passed to
another goroutine. Instead create a new struct to modify.

```
WARNING: DATA RACE
Write at 0x00c01407c3c0 by goroutine 832:
  github.com/hashicorp/consul/agent.TestServiceManager_PersistService_API()
      /home/daniel/pers/code/consul/agent/service_manager_test.go:446 +0x1d86
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

Previous read at 0x00c01407c3c0 by goroutine 938:
  reflect.typedmemmove()
      /usr/lib/go/src/runtime/mbarrier.go:177 +0x0
  reflect.Value.Set()
      /usr/lib/go/src/reflect/value.go:1569 +0x13b
  github.com/mitchellh/copystructure.(*walker).Primitive()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/copystructure.go:289 +0x190
  github.com/mitchellh/reflectwalk.walkPrimitive()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/reflectwalk.go:252 +0x31b
  github.com/mitchellh/reflectwalk.walk()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/reflectwalk.go:179 +0x24d
  github.com/mitchellh/reflectwalk.walkStruct()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/reflectwalk.go:386 +0x4ec
  github.com/mitchellh/reflectwalk.walk()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/reflectwalk.go:188 +0x656
  github.com/mitchellh/reflectwalk.walkStruct()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/reflectwalk.go:386 +0x4ec
  github.com/mitchellh/reflectwalk.walk()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/reflectwalk.go:188 +0x656
  github.com/mitchellh/reflectwalk.Walk()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/reflectwalk.go:92 +0x164
  github.com/mitchellh/copystructure.Config.Copy()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/copystructure.go:69 +0xe7
  github.com/mitchellh/copystructure.Copy()
      /home/daniel/go/pkg/mod/github.com/mitchellh/[email protected]/copystructure.go:13 +0x84
  github.com/hashicorp/consul/agent.mergeServiceConfig()
      /home/daniel/pers/code/consul/agent/service_manager.go:362 +0x56
  github.com/hashicorp/consul/agent.(*serviceConfigWatch).handleUpdate()
      /home/daniel/pers/code/consul/agent/service_manager.go:279 +0x250
  github.com/hashicorp/consul/agent.(*serviceConfigWatch).runWatch()
      /home/daniel/pers/code/consul/agent/service_manager.go:246 +0x2d4

Goroutine 832 (running) created at:
  testing.(*T).Run()
      /usr/lib/go/src/testing/testing.go:1238 +0x5d7
  testing.runTests.func1()
      /usr/lib/go/src/testing/testing.go:1511 +0xa6
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202
  testing.runTests()
      /usr/lib/go/src/testing/testing.go:1509 +0x612
  testing.(*M).Run()
      /usr/lib/go/src/testing/testing.go:1417 +0x3b3
  main.main()
      _testmain.go:1181 +0x236

Goroutine 938 (running) created at:
  github.com/hashicorp/consul/agent.(*serviceConfigWatch).start()
      /home/daniel/pers/code/consul/agent/service_manager.go:223 +0x4e4
  github.com/hashicorp/consul/agent.(*ServiceManager).AddService()
      /home/daniel/pers/code/consul/agent/service_manager.go:98 +0x344
  github.com/hashicorp/consul/agent.(*Agent).addServiceLocked()
      /home/daniel/pers/code/consul/agent/agent.go:1942 +0x2e4
  github.com/hashicorp/consul/agent.(*Agent).AddService()
      /home/daniel/pers/code/consul/agent/agent.go:1929 +0x337
  github.com/hashicorp/consul/agent.TestServiceManager_PersistService_API()
      /home/daniel/pers/code/consul/agent/service_manager_test.go:400 +0x17c4
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

```
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 9, 2021 16:18 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 9, 2021 16:18 Inactive
dnephin added 2 commits June 9, 2021 17:05
So that the tests in command can use the consul binary
To pick up data race fixes
@vercel vercel bot temporarily deployed to Preview – consul June 9, 2021 22:14 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 9, 2021 22:14 Inactive
dnephin added 2 commits June 10, 2021 11:49
By setting the hash when we create the policy.

```
WARNING: DATA RACE
Read at 0x00c0028b4b10 by goroutine 1182:
  github.com/hashicorp/consul/agent/structs.(*ACLPolicy).SetHash()
      /home/daniel/pers/code/consul/agent/structs/acl.go:701 +0x40d
  github.com/hashicorp/consul/agent/structs.ACLPolicies.resolveWithCache()
      /home/daniel/pers/code/consul/agent/structs/acl.go:779 +0xfe
  github.com/hashicorp/consul/agent/structs.ACLPolicies.Compile()
      /home/daniel/pers/code/consul/agent/structs/acl.go:809 +0xf1
  github.com/hashicorp/consul/agent/consul.(*ACLResolver).ResolveTokenToIdentityAndAuthorizer()
      /home/daniel/pers/code/consul/agent/consul/acl.go:1226 +0x6ef
  github.com/hashicorp/consul/agent/consul.resolveTokenAsync()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:66 +0x5c

Previous write at 0x00c0028b4b10 by goroutine 1509:
  github.com/hashicorp/consul/agent/structs.(*ACLPolicy).SetHash()
      /home/daniel/pers/code/consul/agent/structs/acl.go:730 +0x3a8
  github.com/hashicorp/consul/agent/structs.ACLPolicies.resolveWithCache()
      /home/daniel/pers/code/consul/agent/structs/acl.go:779 +0xfe
  github.com/hashicorp/consul/agent/structs.ACLPolicies.Compile()
      /home/daniel/pers/code/consul/agent/structs/acl.go:809 +0xf1
  github.com/hashicorp/consul/agent/consul.(*ACLResolver).ResolveTokenToIdentityAndAuthorizer()
      /home/daniel/pers/code/consul/agent/consul/acl.go:1226 +0x6ef
  github.com/hashicorp/consul/agent/consul.resolveTokenAsync()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:66 +0x5c

Goroutine 1182 (running) created at:
  github.com/hashicorp/consul/agent/consul.TestACLResolver_Client.func4()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:1669 +0x459
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

Goroutine 1509 (running) created at:
  github.com/hashicorp/consul/agent/consul.TestACLResolver_Client.func4()
      /home/daniel/pers/code/consul/agent/consul/acl_test.go:1668 +0x415
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202
```
This test is super racy, and we already have test coverage for this
functionality in the agent/cache package with TestCacheThrottle.

Instead of spending time rewriting this test, let's remove it.

```
WARNING: DATA RACE
Read at 0x00c01de410fc by goroutine 735:
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
      /home/daniel/pers/code/consul/agent/agent_test.go:1024 +0x9af
  github.com/hashicorp/consul/testrpc.WaitForTestAgent()
      /home/daniel/pers/code/consul/testrpc/wait.go:99 +0x209
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
      /home/daniel/pers/code/consul/agent/agent_test.go:966 +0x1ad
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

Previous write at 0x00c01de410fc by goroutine 605:
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1.2()
      /home/daniel/pers/code/consul/agent/agent_test.go:998 +0xe9

Goroutine 735 (running) created at:
  testing.(*T).Run()
      /usr/lib/go/src/testing/testing.go:1238 +0x5d7
  github.com/hashicorp/consul/agent.TestCacheRateLimit()
      /home/daniel/pers/code/consul/agent/agent_test.go:961 +0x375
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202

Goroutine 605 (finished) created at:
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
      /home/daniel/pers/code/consul/agent/agent_test.go:1022 +0x91e
  github.com/hashicorp/consul/testrpc.WaitForTestAgent()
      /home/daniel/pers/code/consul/testrpc/wait.go:99 +0x209
  github.com/hashicorp/consul/agent.TestCacheRateLimit.func1()
      /home/daniel/pers/code/consul/agent/agent_test.go:966 +0x1ad
  testing.tRunner()
      /usr/lib/go/src/testing/testing.go:1193 +0x202
```
@dnephin dnephin force-pushed the dnephin/fix-serf-tag-data-race branch from 9699aac to 40876bc Compare June 10, 2021 16:40
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging June 10, 2021 16:40 Inactive
@vercel vercel bot temporarily deployed to Preview – consul June 10, 2021 16:40 Inactive
This was referenced Jun 11, 2021
@dnephin
Copy link
Contributor Author

dnephin commented Jan 26, 2022

This should be easier to do now that we've removed legacy ACLs, but the commits will need to be rebased on main.

@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check
All committers have signed the CLA.

@jmurret
Copy link
Member

jmurret commented May 9, 2023

Closing as this is over two years old.

@jmurret jmurret closed this May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/no-changelog PR does not need a corresponding .changelog entry theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics type/ci Relating to continuous integration (CI) tooling for testing or releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

data race: TestLeader_Vault_PrimaryCA_IntermediateRenew modifies a global without a lock
3 participants