Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect/proxy: fix a few problems with tests #10146

Merged
merged 2 commits into from
Apr 28, 2021

Conversation

dnephin
Copy link
Contributor

@dnephin dnephin commented Apr 28, 2021

We noticed that TestUpstreamListener would deadlock sometimes when run
with the race detector (example). While debugging this issue I found and fixed the
following problems.

  1. the net.Listener was not being closed properly when Listener.Stop was
    called. This caused the Listener.Serve goroutine to run forever.
    Fixed by storing a reference to net.Listener and closing it properly
    when Listener.Stop is called.
  2. call connWG.Add in the correct place. WaitGroup.Add must be called
    before starting a goroutine, not from inside the goroutine.
  3. Set metrics config EnableRuntimeMetrics to false so that we don't
    start a background goroutine in each test for no reason. There is no
    way to shutdown this goroutine, and it was an added distraction while
    debugging these timeouts.
  4. two tests were calling require.NoError from a goroutine.
    require.NoError calls t.FailNow, which MUST be called from the main
    test goroutine. Instead use t.Errorf, which can be called from other
    goroutines and will still fail the test.
  5. assertCurrentGaugeValue was breaking out of a for loop, which
    would cause the RWMutex.RUnlock to be missed. Fixed by calling
    unlock before break.

The core issue of a deadlock was fixed by hashicorp/go-metrics#124. If that merges soon I will include the vendor update in this PR.

These tests run in under 10ms, t.Parallel provides no value and makes debuging failures
more difficult.
@dnephin dnephin added the theme/testing Testing, and related enhancements label Apr 28, 2021
@dnephin dnephin requested a review from a team April 28, 2021 20:56
@github-actions github-actions bot added theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies and removed theme/testing Testing, and related enhancements labels Apr 28, 2021
@dnephin dnephin added the pr/no-changelog PR does not need a corresponding .changelog entry label Apr 28, 2021
connect/proxy/listener.go Outdated Show resolved Hide resolved
We noticed that TestUpstreamListener would deadlock sometimes when run
with the race detector. While debugging this issue I found and fixed the
following problems.

1. the net.Listener was not being closed properly when Listener.Stop was
   called. This caused the Listener.Serve goroutine to run forever.
   Fixed by storing a reference to net.Listener and closing it properly
   when Listener.Stop is called.
2. call connWG.Add in the correct place. WaitGroup.Add must be called
   before starting a goroutine, not from inside the goroutine.
3. Set metrics config EnableRuntimeMetrics to `false` so that we don't
   start a background goroutine in each test for no reason. There is no
   way to shutdown this goroutine, and it was an added distraction while
   debugging these timeouts.
5. two tests were calling require.NoError from a goroutine.
   require.NoError calls t.FailNow, which MUST be called from the main
   test goroutine. Instead use t.Errorf, which can be called from other
   goroutines and will still fail the test.
6. `assertCurrentGaugeValue` wass breaking out of a for loop, which
   would cause the `RWMutex.RUnlock` to be missed. Fixed by calling
   unlock before `break`.

The core issue of a deadlock  was fixed by hashicorp/go-metrics#124.
@dnephin dnephin force-pushed the dnephin/connect-proxy-test-deadlock branch from d8b7c75 to d18a03b Compare April 28, 2021 21:22
@vercel vercel bot temporarily deployed to Preview – consul April 28, 2021 21:22 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging April 28, 2021 21:22 Inactive
Copy link
Member

@rboyer rboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnephin dnephin merged commit e19aef4 into master Apr 28, 2021
@dnephin dnephin deleted the dnephin/connect-proxy-test-deadlock branch April 28, 2021 21:53
@hc-github-team-consul-core
Copy link
Collaborator

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/358670.

mikemorris pushed a commit that referenced this pull request May 5, 2021
…eadlock

connect/proxy: fix a few problems with tests
mikemorris pushed a commit that referenced this pull request May 5, 2021
…eadlock

connect/proxy: fix a few problems with tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr/no-changelog PR does not need a corresponding .changelog entry theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants