Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(test): add missing unlock in TestPersistLFDiscardStats #1924

Merged
merged 3 commits into from
May 4, 2023

Conversation

joshua-goldstein
Copy link
Contributor

@joshua-goldstein joshua-goldstein commented Apr 12, 2023

Description

TestPersistLFDiscardStats is currently flaky. E.g., 1, 2, 3 (but this goes back further). To reproduce the problem:

go test -run=TestPersistLFDiscardStats -count=80 -timeout=1500s

From the stack traces in the logs, this looks like a deadlock. The test currently sleeps for 2 seconds to allow for compaction to complete. Assuming compaction does not complete in this time, and we make the call to db.vlog.discardStats.Lock(), seems to be enough to put us into this deadlock. This is supported experimentally; running the above test command with this change no longer produces the deadlock.

There is another sleep in the test to give discardStats a chance to be populated by the newly generated discardStats. We have not seen this issue in CI, but when running this test at very high frequencies (i.e. with -count=160), we start to see this failure as well. This is mitigated by increasing the sleep.

@joshua-goldstein joshua-goldstein marked this pull request as ready for review April 12, 2023 05:40
@@ -531,14 +531,15 @@ func TestPersistLFDiscardStats(t *testing.T) {
persistedMap[fid] = val
})

db.vlog.discardStats.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it odd that we were acquiring the lock before and not calling unlock. Could you elaborate more on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also found that very odd. I have no insight into why it was like this. Probably an oversight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like db variable is re-assigned and hence calling Lock again without Unlock is not a problem. But this is better.

@mangalaman93
Copy link
Contributor

@joshua-goldstein could you explain why would there be a deadlock if we don't put sleep altogether?

@joshua-goldstein joshua-goldstein changed the title fix(test): increase sleep in TestPersistLFDiscardStats fix(test): add missing unlock in TestPersistLFDiscardStats May 4, 2023
@mangalaman93 mangalaman93 merged commit 9afd0a0 into main May 4, 2023
@mangalaman93 mangalaman93 deleted the joshua/badger-deadlock branch May 4, 2023 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants