Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use consul lock properly #8310

Merged
merged 1 commit into from
Jul 8, 2021
Merged

Use consul lock properly #8310

merged 1 commit into from
Jul 8, 2021

Conversation

5antelope
Copy link
Member

Signed-off-by: crowu [email protected]

Description

Vitess assumes the process gets the lock if the error is nil https://github.com/vitessio/vitess/blob/main/go/vt/topo/locks.go

However, consul will return nil error when the context timeout’d before the process get the lock: https://github.com/hashicorp/consul/blob/master/api/lock.go#L185 - which means anyone will get the lock after the context timeout.

This means after the context timeout, everyone that is waiting for the lock will get one

This PR make sure we will raise error if the lost chan returned is nil, we should not assume we have the lock

Related Issue(s)

#8309

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: crowu <[email protected]>
@5antelope
Copy link
Member Author

Gentle reminder, thanks!

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good.
Is there a way to simulate the failure condition in a unit test?

@5antelope
Copy link
Member Author

@deepthi it is a little bit to simulate the failure in unit test.

iiuc the test we have for topo lock are here: https://github.com/vitessio/vitess/blob/main/go/vt/topo/test/lock.go#L136

we are using same process that tries to grab the lock, the check here will consume all the time in the context: https://github.com/vitessio/vitess/blob/main/go/vt/topo/consultopo/lock.go#L72 before we jump into the line that causes the real issue here https://github.com/vitessio/vitess/blob/main/go/vt/topo/consultopo/lock.go#L88

any suggestions to set this up properly?

@deepthi
Copy link
Member

deepthi commented Jul 8, 2021

@deepthi it is a little bit to simulate the failure in unit test.

iiuc the test we have for topo lock are here: https://github.com/vitessio/vitess/blob/main/go/vt/topo/test/lock.go#L136

we are using same process that tries to grab the lock, the check here will consume all the time in the context: https://github.com/vitessio/vitess/blob/main/go/vt/topo/consultopo/lock.go#L72 before we jump into the line that causes the real issue here https://github.com/vitessio/vitess/blob/main/go/vt/topo/consultopo/lock.go#L88

any suggestions to set this up properly?

Yeah I see the problem. I think it's ok to merge this as-is.

@deepthi deepthi merged commit 9fa6ff4 into vitessio:main Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants