Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in CreateVolume #165

Closed
bertinatto opened this issue Jan 2, 2019 · 20 comments · Fixed by #982
Closed

Race condition in CreateVolume #165

bertinatto opened this issue Jan 2, 2019 · 20 comments · Fixed by #982
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@bertinatto
Copy link
Member

/kind bug

What happened?

When the CO calls CreateVolume (through the sidecar container), it passes in a volume name. If a second call to create the volume is issued, we return the same volume created in the first request.

Since there's no way to specify the volume name through the EC2 API, we currently set the volume name as a tag. However, this workaround has its own problems.

Imagine this situation:

  • CreateVolume is called so a volume named volume1 is created.
  • EC2's CreateVolumeWithContext is called and it takes about 15 seconds to effectively create the volume.
  • CreateVolume is called again, but the call above is not done yet.
  • The driver will check if a volume with this name already exists, but it won't find any, so it creates the new volume.
  • Now we have 2 volumes tagged with the same name.

During attach, for instance, the driver will notice that there's more than one volume with the same name, so it'll return an error. However, this is just a workaround and a proper solution should ideally be implemented on the EC2 side.

This could be achieved by having a Name filed in ec2.CreateVolumeInput.

How to reproduce it (as minimally and precisely as possible)?

The easiest way is to:

  1. Run the CSI driver
  2. Use the csc tool to create 2 volumes:
$ csc controller new --endpoint tcp://127.0.0.1:10000 (..) && \
  csi controller new --endpoint tcp://127.0.0.1:10000(...)
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 2, 2019
@dkoshkin
Copy link
Contributor

dkoshkin commented Jan 4, 2019

The new InFlight struct from #163 is intended to make all long running operations idempotent, we just wanted to open the PR with a single operation first to get agreement.

@leakingtapan
Copy link
Contributor

And we have an inFlight cache for volume attach idempotency. Could we make these three idempotent solution consistent?

@leakingtapan leakingtapan added this to the GA milestone Feb 13, 2019
@leakingtapan
Copy link
Contributor

We should address this bug in q2

@leakingtapan leakingtapan modified the milestones: 0.4, 0.5 Mar 31, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2019
@leakingtapan
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 8, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 6, 2019
@frittentheke
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 7, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2020
@leakingtapan
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2020
@leakingtapan leakingtapan modified the milestones: 0.5, 0.6 Jan 8, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 7, 2020
@leakingtapan
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 13, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2020
@bertinatto
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 13, 2020
@leakingtapan leakingtapan removed this from the 0.6 milestone Aug 9, 2020
@leakingtapan leakingtapan added this to the 0.7 milestone Aug 9, 2020
huffmanca pushed a commit to huffmanca/aws-ebs-csi-driver that referenced this issue Sep 3, 2020
…-and-size

Bug 1810470: Verify pending volume modifications and size both
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 7, 2020
@bertinatto
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 16, 2020
@AndyXiangLi
Copy link
Contributor

@bertinatto What is the expected behavior if the name is same but other params like VolumeCapabilities is different?

@chrishenzie
Copy link
Contributor

chrishenzie commented Feb 18, 2021

This issue is mitigated for now by using the inFlight struct. Once an EBS API change is available that enables idempotency, this in-flight checking should be removed from the CreateVolume() call.

@ayberk
Copy link
Contributor

ayberk commented Feb 19, 2021

@AndyXiangLi FYI. Can you follow-up internally for the API change?

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 20, 2021
@bertinatto
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2021
@vdhanan vdhanan added the good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. label Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
10 participants