Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return context timeout error for deallocate #119

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pperiyasamy
Copy link

return context timeout error back to cni/kubelet in case of pod delete, this would allow kubelet to retry for deallocate and avoids stale ip pool entries in the datastore after pod delete.

context timeout errors (due to default 10s request timeout) are frequently seen in highly scaled setup.

Signed-off-by: Periyasamy Palanisamy [email protected]

Signed-off-by: Periyasamy Palanisamy <[email protected]>
@pperiyasamy
Copy link
Author

/cc @JanScheurich

Signed-off-by: Periyasamy Palanisamy <[email protected]>
@s1061123
Copy link
Member

s1061123 commented Jul 7, 2021

In https://github.com/containernetworking/cni/blob/spec-v0.4.0/SPEC.md, there is following section and your change violates the CNI specification because your code may return error in DEL action.

Plugins should generally complete a DEL action without error even if some resources are missing.

In addition, some case, upper CNI plugin expects that only one DEL command is invoked (without error), so the code might not be effective to various CNI plugins.

@JanScheurich
Copy link

Not returning an error means whereabouts accumulates stale IP addresses in its IP pools. It would require an off-line audit function to find and delete such stale IPs.
I read the CNI specification a bit differently: The CNI should not return an error at DEL if (part of) the resources to be deleted are missing already. But in our case, the actual deletion fails (typically due to too many collisions on the storage back-end).

@pperiyasamy
Copy link
Author

Plugins should generally complete a DEL action without error even if some resources are missing.

@s1061123 thanks for this cni spec pointer! good info. just found an issue with ovs-cni due to missing net ns directory. it needs fixing there.
but in this case, as jan mentioned, whereabouts needs to propagate context timeout error back to cni/kubelet when ip entry is not removed from datastore. otherwise having error return type for cmdDel doesn't make any sense at all.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 1014889463

  • 14 of 46 (30.43%) changed or added relevant lines in 3 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.1%) to 36.963%

Changes Missing Coverage Covered Lines Changed/Added Lines %
cmd/whereabouts.go 1 5 20.0%
pkg/storage/storage.go 0 28 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/storage/storage.go 3 0%
Totals Coverage Status
Change from base Build 986532013: -0.1%
Covered Lines: 353
Relevant Lines: 955

💛 - Coveralls

@dougbtv
Copy link
Member

dougbtv commented Jun 23, 2022

Any chance you could rebase and see if this is still an issue, thanks!

nicklesimba pushed a commit to nicklesimba/whereabouts that referenced this pull request Sep 11, 2023
…m-sync-2023-03-22

Upstream sync 2023 03 29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants