-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draining a node confuses CSI plugin node health #9810
Comments
Hi @apollo13! In 1.0.0 we shipped some fixes for the plugin counts in the UI and API, but it looks like we missed a case. It's interesting that the count fixes itself when the node plugin comes back... that might be a clue as to where the problem is. Thanks for opening this issue. |
Hi @tgross, I began looking through my plugin and noted something interesting when the nomad UI calls
Why is the toplevel |
I just realized that simply stopping a single node allocation has the same effect. @tgross did you ever get around reproducing this? I can reliably trigger this. |
I am experiencing the same problem with ceph-csi module. Stopping and starting an allocation various times causes the problem to appear |
@tgross As promised here is a ping ;) Is there any chance that we can work on fixing that? What do you need from me to move this forward? |
Hi @apollo13! I don't think I need anything else from you, just a little time to get ramped back up and dig into it. Thanks for the ping... I'll assign myself so that I don't lose this. |
Cross linking to #11758 |
Leaving a note that this issue appears to be related to but may have subtly different code paths from #11784. Note that I don't think this is actually related to #11758 or #10073 (or at least not by itself). Those issues are primarily about count state on the server, whereas here and in #11784 we have evidence of client-side reconnection issues with the CSI plugins. I'll be looking into that as well as the count state as separate patch sets. |
I closely followed your fixes and am not sure yet either (massive kudos for those!). I'll deploy 1.2.5 and see if I can test it manually. In the meantime I will leave this open as a reminder for myself; don't waste time on it unless you have an hunch that this is still unsolved. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.0.1 (c9c68aa)
Operating system and Environment details
Debian stable
Issue
When draining a node the CSI plugin node health display gets confused:
Reproduction steps
The text was updated successfully, but these errors were encountered: