Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[demo] Kadalu CSI support for Nomad #11207

Merged
merged 2 commits into from
Oct 6, 2021
Merged

[demo] Kadalu CSI support for Nomad #11207

merged 2 commits into from
Oct 6, 2021

Conversation

leelavg
Copy link
Contributor

@leelavg leelavg commented Sep 20, 2021

  • Access external gluster cluster from Nomad via Kadalu CSI

Signed-off-by: Leela Venkaiah G [email protected]

- Access external gluster cluster from Nomad via Kadalu CSI

Signed-off-by: Leela Venkaiah G <[email protected]>
@hashicorp-cla
Copy link

hashicorp-cla commented Sep 20, 2021

CLA assistant check
All committers have signed the CLA.

@leelavg
Copy link
Contributor Author

leelavg commented Sep 20, 2021

@leelavg leelavg changed the title [docs] Kadalu CSI support for Nomad [demo] Kadalu CSI support for Nomad Sep 20, 2021
@leelavg
Copy link
Contributor Author

leelavg commented Sep 21, 2021

Please review @tgross if you get time. Thanks.

@jrasell
Copy link
Member

jrasell commented Sep 21, 2021

Hi @leelavg and thanks for raising this PR. Tim no longer works at HashiCorp, however, a member of the team will review this PR when possible.

@apollo13
Copy link
Contributor

@leelavg This looks interesting. How are you handling updates though? Last time I checked glusterfs has a fuse based driver and as such upgrading the node pods will result in all mounts of the node getting lost.

@leelavg
Copy link
Contributor Author

leelavg commented Sep 21, 2021

@apollo13

  1. yep, that is still the case for all fuse based drivers (in k8s too, yet to reach a consensus ig No easy way how to update CSI driver that uses fuse kubernetes/kubernetes#70013)
  2. generally, we need to drain the node for specific pods (if that's possible at all) which are consuming PVs from that specific nodeplugin and upgrade nodeplugin on that node and perform uprades for remaining pods in the same way
  3. basically rolling update of each node but before update drain pods which are using volume from that node
  4. I also took a stab at it in Kadalu ([Obsolete] Remount PVC on nodeplugin reboot kadalu/kadalu#645) but couldn't quite make that work yet (even this implementation also has shortcomings, as in, pods loose mount for a brief period of time)
  5. So, should the demo include I/U as well? In that case it's going to stay here for long 😞
  6. Gist is, if nodeplugin is fuse based and if it reboots/upgrades, pods will definitely loose the mount, no way around it yet (or at-least I'm not aware of any common solution).

PS: Pls don't mind k8s terminology, muscle memory 😅

@apollo13
Copy link
Contributor

Thanks, that makes sense. The main blocker I see here (for me personally, I am not a nomad team member) is that it is currently (afaik) not really possible to upgrade node by node. This is the reason why I export my glusterfs via nfs-ganesha and use NFS to mount it: https://gitlab.com/rocketduck/csi-plugin-nfs -- this works nicely since NFS is a kernel driver and as such I can just upgrade the system job running the node plugins.

@leelavg
Copy link
Contributor Author

leelavg commented Sep 21, 2021

  • Cool, that's a good workaround. It adds an extra layer of dependency though but I get the need 😅
  • Can you provide the config/job files that you are using, if they are public?
  • We are also planning for SMB export (with gluster managed by k8s) and guess NFS support is also doable, however fuse driver offers good latency ig
  • Will ask one of the maintainers to comment about NFS functionality if we were to enhance Nomad support to use gluster deployed and managed internally in containers using Kadalu.
  • Until then, it is what it is, this PR demos another CSI support for Nomad.

Thanks for your inputs.

@apollo13
Copy link
Contributor

This is the config for nomad: https://gitlab.com/rocketduck/csi-plugin-nfs/-/tree/main/nomad and the nfs-ganesha jobfile looks like this:

EXPORT
{
	Export_Id = 1;
	Path = "/nomad";
	Pseudo = "/nomad";
	Access_Type = RW;
	Squash = No_Root_Squash;
	SecType = "sys";

	FSAL {
		Name = "GLUSTER";
		Hostname = localhost;
		Volume = "nomad";
        Up_poll_usec = 10;
        Transport = tcp;
	}
}

It is indeed an extra layer, but one that I need :)

@leelavg
Copy link
Contributor Author

leelavg commented Sep 21, 2021

  • ig, setting network_mode to host (from nfs repo) in this PR might work during reboots (not sure though).
  • well, that's a good scenario to try out in kadalu 😁

Edit: Reworded correctly

@apollo13
Copy link
Contributor

ig, setting network_mode to host (from nfs repo) in attached job files also might work during reboots (not sure though).

It is also required so the mount survives pod updates, otherwise the underlying source ip for the nfs mount is no longer available on the host :D

@lgfa29
Copy link
Contributor

lgfa29 commented Oct 5, 2021

It's been hard for us to test and validate all the different CSI demos we receive, but we love to receive them and we trust you all know how to use these tools best 😄

So we recently changed the README file of the folder to make the expectations more clear and reduce the barrier on getting demos approved:

Contributions are welcome but demos are not supported by the core Nomad development team. Please tag demo authors when filing issues about CSI demos.

@leelavg would you be OK with adding yourself as this demo author? Here's an example. We may tag you in the future to help us solve problems with the demo if someone opens an issue about it.

@apollo13 does the PR looks good to you, or are there any blockers left before we merge this?

@lgfa29 lgfa29 self-assigned this Oct 5, 2021
@leelavg
Copy link
Contributor Author

leelavg commented Oct 6, 2021

@lgfa29 definitely, I've raised this PR for visibility of CSI projects supporting Nomad.

I specifically mention in the README to contact kadalu team for any issues.

Will tag the author as me but I anticipate users to raise CSI issues against Kadalu repo and someone from Kadalu will raise issue against Nomad if needed.

@apollo13
Copy link
Contributor

apollo13 commented Oct 6, 2021

@apollo13 does the PR looks good to you, or are there any blockers left before we merge this?

I am not using glusterfs in such a configuration due to the downsides with using a fuse based driver. That said, from a quick glance it looks solid.

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @leelavg for the contribution and @apollo13 for the feedback!

I made some minor changes to the text formatting and will get this merged. Feel free to open any follow-up PRs if I messed up anything 😄

@lgfa29 lgfa29 merged commit 3eb852f into hashicorp:main Oct 6, 2021
@leelavg leelavg deleted the kadalu branch October 7, 2021 03:53
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants