Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python implementation #19

Open
stevenstetzler opened this issue Dec 19, 2019 · 2 comments
Open

Python implementation #19

stevenstetzler opened this issue Dec 19, 2019 · 2 comments

Comments

@stevenstetzler
Copy link
Member

stevenstetzler commented Dec 19, 2019

@mjuric I am playing around with a python version of the automator in the branch python-dev. You can find a short readme here: https://github.com/astronomy-commons/genesis-jupyterhub-automator/tree/python-dev/python

This is essentially an entire re-write of the code and kind of abandons templating of configuration files. It makes everything class based and attempts to make it flexible to add new kubernetes resources to the cluster. It also separates the Kubernetes providers from who is managing the domain name. I imagine this can be used with any DNS that has some command line or web-based API to create an A record. Not much actually works yet, deployment-wise but I think the structure is there.

I am wondering if you think we should keep going in this direction or pull back and look at what we can gain from just re-writing parts of what we have in python.

I am also worried I may be too ambitious with the changes I'm making, and perhaps writing a lot of code that is recreating what other tools might do better.

@mjuric
Copy link
Member

mjuric commented Dec 24, 2019

Thanks @stevenstetzler! Just a quick response that I'll be looking at this over the next day or two -- a few deadlines kept me busy.

@stevenstetzler
Copy link
Member Author

I've been trying to develop this more, but I'm running into a few issues that make me think the class-based approach I am taking might not be worthwhile.

The way I'm thinking about the automator is as a script that takes a set of actions in a certain order. The steps we follow now are:

  1. Create the Kubernetes cluster
  2. Install Helm / Tiller (a Kubernetes deployment)
  3. Install Helm charts: dashboard and jupyterhub with custom values based on command line arguments to configure/automator.
  4. Update DNS routing to point a custom domain to the jupyterhub external address

It makes sense to me to implement each of these actions as individual python classes that expose a validate, create, and delete method (see e.g. https://github.com/astronomy-commons/genesis-jupyterhub-automator/blob/python-dev/python/components.py#L6-L57), as those are the three basic actions we end up taking when creating the Kubernetes cluster, the Helm deployment, or even the DNS routing. We check if what's currently available is valid, try to create it if not, or delete it if requested.

Taking this route lets us abstract the code in what seems like a nice way: you can have automator deploy just be a for-loop over a set of deployment actions calling validate and create on each of them (see https://github.com/astronomy-commons/genesis-jupyterhub-automator/blob/python-dev/python/automator.py#L266-L287), and automator init just sets up a configuration of these objects so they make the resources you want them to create. Additionally, doing this makes resource creation very flexible: for a given action create can contain any Python code you want.

The first issue that I'm running into is figuring out a way to make this easily extensible. How do you store and encode the "deployment plan" in a way that a user can extend / override at will? How do you add a new Helm chart for example? I imagine it would look like e.g. automator init --extra-helm my_helm_chart.CustomChart which would specify a path to a python file my_helm_chart.py that contains a class CustomChart that exposes a validate, create, and delete method (subclassing a class in our module for example). I've tested it out and you can do all sorts of trickery with dynamic importing, so there's no worry about whether the automator can see/import/call this class. But I am worried this will have limits, and it seems kind of cumbersome to create a whole new python class and import it just to add a new helm chart to the deployment...

The second issue I'm running into is balancing transparency with flexibility. At the end of the day, the automator is a wrapper on top of a few bash commands, calls to kubectl and helm, nothing really needs to be that complex. I believe that the user should have a deployment directory filled with bash scripts that they could run themselves instead of using the automator, so that they don't become dependent on the automator to interact with their cluster. This can be done with a class-based approach where each create method call to an action class is a wrapper around a bash script in the deployment directory e.g. Helm().create() might just be a wrapper around hub.dirac.institute/k8s/helm/create.sh and JupyterHub().create() might just be a wrapper around hub.dirac.institute/helm/jupyterhub/create.sh.

I think that taking a templating approach, in the spirit of the original code, might be a better route forward. I imagine this would look like automator init --templates=templates/ where templates/ is a folder containing a set of Jinja2 templates that are read, rendered, and written to the deployment directory with the same structure as templates/. All templates would be rendered with data specified from the arguments (and their defaults) of automator init (like the current configure script). Adding a new Helm chart might be as easy as adding a file templates/helm/my_new_chart.yaml.

I realize this is a lot of detail and a bit in the weeds, but I'm kind of spinning my wheels deciding between "make the code as flexible as possible", which would be the approach I'm currently attempting, and "make the deployment transparent" which would be falling back on just templating some YAML and bash scripts with Jinja.

I suppose a good question to ask is: what's the purpose of the python implementation / what does it bring over the bash version? In my mind, it's to help it be easier to make changes, for exampling adding a new provider -- in my current code this is just changing the class you are using from DOProvider to AWSProvider or an entirely new class based on the value of the --provider argument. I think it's also to make templating more flexible. We can't template in a for-loop to generate YAML in the bash version, but that's really easy when you move to python using either Jinja templates or by just manipulating lists and dictionaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants