Python implementation #19

stevenstetzler · 2019-12-19T15:00:04Z

@mjuric I am playing around with a python version of the automator in the branch python-dev. You can find a short readme here: https://github.com/astronomy-commons/genesis-jupyterhub-automator/tree/python-dev/python

This is essentially an entire re-write of the code and kind of abandons templating of configuration files. It makes everything class based and attempts to make it flexible to add new kubernetes resources to the cluster. It also separates the Kubernetes providers from who is managing the domain name. I imagine this can be used with any DNS that has some command line or web-based API to create an A record. Not much actually works yet, deployment-wise but I think the structure is there.

I am wondering if you think we should keep going in this direction or pull back and look at what we can gain from just re-writing parts of what we have in python.

I am also worried I may be too ambitious with the changes I'm making, and perhaps writing a lot of code that is recreating what other tools might do better.

The text was updated successfully, but these errors were encountered:

mjuric · 2019-12-24T00:27:19Z

Thanks @stevenstetzler! Just a quick response that I'll be looking at this over the next day or two -- a few deadlines kept me busy.

stevenstetzler · 2019-12-24T13:19:35Z

I've been trying to develop this more, but I'm running into a few issues that make me think the class-based approach I am taking might not be worthwhile.

The way I'm thinking about the automator is as a script that takes a set of actions in a certain order. The steps we follow now are:

Create the Kubernetes cluster
Install Helm / Tiller (a Kubernetes deployment)
Install Helm charts: dashboard and jupyterhub with custom values based on command line arguments to configure/automator.
Update DNS routing to point a custom domain to the jupyterhub external address

It makes sense to me to implement each of these actions as individual python classes that expose a validate, create, and delete method (see e.g. https://github.com/astronomy-commons/genesis-jupyterhub-automator/blob/python-dev/python/components.py#L6-L57), as those are the three basic actions we end up taking when creating the Kubernetes cluster, the Helm deployment, or even the DNS routing. We check if what's currently available is valid, try to create it if not, or delete it if requested.

Taking this route lets us abstract the code in what seems like a nice way: you can have automator deploy just be a for-loop over a set of deployment actions calling validate and create on each of them (see https://github.com/astronomy-commons/genesis-jupyterhub-automator/blob/python-dev/python/automator.py#L266-L287), and automator init just sets up a configuration of these objects so they make the resources you want them to create. Additionally, doing this makes resource creation very flexible: for a given action create can contain any Python code you want.

The first issue that I'm running into is figuring out a way to make this easily extensible. How do you store and encode the "deployment plan" in a way that a user can extend / override at will? How do you add a new Helm chart for example? I imagine it would look like e.g. automator init --extra-helm my_helm_chart.CustomChart which would specify a path to a python file my_helm_chart.py that contains a class CustomChart that exposes a validate, create, and delete method (subclassing a class in our module for example). I've tested it out and you can do all sorts of trickery with dynamic importing, so there's no worry about whether the automator can see/import/call this class. But I am worried this will have limits, and it seems kind of cumbersome to create a whole new python class and import it just to add a new helm chart to the deployment...

The second issue I'm running into is balancing transparency with flexibility. At the end of the day, the automator is a wrapper on top of a few bash commands, calls to kubectl and helm, nothing really needs to be that complex. I believe that the user should have a deployment directory filled with bash scripts that they could run themselves instead of using the automator, so that they don't become dependent on the automator to interact with their cluster. This can be done with a class-based approach where each create method call to an action class is a wrapper around a bash script in the deployment directory e.g. Helm().create() might just be a wrapper around hub.dirac.institute/k8s/helm/create.sh and JupyterHub().create() might just be a wrapper around hub.dirac.institute/helm/jupyterhub/create.sh.

I think that taking a templating approach, in the spirit of the original code, might be a better route forward. I imagine this would look like automator init --templates=templates/ where templates/ is a folder containing a set of Jinja2 templates that are read, rendered, and written to the deployment directory with the same structure as templates/. All templates would be rendered with data specified from the arguments (and their defaults) of automator init (like the current configure script). Adding a new Helm chart might be as easy as adding a file templates/helm/my_new_chart.yaml.

I realize this is a lot of detail and a bit in the weeds, but I'm kind of spinning my wheels deciding between "make the code as flexible as possible", which would be the approach I'm currently attempting, and "make the deployment transparent" which would be falling back on just templating some YAML and bash scripts with Jinja.

I suppose a good question to ask is: what's the purpose of the python implementation / what does it bring over the bash version? In my mind, it's to help it be easier to make changes, for exampling adding a new provider -- in my current code this is just changing the class you are using from DOProvider to AWSProvider or an entirely new class based on the value of the --provider argument. I think it's also to make templating more flexible. We can't template in a for-loop to generate YAML in the bash version, but that's really easy when you move to python using either Jinja templates or by just manipulating lists and dictionaries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python implementation #19

Python implementation #19

stevenstetzler commented Dec 19, 2019 •

edited

Loading

mjuric commented Dec 24, 2019

stevenstetzler commented Dec 24, 2019

Python implementation #19

Python implementation #19

Comments

stevenstetzler commented Dec 19, 2019 • edited Loading

mjuric commented Dec 24, 2019

stevenstetzler commented Dec 24, 2019

stevenstetzler commented Dec 19, 2019 •

edited

Loading