Add a Kubernetes Ingress based Proxy implementation #64

yuvipanda · 2017-07-21T08:48:27Z

You need an ingress controller set up to be able to use it. Simplest is the traefik one, which you can install with:

helm install stable/traefik --namespace=kube-system --name=ingress

minrk

Looks sensible so far! I'm not sure about the hardcoded large thread pool, but 👍

minrk · 2017-07-26T14:29:11Z

kubespawner/proxy.py

+        super().__init__(*args, **kwargs)
+
+        # other attributes
+        self.executor = ThreadPoolExecutor(max_workers=24)


Are you sure you want to run this in 24 threads? It's probably appropriate to make this configurable if it should be any value other than one.

When we talk to the docker API, we use one thread, because it can't actually handle all those requests being concurrent anyway, so the key is that it's off the main thread, but the requests aren't really concurrent with each other.

Yeah, I agree - we should make this configurable. I need to remove all the hard coded numbers and make all of 'em configurable.

The kubernetes API master is highly concurrent, and benchmarks often hit it at hundreds/thousands of req/s without issues. We should be ok with a Threadpool I think!

In general we need to think about the various sizes that limit the number of concurrent actions we could be performing. Threadpool sizes are one of those!

So I spent more time thinkign about this and playing.

I think at least for KubeSpawner, these should be a multiple of the maximum number of spawns we can have pending at a time. That'll allow the spawns to go without bottlenecking on waiting on the threadpool. It also allows admins one single knob to tweak - the maximum concurrent spawns.

I've made this 3x concurrent_spawn_limit now

minrk · 2017-07-26T14:49:04Z

kubespawner/spawner.py

        return (pod.status.pod_ip, self.port)

+    def start_polling(self):


I think you can already set poll_interval = 0 to disable polling without overriding the methods.

Ah, ok! I'll do that!

yuvipanda · 2017-10-06T22:23:00Z

I've rebased this, and I think this needs a lot of careful review and testing. Specifically, we need to have solid answers for the following questions:

Will this cause us to miss events? Is just using resourceversion good enough?
If we miss events, will we recover appropriately?

We also need to do a lot of documenting, and decide what's the default mode for z2jh.

yuvipanda · 2017-10-07T02:02:55Z

I've cleaned this up further, so this only adds the kubeproxy implementation. I have removed all the bits about making poll go away and be replaced with pure notifications. I'm not entirely sure if that is a good idea yet, so let's just do this one!

yuvipanda · 2017-10-07T02:25:04Z

Note that this probably needs work to support named servers.

Could be used to add a proxy implementation!

We create an Endpoints, Service & Ingress object for each of the pods. This is necessary, since Ingresses only take Services as backends, and you need a manual Endpoint to point a service to our pod. We can't use label selectors on the service because the JupyterHub abstraction only gives us IPs!

1. If something has already been deleted when we try to delete it, just let it go! 2. If something has already been created when we try to create it, try to delete it first & create it. But only do this once!

This makes stuff slow, but is better than having an ingress that is pointing to an uncreated service in ingress controllers - that might delay the amount of time it takes for them to pick up the ingress - causing too many redirect errors

And change factors to 1.2 instead of 2, and use full Jitter

I need to set aside like, a month for writing tests.

Deleting a service object deletes the endpoint associated with it. This causes problems when you have to 'recreate' service objects by deleting & creating them - you have to go back and recreate the endpoint too. Patch is also probably much faster for everyone involved

Lost in a rebase!

yuvipanda · 2017-10-07T03:40:12Z

Actually no, the proxy doesn't have to know about named servers at all. Yay!

yuvipanda

@minrk @willingc I would appreciate a thorough review of this PR. It's doing things with threading & async stuff, so there are most likely bugs!

minrk

Thanks @yuvipanda! I made a few mostly minor comments throughout.

minrk · 2017-10-12T12:53:02Z

kubespawner/objects.py

-from kubernetes.client.models.v1_persistent_volume_claim_spec import V1PersistentVolumeClaimSpec
-
+from kubernetes.client.models import \
+    V1Pod, V1PodSpec, V1PodSecurityContext, \


General style point - Python generally does multi-line imports like

from kubernetes.client.models import ( V1Pod, V1PodSpec, V1ObjectMeta, V1Local... )

which eliminates the need for line continuations

minrk · 2017-10-12T18:16:09Z

kubespawner/reflector.py

+
+    list_method_name = Unicode(
+        None,
+        allow_none=True,


From the usage below, list_method_name is not allowed to be None, as getattr(obj, None) will always fail. Probably best to leave this without setting None or allow_none.

minrk · 2017-10-12T18:17:20Z

kubespawner/reflector.py

+
+    api_group_name = Unicode(
+        'CoreV1Api',
+        allow_none=False,


No need to specify allow_none=False. In general, I don't think we should use allow_none on Unicode traits unless there's a specific, odd case for empty strings meaning something specific.

Yeah, I think this is one of the cases where code I've written (default to None for default) has been different from code you've written (default to "", not allow None). Since it looks like your style is more prevelant in the Jupyter community & I can't make a compelling case for my style, I'll switch to yours from now!

minrk · 2017-10-12T18:19:40Z

kubespawner/proxy.py

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        # We want 3x concurrent spawn limit as our threadpool. This means that if


Maybe use 1x instead of 3x? It seems okay when hitting capacity to allow a single spawn's requests to serialize. It doesn't seem important that all possible requests are in flight at once.

minrk · 2017-10-12T18:21:13Z

kubespawner/spawner.py

+            self.__class__.executor = ThreadPoolExecutor(
+                max_workers=self.k8s_api_threadpool_workers
+            )
+        self.executor = self.__class__.executor


This extra assignment is not necessary. self.executor resolves to the class attribute already.

minrk · 2017-10-12T18:21:21Z

kubespawner/spawner.py

+                parent=self, namespace=self.namespace,
+                on_failure=on_reflector_failure
+            )
+        self.pod_reflector = self.__class__.pod_reflector


willingc

Thanks @yuvipanda. A few questions and comments.

willingc · 2017-10-12T18:39:49Z

kubespawner/proxy.py

+    namespace = Unicode(
+        config=True,
+        help="""
+        Kubernetes namespace to spawn ingresses for single-user servers in.


Perhaps: Kubernetes namespace, specified by ingress, to spawn single-user servers in.

Each single-user server has an ingress associated with it, and both of these things belong to a namespace. This is just for creating ingress objects that corrsepond to spawned servers. I'm confused by the alternate suggestion and am unsure what it means :)

Is this accurate? What's confusing me is I thought ingresses are essentially rules.

Kubernetes namespace for both single users servers to be spawned and their ingresses.

Everything in kubernetes is an object that represents something. In this case, Ingresses are objects that represent HTTP routing rules.

This only sets the namespace for the ingresses to be created, though. The namespace for singleuser servers is specified in KubeSpawner.namespace, rather than KubeProxy.namespace.

How about:

"Kubernetes namespace to create Ingress objects in.

This corresponds to KubeSpawner.namespace, which sets the namespace single-user servers
are spawned in. For most use cases, these two should be the same
"

Love the suggestion. Thanks for the explanation @yuvipanda.

willingc · 2017-10-12T18:41:27Z

kubespawner/proxy.py

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        # We want 3x concurrent spawn limit as our threadpool. This means that if


I agree with @minrk on 1x. It will be simpler to test initially and can be increased in a future refactor if needed.

willingc · 2017-10-12T18:45:56Z

kubespawner/proxy.py

+                grace_period_seconds=0
+        )
+
+        # This seems like cleanest way to parallelize all three of these while


Question for my own understanding: does order of deletion matter at all (i.e is there any dependence on each other)?

yup! Added a line here!

It does! I've added a comment!

willingc · 2017-10-12T18:48:41Z

kubespawner/proxy.py

+        # FIXME: Validate that this shallow copy *is* thread safe
+        ingress_copy = dict(self.ingress_reflector.ingresses)
+        routes = {
+            i.metadata.annotations['hub.jupyter.org/proxy-routespec']:


What is i? Each route? Original before copy? Define in a comment or docstring. Thanks 👍

I've renamed it to 'ingress' which I think clarifies what they are (ingress objects). lmk if you think an additional doc line is still warranted!

I've renamed it to 'ingress' now, since these are all ingress objects. Let me know if that's good enough!

+1 to renaming. no doc line needed.

willingc · 2017-10-12T18:50:38Z

kubespawner/reflector.py

    """
-    Local up-to-date copy of a set of kubernetes pods
+    Local up-to-date copy of a set of kubernetes resources.


Is this more accurate:

Base class for a local up-to-date copy of a type of kubernetes resource

Yup! updated!

willingc · 2017-10-12T18:53:01Z

kubespawner/reflector.py

@@ -48,7 +81,7 @@ def __init__(self, *args, **kwargs):
            config.load_incluster_config()
        except config.ConfigException:
            config.load_kube_config()
-        self.api = client.CoreV1Api()
+        self.api = getattr(client, self.api_group_name)()

        # FIXME: Protect against malicious labels?


Do we also need to protect against malicious targets (earlier in this PR)?

I don't think so right now, since these can only be specified by users editing jupyterhub_config.py files. I think we should take a pass in a separate PR doing more validation here, but for now I think this is fine.

willingc · 2017-10-12T18:56:24Z

kubespawner/spawner.py

+
+    # We want to have one threadpool executor that is shared across all spawner objects
+    # This is initialized by the first spawner that is created
+    executor = None


Question on naming: Would it be better to name shared_threadpool_executor?

Either way we should move this import comment into the general class docstring.

This just follows Tornado's convention for executors: http://www.tornadoweb.org/en/stable/concurrent.html#tornado.concurrent.run_on_executor

willingc · 2017-10-12T18:57:47Z

kubespawner/spawner.py

@@ -727,7 +747,8 @@ def is_pod_running(self, pod):
        pod must be a dictionary representing a Pod kubernetes API object.
        """
        # FIXME: Validate if this is really the best way
-        is_running = pod.status.phase == 'Running' and \
+        is_running = pod is not None and \


This should be converted to the Pythonic ( ) instead of \

willingc · 2017-10-12T18:59:30Z

kubespawner/utils.py

-    def make_request(url, **kwargs):
-        """
-        Make & return a HTTPRequest object suited to making requests to a Kubernetes cluster
+    # If not, we pick the first limit - hash_length chars from slug & hash the rest.


Put what approach we are using in the docstring if > 63 char

@willingc

Suggested by @willingc + @minrk

We seriealize creates, so 1x is enough

yuvipanda

Thanks for your detailed comments, @minrk and @willingc. I've updated the PR now, I'll go through and see if I've missed any comments!

yuvipanda · 2017-10-16T17:43:16Z

kubespawner/proxy.py

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+
+        # We want 3x concurrent spawn limit as our threadpool. This means that if


Yeah, I agree! Upon further thought, this will also never use more than 1x, since we serialize these creates anyway. I'll update!

yuvipanda · 2017-10-16T17:43:37Z

kubespawner/proxy.py

+                grace_period_seconds=0
+        )
+
+        # This seems like cleanest way to parallelize all three of these while


It does! I've added a comment!

yuvipanda · 2017-10-16T17:44:03Z

kubespawner/proxy.py

+        # FIXME: Validate that this shallow copy *is* thread safe
+        ingress_copy = dict(self.ingress_reflector.ingresses)
+        routes = {
+            i.metadata.annotations['hub.jupyter.org/proxy-routespec']:


I've renamed it to 'ingress' now, since these are all ingress objects. Let me know if that's good enough!

yuvipanda · 2017-10-16T17:44:48Z

kubespawner/reflector.py

@@ -48,7 +81,7 @@ def __init__(self, *args, **kwargs):
            config.load_incluster_config()
        except config.ConfigException:
            config.load_kube_config()
-        self.api = client.CoreV1Api()
+        self.api = getattr(client, self.api_group_name)()

        # FIXME: Protect against malicious labels?


I don't think so right now, since these can only be specified by users editing jupyterhub_config.py files. I think we should take a pass in a separate PR doing more validation here, but for now I think this is fine.

yuvipanda · 2017-10-16T17:51:08Z

kubespawner/reflector.py

+
+    api_group_name = Unicode(
+        'CoreV1Api',
+        allow_none=False,


Yeah, I think this is one of the cases where code I've written (default to None for default) has been different from code you've written (default to "", not allow None). Since it looks like your style is more prevelant in the Jupyter community & I can't make a compelling case for my style, I'll switch to yours from now!

yuvipanda · 2017-10-16T17:53:48Z

kubespawner/reflector.py

    """
-    Local up-to-date copy of a set of kubernetes pods
+    Local up-to-date copy of a set of kubernetes resources.


Yup! updated!

yuvipanda · 2017-10-16T17:54:19Z

kubespawner/utils.py

-    def make_request(url, **kwargs):
-        """
-        Make & return a HTTPRequest object suited to making requests to a Kubernetes cluster
+    # If not, we pick the first limit - hash_length chars from slug & hash the rest.


willingc

Great job with the changes @yuvipanda. I have one additional question re: namespace, but this is good to merge from my perspective. Thanks ☀️

yuvipanda · 2017-10-16T18:23:36Z

I definitely want to run a 5k user stress test before merging this though.

minrk · 2017-11-24T12:37:57Z

@yuvipanda feel free to merge if/when you get that stress test done.

Missed in the merging

yuvipanda · 2017-12-21T23:49:09Z

Am going to merge this now to prevent more merge conflicts, and do a perf test on this after.

jupyterhub/kubespawner#64 got merged!

gsemet · 2018-03-05T10:20:24Z

Hello. Is there a documentation on how to use Traefik as ingress for jupyterhub? We heavily use Traefik on our kubernetes cluster, it rocks a lot ! So if we can somehow "remove" the "proxy" pod and let Traefik do the routing job for started servers, it would be very nice!

willingc · 2018-03-05T16:17:13Z

Hi @gsemet, We think Traefik rocks too :-)

I'm not aware of any documentation that currently exists for using Traefik. @yuvipanda @minrk are you aware of anyone using Traefik in production?

gsemet · 2018-03-05T16:48:19Z

Just to be clear: we already have Traefik in front of our jupyterhub. Just their is still this proxy pod that appears the single point of failure.

yuvipanda changed the title ~~[WIP] Add a Kubernetes Ingress based Proxy implementation~~ Add a Kubernetes Ingress based Proxy implementation Jul 25, 2017

yuvipanda mentioned this pull request Jul 26, 2017

"stop my server" results in hung docker instance if user clicks on "my server" before it is done #65

Closed

minrk reviewed Jul 26, 2017

View reviewed changes

yuvipanda force-pushed the kubeproxy branch from 3fced39 to c91f350 Compare August 2, 2017 05:57

yuvipanda force-pushed the kubeproxy branch from d9c873c to 683ca80 Compare October 6, 2017 21:52

yuvipanda force-pushed the kubeproxy branch from d8eccb1 to 0bb5370 Compare October 7, 2017 02:02

yuvipanda added 18 commits October 6, 2017 19:30

Refactor PodReflector to make most of the logic generic

d62eac1

Could be used to add a proxy implementation!

proxy: Handle deletion & creation conflicts better

855a585

1. If something has already been deleted when we try to delete it, just let it go! 2. If something has already been created when we try to create it, try to delete it first & create it. But only do this once!

Fix ordering of creating service and endpoint

fa5eb90

Remove unused utils for connecting to the k8s API

7be8a49

Use exponential backoff for proxy implementation

735b1e4

Use exponential_backoff for waiting for server start too

716d2b0

Add random jitter to the exponential backoff function

f42e620

Add docstrings to exponential backoff

f106a43

Make exponential backoff algorithm self contained

236823a

And change factors to 1.2 instead of 2, and use full Jitter

Use exponential backoff from JupyterHub

64eeb1d

Don't fail if object names are > 63 char

aeb7359

Fix bad copy/paste

6ce0a59

I need to set aside like, a month for writing tests.

Wait for ingress to show up too before returning

77f5c7d

Add missing import

a76776f

Lost in a rebase!

Tune threadpool size to be based on concurrent_spawn_limit

0f63377

Try to make get_all_routes a bit more threadsafe

2f5731a

yuvipanda commented Oct 7, 2017

View reviewed changes

willingc self-requested a review October 12, 2017 03:31

minrk reviewed Oct 12, 2017

View reviewed changes

willingc reviewed Oct 12, 2017

View reviewed changes

yuvipanda added 6 commits October 16, 2017 10:32

Use more pythonic line continuation

f409a9b

Add some clarifying documentation

4625972

Suggested by @willingc + @minrk

Remove superfluous self assignments

8abd61d

Bump proxy threadpool to 1x concurrent spawn limit, not 3x

58de9f9

We seriealize creates, so 1x is enough

Stop using allow_none on Unicode traits that must be set

5159791

Update docstrings per @willingc

4d09279

yuvipanda commented Oct 16, 2017

View reviewed changes

yuvipanda mentioned this pull request Oct 16, 2017

Add timeout handling to spawner and reflector (WIP) #88

Closed

willingc approved these changes Oct 16, 2017

View reviewed changes

Fix typo

93b025d

yuvipanda mentioned this pull request Nov 6, 2017

Support more than 1 proxy or hub replicas jupyterhub/zero-to-jupyterhub-k8s#254

Closed

minrk approved these changes Nov 24, 2017

View reviewed changes

yuvipanda added 2 commits December 21, 2017 15:24

Merge remote-tracking branch 'official/master' into HEAD

ada164b

Re-add missing Callable

8bba55e

Missed in the merging

yuvipanda merged commit 5c9bf38 into master Dec 21, 2017

yuvipanda deleted the kubeproxy branch December 21, 2017 23:49

yuvipanda added a commit to yuvipanda/zero-to-jupyterhub-k8s that referenced this pull request Dec 21, 2017

Bump kubespawner version

ed5c52d

jupyterhub/kubespawner#64 got merged!

yuvipanda mentioned this pull request Dec 21, 2017

Bump kubespawner version jupyterhub/zero-to-jupyterhub-k8s#365

Merged

yuvipanda added a commit to yuvipanda/zero-to-jupyterhub-k8s that referenced this pull request Dec 23, 2017

Bump kubespawner version

8b3b2ab

jupyterhub/kubespawner#64 got merged!

minrk mentioned this pull request Jan 8, 2018

Rewrite project in Python3 to get rid of NodeJS dependency jupyterhub/configurable-http-proxy#143

Closed

minrk mentioned this pull request May 30, 2018

Support HA (High Availability) jupyterhub/jupyterhub#1932

Open

		return (pod.status.pod_ip, self.port)

		def start_polling(self):

Add a Kubernetes Ingress based Proxy implementation #64

Add a Kubernetes Ingress based Proxy implementation #64

Conversation

yuvipanda commented Jul 21, 2017

minrk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuvipanda commented Oct 6, 2017

yuvipanda commented Oct 7, 2017

yuvipanda commented Oct 7, 2017

yuvipanda commented Oct 7, 2017

yuvipanda left a comment

Choose a reason for hiding this comment

minrk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willingc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuvipanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willingc left a comment

Choose a reason for hiding this comment

yuvipanda commented Oct 16, 2017

minrk commented Nov 24, 2017

yuvipanda commented Dec 21, 2017

gsemet commented Mar 5, 2018

willingc commented Mar 5, 2018

gsemet commented Mar 5, 2018