-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Kubernetes Ingress based Proxy implementation #64
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks sensible so far! I'm not sure about the hardcoded large thread pool, but 👍
kubespawner/proxy.py
Outdated
super().__init__(*args, **kwargs) | ||
|
||
# other attributes | ||
self.executor = ThreadPoolExecutor(max_workers=24) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure you want to run this in 24 threads? It's probably appropriate to make this configurable if it should be any value other than one.
When we talk to the docker API, we use one thread, because it can't actually handle all those requests being concurrent anyway, so the key is that it's off the main thread, but the requests aren't really concurrent with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree - we should make this configurable. I need to remove all the hard coded numbers and make all of 'em configurable.
The kubernetes API master is highly concurrent, and benchmarks often hit it at hundreds/thousands of req/s without issues. We should be ok with a Threadpool I think!
In general we need to think about the various sizes that limit the number of concurrent actions we could be performing. Threadpool sizes are one of those!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I spent more time thinkign about this and playing.
I think at least for KubeSpawner, these should be a multiple of the maximum number of spawns we can have pending at a time. That'll allow the spawns to go without bottlenecking on waiting on the threadpool. It also allows admins one single knob to tweak - the maximum concurrent spawns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this 3x concurrent_spawn_limit now
kubespawner/spawner.py
Outdated
return (pod.status.pod_ip, self.port) | ||
|
||
def start_polling(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can already set poll_interval = 0
to disable polling without overriding the methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok! I'll do that!
I've rebased this, and I think this needs a lot of careful review and testing. Specifically, we need to have solid answers for the following questions:
We also need to do a lot of documenting, and decide what's the default mode for z2jh. |
I've cleaned this up further, so this only adds the kubeproxy implementation. I have removed all the bits about making poll go away and be replaced with pure notifications. I'm not entirely sure if that is a good idea yet, so let's just do this one! |
Note that this probably needs work to support named servers. |
Could be used to add a proxy implementation!
We create an Endpoints, Service & Ingress object for each of the pods. This is necessary, since Ingresses only take Services as backends, and you need a manual Endpoint to point a service to our pod. We can't use label selectors on the service because the JupyterHub abstraction only gives us IPs!
1. If something has already been deleted when we try to delete it, just let it go! 2. If something has already been created when we try to create it, try to delete it first & create it. But only do this once!
This makes stuff slow, but is better than having an ingress that is pointing to an uncreated service in ingress controllers - that might delay the amount of time it takes for them to pick up the ingress - causing too many redirect errors
And change factors to 1.2 instead of 2, and use full Jitter
I need to set aside like, a month for writing tests.
Deleting a service object deletes the endpoint associated with it. This causes problems when you have to 'recreate' service objects by deleting & creating them - you have to go back and recreate the endpoint too. Patch is also probably much faster for everyone involved
Lost in a rebase!
Actually no, the proxy doesn't have to know about named servers at all. Yay! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yuvipanda! I made a few mostly minor comments throughout.
kubespawner/objects.py
Outdated
from kubernetes.client.models.v1_persistent_volume_claim_spec import V1PersistentVolumeClaimSpec | ||
|
||
from kubernetes.client.models import \ | ||
V1Pod, V1PodSpec, V1PodSecurityContext, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General style point - Python generally does multi-line imports like
from kubernetes.client.models import (
V1Pod, V1PodSpec,
V1ObjectMeta,
V1Local...
)
which eliminates the need for line continuations
kubespawner/reflector.py
Outdated
|
||
list_method_name = Unicode( | ||
None, | ||
allow_none=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the usage below, list_method_name
is not allowed to be None, as getattr(obj, None)
will always fail. Probably best to leave this without setting None or allow_none.
kubespawner/reflector.py
Outdated
|
||
api_group_name = Unicode( | ||
'CoreV1Api', | ||
allow_none=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to specify allow_none=False
. In general, I don't think we should use allow_none
on Unicode traits unless there's a specific, odd case for empty strings meaning something specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this is one of the cases where code I've written (default to None for default) has been different from code you've written (default to "", not allow None). Since it looks like your style is more prevelant in the Jupyter community & I can't make a compelling case for my style, I'll switch to yours from now!
kubespawner/proxy.py
Outdated
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
|
||
# We want 3x concurrent spawn limit as our threadpool. This means that if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use 1x instead of 3x? It seems okay when hitting capacity to allow a single spawn's requests to serialize. It doesn't seem important that all possible requests are in flight at once.
kubespawner/spawner.py
Outdated
self.__class__.executor = ThreadPoolExecutor( | ||
max_workers=self.k8s_api_threadpool_workers | ||
) | ||
self.executor = self.__class__.executor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra assignment is not necessary. self.executor
resolves to the class attribute already.
kubespawner/spawner.py
Outdated
parent=self, namespace=self.namespace, | ||
on_failure=on_reflector_failure | ||
) | ||
self.pod_reflector = self.__class__.pod_reflector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yuvipanda. A few questions and comments.
namespace = Unicode( | ||
config=True, | ||
help=""" | ||
Kubernetes namespace to spawn ingresses for single-user servers in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps: Kubernetes namespace, specified by ingress, to spawn single-user servers in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each single-user server has an ingress associated with it, and both of these things belong to a namespace. This is just for creating ingress objects that corrsepond to spawned servers. I'm confused by the alternate suggestion and am unsure what it means :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accurate? What's confusing me is I thought ingresses are essentially rules.
Kubernetes namespace for both single users servers to be spawned and their ingresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything in kubernetes is an object that represents something. In this case, Ingresses are objects that represent HTTP routing rules.
This only sets the namespace for the ingresses to be created, though. The namespace for singleuser servers is specified in KubeSpawner.namespace, rather than KubeProxy.namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
"Kubernetes namespace to create Ingress objects in.
This corresponds to KubeSpawner.namespace, which sets the namespace single-user servers
are spawned in. For most use cases, these two should be the same
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the suggestion. Thanks for the explanation @yuvipanda.
kubespawner/proxy.py
Outdated
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
|
||
# We want 3x concurrent spawn limit as our threadpool. This means that if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @minrk on 1x. It will be simpler to test initially and can be increased in a future refactor if needed.
grace_period_seconds=0 | ||
) | ||
|
||
# This seems like cleanest way to parallelize all three of these while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for my own understanding: does order of deletion matter at all (i.e is there any dependence on each other)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup! Added a line here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does! I've added a comment!
kubespawner/proxy.py
Outdated
# FIXME: Validate that this shallow copy *is* thread safe | ||
ingress_copy = dict(self.ingress_reflector.ingresses) | ||
routes = { | ||
i.metadata.annotations['hub.jupyter.org/proxy-routespec']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is i
? Each route? Original before copy? Define in a comment or docstring. Thanks 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed it to 'ingress' which I think clarifies what they are (ingress objects). lmk if you think an additional doc line is still warranted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed it to 'ingress' now, since these are all ingress objects. Let me know if that's good enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to renaming. no doc line needed.
kubespawner/reflector.py
Outdated
""" | ||
Local up-to-date copy of a set of kubernetes pods | ||
Local up-to-date copy of a set of kubernetes resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this more accurate:
Base class for a local up-to-date copy of a type of kubernetes resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! updated!
@@ -48,7 +81,7 @@ def __init__(self, *args, **kwargs): | |||
config.load_incluster_config() | |||
except config.ConfigException: | |||
config.load_kube_config() | |||
self.api = client.CoreV1Api() | |||
self.api = getattr(client, self.api_group_name)() | |||
|
|||
# FIXME: Protect against malicious labels? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to protect against malicious targets (earlier in this PR)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so right now, since these can only be specified by users editing jupyterhub_config.py files. I think we should take a pass in a separate PR doing more validation here, but for now I think this is fine.
|
||
# We want to have one threadpool executor that is shared across all spawner objects | ||
# This is initialized by the first spawner that is created | ||
executor = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question on naming: Would it be better to name shared_threadpool_executor
?
Either way we should move this import comment into the general class docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just follows Tornado's convention for executors: http://www.tornadoweb.org/en/stable/concurrent.html#tornado.concurrent.run_on_executor
kubespawner/spawner.py
Outdated
@@ -727,7 +747,8 @@ def is_pod_running(self, pod): | |||
pod must be a dictionary representing a Pod kubernetes API object. | |||
""" | |||
# FIXME: Validate if this is really the best way | |||
is_running = pod.status.phase == 'Running' and \ | |||
is_running = pod is not None and \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be converted to the Pythonic ( )
instead of \
kubespawner/utils.py
Outdated
def make_request(url, **kwargs): | ||
""" | ||
Make & return a HTTPRequest object suited to making requests to a Kubernetes cluster | ||
# If not, we pick the first limit - hash_length chars from slug & hash the rest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put what approach we are using in the docstring if > 63 char
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
We seriealize creates, so 1x is enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubespawner/proxy.py
Outdated
def __init__(self, *args, **kwargs): | ||
super().__init__(*args, **kwargs) | ||
|
||
# We want 3x concurrent spawn limit as our threadpool. This means that if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree! Upon further thought, this will also never use more than 1x, since we serialize these creates anyway. I'll update!
grace_period_seconds=0 | ||
) | ||
|
||
# This seems like cleanest way to parallelize all three of these while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does! I've added a comment!
kubespawner/proxy.py
Outdated
# FIXME: Validate that this shallow copy *is* thread safe | ||
ingress_copy = dict(self.ingress_reflector.ingresses) | ||
routes = { | ||
i.metadata.annotations['hub.jupyter.org/proxy-routespec']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've renamed it to 'ingress' now, since these are all ingress objects. Let me know if that's good enough!
@@ -48,7 +81,7 @@ def __init__(self, *args, **kwargs): | |||
config.load_incluster_config() | |||
except config.ConfigException: | |||
config.load_kube_config() | |||
self.api = client.CoreV1Api() | |||
self.api = getattr(client, self.api_group_name)() | |||
|
|||
# FIXME: Protect against malicious labels? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so right now, since these can only be specified by users editing jupyterhub_config.py files. I think we should take a pass in a separate PR doing more validation here, but for now I think this is fine.
kubespawner/reflector.py
Outdated
|
||
api_group_name = Unicode( | ||
'CoreV1Api', | ||
allow_none=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this is one of the cases where code I've written (default to None for default) has been different from code you've written (default to "", not allow None). Since it looks like your style is more prevelant in the Jupyter community & I can't make a compelling case for my style, I'll switch to yours from now!
kubespawner/reflector.py
Outdated
""" | ||
Local up-to-date copy of a set of kubernetes pods | ||
Local up-to-date copy of a set of kubernetes resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! updated!
kubespawner/utils.py
Outdated
def make_request(url, **kwargs): | ||
""" | ||
Make & return a HTTPRequest object suited to making requests to a Kubernetes cluster | ||
# If not, we pick the first limit - hash_length chars from slug & hash the rest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job with the changes @yuvipanda. I have one additional question re: namespace, but this is good to merge from my perspective. Thanks ☀️
I definitely want to run a 5k user stress test before merging this though. |
@yuvipanda feel free to merge if/when you get that stress test done. |
Missed in the merging
Am going to merge this now to prevent more merge conflicts, and do a perf test on this after. |
jupyterhub/kubespawner#64 got merged!
jupyterhub/kubespawner#64 got merged!
Hello. Is there a documentation on how to use Traefik as ingress for jupyterhub? We heavily use Traefik on our kubernetes cluster, it rocks a lot ! So if we can somehow "remove" the "proxy" pod and let Traefik do the routing job for started servers, it would be very nice! |
Hi @gsemet, We think Traefik rocks too :-) I'm not aware of any documentation that currently exists for using Traefik. @yuvipanda @minrk are you aware of anyone using Traefik in production? |
Just to be clear: we already have Traefik in front of our jupyterhub. Just their is still this proxy pod that appears the single point of failure. |
You need an ingress controller set up to be able to use it. Simplest is the traefik one, which you can install with: