Skip to content

Commit

Permalink
feat: support proxied Determined tasks on remote k8s clusters (#9469)
Browse files Browse the repository at this point in the history
- Enables Determined tasks running on remote Kubernetes clusters to be exposed to the Determined master and proxies.
- Facilitates multi-resource manager setups by configuring a Gateway controller in the external Kubernetes cluster.

Co-authored-by: Hamid Zare <[email protected]>
  • Loading branch information
NicholasBlaskey and hamidzr authored Jun 18, 2024
1 parent 44f446c commit 3641bfc
Show file tree
Hide file tree
Showing 38 changed files with 2,352 additions and 260 deletions.
8 changes: 8 additions & 0 deletions .circleci/devcluster/multi-k8s.devcluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,14 @@ stages:
kubeconfig_path: /tmp/defaultrm-kubeconf
determined_master_ip: $DOCKER_LOCALHOST
determined_master_port: 8080
internal_task_gateway:
gateway_name: contour
gateway_namespace: projectcontour
gateway_ip: $GATEWAY_IP
gateway_port_range_start: 49152
gateway_port_range_end: 65535


additional_resource_managers:
- resource_manager:
type: kubernetes
Expand Down
9 changes: 7 additions & 2 deletions .circleci/real_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3273,6 +3273,9 @@ jobs:
collect-det-job-logs:
type: boolean
default: true
k8s-version:
type: string
default: "1.29.5"
machine:
image: <<pipeline.parameters.machine-image>>
resource_class: <<parameters.resource-class>>
Expand Down Expand Up @@ -3304,10 +3307,12 @@ jobs:
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && sudo install minikube-linux-amd64 /usr/local/bin/minikube
- run:
name: Start defaultrm minikube
command: minikube start --profile defaultrm
command: |
K8S_VERSION=<<parameters.k8s-version>> source tools/k8s/launch-minikube-with-gateway.sh defaultrm
echo "export GATEWAY_IP=\"${GATEWAY_IP}\"" >> $BASH_ENV
- run:
name: Start additionalrm minikube
command: minikube start --profile additionalrm
command: minikube start --profile additionalrm --kubernetes-version <<parameters.k8s-version>>

- install-devcluster
- unless:
Expand Down
20 changes: 20 additions & 0 deletions docs/release-notes/gateway.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
:orphan:

**New Features**

- Kubernetes: The :ref:`Internal Task Gateway <internal-task-gateway>` feature enables Determined
tasks running on remote Kubernetes clusters to be exposed to the Determined master and proxies.
This feature facilitates multi-resource manager setups by configuring a Gateway controller in the
external Kubernetes cluster.

.. important::

Enabling this feature exposes Determined tasks to the outside world. It is crucial to implement
appropriate security measures to restrict access to exposed tasks and secure communication
between the external cluster and the main cluster. Recommended measures include:

- Setting up a firewall
- Using a VPN
- Implementing IP whitelisting
- Configuring Kubernetes Network Policies
- Employing other security measures as needed
2 changes: 2 additions & 0 deletions docs/setup-cluster/k8s/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,4 +151,6 @@ for diagnosing any issues that arise during installation.
custom-pod-specs
helm-commands
setup-multiple-resource-managers
internal-task-gateway
controller-reviews
troubleshooting
192 changes: 192 additions & 0 deletions docs/setup-cluster/k8s/controller-reviews.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
.. _controller-reviews:

#############################
Gateway API Implementations
#############################

This document is a survey of the Gateway API controllers that are available and listed by the `SIG
here <https://gateway-api.sigs.k8s.io/implementations/#haproxy-kubernetes-ingress-controller>`_.

Based on the documentation provided by the projects, we've categorized the implementations into
three groups:

- **Supported**: The project has implemented the TCPRoute resource and we have tested it.
- **Support Not Tested**: The project has indicated implementation of the TCPRoute resource but we
have not tested it.
- **Not Yet Supported**: The project either has not implemented the TCPRoute resource or has not
indicated support for it, or we have not found the documentation on it.

*********
Contour
*********

> Contour v1.29.0 implements Gateway API v1.0.0. All Standard channel v1 API group resources
(GatewayClass, Gateway, HTTPRoute, ReferenceGrant), plus most v1alpha2 API group resources
(TLSRoute, TCPRoute, GRPCRoute, ReferenceGrant, and BackendTLSPolicy) are supported.

`Contour Gateway API Guide <https://projectcontour.io/docs/1.29/guides/gateway-api/>`_

***************
Envoy Gateway
***************

`Envoy Gateway TCP Routing <https://gateway.envoyproxy.io/latest/tasks/traffic/tcp-routing/>`_

####################
Support Not Tested
####################

********
Cilium
********

`Cilium Gateway API
<https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/gateway-api/#gs-gateway-api>`_
Based on Envoy.

********************************
HAProxy K8s Ingress Controller
********************************

`HAProxy Kubernetes TCPRoute
<https://www.haproxy.com/documentation/kubernetes-ingress/gateway-api/tcproute/>`_ HAProxy
Enterprise Kubernetes Ingress Controller.

******************
Hashicorp Consul
******************

`Consul TCPRoute Reference
<https://developer.hashicorp.com/consul/docs/k8s/multiport/reference/tcproute>`_

*********
Traefik
*********

`Traefik Kubernetes Gateway <https://doc.traefik.io/traefik/routing/providers/kubernetes-gateway/>`_
`Traefik Gateway Provider <https://doc.traefik.io/traefik/providers/kubernetes-gateway/>`_ >
Enabling The Experimental Kubernetes Gateway Provider > Since this provider is still experimental,
it needs to be activated in the experimental section of the static configuration.

*******************************************
Kong Operator and Kong Ingress Controller
*******************************************

`Kong Gateway API <https://docs.konghq.com/gateway-operator/latest/concepts/gateway-api/#main>`_

******
Kuma
******

Based on Envoy. `Kuma Mesh TCPRoute
<https://kuma.io/docs/2.7.x/policies/meshtcproute/#meshtcproute>`_

*********
Flomesh
*********

`Flomesh Gateway API Compatibility
<https://github.com/flomesh-io/fsm/blob/main/docs/gateway-api-compatibility.md>`_ Partial tcproute
support

*******
Istio
*******

`Istio Gateway API Differences
<https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#differences-from-istio-apis>`_

###################
Not Yet Supported
###################

**************
Acnodal Epic
**************

Supports k8s v0.5

***************
Apache Apisix
***************

`Apisix Ingress Controller <https://apisix.apache.org/docs/ingress-controller/getting-started/>`_
Mainly ingress focused.

*******
Azure
*******

`Azure Application Gateway
<https://learn.microsoft.com/en-us/azure/application-gateway/for-containers/overview>`_ No TCPRoute
support.

************
VMWare Avi
************

Advertises level 4 load balancing but no TCPRoute support yet. Supports k8v1. `VMWare Avi Kubernetes
Guide
<https://docs.vmware.com/en/VMware-Avi-Load-Balancer/1.12/Avi-Kubernetes-Operator-Guide/GUID-84BD68AB-B96F-425C-8323-3A249D6AC8B2.html>`_

***********
Easegress
***********

No TCPRoute support.

*******************************
Emissary Ingress - Ambassador
*******************************

No TCPRoute support. `Ambassador Gateway API
<https://www.getambassador.io/docs/edge-stack/latest/topics/using/gateway-api#gateway-api>`_

***********
Gloo Solo
***********

Based on Envoy but no TCPRoute support.

*****************
HAProxy Ingress
*****************

No TCPRoute support. `HAProxy Ingress Gateway API
<https://haproxy-ingress.github.io/docs/configuration/gateway-api/>`_

*********
Linkerd
*********

No TCPRoute support. `Linkerd HTTPRoute Reference <https://linkerd.io/2.15/reference/httproute/>`_

***********
Litespeed
***********

No TCPRoute support. `Litespeed Kubernetes Gateway
<https://docs.litespeedtech.com/cloud/kubernetes/gateway/>`_

*****************
Nginx GW Fabric
*****************

No TCPRoute support yet. `Nginx Gateway API Compatibility
<https://docs.nginx.com/nginx-gateway-fabric/overview/gateway-api-compatibility/>`_

*******
Ngrok
*******

No TCPRoute support. Only HTTRoutes are stable, the others are in an experimental channel. ngrok
supports edges for HTTP/S, TLS, and TCP. The ngrok Operator currently only supports the HTTPRoute.
TLSRoute and TCPRoute will be added after they become stable. `Ngrok Kubernetes Gateway API
<https://ngrok.com/docs/k8s/?k8s-install=gatewayAPI>`_

**********
WSO2 APK
**********

No TCPRoute support. `WSO2 Kubernetes CRDs
<https://apk.docs.wso2.com/en/latest/catalogs/kubernetes-crds/>`_
20 changes: 19 additions & 1 deletion docs/setup-cluster/k8s/install-on-kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,29 @@ using the `Determined Helm Chart <https://helm.determined.ai/>`__.
When the Determined Helm chart is installed, the following entities will be created:

- Deployment of the Determined master.

- ConfigMap containing configurations for the Determined master.

- LoadBalancer service to make the Determined master accessible. Later in this guide, we describe
how it is possible to replace this with a NodePort service.
- ServiceAcccount which will be used by the Determined master.

- ServiceAccount which will be used by the Determined master.

- Deployment of a Postgres database. Later in this guide, we describe how an external database can
be used instead.

- PersistentVolumeClaim for the Postgres database. Omitted if using an external database.

- Service to allow the Determined master to communicate with the Postgres database. Omitted if
using an external database.

- In case of multiple Kubernetes clusters and in each external-to-master clusters:

- Gateway service to allow north-south access to Determined proxied tasks in external-to-master
clusters.
- Service to expose proxied ports on Determined jobs.
- TCPRoute to attach the gateway service to the proxied ports service.

***************
Prerequisites
***************
Expand Down Expand Up @@ -450,6 +463,11 @@ To set up multiple resource pools for Determined on your Kubernetes cluster:
#. Add the appropriate resource pool name to namespace mappings in the ``resourcePools`` section of
the ``values.yaml`` file in the Helm chart.

.. note::

To enable north-south access to Determined proxied tasks in external-to-master clusters, set up a
gateway as described in the docs :doc:`Internal Task Gateway <internal-task-gateway>`

********************
Install Determined
********************
Expand Down
Loading

0 comments on commit 3641bfc

Please sign in to comment.