Skip to content

Commit

Permalink
add helm, master_config and release note changes
Browse files Browse the repository at this point in the history
revert PR #9144 & #9145
  • Loading branch information
carolinaecalderon committed Apr 17, 2024
1 parent 44f9678 commit 8ee794f
Show file tree
Hide file tree
Showing 4 changed files with 122 additions and 1 deletion.
22 changes: 22 additions & 0 deletions docs/reference/deploy/helm-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -256,4 +256,26 @@
namespaces. Maps to the ``resource_pools`` section from the :ref:`master configuration
<master-config-reference>`.

- ``additional_resource_managers``: This section includes additional resource managers for
launching jobs across multiple Kubernetes clusters. Maps to :ref:`additional_resource_managers
<master-config-additional-resource-managers>` in the master configuration. An example
configuration is provided in the ``values.yaml`` file.

- ``resource_manager``: Describes the configuration settings for the resource manager. Maps to
:ref:`resource_manager <master-config-resource-manager>` in the master configuration.

- ``kubeconfig_secret_name``: Specifies the name of the secret containing the kubeconfig for
the resource manager. This kubeconfig is used to connect to the Kubernetes cluster and
launch tasks. Note that some kubeconfigs may require additional adjustments or
modifications. For example some kubeconfigs reference file paths, which may need to be
bind-mounted into the container or have their data paths encoded into the kubeconfig. Other
kubeconfigs, like those for GKE, may require installing plugins into the Determined master
container and binding certain credential files. (*Required*)

- ``kubeconfig_secret_value``: The name of the secret that contains the resource manager's
kubeconfig. (*Required*)

- ``resource_pools``: The resource pool configuration. See :ref:`resource_pools
<cluster-resource-pools>` for available configuration options.

.. include:: ../../_shared/note-dtrain-learn-more.txt
68 changes: 68 additions & 0 deletions docs/reference/deploy/master-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -241,6 +241,30 @@ otherwise active (as defined by the ``notebook_idle_type`` option in the :ref:`t

The resource manager used to acquire resources. Defaults to ``agent``.

For Kubernetes installations, if you define additional resource managers, the resource manager
specified under the primary resource_manager key here is considered the default.

``name``
========

Optional. Specifies the resource manager's name. Defaults to ``default`` if not specified. For
Kubernetes installations with additional resource managers, ensure unique names for all resource
managers in the cluster.

``metadata``
============

Optional. Stores additional information about the resource manager in a yaml map, such as the zone,
region, or location.

For example:

.. code:: yaml
metadata:
region: us-west1
zone: us-west1-a
``type: agent``
===============

Expand Down Expand Up @@ -1172,6 +1196,50 @@ those partitions/queues.
the HPC partition named ``defq_GPU`` with the ``gpu_type`` property set, and Slurm constraint
associated with the feature ``XL675d`` used to identify the model type of the compute node.

.. _master-config-additional-resource-managers:

**********************************
``additional_resource_managers``
**********************************

Cluster administrators for Kubernetes installations can define additional resource managers for
connecting the Determined master service with remote clusters. Support for notebooks and other
workloads that require proxying on remote clusters is under development.

To define a single resource manager or designate the default resource manager, do not define it
under ``additional_resource_manager``; instead, use the primary ``resource_manager`` key.

Resource manager names must be unique among all defined resource managers.

Any additional resource managers must have at least one resource pool assigned to them. These
resource pool names must be defined and must be distinct among all resource pools across all
resource managers. You define resource pools for any additional resource managers within their
respective elements in the resource manager list (not at the root level).

For example, to define three resource managers (one default, two additional):

.. code:: yaml
resource_manager: # the default resource manager
resource_pool: # resource pools for the resource manager defined above.
pool_name: "foo"
additional_resource_managers:
- resource_manager:
type: kubernetes # required, this feature is only for Kubernetes.
name: "bar" # required
resource_pools:
pool_name: "abc"
- resource_manager:
type: kubernetes # required, this feature is only for Kubernetes.
name: "baz" # required
resource_pools:
pool_name: "def"
``resource_manager``
====================

Expand Down
20 changes: 19 additions & 1 deletion docs/release-notes/multirm-for-k8s.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
:orphan:

ignore
**New Features**

- Kubernetes: Add ability to set up the Determined master service on one Kubernetes cluster and
manage workloads across different Kubernetes clusters. Additional non-default resource managers
and resource pools are configured under the master configuration options
``additional_resource_managers`` and ``resource_pools`` (additional resource managers are
required to have at least one resource pool defined). Additional resource managers and their
resource pools must have unique names. For more information, visit :ref:master configuration
<master-config-reference>. Support for notebooks and other workloads that require proxying is
under development.

- WebUI: Add ability to view resource manager name for resource pools.

- API/CLI/WebUI: Route any requests to resource pools not defined in the master configuration to
the default resource manager, not any additional resource manager, if defined.

- Configuration: Add a ``name`` and ``metadata`` field to resource manager section in the master
configuration. Add an ``additional_resource_managers`` section that follows the
``resource_manager`` and ``resource_pool`` configuration pattern.
13 changes: 13 additions & 0 deletions helm/charts/determined/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -347,3 +347,16 @@ resourcePools:

## Configure the initial user password for the cluster
# initialUserPassword

# additional_resource_managers:
# - resource_manager:
# type: kubernetes
# max_slots_per_pod: 1
# name: carolina-multirm-1
# namespace: default
# kubeconfig_secret_name: additionalrm
# kubeconfig_secret_value: config
# determined_master_ip: 10.11.12.13
# determined_master_port: 8080
# resource_pools:
# - pool_name: additional_pool

0 comments on commit 8ee794f

Please sign in to comment.