Skip to content

Commit

Permalink
docs: add release notes for 0.35.0 (#9786)
Browse files Browse the repository at this point in the history
Co-authored-by: Tara <[email protected]>
Co-authored-by: Saloni Gupta <[email protected]>
Co-authored-by: Amanda Vialva <[email protected]>
  • Loading branch information
4 people authored Aug 8, 2024
1 parent 7f03a87 commit e770ee5
Show file tree
Hide file tree
Showing 25 changed files with 190 additions and 246 deletions.
191 changes: 190 additions & 1 deletion docs/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,195 @@
Release Notes
###############

**************
Version 0.35
**************

Version 0.35.0
==============

**Release Date:** August 08, 2024

**Breaking Changes**

- Master Configuration: Replace ``resource_manager.name`` with ``resource_manager.cluster_name``
for better clarity and to support multiple resource managers.

- Resource managers operate relative to the specific cluster providing resources for Determined
tasks, so changing the cluster will affect the resource manager's responses.

- The ``cluster_name`` must be unique for all resource managers when deploying multiple resource
managers in Determined.

- When upgrading, specify ``resourceManager.clusterName`` in your ``values.yaml`` to override
``resource_manager.name`` and/or remove the ``name`` field from your ``resource_manager``
config altogether.

- For additional resource managers, you must change ``additional_resource_manager[i].name`` to
``additional_resource_manager[i].cluster_name`` in your ``values.yaml``.

- Master Configuration: Replace ``resource_manager.namespace`` with
``resource_manager.default_namespace``.

- The namespace field in the Kubernetes Resource Manager configuration is no longer supported
and is replaced by ``default_namespace``.

- This field serves as the default namespace for deploying namespaced resources when the
workspace associated with a workload is not bound to a specific namespace.

- If unset, the workloads will be sent to the release namespace during determined helm installs
or upgrades and will be sent to the default Kubernetes namespace, "default", during non-helm
determined deployments.

- Tasks: The :ref:`historical usage <historical-cluster-usage-data>` CSV file has been updated. The
header row for slot-hours is now named ``slot_hours`` instead of ``gpu_hours`` to accurately
reflect the allocation time for resource pools including those without GPUs. In addition, a new
column, ``resource_pool``, has been added to provide the resource pool for each allocation.

- Cluster: The ``kubernetes_namespace`` field in the resource pool configuration is no longer
supported. Users can now submit workloads to specific namespaces by binding workspaces to
namespaces using the CLI, WebUI, or API.

- Cluster: The ``resources.agent_label`` task option and the ``label`` option in the agent config
have been removed. Beginning with 0.20.0 release, these options have been ignored. Please remove
any remaining references from configuration files and use ``resource_pool`` instead.

**New Features**

- WebUI/CLI/API: Allow admins to bind namespaces to workspaces and manage resource quotas for
auto-created namespaces directly.

- WebUI: Add a "Namespace Bindings" section to the Create and Edit Workspace modals.

- Users can input a namespace for a Kubernetes cluster. If no namespace is specified, the
workspace will be bound to the ``resource_manager.default_namespace`` field in the master
configuration YAML or the "default" Kubernetes namespace.

- In the enterprise edition, users can auto-create namespaces and set resource quotas,
limiting GPU requests for that workspace. The Edit Workspace modal displays the lowest GPU
limit resource quota within the bound namespace.

- Once saved, all workloads in the workspace will be sent to the bound namespace. Changing
the binding will affect future workloads, while in-progress workloads remain in their
original namespace.

- For help with workspace-namespace bindings, visit :ref:`Manage Workspace-Namespace Bindings
<k8s-resource-caps>`.

- CLI: Add new commands for creating and managing workspace namespace bindings.

- Allow creating namespace bindings during workspace creation with ``det w create
<workspace-id> --namespace <namespace-name>`` or later with ``det w bindings set
<workspace-id> --namespace <namespace-name>``.

- In the enterprise edition, users can use additional arguments ``--auto-create-namespace``
and ``--auto-create-namespace-all-clusters`` to bind workspaces to auto-created namespaces.
Users can set resource quotas during workspace creation with ``det w create
<workspace-name> --cluster-name <cluster-name> --auto-create-namespace --resource-quota
<resource-quota>``, or later with ``det w resource-quota set <workspace-id> <quota>
--cluster-name <cluster-name>`` if their workspace is bound to an auto-created namespace.

- Add a command to delete namespace bindings with ``det w bindings delete <workspace-id>
--cluster-name <cluster-name>``.

- Add a command to list bindings for a workspace with ``det w bindings list
<workspace-name>``.

- The ``--cluster-name`` field is required only for MultiRM setups when
``--auto-create-namespace-all-clusters`` is omitted.

- API: Add new endpoints for creating and managing workspace namespace bindings.

- Add POST and DELETE endpoints to ``/api/v1/workspaces/{workspace_id}/namespace-bindings``
for setting and deleting workspace namespace bindings.
- Add a GET endpoint ``/api/v1/workspaces/{id}/list-namespace-bindings`` to list namespace
bindings for a workspace.
- Add a POST endpoint ``/api/v1/workspaces/{id}/set-resource-quota`` to set resource quotas
on workspaces bound to auto-created namespaces.
- Add a GET endpoint ``/api/v1/workspaces/{id}/get-k8s-resource-quotas`` to retrieve enforced
Kubernetes GPU resource quotas for workspace bound namespaces.

- WebUI: Enable users to add or remove hyperparameters during hyperparameter searches.

- WebUI: Experiments with configured Pachyderm data integration now display a link to the Pachyderm
repo in the trial view page. The link is also available when viewing checkpoints derived from the
Pachyderm data. For a preview, visit: :ref:`Pachyderm <pachyderm-integration>` data lineage.

- WebUI: In the Experimental features, Flat Runs View is now "on" by default in the :ref:`WebUI
<web-ui-if>`. Users can still toggle this feature "off". This update improves the ability to
compare model performance between different trials, based on user feedback that most Determined
users run single-trial experiments.

- "Experiments" are now called "searches" and "trials" are now called "runs" for better clarity.
- The "experiment list" is now called the "run list", showing all trials from experiments in the
project. It functions similarly to the previous new experiment list.
- Multi-trial experiments can be viewed in the new searches view, which allows for sorting,
filtering and navigating multi-trial experiments.
- When viewing a multi-trial experiment, a list of trials is displayed, allowing for sorting,
filtering and arbitrary comparison between trials.

- WebUI: Add resource allocation information to the trial details page.

- WebUI: Allow users to continue a canceled or errored multi-trial experiment for searcher type
``random`` or ``grid``.

- Master Configuration: Add an ``always_redirect`` option to OIDC and SAML configurations. When
enabled, this option bypasses the standard Determined sign-in page and routes users directly to
the configured SSO provider. This redirection persists unless the user explicitly signs out
within the WebUI.

- Experiments: Obfuscate subfields of ``data.secrets`` in the :ref:`experiment configuration
<experiment-config-data>`.

- CLI: Add a new command, ``det cmd describe COMMAND_ID`` to allow users to fetch the metadata of a
single command.

**Improvements**

- Switch the default AWS instance type from ``m5.large`` to ``m6i.large``. This change enhances
performance without affecting the cost.
- WebUI: In the enterprise edition, redirect SSO users to the SSO provider's authentication URIs
when their session token has expired, instead of displaying the Determined sign-in page.

**Bug Fixes**

- WebUI: Fix a bug where the Compare view on the Project Details page did not allow comparison of
experiments selected from different pages.
- WebUI: Fix endless metrics fetching in "Visualization" tab in experiment details page for
cancelled experiments that do not have metrics.
- Fix two places where aggregated queued stats could have shown inflated values. The total queued
aggregated time and today's queued aggregated time calculations were both affected.
- CLI: Fix an error related to ``det cmd list --csv``
- WebUI: Fix missing data in Historic Usage Charts due to erroneous date parsing.

**Deprecations**

- Detached mode: The ``defaults`` and ``unmanaged`` parameters of the ``init`` function for
unmanaged experiment have been deprecated and will be removed in a future version. Please use
``config`` instead.

- Agent and Kubernetes Resource Manager: Jobs can no longer be moved within the same priority
group. To reposition a job, update its priority using the CLI or WebUI. For detailed
instructions, visit :ref:`modify-job-queue-cli`. This change was announced in version 0.33.0.

- AgentRM: Support for Singularity, Podman, and Apptainer was deprecated in 0.33.0 and is now
removed. Docker is the only container runtime supported by Agent resource manager (AgentRM). It
is still possible to use podman with AgentRM by using the podman emulation layer. For detailed
instructions, visit: `Emulating Docker CLI with Podman
<https://podman-desktop.io/docs/migrating-from-docker/emulating-docker-cli-with-podman>`. You
might need to also configure ``checkpoint_storage`` in experiment or master configurations. In
the enterprise edition, Slurm resource manager still supports Singularity, Podman, or Apptainer
use.

- Kubernetes Scheduling: Support for the priority scheduler for Kubernetes Resource Managers is
discontinued and may be removed in a future release due to limited usage. Users should transition
to the default scheduler. Visit :ref:`Kubernetes Default Scheduler
<kubernetes-default-scheduler>` for details.

- API: The ``model_hub`` library is now deprecated. Users of MMDetTrial and BaseTransformerTrial
should switch to :ref:`Core API <api-core-ug>` or the :ref:`PyTorch Trainer <pytorch_trainer_ug>`
for integrations with ``mmcv`` and ``huggingface``.

**************
Version 0.34
**************
Expand Down Expand Up @@ -94,7 +283,7 @@ Version 0.34.0
**Deprecations**

- Agent Resource Manager: Round robin scheduler is removed for Agent Resource Managers. Deprecation
was announded in release 0.33.0. Users should transition to priority scheduler.
was announced in release 0.33.0. Users should transition to priority scheduler.
- Machine Architectures: Support for PPC64/POWER builds for all environments has been deprecated
and is now being removed. Users should transition to ARM64/AMD64.

Expand Down
9 changes: 0 additions & 9 deletions docs/release-notes/basic-lineage.rst

This file was deleted.

10 changes: 0 additions & 10 deletions docs/release-notes/cli-cmd-describe.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/release-notes/compare-selected-experiments.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/release-notes/continue-trial.rst

This file was deleted.

12 changes: 0 additions & 12 deletions docs/release-notes/delete-namespace-bindings.rst

This file was deleted.

15 changes: 0 additions & 15 deletions docs/release-notes/deprecate-containerRuntime.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/release-notes/deprecate-job-move.rst

This file was deleted.

14 changes: 0 additions & 14 deletions docs/release-notes/deprecate-name-use-cluster-name.rst

This file was deleted.

8 changes: 0 additions & 8 deletions docs/release-notes/deprecate-priority-scheduler.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/release-notes/detached-mode-config-improvement.rst

This file was deleted.

5 changes: 0 additions & 5 deletions docs/release-notes/edit-HP-search.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/release-notes/fetching-metrics-fix.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/release-notes/fix-taskstats.rst

This file was deleted.

5 changes: 0 additions & 5 deletions docs/release-notes/historic-usage-chart-timezones.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/release-notes/m6i.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/release-notes/model-hub-deprecation.rst

This file was deleted.

8 changes: 0 additions & 8 deletions docs/release-notes/relabel-allocation-csv.rst

This file was deleted.

7 changes: 0 additions & 7 deletions docs/release-notes/remove-agent-labels.rst

This file was deleted.

5 changes: 0 additions & 5 deletions docs/release-notes/resource-allocation.rst

This file was deleted.

16 changes: 0 additions & 16 deletions docs/release-notes/run-centric-ux.rst

This file was deleted.

6 changes: 0 additions & 6 deletions docs/release-notes/secret-experiment-config.rst

This file was deleted.

Loading

0 comments on commit e770ee5

Please sign in to comment.