From 3656cb8d585ddc93b6ff15d7c5421096d81c030b Mon Sep 17 00:00:00 2001 From: Keita Nonaka Date: Mon, 24 Jun 2024 15:31:29 -0700 Subject: [PATCH 01/10] docs: add release notes for 0.34.0 --- docs/release-notes.rst | 92 +++++++++++++++++++ docs/release-notes/cluster-message.rst | 9 -- .../deploy-local-auto-password.rst | 9 -- docs/release-notes/deprecate-ppc64.rst | 6 -- docs/release-notes/deprecate-roundrobin.rst | 6 -- docs/release-notes/feature-checkpointGC.rst | 11 --- docs/release-notes/feature-node-selectors.rst | 14 --- docs/release-notes/framework-splitting.rst | 11 --- docs/release-notes/gateway.rst | 20 ---- docs/release-notes/job-state.rst | 6 -- docs/release-notes/pods-to-jobs.rst | 10 -- 11 files changed, 92 insertions(+), 102 deletions(-) delete mode 100644 docs/release-notes/cluster-message.rst delete mode 100644 docs/release-notes/deploy-local-auto-password.rst delete mode 100644 docs/release-notes/deprecate-ppc64.rst delete mode 100644 docs/release-notes/deprecate-roundrobin.rst delete mode 100644 docs/release-notes/feature-checkpointGC.rst delete mode 100644 docs/release-notes/feature-node-selectors.rst delete mode 100644 docs/release-notes/framework-splitting.rst delete mode 100644 docs/release-notes/gateway.rst delete mode 100644 docs/release-notes/job-state.rst delete mode 100644 docs/release-notes/pods-to-jobs.rst diff --git a/docs/release-notes.rst b/docs/release-notes.rst index 8a5999f8485..45e7ca95884 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -6,6 +6,98 @@ Release Notes ############### +************** + Version 0.34 +************** + +Version 0.34.0 +============== + +**Release Date:** June 24, 2024 + +**Breaking Changes** + +- Images: The default environment includes images that support PyTorch. TensorFlow users must + configure their experiments to target our non-default TensorFlow images. Details on this process + can be found at :ref:`set-environment-images` + +- Images: Our new default images are based on Nvidia NGC. While we provide a recommended NGC + version, users have the flexibility to build their own images using any NGC version that meets + their specific requirements. For more information, visit :ref:`ngc-version` + +**New Features** + +- Kubernetes: The system now launches Kubernetes jobs on behalf of users when they submit workloads + to Determined, instead of launching Kubernetes pods. This change allows Determined to work + properly with other Kubernetes features like resource quotas. + + As a result, permissions are now required to create, get, list, delete, and watch Kubernetes job + resources. + +- WebUI: Add the ability for administrators to use the CLI to set a message to be displayed on all + pages of the WebUI (for example, ``det master cluster-message set -m "Your message"``). Optional + flags are available for scheduling the message with a start time and an end time. Administrators + can clear the message anytime using ``det master cluster-message clear``. Only one message can be + active at a time, so setting a new message will replace the previous one. + +- Kubernetes: Add a feature where Determined offers the users to provide custom Checkpoint GC pod spec. + This configuration is done using the ``task_container_defaults.checkpointGcPodSpec`` field + within your ``value.yaml`` file. User can create a custom pod specification for CheckpointGC, + it will override the default experiment's pod spec settings. Determined by default uses the + experiment's pod spec, but by providing custom pod spec users have the flexibility to + customize and configure the pod spec directly in this field. User can tailor the garbage + collection settings according to the specific GC needs. + +- Kubernetes: The :ref:`Internal Task Gateway ` feature enables Determined + tasks running on remote Kubernetes clusters to be exposed to the Determined master and proxies. + This feature facilitates multi-resource manager setups by configuring a Gateway controller in the + external Kubernetes cluster. + +.. important:: + + Enabling this feature exposes Determined tasks to the outside world. It is crucial to implement + appropriate security measures to restrict access to exposed tasks and secure communication + between the external cluster and the main cluster. Recommended measures include: + + - Setting up a firewall + - Using a VPN + - Implementing IP whitelisting + - Configuring Kubernetes Network Policies + - Employing other security measures as needed + +- Kubernetes Configuration: Allow Cluster administrators to define Determined resource pools on + Kubernetes using node selectors and/or affinities. Configure these settings at the default pod + spec level under ``task_container_defaults.cpu_pod_spec`` or + ``task_container_defaults.gpu_pod_spec``. This allows a single cluster to be divided into + multiple resource pools using node labels. + +- WebUI: Allow resource pool slot counts to reflect the state of the entire cluster. Allow slot + counts and scheduling to respect node selectors and affinities. This impacts Determined clusters + deployed on Kubernetes with multiple resource pools defined in terms of node selectors and/or + affinities. + +**Bug Fixes** + +- Kubernetes: Fix an issue where where jobs would remain in "QUEUED" state until all pods were + running. Jobs will now correctly show as "SCHEDULED" once all pods have been assigned to nodes. + +**ZZZ put these into an appropriate section!** + +Security Fixes + + - CLI: When deploying locally using ``det deploy local`` with ``master-up`` or ``cluster-up`` + commands and no user accounts have been created yet, an initial password will be automatically + generated and shown to the user (with the option to change it) if neither + ``security.initial_user_password`` in ``master.yaml`` nor the ``--initial-user-password`` CLI + flag is present. + +**Deprecations** + +- Agent Resource Manager: Round robin scheduler is removed for Agent Resource Managers. Deprecation + was announded in release 0.33.0. Users should transition to priority scheduler. +- Machine Architectures: Support for PPC64/POWER builds for all environments has been deprecated + and is now being removed. Users should transition to ARM64/AMD64. + ************** Version 0.33 ************** diff --git a/docs/release-notes/cluster-message.rst b/docs/release-notes/cluster-message.rst deleted file mode 100644 index 5d06740dbc2..00000000000 --- a/docs/release-notes/cluster-message.rst +++ /dev/null @@ -1,9 +0,0 @@ -:orphan: - -**New Features** - -- WebUI: Add the ability for administrators to use the CLI to set a message to be displayed on all - pages of the WebUI (for example, ``det master cluster-message set -m "Your message"``). Optional - flags are available for scheduling the message with a start time and an end time. Administrators - can clear the message anytime using ``det master cluster-message clear``. Only one message can be - active at a time, so setting a new message will replace the previous one. diff --git a/docs/release-notes/deploy-local-auto-password.rst b/docs/release-notes/deploy-local-auto-password.rst deleted file mode 100644 index dff9d31bda2..00000000000 --- a/docs/release-notes/deploy-local-auto-password.rst +++ /dev/null @@ -1,9 +0,0 @@ -:orphan: - -Security Fixes - - - CLI: When deploying locally using ``det deploy local`` with ``master-up`` or ``cluster-up`` - commands and no user accounts have been created yet, an initial password will be automatically - generated and shown to the user (with the option to change it) if neither - ``security.initial_user_password`` in ``master.yaml`` nor the ``--initial-user-password`` CLI - flag is present. diff --git a/docs/release-notes/deprecate-ppc64.rst b/docs/release-notes/deprecate-ppc64.rst deleted file mode 100644 index 4c64839526e..00000000000 --- a/docs/release-notes/deprecate-ppc64.rst +++ /dev/null @@ -1,6 +0,0 @@ -:orphan: - -**Deprecations** - -- Machine Architectures: Support for PPC64/POWER builds for all environments has been deprecated - and is now being removed. Users should transition to ARM64/AMD64. diff --git a/docs/release-notes/deprecate-roundrobin.rst b/docs/release-notes/deprecate-roundrobin.rst deleted file mode 100644 index 63cf3b5afad..00000000000 --- a/docs/release-notes/deprecate-roundrobin.rst +++ /dev/null @@ -1,6 +0,0 @@ -:orphan: - -**Deprecations** - -- Agent Resource Manager: Round robin scheduler is removed for Agent Resource Managers. Deprecation - was announded in release 0.33.0. Users should transition to priority scheduler. diff --git a/docs/release-notes/feature-checkpointGC.rst b/docs/release-notes/feature-checkpointGC.rst deleted file mode 100644 index 4c44a3bc4f0..00000000000 --- a/docs/release-notes/feature-checkpointGC.rst +++ /dev/null @@ -1,11 +0,0 @@ -:orphan: - -**New Features** - -- Kubernetes: Add a feature where Determined offers the users to provide custom Checkpoint GC pod spec. - This configuration is done using the ``task_container_defaults.checkpointGcPodSpec`` field - within your ``value.yaml`` file. User can create a custom pod specification for CheckpointGC, - it will override the default experiment's pod spec settings. Determined by default uses the - experiment's pod spec, but by providing custom pod spec users have the flexibility to - customize and configure the pod spec directly in this field. User can tailor the garbage - collection settings according to the specific GC needs. diff --git a/docs/release-notes/feature-node-selectors.rst b/docs/release-notes/feature-node-selectors.rst deleted file mode 100644 index 984e3ffe868..00000000000 --- a/docs/release-notes/feature-node-selectors.rst +++ /dev/null @@ -1,14 +0,0 @@ -:orphan: - -**New Features** - -- Kubernetes Configuration: Allow Cluster administrators to define Determined resource pools on - Kubernetes using node selectors and/or affinities. Configure these settings at the default pod - spec level under ``task_container_defaults.cpu_pod_spec`` or - ``task_container_defaults.gpu_pod_spec``. This allows a single cluster to be divided into - multiple resource pools using node labels. - -- WebUI: Allow resource pool slot counts to reflect the state of the entire cluster. Allow slot - counts and scheduling to respect node selectors and affinities. This impacts Determined clusters - deployed on Kubernetes with multiple resource pools defined in terms of node selectors and/or - affinities. diff --git a/docs/release-notes/framework-splitting.rst b/docs/release-notes/framework-splitting.rst deleted file mode 100644 index f573d3f6d37..00000000000 --- a/docs/release-notes/framework-splitting.rst +++ /dev/null @@ -1,11 +0,0 @@ -:orphan: - -**Breaking Change** - -- Images: The default environment includes images that support PyTorch. TensorFlow users must - configure their experiments to target our non-default TensorFlow images. Details on this process - can be found at :ref:`set-environment-images` - -- Images: Our new default images are based on Nvidia NGC. While we provide a recommended NGC - version, users have the flexibility to build their own images using any NGC version that meets - their specific requirements. For more information, visit :ref:`ngc-version` diff --git a/docs/release-notes/gateway.rst b/docs/release-notes/gateway.rst deleted file mode 100644 index 3eff08b5a8e..00000000000 --- a/docs/release-notes/gateway.rst +++ /dev/null @@ -1,20 +0,0 @@ -:orphan: - -**New Features** - -- Kubernetes: The :ref:`Internal Task Gateway ` feature enables Determined - tasks running on remote Kubernetes clusters to be exposed to the Determined master and proxies. - This feature facilitates multi-resource manager setups by configuring a Gateway controller in the - external Kubernetes cluster. - -.. important:: - - Enabling this feature exposes Determined tasks to the outside world. It is crucial to implement - appropriate security measures to restrict access to exposed tasks and secure communication - between the external cluster and the main cluster. Recommended measures include: - - - Setting up a firewall - - Using a VPN - - Implementing IP whitelisting - - Configuring Kubernetes Network Policies - - Employing other security measures as needed diff --git a/docs/release-notes/job-state.rst b/docs/release-notes/job-state.rst deleted file mode 100644 index 1ba3deb025a..00000000000 --- a/docs/release-notes/job-state.rst +++ /dev/null @@ -1,6 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Kubernetes: Fix an issue where where jobs would remain in "QUEUED" state until all pods were - running. Jobs will now correctly show as "SCHEDULED" once all pods have been assigned to nodes. diff --git a/docs/release-notes/pods-to-jobs.rst b/docs/release-notes/pods-to-jobs.rst deleted file mode 100644 index ef6a026cc74..00000000000 --- a/docs/release-notes/pods-to-jobs.rst +++ /dev/null @@ -1,10 +0,0 @@ -:orphan: - -**New Features** - -- Kubernetes: The system now launches Kubernetes jobs on behalf of users when they submit workloads - to Determined, instead of launching Kubernetes pods. This change allows Determined to work - properly with other Kubernetes features like resource quotas. - - As a result, permissions are now required to create, get, list, delete, and watch Kubernetes job - resources. From b00026cd48c234c30a1229dfdb94893277ccc0a8 Mon Sep 17 00:00:00 2001 From: Keita Fish Date: Mon, 24 Jun 2024 16:18:50 -0700 Subject: [PATCH 02/10] Update docs/release-notes.rst Co-authored-by: Tara --- docs/release-notes.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release-notes.rst b/docs/release-notes.rst index 45e7ca95884..080fbb791eb 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -17,7 +17,7 @@ Version 0.34.0 **Breaking Changes** -- Images: The default environment includes images that support PyTorch. TensorFlow users must +- Images: The default environment includes images that support PyTorch. Therefore, TensorFlow users must configure their experiments to target our non-default TensorFlow images. Details on this process can be found at :ref:`set-environment-images` From 50d97d98d73c1a5afd00771cb437e42d52e84af9 Mon Sep 17 00:00:00 2001 From: Keita Fish Date: Mon, 24 Jun 2024 16:18:58 -0700 Subject: [PATCH 03/10] Update docs/release-notes.rst Co-authored-by: Tara --- docs/release-notes.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release-notes.rst b/docs/release-notes.rst index 080fbb791eb..41e7284d3e7 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -22,7 +22,7 @@ Version 0.34.0 can be found at :ref:`set-environment-images` - Images: Our new default images are based on Nvidia NGC. While we provide a recommended NGC - version, users have the flexibility to build their own images using any NGC version that meets +version, users can build their own images using any NGC version that meets their specific requirements. For more information, visit :ref:`ngc-version` **New Features** From e1cb0ffaae5bf6998775b4199e0e277bace2d1f7 Mon Sep 17 00:00:00 2001 From: Keita Nonaka Date: Tue, 25 Jun 2024 09:41:53 -0700 Subject: [PATCH 04/10] chore: fmt --- docs/release-notes.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/release-notes.rst b/docs/release-notes.rst index 41e7284d3e7..bf3c4dd0118 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -17,13 +17,13 @@ Version 0.34.0 **Breaking Changes** -- Images: The default environment includes images that support PyTorch. Therefore, TensorFlow users must - configure their experiments to target our non-default TensorFlow images. Details on this process - can be found at :ref:`set-environment-images` +- Images: The default environment includes images that support PyTorch. Therefore, TensorFlow users + must configure their experiments to target our non-default TensorFlow images. Details on this + process can be found at :ref:`set-environment-images` - Images: Our new default images are based on Nvidia NGC. While we provide a recommended NGC -version, users can build their own images using any NGC version that meets - their specific requirements. For more information, visit :ref:`ngc-version` + version, users can build their own images using any NGC version that meets their specific + requirements. For more information, visit :ref:`ngc-version` **New Features** From 19ddb04cf25006c35643e15b96f883ea43b985b4 Mon Sep 17 00:00:00 2001 From: Keita Fish Date: Wed, 26 Jun 2024 11:53:07 -0700 Subject: [PATCH 05/10] Update docs/release-notes.rst Co-authored-by: Tara --- docs/release-notes.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release-notes.rst b/docs/release-notes.rst index bf3c4dd0118..b28326af90d 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -17,7 +17,7 @@ Version 0.34.0 **Breaking Changes** -- Images: The default environment includes images that support PyTorch. Therefore, TensorFlow users +- Images: The default environment includes images that support PyTorch. Therefore, TensorFlow users. must configure their experiments to target our non-default TensorFlow images. Details on this process can be found at :ref:`set-environment-images` From a0e501ab31fdee9ad8c2ef8b624744b2de523ddf Mon Sep 17 00:00:00 2001 From: Keita Fish Date: Wed, 26 Jun 2024 11:53:12 -0700 Subject: [PATCH 06/10] Update docs/release-notes.rst Co-authored-by: Tara --- docs/release-notes.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release-notes.rst b/docs/release-notes.rst index b28326af90d..30f905987dd 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -19,7 +19,7 @@ Version 0.34.0 - Images: The default environment includes images that support PyTorch. Therefore, TensorFlow users. must configure their experiments to target our non-default TensorFlow images. Details on this - process can be found at :ref:`set-environment-images` + process can be found at :ref:`set-environment-images`. - Images: Our new default images are based on Nvidia NGC. While we provide a recommended NGC version, users can build their own images using any NGC version that meets their specific From 4735222f0f196cc6858cea8c94ac99f65d13f542 Mon Sep 17 00:00:00 2001 From: Keita Nonaka Date: Wed, 26 Jun 2024 14:50:54 -0700 Subject: [PATCH 07/10] docs: add more notes --- docs/release-notes.rst | 8 +++++--- docs/release-notes/idle.rst | 6 ------ 2 files changed, 5 insertions(+), 9 deletions(-) delete mode 100644 docs/release-notes/idle.rst diff --git a/docs/release-notes.rst b/docs/release-notes.rst index 30f905987dd..17f56efc272 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -17,9 +17,9 @@ Version 0.34.0 **Breaking Changes** -- Images: The default environment includes images that support PyTorch. Therefore, TensorFlow users. - must configure their experiments to target our non-default TensorFlow images. Details on this - process can be found at :ref:`set-environment-images`. +- Images: The default environment includes images that support PyTorch. Therefore, TensorFlow users. + must configure their experiments to target our non-default TensorFlow images. Details on this + process can be found at :ref:`set-environment-images`. - Images: Our new default images are based on Nvidia NGC. While we provide a recommended NGC version, users can build their own images using any NGC version that meets their specific @@ -80,6 +80,8 @@ Version 0.34.0 - Kubernetes: Fix an issue where where jobs would remain in "QUEUED" state until all pods were running. Jobs will now correctly show as "SCHEDULED" once all pods have been assigned to nodes. +- Notebooks: Fix an issue introduced in 0.30.0 where idle notebooks were not terminated as + expected. **ZZZ put these into an appropriate section!** diff --git a/docs/release-notes/idle.rst b/docs/release-notes/idle.rst deleted file mode 100644 index 56ea5be25ad..00000000000 --- a/docs/release-notes/idle.rst +++ /dev/null @@ -1,6 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Notebooks: Fix an issue introduced in 0.30.0 where idle notebooks were not terminated as - expected. From 65daf435fb233c7b58042a20f82087a9000fd074 Mon Sep 17 00:00:00 2001 From: Keita Nonaka Date: Wed, 26 Jun 2024 15:54:14 -0700 Subject: [PATCH 08/10] docs: minor fixes --- docs/release-notes.rst | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/release-notes.rst b/docs/release-notes.rst index 17f56efc272..db976bfa77b 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -83,9 +83,7 @@ Version 0.34.0 - Notebooks: Fix an issue introduced in 0.30.0 where idle notebooks were not terminated as expected. -**ZZZ put these into an appropriate section!** - -Security Fixes +**Security Fixes** - CLI: When deploying locally using ``det deploy local`` with ``master-up`` or ``cluster-up`` commands and no user accounts have been created yet, an initial password will be automatically From e6b260b4f33e67ec501fb49cad96115e00599939 Mon Sep 17 00:00:00 2001 From: Keita Nonaka Date: Wed, 26 Jun 2024 15:55:29 -0700 Subject: [PATCH 09/10] docs: modify release date --- docs/release-notes.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release-notes.rst b/docs/release-notes.rst index db976bfa77b..689d172194a 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -13,7 +13,7 @@ Version 0.34.0 ============== -**Release Date:** June 24, 2024 +**Release Date:** June 27, 2024 **Breaking Changes** From f508a4dec4078f1ebd8e774401106c1a4d97e18c Mon Sep 17 00:00:00 2001 From: Keita Nonaka Date: Thu, 27 Jun 2024 16:34:47 -0700 Subject: [PATCH 10/10] docs: modify release date --- docs/release-notes.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release-notes.rst b/docs/release-notes.rst index 689d172194a..63330539f92 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -13,7 +13,7 @@ Version 0.34.0 ============== -**Release Date:** June 27, 2024 +**Release Date:** June 28, 2024 **Breaking Changes**