Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: bumpenvs for efs-utils #9309

Merged
merged 1 commit into from
May 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions .circleci/real_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ parameters:
# be referenced by --ee testing.
default-pt-gpu-image:
type: string
default: determinedai/pytorch-tensorflow-cuda-dev:f17151a
default: determinedai/pytorch-tensorflow-cuda-dev:8b3bea3
# Some python, go, and react dependencies are cached by circleci via `save_cache`/`restore_cache`.
# If the dependencies stay the same, but the circleci code that would produce them is changed,
# it may be necessary to invalidate the cache by incrementing this value.
Expand Down Expand Up @@ -234,7 +234,7 @@ commands:
- when:
condition: <<parameters.tf2>>
steps:
- run: docker pull determinedai/pytorch-tensorflow-cpu-dev:f17151a
- run: docker pull determinedai/pytorch-tensorflow-cpu-dev:8b3bea3

login-docker:
parameters:
Expand Down Expand Up @@ -2298,7 +2298,7 @@ jobs:

test-unit-harness-gpu:
docker:
- image: determinedai/pytorch-tensorflow-cuda-dev:f17151a
- image: determinedai/pytorch-tensorflow-cuda-dev:8b3bea3
resource_class: determined-ai/container-runner-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand All @@ -2320,7 +2320,7 @@ jobs:

test-unit-harness-pytorch2-gpu:
docker:
- image: determinedai/pytorch-cuda-dev:f17151a
- image: determinedai/pytorch-cuda-dev:8b3bea3
resource_class: determined-ai/container-runner-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand All @@ -2342,7 +2342,7 @@ jobs:

test-unit-harness-pytorch2-cpu:
docker:
- image: determinedai/pytorch-cpu-dev:f17151a
- image: determinedai/pytorch-cpu-dev:8b3bea3
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
- checkout
Expand All @@ -2363,7 +2363,7 @@ jobs:

test-unit-harness-gpu-parallel:
docker:
- image: determinedai/pytorch-tensorflow-cuda-dev:f17151a
- image: determinedai/pytorch-tensorflow-cuda-dev:8b3bea3
resource_class: determined-ai/container-runner-multi-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand All @@ -2385,7 +2385,7 @@ jobs:

test-unit-harness-gpu-deepspeed:
docker:
- image: determinedai/pytorch-ngc-dev:f17151a
- image: determinedai/pytorch-ngc-dev:8b3bea3
resource_class: determined-ai/container-runner-gpu
steps:
- run: mkdir -p ~/.ssh && ssh-keyscan github.com >> ~/.ssh/known_hosts
Expand Down Expand Up @@ -3619,7 +3619,7 @@ jobs:
type: string
default: "1"
environment-image:
default: determinedai/pytorch-tensorflow-cuda-dev:f17151a
default: determinedai/pytorch-tensorflow-cuda-dev:8b3bea3
type: string
accel-node-taints:
type: string
Expand Down
4 changes: 2 additions & 2 deletions docs/model-dev-guide/api-guides/apis-howto/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,13 @@ experiment is controlled by the container image that has been configured for tha
Determined provides prebuilt Docker images that include TensorFlow 2.11, 1.15, and 2.8,
respectively:

- ``determinedai/pytorch-tensorflow-cuda-dev:f17151a`` (default)
- ``determinedai/pytorch-tensorflow-cuda-dev:8b3bea3`` (default)
- ``determinedai/environments:cuda-10.2-pytorch-1.7-tf-1.15-gpu-0.21.2``
- ``determinedai/environments:cuda-11.2-tf-2.8-gpu-0.29.1``

We also provide lightweight CPU-only counterparts:

- ``determinedai/pytorch-tensorflow-cpu-dev:f17151a``
- ``determinedai/pytorch-tensorflow-cpu-dev:8b3bea3``
- ``determinedai/environments:py-3.8-tf-2.8-cpu-0.29.1``

To change the container image used for an experiment, specify :ref:`environment.image
Expand Down
8 changes: 4 additions & 4 deletions docs/model-dev-guide/prepare-container/custom-env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,9 @@ Default Images
+-------------+-------------------------------------------------------------------------------+
| Environment | File Name |
+=============+===============================================================================+
| CPUs | ``determinedai/pytorch-tensorflow-cpu-dev:f17151a`` |
| CPUs | ``determinedai/pytorch-tensorflow-cpu-dev:8b3bea3`` |
+-------------+-------------------------------------------------------------------------------+
| NVIDIA GPUs | ``determinedai/pytorch-tensorflow-cuda-dev:f17151a`` |
| NVIDIA GPUs | ``determinedai/pytorch-tensorflow-cuda-dev:8b3bea3`` |
+-------------+-------------------------------------------------------------------------------+
| AMD GPUs | ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` |
+-------------+-------------------------------------------------------------------------------+
Expand Down Expand Up @@ -132,7 +132,7 @@ Example Dockerfile that installs custom ``conda``-, ``pip``-, and ``apt``-based
.. code:: bash

# Determined Image
FROM determinedai/pytorch-tensorflow-cuda-dev:f17151a
FROM determinedai/pytorch-tensorflow-cuda-dev:8b3bea3

# Custom Configuration
RUN apt-get update && \
Expand Down Expand Up @@ -195,7 +195,7 @@ environments using :ref:`custom images <custom-docker-images>`:
.. code:: bash

# Determined Image
FROM determinedai/pytorch-tensorflow-cpu-dev:f17151a
FROM determinedai/pytorch-tensorflow-cpu-dev:8b3bea3

# Create a virtual environment
RUN conda create -n myenv python=3.8
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/deploy/helm-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,13 +194,13 @@

- ``cpuImage``: Sets the default Docker image for all non-GPU tasks. If a Docker image is
specified in the :ref:`experiment config <exp-environment-image>` this default is overriden.
Defaults to: ``determinedai/pytorch-tensorflow-cpu-dev:f17151a``.
Defaults to: ``determinedai/pytorch-tensorflow-cpu-dev:8b3bea3``.

- ``startupHook``: An optional inline script that will be executed as part of task set up.

- ``gpuImage``: Sets the default Docker image for all GPU tasks. If a Docker image is specified
in the :ref:`experiment config <exp-environment-image>` this default is overriden. Defaults
to: ``determinedai/pytorch-tensorflow-cuda-dev:f17151a``.
to: ``determinedai/pytorch-tensorflow-cuda-dev:8b3bea3``.

- ``logPolicies``: Sets log policies for trials. For details, visit :ref:`log_policies
<experiment-config-min-validation-period>`.
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/deploy/master-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,9 @@ configure different container images for NVIDIA GPU tasks using the ``cuda`` key
Determined 0.17.6), CPU tasks using ``cpu`` key, and ROCm (AMD GPU) tasks using the ``rocm`` key.
Default values:

- ``determinedai/pytorch-tensorflow-cuda-dev:f17151a`` for NVIDIA GPUs.
- ``determinedai/pytorch-tensorflow-cuda-dev:8b3bea3`` for NVIDIA GPUs.
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` for ROCm.
- ``determinedai/pytorch-tensorflow-cpu-dev:f17151a`` for CPUs.
- ``determinedai/pytorch-tensorflow-cpu-dev:8b3bea3`` for CPUs.

``environment_variables``
=========================
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/experiment-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1333,8 +1333,8 @@ Optional. The Docker image to use when executing the workload. This image must b
container images for NVIDIA GPU tasks using ``cuda`` key (``gpu`` prior to 0.17.6), CPU tasks using
``cpu`` key, and ROCm (AMD GPU) tasks using ``rocm`` key. Default values:

- ``determinedai/pytorch-tensorflow-cuda-dev:f17151a`` for NVIDIA GPUs.
- ``determinedai/pytorch-tensorflow-cpu-dev:f17151a`` for CPUs.
- ``determinedai/pytorch-tensorflow-cuda-dev:8b3bea3`` for NVIDIA GPUs.
- ``determinedai/pytorch-tensorflow-cpu-dev:8b3bea3`` for CPUs.
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` for ROCm.

When the cluster is configured with :ref:`resource_manager.type: slurm
Expand Down
4 changes: 2 additions & 2 deletions docs/reference/job-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ The following configuration settings are supported:
different container images for NVIDIA GPU tasks using ``cuda`` key (``gpu`` prior to 0.17.6),
CPU tasks using ``cpu`` key, and ROCm (AMD GPU) tasks using ``rocm`` key. Default values:

- ``determinedai/pytorch-tensorflow-cuda-dev:f17151a`` for NVIDIA GPUs.
- ``determinedai/pytorch-tensorflow-cuda-dev:8b3bea3`` for NVIDIA GPUs.
- ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4`` for ROCm.
- ``determinedai/pytorch-tensorflow-cpu-dev:f17151a`` for CPUs.
- ``determinedai/pytorch-tensorflow-cpu-dev:8b3bea3`` for CPUs.

- ``force_pull_image``: Forcibly pull the image from the Docker registry and bypass the Docker
cache. Defaults to ``false``.
Expand Down
4 changes: 2 additions & 2 deletions docs/setup-cluster/deploy-cluster/slurm/singularity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ by default in this version of Determined are described below.
+-------------+--------------------------------------------------------------------------+
| Environment | File Name |
+=============+==========================================================================+
| CPUs | ``determinedai/pytorch-tensorflow-cpu-dev:f17151a`` |
| CPUs | ``determinedai/pytorch-tensorflow-cpu-dev:8b3bea3`` |
+-------------+--------------------------------------------------------------------------+
| NVIDIA GPUs | ``determinedai/pytorch-tensorflow-cuda-dev:f17151a`` |
| NVIDIA GPUs | ``determinedai/pytorch-tensorflow-cuda-dev:8b3bea3`` |
+-------------+--------------------------------------------------------------------------+
| AMD GPUs | ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512`` |
+-------------+--------------------------------------------------------------------------+
Expand Down
4 changes: 2 additions & 2 deletions docs/setup-cluster/gcp/install-gcp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -406,5 +406,5 @@ This command line will spin up a cluster of up to 2 A100s in the ``us-central1-c
--compute-agent-instance-type a2-highgpu-1g --gpu-num 1 \
--gpu-type nvidia-tesla-a100 \
--region us-central1 --zone us-central1-c \
--gpu-env-image determinedai/pytorch-tensorflow-cuda-dev:f17151a \
--cpu-env-image determinedai/pytorch-tensorflow-cpu-dev:f17151a
--gpu-env-image determinedai/pytorch-tensorflow-cuda-dev:8b3bea3 \
--cpu-env-image determinedai/pytorch-tensorflow-cpu-dev:8b3bea3
4 changes: 2 additions & 2 deletions docs/setup-cluster/slurm/singularity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ by default in this version of Determined are described below.
+-------------+--------------------------------------------------------------------------+
| Environment | File Name |
+=============+==========================================================================+
| CPUs | ``determinedai/pytorch-tensorflow-cpu-dev:f17151a`` |
| CPUs | ``determinedai/pytorch-tensorflow-cpu-dev:8b3bea3`` |
+-------------+--------------------------------------------------------------------------+
| NVIDIA GPUs | ``determinedai/pytorch-tensorflow-cuda-dev:f17151a`` |
| NVIDIA GPUs | ``determinedai/pytorch-tensorflow-cuda-dev:8b3bea3`` |
+-------------+--------------------------------------------------------------------------+
| AMD GPUs | ``determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-622d512`` |
+-------------+--------------------------------------------------------------------------+
Expand Down
2 changes: 1 addition & 1 deletion docs/setup-cluster/slurm/slurm-requirements.rst
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,7 @@ platform. There may be additional per-user configuration that is required.

.. code:: bash

image=determinedai/pytorch-tensorflow-cuda-dev:f17151a
image=determinedai/pytorch-tensorflow-cuda-dev:8b3bea3
cd /shared/enroot/images
enroot import docker://$image
enroot create /shared/enroot/images/${image//[\/:]/\+}.sqsh
Expand Down
12 changes: 6 additions & 6 deletions e2e_tests/tests/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@
MAX_TRIAL_BUILD_SECS = 90


DEFAULT_TF2_CPU_IMAGE = "determinedai/pytorch-tensorflow-cpu-dev:f17151a"
DEFAULT_TF2_GPU_IMAGE = "determinedai/pytorch-tensorflow-cuda-dev:f17151a"
DEFAULT_PT_CPU_IMAGE = "determinedai/pytorch-tensorflow-cpu-dev:f17151a"
DEFAULT_PT_GPU_IMAGE = "determinedai/pytorch-tensorflow-cuda-dev:f17151a"
DEFAULT_PT2_CPU_IMAGE = "determinedai/pytorch-cpu-dev:f17151a"
DEFAULT_PT2_GPU_IMAGE = "determinedai/pytorch-cuda-dev:f17151a"
DEFAULT_TF2_CPU_IMAGE = "determinedai/pytorch-tensorflow-cpu-dev:8b3bea3"
DEFAULT_TF2_GPU_IMAGE = "determinedai/pytorch-tensorflow-cuda-dev:8b3bea3"
DEFAULT_PT_CPU_IMAGE = "determinedai/pytorch-tensorflow-cpu-dev:8b3bea3"
DEFAULT_PT_GPU_IMAGE = "determinedai/pytorch-tensorflow-cuda-dev:8b3bea3"
DEFAULT_PT2_CPU_IMAGE = "determinedai/pytorch-cpu-dev:8b3bea3"
DEFAULT_PT2_GPU_IMAGE = "determinedai/pytorch-cuda-dev:8b3bea3"

TF2_CPU_IMAGE = os.environ.get("TF2_CPU_IMAGE") or DEFAULT_TF2_CPU_IMAGE
TF2_GPU_IMAGE = os.environ.get("TF2_GPU_IMAGE") or DEFAULT_TF2_GPU_IMAGE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: torchvision dsat core_api
max_restarts: 0
environment:
image:
gpu: determinedai/pytorch-ngc-dev:f17151a
gpu: determinedai/pytorch-ngc-dev:8b3bea3
resources:
slots_per_trial: 2
shm_size: 4294967296 # 4 GiB.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: torchvision dsat deepspeed_trial
max_restarts: 0
environment:
image:
gpu: determinedai/pytorch-ngc-dev:f17151a
gpu: determinedai/pytorch-ngc-dev:8b3bea3
resources:
slots_per_trial: 2
shm_size: 4294967296 # 4 GiB.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ environment:
# You may need to modify this to match your network configuration.
- NCCL_SOCKET_IFNAME=ens,eth,ib
image:
gpu: determinedai/pytorch-ngc-dev:f17151a
gpu: determinedai/pytorch-ngc-dev:8b3bea3
resources:
slots_per_trial: 2
searcher:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ environment:
# You may need to modify this to match your network configuration.
- NCCL_SOCKET_IFNAME=ens,eth,ib
image:
gpu: determinedai/pytorch-ngc-dev:f17151a
gpu: determinedai/pytorch-ngc-dev:8b3bea3
resources:
slots_per_trial: 2
searcher:
Expand Down
20 changes: 10 additions & 10 deletions harness/determined/deploy/aws/templates/efs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,35 @@ Mappings:
RegionMap:
ap-northeast-1:
Master: ami-00910ef9457f0df47
Agent: ami-0d281621f7b2c8ff5
Agent: ami-0d17768c113bc7053
# TODO(DET-4258) Uncomment these when we fully support all P3 regions.
# ap-northeast-2:
# Master: ami-035e3e44dc41db6a2
# Agent: ami-001dc7a236d6004e5
# Agent: ami-06542874b4ab6003d
# ap-southeast-1:
# Master: ami-0fd1ee6c8b656f020
# Agent: ami-0b1a9b3e77f9bf206
# Agent: ami-03d1fba6f5c61a233
# ap-southeast-2:
# Master: ami-0b62ecd3babd1c548
# Agent: ami-0148e5b87c7f1d6fa
# Agent: ami-0a662e555f59bc56f
eu-central-1:
Master: ami-0abbe417ed83c0b29
Agent: ami-094e3b0134303c6ae
Agent: ami-0a510a21bcbb901a1
eu-west-1:
Master: ami-0e3f7dd2dc743e48a
Agent: ami-0d39b1b19dc21c921
Agent: ami-0287a5461ed8dbf38
# eu-west-2:
# Master: ami-0d78429fb6af30994
# Agent: ami-0c250aa7e5393105d
# Agent: ami-08b973d39f7eee569
us-east-1:
Master: ami-0172070f66a8ebe63
Agent: ami-0797876208dd69d72
Agent: ami-0f49364aa77da45f2
us-east-2:
Master: ami-0bafa3699418551cd
Agent: ami-0d83d216812c079f0
Agent: ami-01168798b1af4c386
us-west-2:
Master: ami-0ceeab680f529cc36
Agent: ami-08525450ea29d0306
Agent: ami-03081ca9dd5286010

Parameters:
VpcCIDR:
Expand Down
20 changes: 10 additions & 10 deletions harness/determined/deploy/aws/templates/fsx.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,35 @@ Mappings:
RegionMap:
ap-northeast-1:
Master: ami-00910ef9457f0df47
Agent: ami-0d281621f7b2c8ff5
Agent: ami-0d17768c113bc7053
# TODO(DET-4258) Uncomment these when we fully support all P3 regions.
# ap-northeast-2:
# Master: ami-035e3e44dc41db6a2
# Agent: ami-001dc7a236d6004e5
# Agent: ami-06542874b4ab6003d
# ap-southeast-1:
# Master: ami-0fd1ee6c8b656f020
# Agent: ami-0b1a9b3e77f9bf206
# Agent: ami-03d1fba6f5c61a233
# ap-southeast-2:
# Master: ami-0b62ecd3babd1c548
# Agent: ami-0148e5b87c7f1d6fa
# Agent: ami-0a662e555f59bc56f
eu-central-1:
Master: ami-0abbe417ed83c0b29
Agent: ami-094e3b0134303c6ae
Agent: ami-0a510a21bcbb901a1
eu-west-1:
Master: ami-0e3f7dd2dc743e48a
Agent: ami-0d39b1b19dc21c921
Agent: ami-0287a5461ed8dbf38
# eu-west-2:
# Master: ami-0d78429fb6af30994
# Agent: ami-0c250aa7e5393105d
# Agent: ami-08b973d39f7eee569
us-east-1:
Master: ami-0172070f66a8ebe63
Agent: ami-0797876208dd69d72
Agent: ami-0f49364aa77da45f2
us-east-2:
Master: ami-0bafa3699418551cd
Agent: ami-0d83d216812c079f0
Agent: ami-01168798b1af4c386
us-west-2:
Master: ami-0ceeab680f529cc36
Agent: ami-08525450ea29d0306
Agent: ami-03081ca9dd5286010

Parameters:
VpcCIDR:
Expand Down
4 changes: 2 additions & 2 deletions harness/determined/deploy/aws/templates/govcloud.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ Mappings:
RegionMap:
us-gov-east-1:
Master: ami-04ef693ebcf519dc3
Agent: ami-04b72bb3e6fc6c248
Agent: ami-0f703fc4950620f0d
us-gov-west-1:
Master: ami-08bd15d820a3c087e
Agent: ami-0920e23d7acae3b90
Agent: ami-0c6d1e5673cb0e483
Parameters:
Keypair:
Type: AWS::EC2::KeyPair::KeyName
Expand Down
Loading
Loading