diff --git a/docs/model-dev-guide/api-guides/apis-howto/_index.rst b/docs/model-dev-guide/api-guides/apis-howto/_index.rst index f8769e88c07..1d876479e02 100644 --- a/docs/model-dev-guide/api-guides/apis-howto/_index.rst +++ b/docs/model-dev-guide/api-guides/apis-howto/_index.rst @@ -56,38 +56,6 @@ Prefer to use an Example Model? If you'd like to build off of an existing model that already runs on Determined, visit our :ref:`example-solutions` to see if the model you'd like to train is already available. -******************** - TensorFlow Support -******************** - -TensorFlow Core Models -====================== - -Determined has support for TensorFlow models that use the :ref:`Keras ` API. For -models that use the low-level TensorFlow Core APIs, we recommend wrapping your model in Keras, as -recommended by the official `TensorFlow `_ -documentation. - -TensorFlow 1 vs 2 -================= - -Determined supports both TensorFlow 1 and 2. The version of TensorFlow that is used for a particular -experiment is controlled by the container image that has been configured for that experiment. -Determined provides prebuilt Docker images that include TensorFlow 2+, 1.15, and 2.8, respectively: - -- ``determinedai/tensorflow-ngc-dev:e960eae`` -- ``determinedai/environments:cuda-10.2-pytorch-1.7-tf-1.15-gpu-0.21.2`` -- ``determinedai/environments:cuda-11.2-tf-2.8-gpu-0.29.1`` - -We also provide lightweight CPU-only counterparts: - -- ``determinedai/environments:py-3.8-tf-2.8-cpu-0.29.1`` - -To change the container image used for an experiment, specify :ref:`environment.image -` in the experiment configuration file. Please see :ref:`container-images` -for more details about configuring training environments and a more complete list of prebuilt Docker -images. - ****************** AMD ROCm Support ****************** diff --git a/docs/model-dev-guide/prepare-container/_index.rst b/docs/model-dev-guide/prepare-container/_index.rst index cb5708f6568..fa984156841 100644 --- a/docs/model-dev-guide/prepare-container/_index.rst +++ b/docs/model-dev-guide/prepare-container/_index.rst @@ -13,6 +13,9 @@ Find resources and operations for preparing your container environment. | :ref:`custom-env` | How to set environment variables, use a startup hook, and use custom | | | and default Docker images. | +-------------------------------+----------------------------------------------------------------------+ +| :ref:`tensorflow-support` | How to use TensorFlow Core models with Keras, support for TensorFlow | +| | 1 and 2, and how to configure container images. | ++-------------------------------+----------------------------------------------------------------------+ .. toctree:: :maxdepth: 1 @@ -20,3 +23,4 @@ Find resources and operations for preparing your container environment. Set Environment Images Customize Your Environment + TensorFlow Support diff --git a/docs/model-dev-guide/prepare-container/custom-env.rst b/docs/model-dev-guide/prepare-container/custom-env.rst index dbf7e928709..65d01bce147 100644 --- a/docs/model-dev-guide/prepare-container/custom-env.rst +++ b/docs/model-dev-guide/prepare-container/custom-env.rst @@ -4,7 +4,7 @@ Customize Your Environment ############################ -Determined launches workloads using Docker containers. By default, workloads execute inside a +Determined launches workloads using Docker containers. By default, workloads run inside a Determined-provided container that includes common deep learning libraries and frameworks. If your model code has additional dependencies, the easiest way to install them is to specify a @@ -34,10 +34,10 @@ format is a list of ``NAME=VALUE`` strings. For example: - C=${B} Variables are set sequentially, which affect variables that depend on the expansion of other -variables. In the example, names ``A``, ``B``, and ``C`` each have the value ``hello_world`` in the +variables. In the example, ``A``, ``B``, and ``C`` each have the value ``hello_world`` in the container. -Proxy variables set in this way take precedent over variables set in the :ref:`agent configuration +Proxy variables set in this way take precedence over variables set in the :ref:`agent configuration `. You can also set variables for each accelerator type, separately: @@ -59,16 +59,25 @@ You can also set variables for each accelerator type, separately: Startup Hooks *************** -If a ``startup-hook.sh`` file exists in the top level of your model definition directory, this file -is automatically run with every Docker container startup. This occurs before any Python interpreters -are launched or deep learning operations are performed. The startup hook can be used to customize -the container environment, install additional dependencies, and download data sets among other shell -script commands. +If a ``startup-hook.sh`` file exists in the top level of your model definition directory (for +experiments), or context directory (for shells, notebooks, and TensorBoards), it is automatically +run with every Docker container startup before any Python interpreters are launched or deep learning +operations are performed. The startup hook can customize the container environment, install +additional dependencies, and download datasets, among other shell script commands. + +.. note:: + + ``startup-hook.sh`` does not apply to ``det cmd``. It applies to experiments, notebooks, shells, + and TensorBoards, but not commands. + +For shells, notebooks, and TensorBoards, make sure to supply the context directory using the +``--context`` or ``-c`` option. You can also use the ``--include`` option, though it may require +more directory management. Startup hooks are not cached and run before the start of every workload, so expensive or long-running operations in a startup hook can result in poor performance. -This example startup hook installs the ``wget`` utility and the ``pandas`` Python package: +Example startup hook to install the ``wget`` utility and the ``pandas`` Python package: .. code:: bash @@ -116,13 +125,12 @@ Default Images NGC Version =========== -By default, a suitable NGC container version is used in our images. Users can select a different +By default, a suitable NGC container version is used in our images. You can select a different version of NGC containers to build images from. Versions are listed on the `NVIDIA Frameworks site -`__. Once a suitable -version is selected, users can rebuild these images by cloning the `MLDE environments repo -`__ and modifying either NGC_PYTORCH_VERSION or -NGC_TENSORFLOW_VERSION variables in the MakeFile, then running `make build-pytorch-ngc` or `make -build-tensorflow-ngc` respectively. +`__. To build custom +images, cloning the `MLDE environments repo `__, +modify the ``NGC_PYTORCH_VERSION`` or ``NGC_TENSORFLOW_VERSION`` variables in the MakeFile, and run +`make build-pytorch-ngc` or `make build-tensorflow-ngc` respectively. .. _custom-docker-images: @@ -130,18 +138,17 @@ Custom Images ============= While the official images contain all the dependencies needed for basic deep learning workloads, -many workloads have additional dependencies. If the extra dependencies are quick to install, you -might consider using a :ref:`startup hook `. Where installing dependencies using -``startup-hook.sh`` takes too long, it is recommended that you build your own Docker image and -publish to a Docker registry, such as `Docker Hub `__. +many workloads have additional dependencies. If the extra dependencies are quick to install, use a +:ref:`startup hook `. If installing dependencies using ``startup-hook.sh`` takes too +long, build your own Docker image and publish it to a Docker registry, such as `Docker Hub +`__. .. warning:: Do NOT install TensorFlow, PyTorch, Horovod, or Apex packages, which conflict with Determined-installed packages. -It is recommended that custom images use one of the official Determined images as a base image, -using the ``FROM`` instruction. +Use one of the official Determined images as a base image in the ``FROM`` instruction. Example Dockerfile that installs custom ``conda``-, ``pip``-, and ``apt``-based dependencies: @@ -162,8 +169,8 @@ Example Dockerfile that installs custom ``conda``-, ``pip``-, and ``apt``-based conda activate base && \ pip install --requirement /tmp/pip_requirements.txt -Assuming that this image is published to a public repository on Docker Hub, use the following -declaration format to configure an experiment, command, or notebook: +Assuming this image is published to a public repository on Docker Hub, configure an experiment, +command, or notebook with: .. code:: yaml @@ -173,8 +180,7 @@ declaration format to configure an experiment, command, or notebook: where ``my-user-name`` is your Docker Hub user, ``my-repo-name`` is the name of the Docker Hub repository, and ``my-tag`` is the image tag to use, such as ``latest``. -If you publish your image to a private Docker Hub repository, you can specify the credentials needed -to access the repository: +For a private Docker Hub repository, specify the credentials: .. code:: yaml @@ -184,8 +190,7 @@ to access the repository: username: my-user-name password: my-password -If you publish the image to a private `Docker Registry `__, -specify the registry path as part of the ``image`` field: +For a private `Docker Registry `__, specify the registry path: .. code:: yaml @@ -195,9 +200,9 @@ specify the registry path as part of the ``image`` field: Images are fetched using HTTPS by default. An HTTPS proxy can be configured using the ``https_proxy`` field in the :ref:`agent configuration `. -The custom image and credentials can be set as the defaults for all tasks launched in Determined, -using the ``image`` and ``registry_auth`` fields in the :ref:`master configuration -`. Make sure to restart the master for this to take effect. +Set the custom image and credentials as the defaults for all tasks launched in Determined using the +``image`` and ``registry_auth`` fields in the :ref:`master configuration `. +Restart the master for these changes to take effect. .. _virtual-env: @@ -226,7 +231,7 @@ To ensure that a virtual environment is activated every time a new interactive t created, in JupyterLab or using Determined Shell, update ``~/.bashrc`` with the scripts to activate the virtual environment you want. -This example switches to a virtual environment using a :ref:`startup hook `: +Example using a :ref:`startup hook ` to switch to a virtual environment: .. code:: bash @@ -236,3 +241,8 @@ This example switches to a virtual environment using a :ref:`startup hook > ~/.bashrc + +.. note:: + + ``startup-hook.sh`` does not apply to ``det cmd``. It applies to experiments, notebooks, shells, + and TensorBoards, but not commands. diff --git a/docs/model-dev-guide/prepare-container/tensorflow-support.rst b/docs/model-dev-guide/prepare-container/tensorflow-support.rst new file mode 100644 index 00000000000..22b31016297 --- /dev/null +++ b/docs/model-dev-guide/prepare-container/tensorflow-support.rst @@ -0,0 +1,34 @@ +.. _tensorflow-support: + +#################### + TensorFlow Support +#################### + +************************ + TensorFlow Core Models +************************ + +Determined supports for TensorFlow models using the :ref:`Keras ` API. For models that +use low-level TensorFlow Core APIs, we recommend wrapping your model in Keras as suggested by the +official `TensorFlow `_ documentation. + +******************* + TensorFlow 1 vs 2 +******************* + +Determined supports both TensorFlow 1 and 2. The version of TensorFlow used for a particular +experiment is controlled by the configured container image. Determined provides prebuilt Docker +images that include TensorFlow 2+, 1.15, and 2.8, respectively: + +- ``determinedai/tensorflow-ngc-dev:e960eae`` +- ``determinedai/environments:cuda-10.2-pytorch-1.7-tf-1.15-gpu-0.21.2`` +- ``determinedai/environments:cuda-11.2-tf-2.8-gpu-0.29.1`` + +Lightweight CPU-only counterparts are also available: + +- ``determinedai/environments:py-3.8-tf-2.8-cpu-0.29.1`` + +To change the container image used for an experiment, specify :ref:`environment.image +` in the experiment configuration file. Please see :ref:`container-images` +for more details about configuring training environments and a more complete list of prebuilt Docker +images.