diff --git a/site/en/guide/tpu.ipynb b/site/en/guide/tpu.ipynb index f64450ba04c..c17af68516e 100644 --- a/site/en/guide/tpu.ipynb +++ b/site/en/guide/tpu.ipynb @@ -61,7 +61,9 @@ "id": "Ys81cOhXOWUP" }, "source": [ - "Before you run this Colab notebook, make sure that your hardware accelerator is a TPU by checking your notebook settings: **Runtime** > **Change runtime type** > **Hardware accelerator** > **TPU**." + "This guide demonstrates how to perform basic training on [Tensor Processing Units (TPUs)](https://cloud.google.com/tpu/) and TPU Pods, a collection of TPU devices connected by dedicated high-speed network interfaces, with `tf.keras` and custom training loops.\n", + "\n", + "TPUs are Google's custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. They are available through [Google Colab](https://colab.research.google.com/), the [TPU Research Cloud](https://sites.research.google/trc/), and [Cloud TPU](https://cloud.google.com/tpu)." ] }, { @@ -73,6 +75,17 @@ "## Setup" ] }, + { + "cell_type": "markdown", + "metadata": { + "id": "ebf7f8489bb7" + }, + "source": [ + "Before you run this Colab notebook, make sure that your hardware accelerator is a TPU by checking your notebook settings: **Runtime** > **Change runtime type** > **Hardware accelerator** > **TPU**.\n", + "\n", + "Import some necessary libraries, including TensorFlow Datasets:" + ] + }, { "cell_type": "code", "execution_count": null, @@ -95,7 +108,7 @@ "source": [ "## TPU initialization\n", "\n", - "TPUs are typically Cloud TPU workers, which are different from the local process running the user's Python program. Thus, you need to do some initialization work to connect to the remote cluster and initialize the TPUs. Note that the `tpu` argument to `tf.distribute.cluster_resolver.TPUClusterResolver` is a special address just for Colab. If you are running your code on Google Compute Engine (GCE), you should instead pass in the name of your Cloud TPU." + "TPUs are typically [Cloud TPU](https://cloud.google.com/tpu/docs/) workers, which are different from the local process running the user's Python program. Thus, you need to do some initialization work to connect to the remote cluster and initialize the TPUs. Note that the `tpu` argument to `tf.distribute.cluster_resolver.TPUClusterResolver` is a special address just for Colab. If you are running your code on Google Compute Engine (GCE), you should instead pass in the name of your Cloud TPU." ] }, { @@ -159,7 +172,7 @@ "source": [ "## Distribution strategies\n", "\n", - "Usually you run your model on multiple TPUs in a data-parallel way. To distribute your model on multiple TPUs (or other accelerators), TensorFlow offers several distribution strategies. You can replace your distribution strategy and the model will run on any given (TPU) device. Check the [distribution strategy guide](./distributed_training.ipynb) for more information." + "Usually, you run your model on multiple TPUs in a data-parallel way. To distribute your model on multiple TPUs (as well as multiple GPUs or multiple machines), TensorFlow offers the `tf.distribute.Strategy` API. You can replace your distribution strategy and the model will run on any given (TPU) device. Learn more in the [Distributed training with TensorFlow](./distributed_training.ipynb) guide." ] }, { @@ -168,6 +181,8 @@ "id": "DcDPMZs-9uLJ" }, "source": [ + "Using the `tf.distribute.TPUStrategy` option implements synchronous distributed training. TPUs provide their own implementation of efficient all-reduce and other collective operations across multiple TPU cores, which are used in `TPUStrategy`.\n", + "\n", "To demonstrate this, create a `tf.distribute.TPUStrategy` object:" ] }, @@ -188,7 +203,7 @@ "id": "JlaAmswWPsU6" }, "source": [ - "To replicate a computation so it can run in all TPU cores, you can pass it into the `strategy.run` API. Below is an example that shows all cores receiving the same inputs `(a, b)` and performing matrix multiplication on each core independently. The outputs will be the values from all the replicas." + "To replicate a computation so it can run in all TPU cores, you can pass it into the `Strategy.run` API. Below is an example that shows all cores receiving the same inputs `(a, b)` and performing matrix multiplication on each core independently. The outputs will be the values from all the replicas." ] }, { @@ -216,7 +231,7 @@ "source": [ "## Classification on TPUs\n", "\n", - "Having covered the basic concepts, consider a more concrete example. This section demonstrates how to use the distribution strategy—`tf.distribute.TPUStrategy`—to train a Keras model on a Cloud TPU.\n" + "Having covered the basic concepts, consider a more concrete example. This section demonstrates how to use the distribution strategy—`tf.distribute.TPUStrategy`—to train a Keras model on a Cloud TPU." ] }, { @@ -227,7 +242,7 @@ "source": [ "### Define a Keras model\n", "\n", - "Start with a definition of a `Sequential` Keras model for image classification on the MNIST dataset using Keras. It's no different than what you would use if you were training on CPUs or GPUs. Note that Keras model creation needs to be inside `strategy.scope`, so the variables can be created on each TPU device. Other parts of the code are not necessary to be inside the strategy scope." + "Start with a definition of a [`Sequential` Keras model](./sequential_model.ipynb) for image classification on the MNIST dataset. It's no different than what you would use if you were training on CPUs or GPUs. Note that Keras model creation needs to be inside the `Strategy.scope`, so the variables can be created on each TPU device. Other parts of the code are not necessary to be inside the `Strategy` scope." ] }, { @@ -256,9 +271,9 @@ "source": [ "### Load the dataset\n", "\n", - "Efficient use of the `tf.data.Dataset` API is critical when using a Cloud TPU, as it is impossible to use the Cloud TPUs unless you can feed them data quickly enough. You can learn more about dataset performance in the [Input pipeline performance guide](./data_performance.ipynb).\n", + "Efficient use of the `tf.data.Dataset` API is critical when using a Cloud TPU. You can learn more about dataset performance in the [Input pipeline performance guide](./data_performance.ipynb).\n", "\n", - "For all but the simplest experiments (using `tf.data.Dataset.from_tensor_slices` or other in-graph data), you need to store all data files read by the Dataset in Google Cloud Storage (GCS) buckets.\n", + "If you are using [TPU Nodes](https://cloud.google.com/tpu/docs/managing-tpus-tpu-vm), you need to store all data files read by the TensorFlow `Dataset` in [Google Cloud Storage (GCS) buckets](https://cloud.google.com/tpu/docs/storage-buckets). If you are using [TPU VMs](https://cloud.google.com/tpu/docs/users-guide-tpu-vm), you can store data wherever you like. For more information on TPU Nodes and TPU VMs, refer to the [TPU System Architecture](https://cloud.google.com/tpu/docs/system-architecture-tpu-vm) documentation.\n", "\n", "For most use cases, it is recommended to convert your data into the `TFRecord` format and use a `tf.data.TFRecordDataset` to read it. Check the [TFRecord and tf.Example tutorial](../tutorials/load_data/tfrecord.ipynb) for details on how to do this. It is not a hard requirement and you can use other dataset readers, such as `tf.data.FixedLengthRecordDataset` or `tf.data.TextLineDataset`.\n", "\n", @@ -266,7 +281,7 @@ "\n", "Regardless of the data format used, it is strongly recommended that you use large files on the order of 100MB. This is especially important in this networked setting, as the overhead of opening a file is significantly higher.\n", "\n", - "As shown in the code below, you should use the `tensorflow_datasets` module to get a copy of the MNIST training and test data. Note that `try_gcs` is specified to use a copy that is available in a public GCS bucket. If you don't specify this, the TPU will not be able to access the downloaded data. " + "As shown in the code below, you should use the Tensorflow Datasets `tfds.load` module to get a copy of the MNIST training and test data. Note that `try_gcs` is specified to use a copy that is available in a public GCS bucket. If you don't specify this, the TPU will not be able to access the downloaded data." ] }, { @@ -311,7 +326,7 @@ "source": [ "### Train the model using Keras high-level APIs\n", "\n", - "You can train your model with Keras `fit` and `compile` APIs. There is nothing TPU-specific in this step—you write the code as if you were using mutliple GPUs and a `MirroredStrategy` instead of the `TPUStrategy`. You can learn more in the [Distributed training with Keras](https://www.tensorflow.org/tutorials/distribute/keras) tutorial." + "You can train your model with Keras `Model.fit` and `Model.compile` APIs. There is nothing TPU-specific in this step—you write the code as if you were using multiple GPUs and a `MirroredStrategy` instead of the `TPUStrategy`. You can learn more in the [Distributed training with Keras](../tutorials/distribute/keras.ipynb) tutorial." ] }, { @@ -338,7 +353,7 @@ "model.fit(train_dataset,\n", " epochs=5,\n", " steps_per_epoch=steps_per_epoch,\n", - " validation_data=test_dataset, \n", + " validation_data=test_dataset,\n", " validation_steps=validation_steps)" ] }, @@ -348,7 +363,7 @@ "id": "8hSGBIYtUugJ" }, "source": [ - "To reduce Python overhead and maximize the performance of your TPU, pass in the argument—`steps_per_execution`—to `Model.compile`. In this example, it increases throughput by about 50%:" + "To reduce Python overhead and maximize the performance of your TPU, pass in the `steps_per_execution` argument to Keras `Model.compile`. In this example, it increases throughput by about 50%:" ] }, { @@ -382,7 +397,7 @@ "source": [ "### Train the model using a custom training loop\n", "\n", - "You can also create and train your model using `tf.function` and `tf.distribute` APIs directly. You can use the `strategy.experimental_distribute_datasets_from_function` API to distribute the dataset given a dataset function. Note that in the example below the batch size passed into the dataset is the per-replica batch size instead of the global batch size. To learn more, check out the [Custom training with tf.distribute.Strategy](https://www.tensorflow.org/tutorials/distribute/custom_training) tutorial.\n" + "You can also create and train your model using `tf.function` and `tf.distribute` APIs directly. You can use the `Strategy.experimental_distribute_datasets_from_function` API to distribute the `tf.data.Dataset` given a dataset function. Note that in the example below the batch size passed into the `Dataset` is the per-replica batch size instead of the global batch size. To learn more, check out the [Custom training with `tf.distribute.Strategy`](../tutorials/distribute/custom_training.ipynb) tutorial.\n" ] }, { @@ -391,7 +406,7 @@ "id": "DxdgXPAL6iFE" }, "source": [ - "First, create the model, datasets and tf.functions:" + "First, create the model, datasets and `tf.function`s:" ] }, { @@ -402,8 +417,8 @@ }, "outputs": [], "source": [ - "# Create the model, optimizer and metrics inside the strategy scope, so that the\n", - "# variables can be mirrored on each device.\n", + "# Create the model, optimizer and metrics inside the `tf.distribute.Strategy`\n", + "# scope, so that the variables can be mirrored on each device.\n", "with strategy.scope():\n", " model = create_model()\n", " optimizer = tf.keras.optimizers.Adam()\n", @@ -411,8 +426,8 @@ " training_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(\n", " 'training_accuracy', dtype=tf.float32)\n", "\n", - "# Calculate per replica batch size, and distribute the datasets on each TPU\n", - "# worker.\n", + "# Calculate per replica batch size, and distribute the `tf.data.Dataset`s\n", + "# on each TPU worker.\n", "per_replica_batch_size = batch_size // strategy.num_replicas_in_sync\n", "\n", "train_dataset = strategy.experimental_distribute_datasets_from_function(\n", @@ -479,9 +494,9 @@ "source": [ "### Improving performance with multiple steps inside `tf.function`\n", "\n", - "You can improve the performance by running multiple steps within a `tf.function`. This is achieved by wrapping the `strategy.run` call with a `tf.range` inside `tf.function`, and AutoGraph will convert it to a `tf.while_loop` on the TPU worker.\n", + "You can improve the performance by running multiple steps within a `tf.function`. This is achieved by wrapping the `Strategy.run` call with a `tf.range` inside `tf.function`, and AutoGraph will convert it to a `tf.while_loop` on the TPU worker. You can learn more about `tf.function`s in the [Better performance with `tf.function`](./function.ipynb) guide.\n", "\n", - "Despite the improved performance, there are tradeoffs with this method compared to running a single step inside `tf.function`. Running multiple steps in a `tf.function` is less flexible—you cannot run things eagerly or arbitrary Python code within the steps.\n" + "Despite the improved performance, there are tradeoffs with this method compared to running a single step inside a `tf.function`. Running multiple steps in a `tf.function` is less flexible—you cannot run things eagerly or arbitrary Python code within the steps.\n" ] }, { @@ -512,7 +527,7 @@ " for _ in tf.range(steps):\n", " strategy.run(step_fn, args=(next(iterator),))\n", "\n", - "# Convert `steps_per_epoch` to `tf.Tensor` so the `tf.function` won't get \n", + "# Convert `steps_per_epoch` to `tf.Tensor` so the `tf.function` won't get\n", "# retraced if the value changes.\n", "train_multiple_steps(train_iterator, tf.convert_to_tensor(steps_per_epoch))\n", "\n", @@ -530,10 +545,17 @@ "source": [ "## Next steps\n", "\n", - "- [Google Cloud TPU documentation](https://cloud.google.com/tpu/docs/): How to set up and run a Google Cloud TPU.\n", + "To learn more about Cloud TPUs and how to use them:\n", + "\n", + "- [Google Cloud TPU](https://cloud.google.com/tpu): The Google Cloud TPU homepage.\n", + "- [Google Cloud TPU documentation](https://cloud.google.com/tpu/docs/): Google Cloud TPU documentation, which includes:\n", + " - [Introduction to Cloud TPU](https://cloud.google.com/tpu/docs/intro-to-tpu): An overview of working with Cloud TPUs.\n", + " - [Cloud TPU quickstarts](https://cloud.google.com/tpu/docs/quick-starts): Quickstart introductions to working with Cloud TPU VMs using TensorFlow and other main machine learning frameworks.\n", "- [Google Cloud TPU Colab notebooks](https://cloud.google.com/tpu/docs/colabs): End-to-end training examples.\n", "- [Google Cloud TPU performance guide](https://cloud.google.com/tpu/docs/performance-guide): Enhance Cloud TPU performance further by adjusting Cloud TPU configuration parameters for your application\n", - "- [Distributed training with TensorFlow](./distributed_training.ipynb): How to use distribution strategies—including `tf.distribute.TPUStrategy`—with examples showing best practices." + "- [Distributed training with TensorFlow](./distributed_training.ipynb): How to use distribution strategies—including `tf.distribute.TPUStrategy`—with examples showing best practices.\n", + "- TPU embeddings: TensorFlow includes specialized support for training embeddings on TPUs via `tf.tpu.experimental.embedding`. In addition, [TensorFlow Recommenders](https://www.tensorflow.org/recommenders) has `tfrs.layers.embedding.TPUEmbedding`. Embeddings provide efficient and dense representations, capturing complex similarities and relationships between features. TensorFlow's TPU-specific embedding support allows you to train embeddings that are larger than the memory of a single TPU device, and to use sparse and ragged inputs on TPUs.\n", + "- [TPU Research Cloud (TRC)] https://sites.research.google/trc/about/: TRC enables researchers to apply for access to a cluster of more than 1,000 Cloud TPU devices.\n" ] } ],