diff --git a/sagemaker-training-compiler/huggingface/pytorch_single_gpu_single_node/bert-base-cased/bert-base-cased-single-node-single-gpu.ipynb b/sagemaker-training-compiler/huggingface/pytorch_single_gpu_single_node/bert-base-cased/bert-base-cased-single-node-single-gpu.ipynb index 3a4497c9d5..6c6656560b 100644 --- a/sagemaker-training-compiler/huggingface/pytorch_single_gpu_single_node/bert-base-cased/bert-base-cased-single-node-single-gpu.ipynb +++ b/sagemaker-training-compiler/huggingface/pytorch_single_gpu_single_node/bert-base-cased/bert-base-cased-single-node-single-gpu.ipynb @@ -50,7 +50,7 @@ "source": [ "## Introduction\n", "\n", - "This notebooks is an end-to-end binary text classification example. In this demo, we use the Hugging Face's `transformers` and `datasets` libraries with SageMaker Training Compiler to compile and fine-tune a pre-trained transformer for binary text classification. In particular, the pre-trained model will be fine-tuned using the Stanford Sentiment Treebank (SST) dataset. To get started, you need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on. \n", + "This notebook is an end-to-end binary text classification example. In this demo, we use the Hugging Face's `transformers` and `datasets` libraries with SageMaker Training Compiler to compile and fine-tune a pre-trained transformer for binary text classification. In particular, the pre-trained model will be fine-tuned using the `Stanford Sentiment Treebank (SST)` dataset. To get started, you need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on. \n", "\n", "![image.png](attachment:image.png)\n", "\n", @@ -81,7 +81,7 @@ "metadata": {}, "outputs": [], "source": [ - "!pip install \"sagemaker>=2.108.0\" botocore boto3 awscli s3fs typing-extensions --upgrade" + "!pip install \"sagemaker>=2.108.0\" botocore boto3 awscli s3fs typing-extensions \"torch==1.11.0\" --upgrade" ] }, { @@ -112,7 +112,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Copy and run the following code if you need to upgrade ipywidgets for `datasets` library and restart kernel. This is only needed when preprocessing is done in the notebook.\n", + "Copy and run the following code if you need to upgrade \"ipywidgets\" for `datasets` library and restart kernel. This is only needed when preprocessing is done in the notebook.\n", "\n", "```python\n", "%%capture\n", @@ -134,7 +134,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**Note:** If you are going to use Sagemaker in a local environment. You need access to an IAM Role with the required permissions for SageMaker. To learn more, see [SageMaker Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)." + "**Note:** If you are going to use SageMaker in a local environment. You need access to an IAM Role with the required permissions for SageMaker. To learn more, see [SageMaker Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html)." ] }, { @@ -176,7 +176,7 @@ "\n", "If you'd like to try other training datasets later, you can simply use this method.\n", "\n", - "For this example notebook, we prepared the [SST2 dataset](https://www.tensorflow.org/datasets/catalog/glue#gluesst2) in the public SageMaker sample S3 bucket. The following code cells show how you can directly load the dataset and convert to a HuggingFace DatasetDict." + "For this example notebook, we prepared the [SST2 dataset](https://www.tensorflow.org/datasets/catalog/glue#gluesst2) in the public SageMaker sample S3 bucket. The following code cells show how you can directly load the dataset and convert to a `HuggingFace DatasetDict`." ] }, { @@ -406,7 +406,7 @@ "source": [ "from sagemaker.pytorch import PyTorch\n", "\n", - "hyperparameters = {\"epochs\": 5, \"train_batch_size\": 14, \"model_name\": \"bert-base-cased\"}\n", + "hyperparameters = {\"epochs\": 5, \"train_batch_size\": 16, \"model_name\": \"bert-base-cased\"}\n", "\n", "# Scale the learning rate by batch size, as original LR was using batch size of 32\n", "hyperparameters[\"learning_rate\"] = float(\"5e-5\") / 32 * hyperparameters[\"train_batch_size\"]\n", @@ -712,7 +712,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Plot and compare throughputs of compiled training and native training" + "### Plot and compare throughput of compiled training and native training" ] }, { @@ -765,7 +765,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Example output for SageMaker Training Compiler traing job\n", + "#### Example output for SageMaker Training Compiler training job\n", "\n", "{'train_runtime': 3742.9028,\n", " 'train_samples_per_second': 89.969,\n", @@ -801,27 +801,6 @@ "plt.xticks(ticks=[1, 1.5], labels=[\"Baseline PT\", \"SM-Training-Compiler-enhanced PT\"])" ] }, - { - "attachments": { - "throughput.png": { - "image/png": "" - } - }, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training Throughput Example Plot\n", - "\n", - "![throughput.png](attachment:throughput.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Note:** For this example, the compiler delivers higher throughput for an ML model as measured by samples per second. However, you might not see an improvement in the total training time for your model. The total training time depends on several other factors, such as key components of the Trainer and TFTrainer APIs." - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/sagemaker-training-compiler/huggingface/pytorch_single_gpu_single_node/roberta-base/roberta-base.ipynb b/sagemaker-training-compiler/huggingface/pytorch_single_gpu_single_node/roberta-base/roberta-base.ipynb index 84c5335500..ae6f16b5c6 100644 --- a/sagemaker-training-compiler/huggingface/pytorch_single_gpu_single_node/roberta-base/roberta-base.ipynb +++ b/sagemaker-training-compiler/huggingface/pytorch_single_gpu_single_node/roberta-base/roberta-base.ipynb @@ -67,7 +67,7 @@ "metadata": {}, "outputs": [], "source": [ - "!pip install \"sagemaker>=2.108.0\" botocore boto3 awscli --upgrade" + "!pip install \"sagemaker>=2.108.0\" botocore boto3 awscli \"torch==1.11.0\" --upgrade" ] }, { @@ -99,7 +99,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Copy and run the following code if you need to upgrade ipywidgets for `datasets` library and restart kernel. This is only needed when prerpocessing is done in the notebook.\n", + "Copy and run the following code if you need to upgrade `ipywidgets` for `datasets` library and restart kernel. This is only needed when prepocessing is done in the notebook.\n", "\n", "```python\n", "%%capture\n", @@ -164,7 +164,7 @@ "\n", "If you'd like to try other training datasets later, you can simply use this method.\n", "\n", - "For this example notebook, we prepared the `SST2` dataset in the public SageMaker sample file S3 bucket. The following code cells show how you can directly load the dataset and convert to a HuggingFace DatasetDict." + "For this example notebook, we prepared the `SST2` dataset in the public SageMaker sample file S3 bucket. The following code cells show how you can directly load the dataset and convert to a `HuggingFace DatasetDict`." ] }, { @@ -302,7 +302,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Set up an option for fine-tuning or full training. Set `FINE_TUNING = 1` for fine-tuning and using `fine_tune_with_huggingface.py`. Set `FINE_TUNING = 0` for full training and using `full_train_roberta_with_huggingface.py`." + "Set up an option for fine-tuning or full training. `FINE_TUNING = 1` is for fine-tuning and it will use `fine_tune_with_huggingface.py`. `FINE_TUNING = 0` is for full training and it will use `full_train_roberta_with_huggingface.py`." ] }, { @@ -318,7 +318,7 @@ "FULL_TRAINING = not FINE_TUNING\n", "\n", "# Fine tuning is typically faster and is done for fewer epochs\n", - "EPOCHS = 4 if FINE_TUNING else 100\n", + "EPOCHS = 7 if FINE_TUNING else 100\n", "\n", "TRAINING_SCRIPT = (\n", " \"fine_tune_with_huggingface.py\" if FINE_TUNING else \"full_train_roberta_with_huggingface.py\"\n", @@ -340,7 +340,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The `train_batch_size` in the following code cell is the maximum batch that can fit into the memory of an `ml.p3.2xlarge` instance. If you change the model, instance type, sequence length, and other parameters, you need to do some experiments to find the largest batch size that will fit into GPU memory." + "The `train_batch_size` in the following code cell is the maximum batch that can fit into the memory of the `ml.p3.2xlarge` instance. If you change the model, instance type, sequence length, and other parameters, you need to do some experiments to find the largest batch size that will fit into GPU memory." ] }, { @@ -628,9 +628,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Plot and compare throughputs of compiled training and native training\n", + "### Plot and compare throughput of compiled training and native training\n", "\n", - "Visualize average throughputs as reported by HuggingFace and see potential savings." + "Visualize average throughput as reported by HuggingFace and see potential savings." ] }, {