Skip to content

Commit

Permalink
edit code format
Browse files Browse the repository at this point in the history
  • Loading branch information
Bruce Zhang committed Sep 22, 2022
1 parent 618b49d commit 5c64ca5
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 43 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Compile and Train a Hugging Face Transformers Trainer Model for Question and Answering with the `SQuAD` dataset"
"# Compile and Train a Hugging Face Transformers Trainer Model for Question and Answering with the SQuAD dataset"
]
},
{
Expand All @@ -15,7 +15,7 @@
"2. [Introduction](#Introduction) \n",
"3. [SageMaker Environment and Permissions](#SageMaker-Environment-and-Permissions)\n",
" 1. [Installation](#Installation)\n",
"4. [Loading the `SQuAD` dataset](#Loading-the-SQuAD-dataset)\n",
"4. [Loading the SQuAD dataset](#Loading-the-SQuAD-dataset)\n",
"5. [Preprocessing](#Preprocessing) \n",
"6. [SageMaker Training Job](#SageMaker-Training-Job) \n",
" 1. [Training with Native PyTorch](#Training-with-Native-PyTorch) \n",
Expand All @@ -38,11 +38,11 @@
"\n",
"## Introduction\n",
"\n",
"This example notebook demonstrates how to compile and fine-tune a question and answering NLP task. We use Hugging Face's `transformers` and `datasets` libraries with Amazon SageMaker Training Compiler to accelerate fine-tuning of a pre-trained transformer model on question and answering. In particular, the pre-trained model will be fine-tuned using the `SQuAD` dataset. To get started, we need to set up the environment with a few prerequisite steps to add permissions, configurations, and so on. \n",
"This example notebook demonstrates how to compile and fine-tune a question and answering NLP task. We use HuggingFace's transformers and datasets libraries with Amazon SageMaker Training Compiler to accelerate fine-tuning of a pre-trained transformer model on question and answering. In particular, the pre-trained model will be fine-tuned using the SQuAD dataset. To get started, we need to set up the environment with a few prerequisite steps to add permissions, configurations, and so on. \n",
"\n",
"**NOTE:** You can run this demo in SageMaker Studio, SageMaker notebook instances, or your local machine with AWS CLI set up. If using SageMaker Studio or SageMaker notebook instances, make sure you choose one of the PyTorch-based kernels, `Python 3 (PyTorch x.y Python 3.x CPU Optimized)` or `conda_pytorch_p36` respectively.\n",
"**NOTE:** You can run this demo in SageMaker Studio, SageMaker notebook instances, or your local machine with AWS CLI set up. If using SageMaker Studio or SageMaker notebook instances, make sure you choose one of the PyTorch-based kernels, Python 3 (PyTorch x.y Python 3.x CPU Optimized) or conda_pytorch_p36 respectively.\n",
"\n",
"**NOTE:** This notebook uses two `ml.p3.2xlarge` instances that have single GPU. If you don't have enough quota, see [Request a service quota increase for SageMaker resources](https://docs.aws.amazon.com/sagemaker/latest/dg/regions-quotas.html#service-limit-increase-request-procedure). "
"**NOTE:** This notebook uses two ml.p3.2xlarge instances that have single GPU. If you don't have enough quota, see [Request a service quota increase for SageMaker resources](https://docs.aws.amazon.com/sagemaker/latest/dg/regions-quotas.html#service-limit-increase-request-procedure). "
]
},
{
Expand Down Expand Up @@ -153,7 +153,7 @@
"id": "whPRbBNbIrIl"
},
"source": [
"## Loading the `SQuAD` dataset"
"## Loading the SQuAD dataset"
]
},
{
Expand All @@ -169,10 +169,10 @@
"\n",
"If you'd like to try other training datasets later, you can simply use this method.\n",
"\n",
"For this example notebook, we prepared the `SQuAD v1.1 dataset` in the public SageMaker sample file S3 bucket. The following code cells show how you can directly load the dataset and convert to a `HuggingFace DatasetDict`.\n",
"For this example notebook, we prepared the SQuAD v1.1 dataset in the public SageMaker sample file S3 bucket. The following code cells show how you can directly load the dataset and convert to a HuggingFace DatasetDict.\n",
"\n",
"\n",
"**NOTE:** The [`SQuAD` dataset](https://rajpurkar.github.io/SQuAD-explorer/) is under the [CC BY-SA 4.0 license terms](https://creativecommons.org/licenses/by-sa/4.0/)."
"**NOTE:** The [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/) is under the [CC BY-SA 4.0 license terms](https://creativecommons.org/licenses/by-sa/4.0/)."
]
},
{
Expand Down Expand Up @@ -563,11 +563,11 @@
"source": [
"## SageMaker Training Job\n",
"\n",
"To create a SageMaker training job, we use a `HuggingFace`/`PyTorch` estimator. Using the estimator, you can define which fine-tuning script should SageMaker use through `entry_point`, which `instance_type` to use for training, which `hyperparameters` to pass, and so on.\n",
"To create a SageMaker training job, we use a HuggingFace/PyTorch estimator. Using the estimator, you can define which fine-tuning script should SageMaker use through entry_point, which instance_type to use for training, which hyperparameters to pass, and so on.\n",
"\n",
"When a SageMaker training job starts, SageMaker takes care of starting and managing all the required machine learning instances, picks up the `HuggingFace` Deep Learning Container, uploads your training script, and downloads the data from `sagemaker_session_bucket` into the container at `/opt/ml/input/data`.\n",
"When a SageMaker training job starts, SageMaker takes care of starting and managing all the required machine learning instances, picks up the HuggingFace Deep Learning Container, uploads your training script, and downloads the data from sagemaker_session_bucket into the container at /opt/ml/input/data.\n",
"\n",
"In the following section, you learn how to set up two versions of the SageMaker `HuggingFace`/`PyTorch` estimator, a native one without the compiler and an optimized one with the compiler."
"In the following section, you learn how to set up two versions of the SageMaker HuggingFace/PyTorch estimator, a native one without the compiler and an optimized one with the compiler."
]
},
{
Expand All @@ -581,7 +581,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Below, we run a native PyTorch training job with the `PyTorch` estimator on a `ml.p3.2xlarge` instance. \n",
"Below, we run a native PyTorch training job with the PyTorch estimator on a ml.p3.2xlarge instance. \n",
"\n",
"We run a batch size of 28 on our native training job and 52 on our Training Compiler training job to make an apple to apple comparison. These batch sizes along with the max_length variable get us close to 100% GPU memory utilization.\n",
"\n",
Expand Down Expand Up @@ -1037,7 +1037,7 @@
"source": [
"## Conclusion\n",
"\n",
"In this example, we fine-tuned an [ALBERT model](https://huggingface.co/albert-base-v2) (`albert-base-v2`) with the `SQuAD` dataset and compared a native training job with a SageMaker Training Compiler training job. The Training Compiler job has `93% higher throughput` and `38% quicker training` time while training loss was equal with the native PyTorch training job."
"In this example, we fine-tuned an [ALBERT model](https://huggingface.co/albert-base-v2) (albert-base-v2) with the SQuAD dataset and compared a native training job with a SageMaker Training Compiler training job. The Training Compiler job has 93% higher throughput and 38% quicker training time while training loss was equal with the native PyTorch training job."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Compile and Train a Hugging Face Transformer ``BERT`` Model with the SST Dataset using SageMaker Training Compiler"
"# Compile and Train a Hugging Face Transformer BERT Model with the SST Dataset using SageMaker Training Compiler"
]
},
{
Expand Down Expand Up @@ -50,13 +50,13 @@
"source": [
"## Introduction\n",
"\n",
"This notebook is an end-to-end binary text classification example. In this demo, we use the Hugging Face's `transformers` and `datasets` libraries with SageMaker Training Compiler to compile and fine-tune a pre-trained transformer for binary text classification. In particular, the pre-trained model will be fine-tuned using the `Stanford Sentiment Treebank (SST)` dataset. To get started, you need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on. \n",
"This notebook is an end-to-end binary text classification example. In this demo, we use the Hugging Face's transformers and datasets libraries with SageMaker Training Compiler to compile and fine-tune a pre-trained transformer for binary text classification. In particular, the pre-trained model will be fine-tuned using the Stanford Sentiment Treebank (SST) dataset. To get started, you need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on. \n",
"\n",
"![image.png](attachment:image.png)\n",
"\n",
"**NOTE:** You can run this demo in SageMaker Studio, SageMaker notebook instances, or your local machine with AWS CLI set up. If using SageMaker Studio or SageMaker notebook instances, make sure you choose one of the PyTorch-based kernels, `Python 3 (PyTorch x.y Python 3.x CPU Optimized)` or `conda_pytorch_p36` respectively.\n",
"**NOTE:** You can run this demo in SageMaker Studio, SageMaker notebook instances, or your local machine with AWS CLI set up. If using SageMaker Studio or SageMaker notebook instances, make sure you choose one of the PyTorch-based kernels, Python 3 (PyTorch x.y Python 3.x CPU Optimized) or conda_pytorch_p36 respectively.\n",
"\n",
"**NOTE:** This notebook uses two `ml.p3.2xlarge` instances that have single GPU. If you don't have enough quota, see [Request a service quota increase for SageMaker resources](https://docs.aws.amazon.com/sagemaker/latest/dg/regions-quotas.html#service-limit-increase-request-procedure). "
"**NOTE:** This notebook uses two ml.p3.2xlarge instances that have single GPU. If you don't have enough quota, see [Request a service quota increase for SageMaker resources](https://docs.aws.amazon.com/sagemaker/latest/dg/regions-quotas.html#service-limit-increase-request-procedure). "
]
},
{
Expand Down Expand Up @@ -112,7 +112,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Copy and run the following code if you need to upgrade `ipywidgets` for `datasets` library and restart kernel. This is only needed when preprocessing is done in the notebook.\n",
"Copy and run the following code if you need to upgrade ipywidgets for datasets library and restart kernel. This is only needed when preprocessing is done in the notebook.\n",
"\n",
"```python\n",
"%%capture\n",
Expand Down Expand Up @@ -176,7 +176,7 @@
"\n",
"If you'd like to try other training datasets later, you can simply use this method.\n",
"\n",
"For this example notebook, we prepared the [SST2 dataset](https://www.tensorflow.org/datasets/catalog/glue#gluesst2) in the public SageMaker sample S3 bucket. The following code cells show how you can directly load the dataset and convert to a `HuggingFace DatasetDict`."
"For this example notebook, we prepared the [SST2 dataset](https://www.tensorflow.org/datasets/catalog/glue#gluesst2) in the public SageMaker sample S3 bucket. The following code cells show how you can directly load the dataset and convert to a HuggingFace DatasetDict."
]
},
{
Expand Down Expand Up @@ -343,7 +343,7 @@
"source": [
"### Uploading data to `sagemaker_session_bucket`\n",
"\n",
"After we processed the `datasets` we are going to use the new `FileSystem` [integration](https://huggingface.co/docs/datasets/filesystems.html) to upload our dataset to S3."
"After we processed the datasets we are going to use the new FileSystem [integration](https://huggingface.co/docs/datasets/filesystems.html) to upload our dataset to S3."
]
},
{
Expand Down Expand Up @@ -375,11 +375,11 @@
"source": [
"## SageMaker Training Job\n",
"\n",
"To create a SageMaker training job, we use a `HuggingFace/PyTorch` estimator. Using the estimator, you can define which fine-tuning script should SageMaker use through `entry_point`, which `instance_type` to use for training, which `hyperparameters` to pass, and so on.\n",
"To create a SageMaker training job, we use a HuggingFace/PyTorch estimator. Using the estimator, you can define which fine-tuning script should SageMaker use through entry_point, which instance_type to use for training, which hyperparameters to pass, and so on.\n",
"\n",
"When a SageMaker training job starts, SageMaker takes care of starting and managing all the required machine learning instances, picks up the `HuggingFace` Deep Learning Container, uploads your training script, and downloads the data from `sagemaker_session_bucket` into the container at `/opt/ml/input/data`.\n",
"When a SageMaker training job starts, SageMaker takes care of starting and managing all the required machine learning instances, picks up the HuggingFace Deep Learning Container, uploads your training script, and downloads the data from sagemaker_session_bucket into the container at /opt/ml/input/data.\n",
"\n",
"In the following section, you learn how to set up two versions of the SageMaker `HuggingFace/PyTorch` estimator, a native one without the compiler and an optimized one with the compiler."
"In the following section, you learn how to set up two versions of the SageMaker HuggingFace/PyTorch estimator, a native one without the compiler and an optimized one with the compiler."
]
},
{
Expand Down
Loading

0 comments on commit 5c64ca5

Please sign in to comment.