huggingface · katalinic-gc · Jul 5, 2023 · Jun 26, 2023 · Jun 26, 2023 · Jun 26, 2023
diff --git a/notebooks/whisper-example.ipynb b/notebooks/whisper-example.ipynb
@@ -10,10 +10,10 @@
     "This notebook demonstrates speech transcription on the IPU using the [Whisper implementation in the Hugging Face Transformers library](https://huggingface.co/spaces/openai/whisper) alongside [Optimum Graphcore](https://github.com/huggingface/optimum-graphcore).\n",
     "\n",
     "Whisper is a versatile speech recognition model that can transcribe speech as well as perform multi-lingual translation and recognition tasks.\n",
-    "It was trained on diverse datasets to give human-level speech recognition performance without the need for fine tuning. \n",
+    "It was trained on diverse datasets to give human-level speech recognition performance without the need for fine-tuning. \n",
     "\n",
     "[🤗 Optimum Graphcore](https://github.com/huggingface/optimum-graphcore) is the interface between the [🤗 Transformers library](https://huggingface.co/docs/transformers/index) and [Graphcore IPUs](https://www.graphcore.ai/products/ipu).\n",
-    "It provides a set of tools enabling model parallelization and loading on IPUs, training and fine-tuning on all the tasks already supported by Transformers while being compatible with the Hugging Face Hub and every model available on it out of the box.\n",
+    "It provides a set of tools enabling model parallelization and loading on IPUs, training and fine-tuning on all the tasks already supported by 🤗 Transformers while being compatible with the 🤗 Hub and every model available on it out of the box.\n",
     "\n",
     "> **Hardware requirements:** The Whisper models `whisper-tiny`, `whisper-base` and `whisper-small` can run two replicas on the smallest IPU-POD4 machine. The most capable model, `whisper-large`, will need to use either an IPU-POD16 or a Bow Pod16 machine. Please contact Graphcore if you'd like assistance running model sizes that don't work in this simple example notebook.\n",
     "\n",
@@ -42,30 +42,29 @@
   },
   {
    "cell_type": "markdown",
-   "id": "77c2229d-d6f5-4776-841a-7cf328050d30",
+   "id": "d93bb153",
    "metadata": {},
    "source": [
-    "IPU Whisper runs faster with the latest features available in SDK > 3.3 - check whether those features can be enabled. "
+    "IPU Whisper runs faster with the latest features available in the Poplar SDK version 3.3 or later. This code checks whether these features can be enabled."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "20c8a58c-73dc-4b7c-8f58-e843cb6f32ea",
+   "id": "ec9aa981",
    "metadata": {},
    "outputs": [],
    "source": [
-    "import re\n",
     "import warnings\n",
+    "from transformers.utils.versions import require_version\n",
     "\n",
-    "sdk_version = !popc --version\n",
-    "if sdk_version and (version := re.search(r'\\d+\\.\\d+\\.\\d+', sdk_version[0]).group()) >= '3.3':\n",
-    "    print(f\"SDK check passed.\")\n",
+    "try:\n",
+    "    require_version(\"poptorch>=3.3\")\n",
     "    enable_sdk_features=True\n",
-    "else:\n",
-    "    warnings.warn(\"SDK versions lower than 3.3 do not support all the functionality in this notebook so performance will be reduced. We recommend you relaunch the Paperspace Notebook with the Pytorch SDK 3.3 image. You can use https://hub.docker.com/r/graphcore/pytorch-early-access\", \n",
-    "                  category=Warning, stacklevel=2)\n",
-    "    enable_sdk_features=False"
+    "    print(f\"SDK check passed.\")\n",
+    "except Exception:\n",
+    "    enable_sdk_features=False\n",
+    "    warnings.warn(\"SDK versions earlier than 3.3 do not support all the functionality in this notebook so performance will be reduced. We recommend that you relaunch the Paperspace Notebook with the PyTorch SDK 3.3 image. You can use https://hub.docker.com/r/graphcore/pytorch-early-access\")\n"
    ]
   },
   {
@@ -105,12 +104,14 @@
    "outputs": [],
    "source": [
     "# Generic imports\n",
+    "import os\n",
     "from datasets import load_dataset\n",
     "import matplotlib.pyplot as plt\n",
     "import librosa\n",
     "import IPython\n",
     "import random\n",
     "\n",
+    "\n",
     "# IPU-specific imports\n",
     "from optimum.graphcore import IPUConfig\n",
     "from optimum.graphcore.modeling_utils import to_pipelined\n",
@@ -120,40 +121,138 @@
     "from transformers import WhisperForConditionalGeneration"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "c7a7484f",
+   "metadata": {},
+   "source": [
+    "This notebook demonstrates how to run all sizes of Whisper, assuming you meet the IPU hardware requirements:\n",
+    "\n",
+    "- `whisper-tiny`, `base` and `small` only requires 2 IPUs (IPU-POD4)\n",
+    "- `whisper-medium` requires 4 IPUs (IPU-POD4)\n",
+    "- `whisper-large` requires 8 IPUs (IPU-POD16 or a Bow Pod16)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "734d8d54",
    "metadata": {},
    "source": [
     "The Whisper model is available on Hugging Face in several sizes, from `whisper-tiny` with 39M parameters to `whisper-large` with 1550M parameters.\n",
     "\n",
-    "We download `whisper-tiny` which we will run using two IPUs.\n",
     "The [Whisper architecture](https://openai.com/research/whisper) is an encoder-decoder Transformer, with the audio split into 30-second chunks.\n",
-    "For simplicity one IPU is used for the encoder part of the graph and another for the decoder part.\n",
-    "The `IPUConfig` object helps to configure the model to be pipelined across the IPUs."
+    "- For `whisper-tiny`, `small` and `base`, one IPU is used for the encoder part of the graph and another for the decoder part.\n",
+    "- For `whisper-medium `, two IPUs are used to place the encoder part and two others for the decoder part.\n",
+    "- For `whisper-large `, four IPUs are used to place the encoder part and four others for the decoder part.\n",
+    "\n",
+    "The `IPUConfig` object helps to configure the model to be pipelined across the IPUs.\n",
+    "The number of transformer layers per IPU can be adjusted by using `layers_per_ipu`."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "8c5d72f3-cbd6-462f-9741-1726d412c4eb",
+   "id": "1cffea71",
    "metadata": {},
    "outputs": [],
    "source": [
-    "model_spec = \"openai/whisper-tiny.en\"\n",
+    "num_available_ipus=int(os.getenv(\"NUM_AVAILABLE_IPU\", 4))\n",
+    "cache_dir = os.getenv(\"POPLAR_EXECUTABLE_CACHE_DIR\", \"./exe_cache\")\n",
+    "\n",
+    "default_ipu_config = IPUConfig(executable_cache_dir=cache_dir,\n",
+    "                               ipus_per_replica=2)\n",
     "\n",
+    "medium_ipu_config = IPUConfig(executable_cache_dir=cache_dir,\n",
+    "                             ipus_per_replica=4,\n",
+    "                             layers_per_ipu=[12, 12, 13, 11])\n",
+    "\n",
+    "large_ipu_config = IPUConfig(executable_cache_dir=cache_dir,\n",
+    "                             ipus_per_replica=8,\n",
+    "                             layers_per_ipu=[8, 8, 8, 8, 6, 9, 9, 8])\n",
+    "\n",
+    "configs = {\n",
+    "    \"tiny\": (\"openai/whisper-tiny.en\", \n",
+    "        default_ipu_config),\n",
+    "    \n",
+    "    \"base\": (\"openai/whisper-base.en\", \n",
+    "        default_ipu_config),\n",
+    "\n",
+    "    \"small\": (\"openai/whisper-small.en\", \n",
+    "        default_ipu_config),\n",
+    "    \n",
+    "    \"medium\": (\"openai/whisper-medium.en\",\n",
+    "        medium_ipu_config),\n",
+    "\n",
+    "    \"large\": (\"openai/whisper-large-v2\", \n",
+    "        large_ipu_config),\n",
+    "}\n",
+    "\n",
+    "\n",
+    "def select_whisper_config(size: str, custom_checkpoint: str):\n",
+    "    auto_sizes = {4: \"tiny\", 16: \"large\"}\n",
+    "    if size == \"auto\":\n",
+    "        selected_size = auto_sizes[num_available_ipus]\n",
+    "    elif size in configs.keys():\n",
+    "        if size == \"large\" and num_available_ipus < 8:\n",
+    "            raise ValueError(\"Error: You need at least 8 IPUs to run whisper-large \"\n",
+    "                             f\"but your current environment has {num_available_ipus} IPUs available.\")\n",
+    "        selected_size = size\n",
+    "    else:\n",
+    "        raise ValueError(f\"{size} is not a valid size for Whisper\")\n",
+    "    \n",
+    "    model_checkpoint, ipu_config = configs[selected_size]\n",
+    "    if custom_checkpoint is not None:\n",
+    "        model_checkpoint = custom_checkpoint\n",
+    "\n",
+    "    print(f\"Using whisper-{selected_size} config with the checkpoint '{model_checkpoint}'.\")\n",
+    "    return model_checkpoint, ipu_config "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0d22c55",
+   "metadata": {},
+   "source": [
+    "Select the Whisper size bellow, try `\"tiny\"`,`\"base\"`, `\"small\"`, `\"medium\"`, `\"large\"`  or let the `\"auto\"` mode choose for you."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f919498b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_checkpoint, ipu_config = select_whisper_config(\"auto\", custom_checkpoint=None) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eff8bcf6",
+   "metadata": {},
+   "source": [
+    "You can also use a custom checkpoint from Hugging Face Hub using the argument `custom_checkpoint` above. In this case, you have to make sure that `size` matches the checkpoint model size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8c5d72f3-cbd6-462f-9741-1726d412c4eb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "# Instantiate processor and model\n",
-    "processor = WhisperProcessorTorch.from_pretrained(model_spec)\n",
-    "model = WhisperForConditionalGeneration.from_pretrained(model_spec)\n",
+    "processor = WhisperProcessorTorch.from_pretrained(model_checkpoint)\n",
+    "model = WhisperForConditionalGeneration.from_pretrained(model_checkpoint)\n",
     "\n",
     "# Adapt whisper-tiny to run on the IPU\n",
-    "ipu_config = IPUConfig(ipus_per_replica=2)\n",
+    "\n",
     "pipelined_model = to_pipelined(model, ipu_config)\n",
     "pipelined_model = pipelined_model.parallelize(\n",
     "    for_generation=True, \n",
     "    use_cache=True, \n",
     "    batch_size=1, \n",
-    "    max_length=250,\n",
+    "    max_length=448,\n",
     "    on_device_generation_steps=16, \n",
     "    use_encoder_output_buffer=enable_sdk_features).half()"
    ]
@@ -278,13 +377,17 @@
     "## Next Steps\n",
     "\n",
     "The `whisper-tiny` model used here is very fast for inference and so cheap to run, but its accuracy can be improved.\n",
-    "The `whisper-base` and `whisper-small` models have 74M and 244M parameters respectively (compared to just 39M for `whisper-tiny`). You can try out `whisper-base` and `whisper-small` by changing `model_spec = \"openai/whisper-tiny.en\"` (at the beginning of this notebook) to `model_spec = \"openai/whisper-base.en\"` or `model_spec = \"openai/whisper-small.en\"` respectively.\n",
+    "The `whisper-base`, `whisper-small` and `whisper-medium` models have 74M, 244M and 769 M parameters respectively (compared to just 39M for `whisper-tiny`). You can try out `whisper-base`, `whisper-small` and `whisper-medium` by changing `select_whisper_config(\"auto\")` (at the beginning of this notebook) to:\n",
+    "- `select_whisper_config(\"base\")`\n",
+    "- `select_whisper_config(\"small\")`\n",
+    "- `select_whisper_config(\"medium\")` respectively.\n",
     "\n",
     "Larger models and multilingual models are also available.\n",
     "To access the multilingual models, remove the `.en` from the checkpoint name. Note however that the multilingual models are slightly less accurate for this English transcription task but they can be used for transcribing other languages or for translating to English.\n",
     "\n",
-    "The largest models have 1550M parameters and won't fit with our simple two-IPU pipeline.\n",
-    "To run these you will need more than the IPU-POD4. On Paperspace, this is available using either an IPU-POD16 or a Bow Pod16 machine. Please contact Graphcore if you need assistance running these larger models.\n"
+    "The largest model `whisper-large` has 1550M parameters and requires a 8-IPUs pipeline.\n",
+    "You can try it by setting `select_whisper_config(\"large\")`\n",
+    "To run it you will need more than the IPU-POD4. On Paperspace, this is available using either an IPU-POD16 or a Bow Pod16 machine. Please contact Graphcore if you need assistance running these larger models.\n"
    ]
   },
   {
@@ -294,14 +397,14 @@
    "source": [
     "## Conclusion\n",
     "\n",
-    "In this notebook we demonstrated using Whisper for speech recognition and transcription on the IPU.\n",
+    "In this notebook, we demonstrated using Whisper for speech recognition and transcription on the IPU.\n",
     "We used the Optimum Graphcore package to interface between the IPU and the Hugging Face Transformers library. This meant that only a few lines of code were needed to get this state-of-the-art automated speech recognition model running on IPUs."
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },