From 08b93ae2f2ec86a3696061ba4280f3434282320d Mon Sep 17 00:00:00 2001
From: "Adam J. Stewart" <ajstewart426@gmail.com>
Date: Mon, 2 Dec 2024 16:59:08 +0100
Subject: [PATCH] Add Introduction to PyTorch tutorial

---
 docs/index.rst               |   1 +
 docs/tutorials/pytorch.ipynb | 507 +++++++++++++++++++++++++++++++++++
 2 files changed, 508 insertions(+)
 create mode 100644 docs/tutorials/pytorch.ipynb

diff --git a/docs/index.rst b/docs/index.rst
index ced959493a8..1a8f0058e6b 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -29,6 +29,7 @@ torchgeo
    :caption: Tutorials
 
    tutorials/getting_started
+   tutorials/pytorch
    tutorials/custom_raster_dataset
    tutorials/transforms
    tutorials/indices
diff --git a/docs/tutorials/pytorch.ipynb b/docs/tutorials/pytorch.ipynb
new file mode 100644
index 00000000000..db7bcdb4c88
--- /dev/null
+++ b/docs/tutorials/pytorch.ipynb
@@ -0,0 +1,507 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "45973fd5-6259-4e03-9501-02ee96f3f870",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Copyright (c) Microsoft Corporation. All rights reserved.\n",
+    "# Licensed under the MIT License."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9478ed9a",
+   "metadata": {
+    "id": "NdrXRgjU7Zih"
+   },
+   "source": [
+    "# Introduction to PyTorch\n",
+    "\n",
+    "_Written by: Adam J. Stewart_\n",
+    "\n",
+    "In this tutorial, we introduce the basics of deep learning with PyTorch. Understanding deep learning terminology and the training and evaluation pipeline in PyTorch is essential to using TorchGeo."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34f10e9f",
+   "metadata": {
+    "id": "lCqHTGRYBZcz"
+   },
+   "source": [
+    "## Setup\n",
+    "\n",
+    "First, we install TorchGeo and all of its dependencies, including PyTorch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "019092f0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install torchgeo"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4db9f791",
+   "metadata": {
+    "id": "dV0NLHfGBMWl"
+   },
+   "source": [
+    "## Imports\n",
+    "\n",
+    "Next, we import PyTorch, TorchGeo, and any other libraries we need. We also manually set the random seed to ensure the reproducibility of our experiments."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3d92b0f1",
+   "metadata": {
+    "id": "entire-albania"
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import tempfile\n",
+    "\n",
+    "import kornia.augmentation as K\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "from torch.utils.data import DataLoader\n",
+    "\n",
+    "from torchgeo.datasets import EuroSAT100\n",
+    "from torchgeo.models import ResNet18_Weights, resnet18\n",
+    "\n",
+    "torch.manual_seed(0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d13c2db-e5d4-4d83-846b-a2c32774bb44",
+   "metadata": {},
+   "source": [
+    "## Definitions\n",
+    "\n",
+    "If this is your first introduction to deep learning (DL), a natural question might be \"what _is_ deep learning?\". You may also be curious how it relates to other similar buzz words, including artificial intelligence (AI) and machine learning (ML). We can define these terms as follows:\n",
+    "\n",
+    "* AI: when machines exhibit human intelligence\n",
+    "* ML: when machines learn from example\n",
+    "* DL: when machines learn using neural networks\n",
+    "\n",
+    "In this definition, DL is a subset of ML, and ML is a subset of AI. Some common examples of models and applications of these include:\n",
+    "\n",
+    "* AI: Minimax, A*, Deep Blue, video game AI\n",
+    "* ML: OLS, SVM, $k$-means, spam filtering\n",
+    "* DL: MLP, CNN, ChatGPT, self-driving cars\n",
+    "\n",
+    "In this tutorial, we will specifically focus on deep learning, but many of the same concepts are shared with machine learning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f26e4b8",
+   "metadata": {
+    "id": "5rLknZxrBEMz"
+   },
+   "source": [
+    "## Datasets\n",
+    "\n",
+    "In order to learn by example, we first need examples. In machine learning, we construct datasets of the form:\n",
+    "\n",
+    "$$\\mathcal{D} = \\left\\{\\left(x^{(i)}, y^{(i)}\\right)\\right\\}_{i=1}^N$$\n",
+    "\n",
+    "Written in English, dataset $\\mathcal{D}$ is composed of $N$ pairs of inputs $x$ and expected outputs $y$. $x$ and $y$ can be tabular data, images, text, or any other object that can be represented mathematically.\n",
+    "\n",
+    "![EuroSAT](https://github.com/phelber/EuroSAT/blob/master/eurosat-overview.png?raw=true)\n",
+    "\n",
+    "In this tutorial (and many later tutorials), we will use EuroSAT100, a toy dataset composed of 100 images from the [EuroSAT](https://github.com/phelber/EuroSAT) dataset. EuroSAT is a popular image classification dataset with multispectral images from the Sentinel-2 satellites. Each image is classified into one of ten categories or \"classes\":\n",
+    "\n",
+    "0. Annual Crop\n",
+    "1. Forest\n",
+    "2. Herbaceous Vegetation\n",
+    "3. Highway\n",
+    "4. Industrial Buildings\n",
+    "5. Pasture\n",
+    "6. Permanent Crop\n",
+    "7. Residential Buildings\n",
+    "8. River\n",
+    "9. Sea & Lake\n",
+    "\n",
+    "We can load this dataset and visualize the RGB bands of some example $(x, y)$ pairs like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fa0c5a0c-ac4c-44c5-9fb7-fe4be07a0f01",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "root = os.path.join(tempfile.gettempdir(), 'eurosat100')\n",
+    "dataset = EuroSAT100(root, download=True)\n",
+    "\n",
+    "for i in torch.randint(len(dataset), (10,)):\n",
+    "    sample = dataset[i]\n",
+    "    dataset.plot(sample)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f89e20ae-d3b6-4f05-a83f-f7034dd9862f",
+   "metadata": {},
+   "source": [
+    "In machine learning, we not only want to train a model, but also evaluate its performance on unseen data. Oftentimes, our dataset is split into three separate subsets:\n",
+    "\n",
+    "* train: for training the model *parameters*\n",
+    "* val: for validating the model *hyperparameters*\n",
+    "* test: for testing the model *performance*\n",
+    "\n",
+    "Parameters are the actual model weights, while hyperparameters are things like model width or learning rate that are chosen by the user. We can initialize datasets for all three splits like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4785cddb-9821-4a2a-aa86-c08ffb6f2ebc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "train_dataset = EuroSAT100(root, split='train')\n",
+    "val_dataset = EuroSAT100(root, split='val')\n",
+    "test_dataset = EuroSAT100(root, split='test')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e92d5be-8400-4c8a-83b0-314a672f22d1",
+   "metadata": {},
+   "source": [
+    "## Data Loaders\n",
+    "\n",
+    "While our dataset objects know how to load a single $(x, y)$ pair, machine learning often operates on what are called *mini-batches* of data. We can pass our above datasets to a PyTorch DataLoader object to construct these mini-batches:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8909c035-cbe9-49b6-8380-360914093f9a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "batch_size = 10\n",
+    "\n",
+    "train_dataloader = DataLoader(train_dataset, batch_size, shuffle=True)\n",
+    "val_dataloader = DataLoader(val_dataset, batch_size, shuffle=False)\n",
+    "test_dataloader = DataLoader(test_dataset, batch_size, shuffle=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7162d06-8814-4680-8192-aff279e70049",
+   "metadata": {},
+   "source": [
+    "## Transforms\n",
+    "\n",
+    "There are two categories of transforms a user may want to apply to their data:\n",
+    "\n",
+    "* Preprocessing: required to make data \"ML-ready\"\n",
+    "* Data augmentation: designed to artificially inflate the size of the dataset\n",
+    "\n",
+    "Preprocessing transforms such as normalization and one-hot encodings are applied to both training and evaluation data. Data augmentation transforms such as random flip and rotation are typically only performed during training. Below, we initialize transforms for both using the [Kornia](https://kornia.readthedocs.io/en/latest/augmentation.html) library."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "efc0152c-e7e4-4f06-9418-9d2c5dd803c3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "preprocess = K.Normalize(0, 10000)\n",
+    "augment = K.ImageSequential(K.RandomHorizontalFlip(), K.RandomVerticalFlip())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e80cda68-19ea-4cd5-a3bc-cf8fcf8147af",
+   "metadata": {},
+   "source": [
+    "## Model\n",
+    "\n",
+    "Our goal is to learn some function $f$ that can map between input $x$ and expected output $y$. Mathematically, this can be expressed as:\n",
+    "\n",
+    "$$x \\overset{f}{\\mapsto} y, \\quad y = f(x)$$\n",
+    "\n",
+    "Since our $x$ in this case is an image, we choose to use ResNet-18, a popular *convolutional neural network* (CNN). We also initialize our model with weights that have been pre-trained on Sentinel-2 imagery so we don't have to start from scratch. This process is known as *transfer learning*."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c7cda7a8-2cd6-46a0-a1c2-bafc751a23f2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model = resnet18(ResNet18_Weights.SENTINEL2_ALL_MOCO)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a12b9e9b-26cc-43f4-a517-31b805862df5",
+   "metadata": {},
+   "source": [
+    "## Loss Function\n",
+    "\n",
+    "If $y$ is our expected output (also called \"ground truth\") and $\\hat{y}$ is our predicted output, our goal is to minimize the difference between $y$ and $\\hat{y}$. This difference is referred to as *error* or *loss*, and the loss function tells us how big of a mistake we made. For regression tasks, a simple mean squared error is sufficient:\n",
+    "\n",
+    "$$\\mathcal{L}(y, \\hat{y}) = \\left(y - \\hat{y}\\right)^2$$\n",
+    "\n",
+    "For classification tasks, such as EuroSAT, we instead use a negative log-likelihood:\n",
+    "\n",
+    "$$\\mathcal{L}_c(y, \\hat{y}) = - \\sum_{c=1}^C \\mathbb{1}_{y=\\hat{y}}\\log{p_c}$$\n",
+    "\n",
+    "where $\\mathbb{1}$ is the indicator function and $p_c$ is the probability with which the model predicts class $c$. By normalizing this over the log probability of all classes, we get the cross-entropy loss."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3a0a699e-9bb3-4a06-91d2-401dd048ba66",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loss_fn = nn.CrossEntropyLoss()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7743edf6-5fec-494d-8842-6cf8b45a2289",
+   "metadata": {},
+   "source": [
+    "## Optimizer\n",
+    "\n",
+    "In order to minimize our loss, we compute the gradient of the loss function with respect to model parameters $\\theta$. We then take a small step $\\alpha$ (also called the *learning rate*) in the direction of the negative gradient to update our model parameters in a process called *backpropagation*:\n",
+    "\n",
+    "$$\\theta \\leftarrow \\theta - \\alpha \\nabla_\\theta \\mathcal{L}(y, \\hat{y})$$\n",
+    "\n",
+    "When done one image or one mini-batch at a time, this is known as *stochastic gradient descent* (SGD)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "15b0c17b-db53-41b2-96aa-3b732684b4cd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "optimizer = optim.SGD(model.parameters(), lr=1e-2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7efbe79d-a9a0-4a23-b2f3-21b4ea0af7bd",
+   "metadata": {},
+   "source": [
+    "## Device\n",
+    "\n",
+    "If you peak into the internals of deep learning models, you'll notice that most of it is actually linear algebra. This linear algebra is extremely easy to parallelize, and therefore can run very quickly on a GPU. We now transfer our model and all data to the GPU (if one is available):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a006a71f-0802-49b3-bd45-ddd524ae36a4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
+    "model = model.to(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7af95903-79c9-4a61-a7a4-2d41c884fba0",
+   "metadata": {},
+   "source": [
+    "## Training\n",
+    "\n",
+    "We finally have all the basic components we need to train our ResNet-18 model on the EuroSAT100 dataset. During training, we set the model to train mode, then iterate over all mini-batches in the dataset. During the forward pass, we ask the model $f$ to predict $\\hat{y}$ given $x$. We then calculate the loss accrued by these predictions. During the backward pass, we backpropagate our gradients to update all model weights."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d235772d-475e-42e7-bc7c-f50729ee0e22",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def train(dataloader):\n",
+    "    model.train()\n",
+    "    total_loss = 0\n",
+    "    for batch in dataloader:\n",
+    "        x = batch['image'].to(device)\n",
+    "        y = batch['label'].to(device)\n",
+    "\n",
+    "        # Forward pass\n",
+    "        y_hat = model(x)\n",
+    "        loss = loss_fn(y_hat, y)\n",
+    "        total_loss += loss.item()\n",
+    "\n",
+    "        # Backward pass\n",
+    "        loss.backward()\n",
+    "        optimizer.step()\n",
+    "        optimizer.zero_grad()\n",
+    "\n",
+    "    print(f'Loss: {total_loss:.2f}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fd82312-cd17-4886-bcb6-8e42633e5009",
+   "metadata": {},
+   "source": [
+    "## Evaluation\n",
+    "\n",
+    "Once the model is trained, we need to evaluate its performance on unseen data. To do this, we set the model to evaluation mode, then iterate over all mini-batches in the dataset. Note that we also disable the computation of gradients, since we do not need to backpropagate them. Finally, we compute the number of correctly classified images."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3bddce3b-ed2f-4a5c-b3c6-2f1a3a51c2d9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def evaluate(dataloader):\n",
+    "    model.eval()\n",
+    "    correct = 0\n",
+    "    with torch.no_grad():\n",
+    "        for batch in dataloader:\n",
+    "            x = batch['image'].to(device)\n",
+    "            y = batch['label'].to(device)\n",
+    "\n",
+    "            # Forward pass\n",
+    "            y_hat = model(x)\n",
+    "            correct += (y_hat.argmax(1) == y).type(torch.float).sum().item()\n",
+    "\n",
+    "    correct /= len(dataloader.dataset)\n",
+    "    print(f'Accuracy: {correct:.0%}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f62a54e5-897d-476c-8d84-381993dbabbd",
+   "metadata": {},
+   "source": [
+    "## Putting It All Together\n",
+    "\n",
+    "In machine learning, we typically iterate over our datasets multiple times. Each full pass through the dataset is called an *epoch*. The following hyperparameter controls the number of epoch for which we train our model, and can be modified to train the model for longer:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "eb5dc7e8-6cb3-4457-83ad-7fa5aef8ea0c",
+   "metadata": {
+    "nbmake": {
+     "mock": {
+      "epochs": 1
+     }
+    }
+   },
+   "outputs": [],
+   "source": [
+    "epochs = 100"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f53526e6-54a3-43f7-a377-dca298730387",
+   "metadata": {},
+   "source": [
+    "During each epoch, we train the model on our training dataset, then evaluate its performance on the validation dataset. The goal is for training loss to decrease and validation accuracy to increase, although you should expect noise in the training process. Generally, you want to train the model until the validation accuracy starts to plateau or even decrease."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "97601568-ba75-443d-81cf-494956b2924c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for epoch in range(epochs):\n",
+    "    print(f'Epoch: {epoch}')\n",
+    "    train(train_dataloader)\n",
+    "    evaluate(val_dataloader)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e130fc89-0823-4814-85f8-d4416d6df395",
+   "metadata": {},
+   "source": [
+    "Finally, we evaluate our performance on the test dataset. Note that we are only training our model on a toy dataset consisting of 100 images. If we instead trained on the full dataset (replace `EuroSAT100` with `EuroSAT` in the above code), we would likely get much higher performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7cd0bd25-e19a-4b26-94a1-fe9a544e8afd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "evaluate(test_dataloader)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3acc64e-8dc0-46b4-a677-ecb9723d4f56",
+   "metadata": {},
+   "source": [
+    "## Additional Reading\n",
+    "\n",
+    "If you are new to machine learning and overwhelmed by all of the above terminology, or would like to gain a better understanding of some of the math that goes into machine learning, I would highly recommend a formal machine learning or deep learning course. The following official PyTorch tutorials are also worth exploring:\n",
+    "\n",
+    "* [PyTorch: Learn the Basics](https://pytorch.org/tutorials/beginner/basics/intro.html)\n",
+    "* [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)\n",
+    "* [Transfer Learning for Computer Vision](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html)"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "collapsed_sections": [],
+   "name": "getting_started.ipynb",
+   "provenance": []
+  },
+  "execution": {
+   "timeout": 1200
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}