update

nitrain · May 29, 2024 · 0b138c6 · 0b138c6
1 parent 52dbfff
commit 0b138c6
Showing 1 changed file with 168 additions and 3 deletions.
diff --git a/book/02-02.ipynb b/book/02-02.ipynb
@@ -4,25 +4,190 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# 6. Using datasets\n"
+    "# 6. Using datasets\n",
+    "\n",
+    "The `nitrain.Dataset` class provides everything you need to map collections of images and related meta-data. This chapter introduces the basic functionality and structure of the class so you can get going. Once you learn the basics, it will be intuitive to expand on it with additional things you'll learn later."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## "
+    "## Prerequisites\n",
+    "\n",
+    "Besides nitrain, this chapter will use ants and numpy to create images and some basic operating system tools to create directories that mimic what your data will look like when not loaded into memory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import nitrain as nt\n",
+    "import ants\n",
+    "import numpy as np\n",
+    "import os\n",
+    "from tempfile import TemporaryDirectory"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Basic example\n",
+    "\n",
+    "To create a dataset, you need to pass in `inputs` and `outputs` arguments. In the most basic example of image classification, you would pass in a list of images as inputs and a list of class labels as outputs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "images = [ants.from_numpy(np.zeros((100,100))) * i for i in range(10)]\n",
+    "labels = [i for i in range(10)]\n",
+    "\n",
+    "dataset = nt.Dataset(inputs=images,\n",
+    "                     outputs=labels)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
+   "source": [
+    "Now our dataset is mapped! We can retrieve one or multiple records from the dataset via indexing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "x, y = dataset[0]\n",
+    "print(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      ", ANTsImage\n",
+      "\t Pixel Type : float (float32)\n",
+      "\t Components : 1\n",
+      "\t Dimensions : (100, 100)\n",
+      "\t Spacing    : (1.0, 1.0)\n",
+      "\t Origin     : (0.0, 0.0)\n",
+      "\t Direction  : [1. 0. 0. 1.]\n",
+      "]\n",
+      "[3, 4]\n"
+     ]
+    }
+   ],
+   "source": [
+    "x_list, y_list = dataset[3:5]\n",
+    "print(x_list)\n",
+    "print(y_list)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also print the dataset to understand a bit more of its structure."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Dataset (n=10)\n",
+      "     Inputs     : <nitrain.readers.memory.MemoryReader object at 0x1326f5690>\n",
+      "     Outputs    : <nitrain.readers.memory.MemoryReader object at 0x1326f5dd0>\n",
+      "     Transforms : {}\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(dataset)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As you see, our dataset has a `MemoryReader` in the inputs and the outputs slot. You will learn more about readers in later chapter, but a basic explanation is that readers are what the dataset uses to feed records to you from a variety of sources. Since our images and labels actually exist in memory right now, a `MemoryReader` is inferred. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Loading from file\n",
+    "\n",
+    "What about when our data does not already exist in memory? "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
    "source": []
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
   }
  },
  "nbformat": 4,