From 6214a252e0971b7885bb2d733b31be987d8f5c01 Mon Sep 17 00:00:00 2001
From: Sheilah Kirui <71867292+skirui-source@users.noreply.github.com>
Date: Tue, 11 Apr 2023 09:08:14 -0700
Subject: [PATCH] Migrate dask/notebooks/HPO_demo.ipynb to deployment (#174)

* create new pr

* rename notebook folder, few style fixes in notebook

* Update source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb

Co-authored-by: Jacob Tomlinson <jacobtomlinson@users.noreply.github.com>

* Update source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb

Co-authored-by: Jacob Tomlinson <jacobtomlinson@users.noreply.github.com>

* Update source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb

* add docs for localcudacluster

* minor format fix

* style issue

* updated notebook with deployment links

* addressed review, ready for review

---------

Co-authored-by: Jacob Tomlinson <jacobtomlinson@users.noreply.github.com>
---
 source/cloud/aws/ec2-multi.md                 |   2 +-
 source/examples/index.md                      |   1 +
 .../notebook.ipynb                            | 753 ++++++++++++++++++
 source/guides/cuda/localcluster.md            |  57 ++
 4 files changed, 812 insertions(+), 1 deletion(-)
 create mode 100644 source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb
 create mode 100644 source/guides/cuda/localcluster.md

diff --git a/source/cloud/aws/ec2-multi.md b/source/cloud/aws/ec2-multi.md
index 1b4e56c3..67ab80ce 100644
--- a/source/cloud/aws/ec2-multi.md
+++ b/source/cloud/aws/ec2-multi.md
@@ -10,7 +10,7 @@ Before running these instructions, ensure you have installed RAPIDS.
 This method of deploying RAPIDS effectively allows you to burst beyond the node you are on into a cluster of EC2 VMs. This does come with the caveat that you are on a RAPIDS capable environment with GPUs.
 ```
 
-If you are using a machine with an NVIDIA GPU then follow the [local install instructions](../../local). Alternatively if you do not have a GPU locally consider using s remote environment like a [SageMaker Notebook Instance](./sagemaker).
+If you are using a machine with an NVIDIA GPU then follow the [local install instructions](https://docs.rapids.ai/install). Alternatively if you do not have a GPU locally consider using a remote environment like a [SageMaker Notebook Instance](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html).
 
 ### Install the AWS CLI
 
diff --git a/source/examples/index.md b/source/examples/index.md
index 27c6532c..6720ab88 100644
--- a/source/examples/index.md
+++ b/source/examples/index.md
@@ -11,4 +11,5 @@ rapids-sagemaker-higgs/notebook
 rapids-sagemaker-hpo/notebook
 rapids-ec2-mnmg/notebook
 rapids-autoscaling-multi-tenant-kubernetes/notebook
+xgboost-randomforest-gpu-hpo-dask/notebook
 ```
diff --git a/source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb b/source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb
new file mode 100644
index 00000000..c9023f6b
--- /dev/null
+++ b/source/examples/xgboost-randomforest-gpu-hpo-dask/notebook.ipynb
@@ -0,0 +1,753 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": [
+     "dataset/airline",
+     "library/numpy",
+     "library/pandas",
+     "library/xgboost",
+     "library/dask",
+     "library/dask-cuda",
+     "library/dask-ml",
+     "storage/s3",
+     "workflows/hpo",
+     "library/cuml",
+     "cloud/aws/ec2",
+     "cloud/azure/azure-vm",
+     "cloud/gcp/compute-engine",
+     "cloud/ibm/virtual-server",
+     "library/sklearn"
+    ]
+   },
+   "source": [
+    "# HPO with dask-ml and cuml\n",
+    "\n",
+    "## Introduction\n",
+    "\n",
+    "&emsp; &emsp; &emsp; [Hyperparameter optimization](https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview) is the task of picking hyperparameters values of the model that provide the optimal results for the problem, as measured on a specific test dataset. This is often a crucial step and can help boost the model accuracy when done correctly. Cross-validation is often used to more accurately estimate the performance of the models in the search process. Cross-validation is the method of splitting the training set into complementary subsets and performing training on one of the subsets, then predicting the models performance on the other. This is a potential indication of how the model will generalise to data it has not seen before.\n",
+    "\n",
+    "Despite its theoretical importance, HPO has been difficult to implement in practical applications because of the resources needed to run so many distinct training jobs.\n",
+    "\n",
+    "The two approaches that we will be exploring in this notebook are :\n",
+    "\n",
+    "\n",
+    "#### 1. GridSearch\n",
+    "\n",
+    "&emsp; &emsp; &emsp; As the name suggests, the \"search\" is done over each possible combination in a grid of parameters that the user provides. The user must manually define this grid.. For each parameter that needs to be tuned, a set of values are given and the final grid search is performed with tuple having one element from each set, thus resulting in a Catersian Product of the elements.\n",
+    "\n",
+    "&emsp; &emsp; &emsp;For example, assume we want to perform HPO on XGBoost. For simplicity lets tune only `n_estimators` and `max_depth`\n",
+    "\n",
+    "&emsp; &emsp; &emsp;`n_estimators: [50, 100, 150]`\n",
+    "\n",
+    "&emsp; &emsp; &emsp;`max_depth: [6, 7, ,8]`\n",
+    "    \n",
+    "&emsp; &emsp; &emsp; The grid search will take place over |n_estimators| x |max_depth| which is 3 x 3 = 9. As you have probably guessed, the grid size grows rapidly as the number of parameters and their search space increases.\n",
+    "\n",
+    "#### 2. RandomSearch\n",
+    "\n",
+    "\n",
+    "&emsp; &emsp; &emsp; [Random Search](http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf) replaces the exhaustive nature of the search from before with a random selection of parameters over the specified space. This method can outperform GridSearch in cases where the number of parameters affecting the model's performance is small (low-dimension optimization problems). Since this does not pick every tuple from the cartesian product, it tends to yield results faster, and the performance can be comparable to that of the Grid Search approach. It's worth keeping in mind that the random nature of this search means, the results with each run might differ.\n",
+    "\n",
+    "Some of the other methods used for HPO include:\n",
+    "\n",
+    "1. Bayesian Optimization\n",
+    "\n",
+    "2. Gradient-based Optimization\n",
+    "\n",
+    "3. Evolutionary Optimization\n",
+    "\n",
+    "To learn more about HPO, some papers are linked to at the end of the notebook for further reading.\n",
+    "\n",
+    "\n",
+    "Now that we have a basic understanding of what HPO is, let's discuss what we wish to achieve with this demo. The aim of this notebook is to show the importance of hyper parameter optimisation and the performance of dask-ml GPU for xgboost and cuML-RF.\n",
+    "\n",
+    "For this demo, we will be using the [Airline dataset](http://kt.ijs.si/elena_ikonomovska/data.html). The aim of the problem is to predict the arrival delay. It has about 116 million entries with 13 attributes that are used to determine the delay for a given airline. We have modified this problem to serve as a binary classification problem to determine if the airline will be delayed (True) or not. \n",
+    "\n",
+    "Let's get started!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "\n",
+    "warnings.filterwarnings(\"ignore\")  # Reduce number of messages/warnings displayed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "workflow/randomforest",
+     "library/sklearn",
+     "library/cuml",
+     "library/dask-ml",
+     "workflows/hpo",
+     "cloud/aws/ec2",
+     "workflow/xgboost",
+     "library/dask",
+     "dataset/airline"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "import cudf\n",
+    "import cuml\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import xgboost as xgb\n",
+    "\n",
+    "import dask_ml.model_selection as dcv\n",
+    "from dask.distributed import Client, wait\n",
+    "from dask_cuda import LocalCUDACluster\n",
+    "\n",
+    "from sklearn import datasets\n",
+    "from sklearn.metrics import make_score\n",
+    "from cuml.ensemble import RandomForestClassifier\n",
+    "from cuml.model_selection import train_test_split\n",
+    "from cuml.metrics.accuracy import accuracy_score\n",
+    "\n",
+    "import os\n",
+    "from urllib.request import urlretrieve\n",
+    "import gzip"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "### Spinning up a CUDA Cluster\n",
+    "\n",
+    "This notebook is designed to run on a single node with multiple GPUs, you can get multi-GPU VMs from [AWS](https://docs.rapids.ai/deployment/stable/cloud/aws/ec2-multi.html), [GCP](https://docs.rapids.ai/deployment/stable/cloud/gcp/dataproc.html), [Azure](https://docs.rapids.ai/deployment/stable/cloud/azure/azure-vm-multi.html), [IBM](https://docs.rapids.ai/deployment/stable/cloud/ibm/virtual-server.html) and more.\n",
+    "\n",
+    "We start a [local cluster](../../guides/cuda/localcluster.md) and keep it ready for running distributed tasks with dask.\n",
+    "\n",
+    "Below, [LocalCUDACluster](https://github.com/rapidsai/dask-cuda) launches one Dask worker for each GPU in the current systems. It's developed as a part of the RAPIDS project.\n",
+    "Learn More:\n",
+    "- [Setting up Dask](https://docs.dask.org/en/latest/setup.html)\n",
+    "- [Dask Client](https://distributed.dask.org/en/latest/client.html)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cluster = LocalCUDACluster()\n",
+    "client = Client(cluster)\n",
+    "\n",
+    "client"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data Preparation\n",
+    "\n",
+    "We download the Airline [dataset](https://s3.console.aws.amazon.com/s3/buckets/rapidsai-cloud-ml-sample-data?region=us-west-2&tab=objects) and save it to local directory specific by `data_dir` and `file_name`. In this step, we also want to convert the input data into appropriate dtypes. For this, we will use the `prepare_dataset` function.\n",
+    "\n",
+    "\n",
+    "Note: To ensure that this example runs quickly on a modest machine, we default to using a small subset of the airline dataset. To use the full dataset, pass the argument `use_full_dataset=True` to the `prepare_dataset` function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_dir = \"~/rapids_hpo/data/\"\n",
+    "file_name = \"airlines.parquet\"\n",
+    "parquet_name = os.path.join(data_dir, file_name)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "parquet_name"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "tags": [
+     "library/cudf"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "def prepare_dataset(use_full_dataset=False):\n",
+    "    global file_path, data_dir\n",
+    "\n",
+    "    if use_full_dataset:\n",
+    "        url = \"https://rapidsai-cloud-ml-sample-data.s3-us-west-2.amazonaws.com/airline_20000000.parquet\"\n",
+    "    else:\n",
+    "        url = \"https://rapidsai-cloud-ml-sample-data.s3-us-west-2.amazonaws.com/airline_small.parquet\"\n",
+    "\n",
+    "    if os.path.isfile(parquet_name):\n",
+    "        print(f\" > File already exists. Ready to load at {parquet_name}\")\n",
+    "    else:\n",
+    "        # Ensure folder exists\n",
+    "        os.makedirs(data_dir, exist_ok=True)\n",
+    "\n",
+    "        def data_progress_hook(block_number, read_size, total_filesize):\n",
+    "            if (block_number % 1000) == 0:\n",
+    "                print(\n",
+    "                    f\" > percent complete: { 100 * ( block_number * read_size ) / total_filesize:.2f}\\r\",\n",
+    "                    end=\"\",\n",
+    "                )\n",
+    "            return\n",
+    "\n",
+    "        urlretrieve(\n",
+    "            url=url,\n",
+    "            filename=parquet_name,\n",
+    "            reporthook=data_progress_hook,\n",
+    "        )\n",
+    "\n",
+    "        print(f\" > Download complete {file_name}\")\n",
+    "\n",
+    "    input_cols = [\n",
+    "        \"Year\",\n",
+    "        \"Month\",\n",
+    "        \"DayofMonth\",\n",
+    "        \"DayofWeek\",\n",
+    "        \"CRSDepTime\",\n",
+    "        \"CRSArrTime\",\n",
+    "        \"UniqueCarrier\",\n",
+    "        \"FlightNum\",\n",
+    "        \"ActualElapsedTime\",\n",
+    "        \"Origin\",\n",
+    "        \"Dest\",\n",
+    "        \"Distance\",\n",
+    "        \"Diverted\",\n",
+    "    ]\n",
+    "\n",
+    "    dataset = cudf.read_parquet(parquet_name)\n",
+    "\n",
+    "    # encode categoricals as numeric\n",
+    "    for col in dataset.select_dtypes([\"object\"]).columns:\n",
+    "        dataset[col] = dataset[col].astype(\"category\").cat.codes.astype(np.int32)\n",
+    "\n",
+    "    # cast all columns to int32\n",
+    "    for col in dataset.columns:\n",
+    "        dataset[col] = dataset[col].astype(np.float32)  # needed for random forest\n",
+    "\n",
+    "    # put target/label column first [ classic XGBoost standard ]\n",
+    "    output_cols = [\"ArrDelayBinary\"] + input_cols\n",
+    "\n",
+    "    dataset = dataset.reindex(columns=output_cols)\n",
+    "    return dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = prepare_dataset()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "from contextlib import contextmanager\n",
+    "\n",
+    "\n",
+    "# Helping time blocks of code\n",
+    "@contextmanager\n",
+    "def timed(txt):\n",
+    "    t0 = time.time()\n",
+    "    yield\n",
+    "    t1 = time.time()\n",
+    "    print(\"%32s time:  %8.5f\" % (txt, t1 - t0))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define some default values to make use of across the notebook for a fair comparison\n",
+    "N_FOLDS = 5\n",
+    "N_ITER = 25"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "label = \"ArrDelayBinary\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Splitting Data\n",
+    "\n",
+    "We split the data randomnly into train and test sets using the [cuml train_test_split](https://docs.rapids.ai/api/cuml/stable/api.html#cuml.model_selection.train_test_split) and create CPU versions of the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X_train, X_test, y_train, y_test = train_test_split(df, label, test_size=0.2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X_cpu = X_train.to_pandas()\n",
+    "y_cpu = y_train.to_numpy()\n",
+    "\n",
+    "X_test_cpu = X_test.to_pandas()\n",
+    "y_test_cpu = y_test.to_numpy()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Setup Custom cuML scorers\n",
+    "\n",
+    "The search functions (such as GridSearchCV) for scikit-learn and dask-ml expect the metric functions (such as accuracy_score) to match the “scorer” API. This can be achieved using the scikit-learn's [make_scorer](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html) function.\n",
+    "\n",
+    "We will generate a `cuml_scorer` with the cuML `accuracy_score` function.  You'll also notice an `accuracy_score_wrapper` which primarily converts the y label into a `float32` type. This is because some cuML models only accept this type for now and in order to make it compatible, we perform this conversion.\n",
+    "\n",
+    "We also create helper functions for performing HPO in 2 different modes: \n",
+    "1. `gpu-grid`: Perform GPU based GridSearchCV\n",
+    "2. `gpu-random`: Perform GPU based RandomizedSearchCV"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def accuracy_score_wrapper(y, y_hat):\n",
+    "    \"\"\"\n",
+    "    A wrapper function to convert labels to float32,\n",
+    "    and pass it to accuracy_score.\n",
+    "\n",
+    "    Params:\n",
+    "    - y: The y labels that need to be converted\n",
+    "    - y_hat: The predictions made by the model\n",
+    "    \"\"\"\n",
+    "    y = y.astype(\"float32\")  # cuML RandomForest needs the y labels to be float32\n",
+    "    return accuracy_score(y, y_hat, convert_dtype=True)\n",
+    "\n",
+    "\n",
+    "accuracy_wrapper_scorer = make_scorer(accuracy_score_wrapper)\n",
+    "cuml_accuracy_scorer = make_scorer(accuracy_score, convert_dtype=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def do_HPO(model, gridsearch_params, scorer, X, y, mode=\"gpu-Grid\", n_iter=10):\n",
+    "    \"\"\"\n",
+    "    Perform HPO based on the mode specified\n",
+    "\n",
+    "    mode: default gpu-Grid. The possible options are:\n",
+    "    1. gpu-grid: Perform GPU based GridSearchCV\n",
+    "    2. gpu-random: Perform GPU based RandomizedSearchCV\n",
+    "\n",
+    "    n_iter: specified with Random option for number of parameter settings sampled\n",
+    "\n",
+    "    Returns the best estimator and the results of the search\n",
+    "    \"\"\"\n",
+    "    if mode == \"gpu-grid\":\n",
+    "        print(\"gpu-grid selected\")\n",
+    "        clf = dcv.GridSearchCV(model, gridsearch_params, cv=N_FOLDS, scoring=scorer)\n",
+    "    elif mode == \"gpu-random\":\n",
+    "        print(\"gpu-random selected\")\n",
+    "        clf = dcv.RandomizedSearchCV(\n",
+    "            model, gridsearch_params, cv=N_FOLDS, scoring=scorer, n_iter=n_iter\n",
+    "        )\n",
+    "\n",
+    "    else:\n",
+    "        print(\"Unknown Option, please choose one of [gpu-grid, gpu-random]\")\n",
+    "        return None, None\n",
+    "    res = clf.fit(X, y)\n",
+    "    print(\n",
+    "        \"Best clf and score {} {}\\n---\\n\".format(res.best_estimator_, res.best_score_)\n",
+    "    )\n",
+    "    return res.best_estimator_, res"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def print_acc(model, X_train, y_train, X_test, y_test, mode_str=\"Default\"):\n",
+    "    \"\"\"\n",
+    "    Trains a model on the train data provided, and prints the accuracy of the trained model.\n",
+    "    mode_str: User specifies what model it is to print the value\n",
+    "    \"\"\"\n",
+    "    y_pred = model.fit(X_train, y_train).predict(X_test)\n",
+    "    score = accuracy_score(y_pred, y_test.astype(\"float32\"), convert_dtype=True)\n",
+    "    print(\"{} model accuracy: {}\".format(mode_str, score))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X_train.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Launch HPO\n",
+    "\n",
+    "We will first see the model's performances without the gridsearch and then compare it with the performance after searching.\n",
+    "\n",
+    "### XGBoost\n",
+    "\n",
+    "To perform the Hyperparameter Optimization, we make use of the sklearn version of the [XGBClassifier](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn).We're making use of this version to make it compatible and easily comparable to the scikit-learn version. The model takes a set of parameters that can be found in the documentation. We're primarily interested in the `max_depth`, `learning_rate`, `min_child_weight`, `reg_alpha` and `num_round` as these affect the performance of XGBoost the most.\n",
+    "\n",
+    "Read more about what these parameters are useful for [here](https://xgboost.readthedocs.io/en/latest/parameter.html)\n",
+    "\n",
+    "#### Default Performance\n",
+    "\n",
+    "We first use the model with it's default parameters and see the accuracy of the model. In this case, it is 84%"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_gpu_xgb_ = xgb.XGBClassifier(tree_method=\"gpu_hist\")\n",
+    "\n",
+    "print_acc(model_gpu_xgb_, X_train, y_cpu, X_test, y_test_cpu)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Parameter Distributions\n",
+    "\n",
+    "The way we define the grid to perform the search is by including ranges of parameters that need to be used for the search. In this example we make use of [np.arange](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html) which returns an ndarray of even spaced values, [np.logspace](https://docs.scipy.org/doc/numpy/reference/generated/numpy.logspace.html#numpy.logspace) returns a specified number of ssamples that are equally spaced on the log scale. We can also specify as lists, NumPy arrays or make use of any random variate sample that gives a sample when called. SciPy provides various functions for this too."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# For xgb_model\n",
+    "model_gpu_xgb = xgb.XGBClassifier(tree_method=\"gpu_hist\")\n",
+    "\n",
+    "# More range\n",
+    "params_xgb = {\n",
+    "    \"max_depth\": np.arange(start=3, stop=12, step=3),  # Default = 6\n",
+    "    \"alpha\": np.logspace(-3, -1, 5),  # default = 0\n",
+    "    \"learning_rate\": [0.05, 0.1, 0.15],  # default = 0.3\n",
+    "    \"min_child_weight\": np.arange(start=2, stop=10, step=3),  # default = 1\n",
+    "    \"n_estimators\": [100, 200, 1000],\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### RandomizedSearchCV\n",
+    "\n",
+    "We'll now try [RandomizedSearchCV](https://ml.dask.org/modules/generated/dask_ml.model_selection.RandomizedSearchCV.html).\n",
+    "`n_iter` specifies the number of parameters points theat the search needs to perform. Here we will search `N_ITER` (defined earlier) points for the best performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mode = \"gpu-random\"\n",
+    "\n",
+    "with timed(\"XGB-\" + mode):\n",
+    "    res, results = do_HPO(\n",
+    "        model_gpu_xgb,\n",
+    "        params_xgb,\n",
+    "        cuml_accuracy_scorer,\n",
+    "        X_train,\n",
+    "        y_cpu,\n",
+    "        mode=mode,\n",
+    "        n_iter=N_ITER,\n",
+    "    )\n",
+    "print(\"Searched over {} parameters\".format(len(results.cv_results_[\"mean_test_score\"])))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print_acc(res, X_train, y_cpu, X_test, y_test_cpu, mode_str=mode)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mode = \"gpu-grid\"\n",
+    "\n",
+    "with timed(\"XGB-\" + mode):\n",
+    "    res, results = do_HPO(\n",
+    "        model_gpu_xgb, params_xgb, cuml_accuracy_scorer, X_train, y_cpu, mode=mode\n",
+    "    )\n",
+    "print(\"Searched over {} parameters\".format(len(results.cv_results_[\"mean_test_score\"])))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print_acc(res, X_train, y_cpu, X_test, y_test_cpu, mode_str=mode)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Improved performance\n",
+    "\n",
+    "There's a 5% improvement in the performance.\n",
+    "\n",
+    "We notice that performing grid search and random search yields similar performance improvements even though random search used just 25 combination of parameters. We will stick to performing Random Search for the rest of the notebook with RF with the assumption that there will not be a major difference in performance if the ranges are large enough."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Visualizing the Search\n",
+    "\n",
+    "Let's plot some graphs to get an understanding how the parameters affect the accuracy. The code for these plots are included in `cuml/experimental/hyperopt_utils/plotting_utils.py`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Mean/Std of test scores\n",
+    "\n",
+    "We fix all parameters except one for each of these graphs and plot the effect the parameter has on the mean test score with the error bar indicating the standard deviation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from cuml.experimental.hyperopt_utils import plotting_utils"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "plotting_utils.plot_search_results(results)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Heatmaps \n",
+    "   - Between parameter pairs (we can do a combination of all possible pairs, but only one are shown in this notebook) \n",
+    "   - This gives a visual representation of how the pair affect the test score"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_gridsearch = pd.DataFrame(results.cv_results_)\n",
+    "plotting_utils.plot_heatmap(df_gridsearch, \"param_max_depth\", \"param_n_estimators\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## RandomForest\n",
+    "\n",
+    "Let's use RandomForest Classifier to perform a hyper-parameter search. We'll make use of the cuml RandomForestClassifier and visualize the results using heatmap."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "## Random Forest\n",
+    "model_rf_ = RandomForestClassifier()\n",
+    "\n",
+    "params_rf = {\n",
+    "    \"max_depth\": np.arange(start=3, stop=15, step=2),  # Default = 6\n",
+    "    \"max_features\": [0.1, 0.50, 0.75, \"auto\"],  # default = 0.3\n",
+    "    \"n_estimators\": [100, 200, 500, 1000],\n",
+    "}\n",
+    "\n",
+    "for col in X_train.columns:\n",
+    "    X_train[col] = X_train[col].astype(\"float32\")\n",
+    "y_train = y_train.astype(\"int32\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\n",
+    "    \"Default acc: \",\n",
+    "    accuracy_score(model_rf_.fit(X_train, y_train).predict(X_test), y_test),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "mode = \"gpu-random\"\n",
+    "model_rf = RandomForestClassifier()\n",
+    "\n",
+    "\n",
+    "with timed(\"RF-\" + mode):\n",
+    "    res, results = do_HPO(\n",
+    "        model_rf,\n",
+    "        params_rf,\n",
+    "        cuml_accuracy_scorer,\n",
+    "        X_train,\n",
+    "        y_cpu,\n",
+    "        mode=mode,\n",
+    "        n_iter=N_ITER,\n",
+    "    )\n",
+    "print(\"Searched over {} parameters\".format(len(results.cv_results_[\"mean_test_score\"])))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"Improved acc: \", accuracy_score(res.predict(X_test), y_test))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_gridsearch = pd.DataFrame(results.cv_results_)\n",
+    "\n",
+    "plotting_utils.plot_heatmap(df_gridsearch, \"param_max_depth\", \"param_n_estimators\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Conclusion and Next Steps\n",
+    "\n",
+    "We notice improvements in the performance for a really basic version of the GridSearch and RandomizedSearch. Generally, the more data we use, the better the model performs, so you are encouraged to try for larger data and broader range of parameters.\n",
+    "\n",
+    "This experiment can also be repeated with different classifiers and different ranges of parameters to notice how HPO can help improve the performance metric. In this example, we have chosen a basic metric - accuracy, but you can use more interesting metrics that help in determining the usefulness of a model. You can even send a list of parameters to the scoring function. This makes HPO really powerful, and it can add a significant boost to the model that we generate.\n",
+    "\n",
+    "\n",
+    "#### Further Reading\n",
+    "\n",
+    "- [The 5 Classification Evaluation Metrics You Must Know](https://towardsdatascience.com/the-5-classification-evaluation-metrics-you-must-know-aa97784ff226)\n",
+    "- [11 Important Model Evaluation Metrics for Machine Learning Everyone should know](https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/)\n",
+    "- [Algorithms for Hyper-Parameter Optimisation](http://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf)\n",
+    "- [Forward and Reverse Gradient-Based Hyperparameter Optimization](http://proceedings.mlr.press/v70/franceschi17a/franceschi17a-supp.pdf)\n",
+    "- [Practical Bayesian Optimization of Machine\n",
+    "Learning Algorithms](http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf)\n",
+    "- [Random Search for Hyper-Parameter Optimization](http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "rapids-23.02",
+   "language": "python",
+   "name": "rapids-23.02"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/source/guides/cuda/localcluster.md b/source/guides/cuda/localcluster.md
new file mode 100644
index 00000000..492b55ea
--- /dev/null
+++ b/source/guides/cuda/localcluster.md
@@ -0,0 +1,57 @@
+# LocalCUDACluster
+
+Create a cluster of one or more GPUs on your local machine. You can launch a Dask scheduler on LocalCUDACluster to parallelize and distribute your RAPIDS workflows across multiple GPUs on a single node.
+
+In addition to enabling multi-GPU computation, `LocalCUDACluster` also provides a simple interface for managing the cluster, such as starting and stopping the cluster, querying the status of the nodes, and monitoring the workload distribution.
+
+## Pre-requisites
+
+Before running these instructions, ensure you have installed the [`dask`](https://docs.dask.org/en/stable/install.html) and [`dask_cuda`](https://docs.rapids.ai/api/dask-cuda/nightly/install.html) packages in your local environment
+
+## Cluster setup
+
+### Instantiate a LocalCUDACluster object
+
+In this example, we create a cluster with two workers, each of which is responsible for executing tasks on a separate GPU.
+
+```console
+cluster = LocalCUDACluster(n_workers=2)
+```
+
+### Connecting a Dask client
+
+Dask scheduler coordinates the execution of tasks, whereas Dask client is the user-facing interface that submits tasks to the scheduler and monitors their progress.
+
+```console
+client = Client(cluster)
+```
+
+## Test RAPIDS
+
+To test RAPIDS, create a distributed client for the cluster and query for the GPU model.
+
+```Python
+from dask_cuda import LocalCUDACluster
+from dask.distributed import Client
+
+cluster = LocalCUDACluster()
+client = Client(cluster)
+
+def get_gpu_model():
+    import pynvml
+
+    pynvml.nvmlInit()
+    return pynvml.nvmlDeviceGetName(pynvml.nvmlDeviceGetHandleByIndex(0))
+
+
+client.submit(get_gpu_model).result()
+```
+
+## 3.Clean up
+
+Be sure to shut down and clean LocalCUDACluster when you are done:
+
+```console
+client.close()
+cluster.close()
+```