diff --git a/sagemaker-triton/fil_ensemble/.gitignore b/sagemaker-triton/fil_ensemble/.gitignore
new file mode 100644
index 0000000000..66dfc26e87
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/.gitignore
@@ -0,0 +1,3 @@
+**/*.ipynb_checkpoints/
+preprocessing_env.tar.gz
+**/*__pycache__/
diff --git a/sagemaker-triton/fil_ensemble/1_prep_rapids_train_xgb.ipynb b/sagemaker-triton/fil_ensemble/1_prep_rapids_train_xgb.ipynb
new file mode 100644
index 0000000000..fb39bca30d
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/1_prep_rapids_train_xgb.ipynb
@@ -0,0 +1,715 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d8c854cb",
+   "metadata": {},
+   "source": [
+    "# Data Preprocessing using RAPIDS and Training XGBoost for Fraud Detection"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c585524",
+   "metadata": {},
+   "source": [
+    "\n",
+    "\n",
+    "<img src=\"./images/rapids.png\" alt=\"rapids\" width=\"400\" align=\"center\"/>\n",
+    "\n",
+    "In this notebook we will walk through using [RAPIDS](https://rapids.ai/about.html) for GPU-accelerated data preprocessing and training of XGBoost model for a Fraud Detection use-case. This is the first notebook in a two notebook series. In the [second notebook](2_triton_xgb_fil_ensemble.ipynb) we will show how to deploy the trained XGBoost model in Triton on SageMaker. The RAPIDS suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a15d9b9e",
+   "metadata": {},
+   "source": [
+    "**Note:** Since the primary goal of this example is to get a trained XGBoost model to illustrate deployment of Tree-based ML models on Triton in SageMaker we don't perform any in-depth feature engineering or hyperparameter optimization. Although RAPIDS on SageMaker is excellent for [running cost-effective HPO in minimal amount of time](https://aws.amazon.com/blogs/machine-learning/rapids-and-amazon-sagemaker-scale-up-and-scale-out-to-tackle-ml-challenges/) to get to the best accuracy model configuration. \n",
+    "\n",
+    "## To Run This Notebook Please Select RAPIDS 2106 Kernel from the Kernel Dropdown menu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdf4884f",
+   "metadata": {},
+   "source": [
+    "This notebook was tested with the `rapids-2106` kernel on an Amazon SageMaker notebook instance of type `g4dn`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f87fb40",
+   "metadata": {},
+   "source": [
+    "## Get Data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b379efd7",
+   "metadata": {},
+   "source": [
+    "For this example, we use the Tabformer [synthetic credit card transactions dataset](https://arxiv.org/abs/1910.03033) from IBM available on [Kaggle](https://www.kaggle.com/datasets/ealtman2019/credit-card-transactions). The origin of this dataset along with its licensing terms can be found at: [Kaggle link](https://www.kaggle.com/datasets/ealtman2019/credit-card-transactions).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d82b7ea5",
+   "metadata": {},
+   "source": [
+    "### Download Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4588fb76",
+   "metadata": {},
+   "source": [
+    "First we download the dataset from our Amazon S3 bucket."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4f4f19b0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!python -m pip install --upgrade pip --quiet\n",
+    "!pip install -U awscli --quiet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "58a250bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/synthetic_credit_card_transactions/credit_card_transactions-ibm_v2.csv ./"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f00abe8b",
+   "metadata": {},
+   "source": [
+    "## Check on our GPU"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96e7edd8",
+   "metadata": {},
+   "source": [
+    "Next, let's check the GPU resources we have by using the terminal command `nvidia-smi`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7c3aeafc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!nvidia-smi\n",
+    "!nvidia-smi -L"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f5e8810",
+   "metadata": {},
+   "source": [
+    "Awesome, we have powerful NVIDIA GPU at our disposal. Let's get started with using it for Data Preprocessing."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fb68f21",
+   "metadata": {},
+   "source": [
+    "## Data Preprocessing"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9894de9e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import cudf\n",
+    "import cuml\n",
+    "import numpy as np\n",
+    "import pickle\n",
+    "import os"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef59ddd2",
+   "metadata": {},
+   "source": [
+    "We read in the data and begin our data preprocessing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7adb6815",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_path = \"./\"\n",
+    "data_csv = \"credit_card_transactions-ibm_v2.csv\"\n",
+    "full_data = cudf.read_csv(os.path.join(data_path, data_csv))\n",
+    "full_data.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ffea2c4",
+   "metadata": {},
+   "source": [
+    "Each row here is a credit card transaction with attributes like time and amount of transaction along with merchant attributes like Name, City, State, Zipcode and Merchant Category Code (MCC) and finally whether the transaction was fraudulent or legitimate (`Is Fraud?`). \n",
+    "\n",
+    "**Note:** `Merchant Name` is hashed so that's why we see integers instead of strings.\n",
+    "\n",
+    "The full dataset has about 24 million rows but in this example we use random subset of about ~5 million transactions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aaa12ce5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "SEED = 42\n",
+    "data = full_data.sample(frac=0.2, random_state=SEED)\n",
+    "data = data.reset_index(drop=True)\n",
+    "print(data.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aff03966",
+   "metadata": {},
+   "source": [
+    "We convert some categorical features to dtype objects."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "937f07a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data[\"Zip\"] = data[\"Zip\"].astype(\"object\")\n",
+    "data[\"MCC\"] = data[\"MCC\"].astype(\"object\")\n",
+    "data[\"Merchant Name\"] = data[\"Merchant Name\"].astype(\"object\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e60bbfcd",
+   "metadata": {},
+   "source": [
+    "### Encode labels\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a5b1c65",
+   "metadata": {},
+   "source": [
+    "Next we perform encoding on our binary labels `Is Fraud?` which indicate whether a transaction is fraudulent or not. After encoding, `1` will denote fraud and `0` will denote legitimate transaction."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89dd9c48",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y = data[\"Is Fraud?\"]\n",
+    "data.drop(columns=[\"Is Fraud?\"], inplace=True)\n",
+    "y = (y == \"Yes\").astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91889ae3",
+   "metadata": {},
+   "source": [
+    "### Save subset for inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fdac51b",
+   "metadata": {},
+   "source": [
+    "We will also save a small subset of the data to submit Triton inference requests for later on in the [second notebook](2_triton_xgb_fil_ensemble.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "588dcba3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_infer = data.iloc[625:630]\n",
+    "data_infer.to_csv(\"data_infer.csv\", index=False)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89151f4b",
+   "metadata": {},
+   "source": [
+    "### Handle Missing Values"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d57baac4",
+   "metadata": {},
+   "source": [
+    "Next let's handle the missing values in our data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "626f87dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data.isna().sum() / len(data) * 100"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66b2518f",
+   "metadata": {},
+   "source": [
+    "We have some missing values in `Merchant State` and `Zip` columns. Turns out these correspond to ONLINE transactions so we will set those missing values to `ONLINE`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "48e9d1b1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data.loc[data[\"Merchant City\"] == \"ONLINE\", \"Merchant State\"] = \"ONLINE\"\n",
+    "data.loc[data[\"Merchant City\"] == \"ONLINE\", \"Zip\"] = \"ONLINE\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa13e333",
+   "metadata": {},
+   "source": [
+    "We also have some foreign transactions where `Merchant City` and `Merchant State` is a foreign city and country and the Zipcode is missing. For those transactions we will set the Zipcode to `FOREIGN`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "926ac124",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "us_states_plus_online = [\n",
+    "    \"AK\",\n",
+    "    \"AL\",\n",
+    "    \"AR\",\n",
+    "    \"AZ\",\n",
+    "    \"CA\",\n",
+    "    \"CO\",\n",
+    "    \"CT\",\n",
+    "    \"DC\",\n",
+    "    \"DE\",\n",
+    "    \"FL\",\n",
+    "    \"GA\",\n",
+    "    \"HI\",\n",
+    "    \"IA\",\n",
+    "    \"ID\",\n",
+    "    \"IL\",\n",
+    "    \"IN\",\n",
+    "    \"KS\",\n",
+    "    \"KY\",\n",
+    "    \"LA\",\n",
+    "    \"MA\",\n",
+    "    \"MD\",\n",
+    "    \"ME\",\n",
+    "    \"MI\",\n",
+    "    \"MN\",\n",
+    "    \"MO\",\n",
+    "    \"MS\",\n",
+    "    \"MT\",\n",
+    "    \"NC\",\n",
+    "    \"ND\",\n",
+    "    \"NE\",\n",
+    "    \"NH\",\n",
+    "    \"NJ\",\n",
+    "    \"NM\",\n",
+    "    \"NV\",\n",
+    "    \"NY\",\n",
+    "    \"OH\",\n",
+    "    \"OK\",\n",
+    "    \"OR\",\n",
+    "    \"PA\",\n",
+    "    \"RI\",\n",
+    "    \"SC\",\n",
+    "    \"SD\",\n",
+    "    \"TN\",\n",
+    "    \"TX\",\n",
+    "    \"UT\",\n",
+    "    \"VA\",\n",
+    "    \"VT\",\n",
+    "    \"WA\",\n",
+    "    \"WI\",\n",
+    "    \"WV\",\n",
+    "    \"WY\",\n",
+    "    \"ONLINE\",\n",
+    "]\n",
+    "\n",
+    "# set zip of all transactions that are not in US States or Online to Foreign\n",
+    "data.loc[~data[\"Merchant State\"].isin(us_states_plus_online), \"Zip\"] = \"FOREIGN\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ff624ef",
+   "metadata": {},
+   "source": [
+    "The `Errors?` column indicates whether or not the transaction had any errors like an Incorrect Pin associated with it. We make this a boolean indicator feature."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "068800f0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data[\"Errors?\"] = data[\"Errors?\"].notna()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0ae0b332",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data.isna().sum() / len(data) * 100"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5162efe1",
+   "metadata": {},
+   "source": [
+    "So now we have handled all the missing values in our data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e80a3fb2",
+   "metadata": {},
+   "source": [
+    "### Handle Amount and Time"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f8ea1d0",
+   "metadata": {},
+   "source": [
+    "Next, for the `Amount` column we remove the dollar symbol prefix and for `Time` column we extract out the Hour and Minute."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8350762d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data[\"Amount\"] = data[\"Amount\"].str.slice(1)\n",
+    "data[\"Hour\"] = data[\"Time\"].str.slice(stop=2)\n",
+    "data[\"Minute\"] = data[\"Time\"].str.slice(start=3)\n",
+    "data.drop(columns=[\"Time\"], inplace=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2934011",
+   "metadata": {},
+   "source": [
+    "###  Train-Test Split"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25acf723",
+   "metadata": {},
+   "source": [
+    "Before doing any further preprocessing let's perform the train-test split. Here we use 70-30 train-test split."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1aa71566",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from cuml.model_selection import train_test_split\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(\n",
+    "    data, y, test_size=0.3, random_state=SEED, stratify=y\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5c957d39",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Free up some room on the GPU by explicitly deleting dataframes\n",
+    "import gc\n",
+    "\n",
+    "del data\n",
+    "del y\n",
+    "gc.collect()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8dfa111b",
+   "metadata": {},
+   "source": [
+    "### Encoding Categorical Columns"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08c0cb62",
+   "metadata": {},
+   "source": [
+    "Next, we handle categorical columns in our dataset by performing [label encoding](https://docs.rapids.ai/api/cuml/stable/api.html?highlight=label%20encoder#feature-and-label-encoding-single-gpu) on them which convert categorical values into numerical values. For some of these columns we have some unseen values which are present in test data but not train data. We handle those values by setting them to `UNKNOWN` before doing the label encoding so that at test time we have an encoding for these unseen values.\n",
+    "\n",
+    "We also serialize the encodings for all categorical columns so that we can later use them for doing data preprocessing at inference time in the [second notebook](2_triton_xgb_fil_ensemble.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "27bcaf30",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from cuml.preprocessing import LabelEncoder\n",
+    "\n",
+    "categorial_columns = [\"Zip\", \"MCC\", \"Merchant Name\", \"Use Chip\", \"Merchant City\", \"Merchant State\"]\n",
+    "encoders = {}\n",
+    "\n",
+    "# handle unknown values present in test data but not in training data\n",
+    "for col in categorial_columns:\n",
+    "    # convert cudf series to numpy array with .values_host\n",
+    "    unique_values = X_train[col].unique().values_host\n",
+    "    X_test.loc[~X_test[col].isin(unique_values), col] = \"UNKNOWN\"\n",
+    "    unique_values = np.append(unique_values, [\"UNKNOWN\"])\n",
+    "    # convert numpy array to cudf series\n",
+    "    unique_values = cudf.Series(unique_values)\n",
+    "    le = LabelEncoder().fit(unique_values)\n",
+    "    X_train[col] = le.transform(X_train[col])\n",
+    "    X_test[col] = le.transform(X_test[col])\n",
+    "    encoders[col] = le.classes_.values_host\n",
+    "\n",
+    "with open(\"label_encoders.pkl\", \"wb\") as f:\n",
+    "    pickle.dump(encoders, f)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "97eb4855",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# convert all dtypes to fp32 for xgboost training\n",
+    "X_train = X_train.astype(\"float32\")\n",
+    "X_test = X_test.astype(\"float32\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28e78093",
+   "metadata": {},
+   "source": [
+    "Let's look at our preprocessed data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "46c42639",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X_train.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbf7bb71",
+   "metadata": {},
+   "source": [
+    "## Train XGBoost"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8df4e196",
+   "metadata": {},
+   "source": [
+    "Now we train the XGBoost fraud detection model on our GPU. This will take about 2-3 minutes on `g4dn.xlarge` instance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e6be5d5e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import xgboost as xgb\n",
+    "import time\n",
+    "\n",
+    "dtrain = xgb.DMatrix(X_train, y_train)\n",
+    "\n",
+    "dtest = xgb.DMatrix(X_test, y_test)\n",
+    "\n",
+    "max_depth = 8\n",
+    "num_trees = 2000\n",
+    "xgb_params = {\n",
+    "    \"max_depth\": max_depth,\n",
+    "    \"tree_method\": \"gpu_hist\",\n",
+    "    \"objective\": \"binary:logistic\",\n",
+    "    \"eval_metric\": \"aucpr\",\n",
+    "    \"predictor\": \"gpu_predictor\",\n",
+    "}\n",
+    "model = xgb.train(params=xgb_params, dtrain=dtrain, num_boost_round=num_trees)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e26dbd34",
+   "metadata": {},
+   "source": [
+    "We quickly evaluate our trained model's predictions on the test set using F1-score."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0121077c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.metrics import f1_score\n",
+    "\n",
+    "y_score = model.predict(dtest)\n",
+    "threshold = 0.5\n",
+    "y_pred = (y_score >= 0.5).astype(int)\n",
+    "y_true = y_test.values_host\n",
+    "f1 = f1_score(y_true, y_pred)\n",
+    "print(f\"Test F1-Score: {f1: 0.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2672cf3a",
+   "metadata": {},
+   "source": [
+    "We can do further Hyperparameter tuning/Feature Engineering to improve the model accuracy but since the primary goal of this example is to walkthrough deployment of decision tree-based ML models like XGBoost on Triton in SageMaker we save our trained model and move on to the [second notebook](2_triton_xgb_fil_ensemble.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0962df8",
+   "metadata": {},
+   "source": [
+    "### Save Trained Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c5f2f476",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_path = \"./xgboost.json\"\n",
+    "model.save_model(model_path)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3236c5c1",
+   "metadata": {},
+   "source": [
+    "## Next Step"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fdb0297e",
+   "metadata": {},
+   "source": [
+    "Please open the [second notebook](2_triton_xgb_fil_ensemble.ipynb) to learn how to deploy this XGBoost model and other similar decision tree-based ML models on Triton in SageMaker."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3.9.13 64-bit",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.13"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/sagemaker-triton/fil_ensemble/2_triton_xgb_fil_ensemble.ipynb b/sagemaker-triton/fil_ensemble/2_triton_xgb_fil_ensemble.ipynb
new file mode 100644
index 0000000000..aab439bd60
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/2_triton_xgb_fil_ensemble.ipynb
@@ -0,0 +1,908 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7e386477",
+   "metadata": {},
+   "source": [
+    "# Pre-processing + XGBoost model inference pipeline with NVIDIA Triton Inference Server on Amazon SageMaker"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ea6713c",
+   "metadata": {},
+   "source": [
+    "This is the second notebook included in a two part series. The  [first notebook](1_prep_rapids_train_xgb.ipynb) creates artifacts which are dependencies for this notebook. \n",
+    "\n",
+    "With the 22.05 version release of [NVIDIA Triton](https://github.com/triton-inference-server/server/) container image on SageMaker you can now use Triton's Forest Inference Library (FIL) backend to easily serve tree based ML models like XGBoost for high-performance CPU and GPU inference in SageMaker. Using Triton's FIL backend allows you to benefit from performance optimizations like dynamic batching and concurrent execution which help maximize the utilization of GPU and CPU, further lowering the cost of inference. The multi-framework support provided by NVIDIA Triton allows you to seamlessly deploy tree-based ML models alongside deep learning models for fast, unified inference pipelines.\n",
+    "\n",
+    "Machine Learning applications are complex and can often require data pre-processing. In this notebook, we will not only deep dive into how to deploy a tree-based ML model like XGBoost using the FIL Backend in Triton on SageMaker endpoint but also cover how to implement python-based data pre-processing inference pipeline for your model using the ensemble feature in Triton. This will allow us to send in the raw data from client side and have both data pre-processing and model inference happen in Triton SageMaker endpoint for the optimal inference performance.\n",
+    "\n",
+    "## To Run This Notebook Please Select conda_python3 Kernel from the Kernel Dropdown menu\n",
+    "\n",
+    "**Note:** This notebook was tested with the `conda_python3` kernel on an Amazon SageMaker notebook instance of type `g4dn`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f9fc77d",
+   "metadata": {},
+   "source": [
+    "## Forest Inference Library (FIL)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9686e34",
+   "metadata": {},
+   "source": [
+    "RAPIDS Forest Inference Library (FIL) is a library to provide high-performance inference for tree-based models. Here are some important FIL features:\n",
+    "\n",
+    "* Supports XGBoost, LightGBM, cuML RandomForest, and Scikit Learn Random Forest\n",
+    "* No conversion needed for XGBoost and LightGBM. SKLearn or cuML pickle models need to be converted to Treelite's binary checkpoint format \n",
+    "* SKLearn Random Forest is supported for single-output regression and multi-class classification\n",
+    "* Both CPU and GPU are supported\n",
+    "\n",
+    "Below we show benchmark highlighting FIL's throughput performance against CPU XGBoost.\n",
+    "\n",
+    "<img src=\"./images/fil_benchmark.png\" alt=\"fil-benchmark\" width=\"500\" align=\"left\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7df7b53e",
+   "metadata": {},
+   "source": [
+    "## Triton FIL Backend\n",
+    "FIL is available as a backend in Triton with features to allow for serving XGBoost, LightGBM and RandomForest models both on CPU and GPU with high performance. Here are some important features of the FIL Backend:\n",
+    "\n",
+    "* **Shapley Value Support (GPU)**: GPU Shapley Values are supported for Model Explainability\n",
+    "* **Categorical Feature Support**: Models trained on categorical features fully supported.\n",
+    "* **CPU Optimizations**: Optimized CPU mode offers faster execution than native XGBoost.\n",
+    "\n",
+    "To learn more about FIL Backend's features please see the [FAQ Notebook](https://github.com/triton-inference-server/fil_backend/blob/fea-faq_nb/notebooks/faq/FAQs.ipynb) and [Triton FIL Backend GitHub.](https://github.com/triton-inference-server/fil_backend/tree/main)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3907aeb3",
+   "metadata": {},
+   "source": [
+    "## Triton Model Ensemble Feature\n",
+    "Triton Inference Server greatly simplifies the deployment of AI models at scale in production. Triton Server comes with a convenient solution that simplifies building pre-processing and post-processing pipelines. Triton Server platform provides the ensemble scheduler, which is responsible for pipelining models participating in the inference process while ensuring efficiency and optimizing throughput. Using ensemble models can avoid the overhead of transferring intermediate tensors and minimize the number of requests that must be sent to Triton.\n",
+    "\n",
+    "<img src=\"./images/triton-ensemble.png\" alt=\"triton-ensemble\" width=\"500\" align=\"left\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ade5d7a",
+   "metadata": {},
+   "source": [
+    "In this notebook we will be show how to use the ensemble feature for building a pipeline of data preprocessing with XGBoost model inference and you can extrapolate from it to add custom postprocessing to the pipeline."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d32d93c",
+   "metadata": {},
+   "source": [
+    "## Set up Environment"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da60ff17",
+   "metadata": {},
+   "source": [
+    "We begin by setting up the required environment. We will install the dependencies required to package our model pipeline and run inferences using Triton server. Also define the IAM role that will give SageMaker access to the model artifacts and the NVIDIA Triton ECR image."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "36a83ed3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install nvidia-pyindex\n",
+    "!pip install tritonclient[http]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "81049583",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import boto3\n",
+    "import json\n",
+    "import sagemaker\n",
+    "import time\n",
+    "import os\n",
+    "from sagemaker import get_execution_role\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "\n",
+    "sess = boto3.Session()\n",
+    "sm = sess.client(\"sagemaker\")\n",
+    "sagemaker_session = sagemaker.Session(boto_session=sess)\n",
+    "role = get_execution_role()\n",
+    "client = boto3.client(\"sagemaker-runtime\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e96f98b1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "account_id_map = {\n",
+    "    \"us-east-1\": \"785573368785\",\n",
+    "    \"us-east-2\": \"007439368137\",\n",
+    "    \"us-west-1\": \"710691900526\",\n",
+    "    \"us-west-2\": \"301217895009\",\n",
+    "    \"eu-west-1\": \"802834080501\",\n",
+    "    \"eu-west-2\": \"205493899709\",\n",
+    "    \"eu-west-3\": \"254080097072\",\n",
+    "    \"eu-north-1\": \"601324751636\",\n",
+    "    \"eu-south-1\": \"966458181534\",\n",
+    "    \"eu-central-1\": \"746233611703\",\n",
+    "    \"ap-east-1\": \"110948597952\",\n",
+    "    \"ap-south-1\": \"763008648453\",\n",
+    "    \"ap-northeast-1\": \"941853720454\",\n",
+    "    \"ap-northeast-2\": \"151534178276\",\n",
+    "    \"ap-southeast-1\": \"324986816169\",\n",
+    "    \"ap-southeast-2\": \"355873309152\",\n",
+    "    \"cn-northwest-1\": \"474822919863\",\n",
+    "    \"cn-north-1\": \"472730292857\",\n",
+    "    \"sa-east-1\": \"756306329178\",\n",
+    "    \"ca-central-1\": \"464438896020\",\n",
+    "    \"me-south-1\": \"836785723513\",\n",
+    "    \"af-south-1\": \"774647643957\",\n",
+    "}\n",
+    "\n",
+    "region = boto3.Session().region_name\n",
+    "if region not in account_id_map.keys():\n",
+    "    raise (\"UNSUPPORTED REGION\")\n",
+    "\n",
+    "base = \"amazonaws.com.cn\" if region.startswith(\"cn-\") else \"amazonaws.com\"\n",
+    "triton_image_uri = \"{account_id}.dkr.ecr.{region}.{base}/sagemaker-tritonserver:22.05-py3\".format(\n",
+    "    account_id=account_id_map[region], region=region, base=base\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e796fbd9",
+   "metadata": {},
+   "source": [
+    "## Set up pre-processing with Triton Python Backend"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b30cc9c",
+   "metadata": {},
+   "source": [
+    "We will be using Triton's [Python Backend](https://github.com/triton-inference-server/python_backend) to perform the same tabular data preprocessing that we did in [first notebook](1_prep_rapids_train_xgb.ipynb) but now during inference time for raw data requests coming into the server. The Python backend enables pre-process, post-processing and any other custom logic to be implemented in Python and served with Triton."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f95bf6b",
+   "metadata": {},
+   "source": [
+    "Using Triton on SageMaker requires us to first set up a model repository folder containing the models we want to serve. We have already set up model for python data preprocessing called `preprocessing` in the `model_repository`.\n",
+    "\n",
+    "<img src=\"./images/preprocessing_model.png\" alt=\"preprocessing-model\" width=\"200\" align=\"left\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f40198e8",
+   "metadata": {},
+   "source": [
+    "Now Triton has specific requirements for model repository layout. Within the top-level model repository directory each model has its own sub-directory containing the information for the corresponding model. Each model directory in Triton must have at least one numeric sub-directory representing a version of the model. Here that is `1` representing version 1 of our python preprocessing model. Each model is executed by a specific backend so within each version sub-directory there must be the model artifact required by that backend. Here, we are using the Python backend and it requires the python file you are serving to be called `model.py` and the file needs to implement [certain functions](https://github.com/triton-inference-server/python_backend#usage). If we were using a PyTorch backend a `model.pt` file would be required and so on. For more details on naming conventions for model files please see the [model files doc](https://github.com/triton-inference-server/server/blob/185253ce225a0b012e73cade5c9a948ef9e75abd/docs/model_repository.md#model-files).\n",
+    "\n",
+    "\n",
+    "[Our model.py](model_repository/preprocessing/1/model.py) python file we are using here implements all the tabular data preprocessing logic to convert raw data into features that can be fed into our XGBoost model.\n",
+    "\n",
+    "Every Triton model must also provide a `config.pbtxt` file describing the model configuration. To learn more about the config settings please see [model configuration](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md) doc. [Our config.pbtxt](model_repository/preprocessing/config.pbtxt) specifies the backend as `python` and specifies all the input columns for raw data along with preprocessed output that consists of 15 features. We also specify we want to run this python preprocessing model on the CPU."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b42213e",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "### Create Conda Env for Preprocessing Dependencies"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3be5e211",
+   "metadata": {},
+   "source": [
+    "The Python backend in Triton requires us to use conda environment for any additional dependencies. In this case we are using the Python backend to do preprocessing of the raw data before feeding it into the XGBoost model being run in FIL Backend. Even though we originally used RAPIDS cuDF and cuML to do the data preprocessing here we use Pandas and Scikit-learn as preprocessing dependencies for inference time. We do this for three reasons. \n",
+    "* Firstly, to show how to create conda environment for your dependencies and how to package it in [format expected](https://github.com/triton-inference-server/python_backend#2-packaging-the-conda-environment) by Triton's Python backend. \n",
+    "* Secondly, by showing the preprocessing model running in Python backend on the CPU while the XGBoost runs on the GPU in FIL Backend we illustrate how each model in Triton's ensemble pipeline can run on different framework backend as well as different hardware configurations\n",
+    "* Thirdly, it highlights how the RAPIDS libraries (cuDF, cuML) are compatible with their CPU counterparts (Pandas, Scikit-learn). For example this way we get to show how LabelEncoders created in cuML can be used in Scikit-learn and vice-versa"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eedfc6dc",
+   "metadata": {},
+   "source": [
+    "We follow the instructions [here](https://github.com/triton-inference-server/python_backend#2-packaging-the-conda-environment) for packaging preprocessing dependencies (here scikit-learn and pandas) to be used in the python backend as conda env tar file. The bash script [create_prep_env.sh](./create_prep_env.sh) creates the conda environment tar file and then we move it into the preprocessing model directory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aefd7687",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!bash create_prep_env.sh\n",
+    "!cp preprocessing_env.tar.gz model_repository/preprocessing/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc8232dc",
+   "metadata": {},
+   "source": [
+    "After creating the tar file from the conda environment and placing it in model folder, you need to tell Python backend to use that environment for your model. We do this by including the lines below in the model `config.pbtxt` file:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "606f2be2",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "parameters: {\n",
+    "  key: \"EXECUTION_ENV_PATH\",\n",
+    "  value: {string_value: \"$$TRITON_MODEL_DIRECTORY/preprocessing_env.tar.gz\"}\n",
+    "}\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "338d5bcf",
+   "metadata": {},
+   "source": [
+    "Here, `$$TRITON_MODEL_DIRECTORY` helps provide environment path relative to the model folder in model repository and is resolved to `$pwd/model_repository/preprocessing`. Finally `preprocessing_env.tar.gz` is the name we gave to our conda env file. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e429da7",
+   "metadata": {},
+   "source": [
+    "### Set up Label Encoders"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc987681",
+   "metadata": {},
+   "source": [
+    "We also move the label encoders we had serialized eariler into `preprocessing` model folder so that we can use them to encode raw data categorical features at inference time."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "40c9c454",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cp label_encoders.pkl model_repository/preprocessing/1/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07b7307a",
+   "metadata": {},
+   "source": [
+    "## Set up Tree-based ML Model for FIL Backend"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e86978d1",
+   "metadata": {},
+   "source": [
+    "Next, we set up the model directory for tree-based ML model like XGBoost which will be using FIL Backend.\n",
+    "\n",
+    "The expected layout for model directory is similar to the one we showed above:\n",
+    "\n",
+    "<img src=\"./images/fil_model.png\" alt=\"fil-model\" width=\"200\" align=\"left\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "facb89d4",
+   "metadata": {},
+   "source": [
+    "Here, `fil` is the name of the model. We can give it a different name like xgboost if we want to. `1` is the version sub-directory which contains the model artifact, in this case it's the `xgboost.json` model that we saved at the end of [first notebook](1_prep_rapids_train_xgb.ipynb). Let's create this expected layout."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "af0a4ac2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# move saved xgboost model into fil model directory\n",
+    "!mkdir -p model_repository/fil/1\n",
+    "!cp xgboost.json model_repository/fil/1/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3413e6eb",
+   "metadata": {},
+   "source": [
+    "And then finally we need to have configuration file `config.pbtxt` describing the model configuration for tree-based ML model so that FIL Backend in Triton can understand how to serve it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "754f8c5d",
+   "metadata": {},
+   "source": [
+    "### Create Config File for FIL Backend Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4c957cd",
+   "metadata": {},
+   "source": [
+    "You can read about all generic Triton configuration options [here](https://github.com/triton-inference-server/server/blob/master/docs/model_configuration.md) and about configuration options specific to the FIL backend [here](https://github.com/triton-inference-server/fil_backend#configuration), but we will focus on just a few of the most common and relevant options in this example. Below are general descriptions of these options:\n",
+    "\n",
+    "* **max_batch_size:** The maximum batch size that can be passed to this model. In general, the only limit on the size of batches passed to a FIL backend is the memory available with which to process them. \n",
+    "* **input:** Options in this section tell Triton the number of features to expect for each input sample.\n",
+    "* **output:** Options in this section tell Triton how many output values there will be for each sample. If the \"predict_proba\" option (described further on) is set to true, then a probability value will be returned for each class. Otherwise, a single value will be returned indicating the class predicted for the given sample.\n",
+    "* **instance_group:** This determines how many instances of this model will be created and whether they will use the GPU or CPU.\n",
+    "* **model_type:** A string indicating what format the model is in (\"xgboost_json\" in this example, but \"xgboost\", \"lightgbm\", and \"tl_checkpoint\" are valid formats as well).\n",
+    "* **predict_proba:** If set to true, probability values will be returned for each class rather than just a class prediction.\n",
+    "* **output_class:** True for classification models, false for regression models.\n",
+    "* **threshold:** A score threshold for determining classification. When output_class is set to true, this must be provided, although it will not be used if predict_proba is also set to true.\n",
+    "* **storage_type:** In general, using \"AUTO\" for this setting should meet most usecases. If \"AUTO\" storage is selected, FIL will load the model using either a sparse or dense representation based on the approximate size of the model. In some cases, you may want to explicitly set this to \"SPARSE\" in order to reduce the memory footprint of large models.\n",
+    "\n",
+    "Here we have 15 input features and 2 classes (FRAUD, NOT FRAUD) that we are doing classification for in our XGBoost Model. Based on this information, let's set up FIL Backend configuration file for our tree-based model for serving on GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "af7e054c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "USE_GPU = True\n",
+    "FIL_MODEL_DIR = \"./model_repository/fil\"\n",
+    "\n",
+    "# Maximum size in bytes for input and output arrays. If you are\n",
+    "# using Triton 21.11 or higher, all memory allocations will make\n",
+    "# use of Triton's memory pool, which has a default size of\n",
+    "# 67_108_864 bytes\n",
+    "MAX_MEMORY_BYTES = 60_000_000\n",
+    "NUM_FEATURES = 15\n",
+    "NUM_CLASSES = 2\n",
+    "bytes_per_sample = (NUM_FEATURES + NUM_CLASSES) * 4\n",
+    "max_batch_size = MAX_MEMORY_BYTES // bytes_per_sample\n",
+    "\n",
+    "IS_CLASSIFIER = True\n",
+    "model_format = \"xgboost_json\"\n",
+    "\n",
+    "# Select deployment hardware (GPU or CPU)\n",
+    "if USE_GPU:\n",
+    "    instance_kind = \"KIND_GPU\"\n",
+    "else:\n",
+    "    instance_kind = \"KIND_CPU\"\n",
+    "\n",
+    "# whether the model is doing classification or regression\n",
+    "if IS_CLASSIFIER:\n",
+    "    classifier_string = \"true\"\n",
+    "else:\n",
+    "    classifier_string = \"false\"\n",
+    "\n",
+    "# whether to predict probabilites or not\n",
+    "predict_proba = False\n",
+    "\n",
+    "if predict_proba:\n",
+    "    predict_proba_string = \"true\"\n",
+    "else:\n",
+    "    predict_proba_string = \"false\"\n",
+    "\n",
+    "config_text = f\"\"\"backend: \"fil\"\n",
+    "max_batch_size: {max_batch_size}\n",
+    "input [                                 \n",
+    " {{  \n",
+    "    name: \"input__0\"\n",
+    "    data_type: TYPE_FP32\n",
+    "    dims: [ {NUM_FEATURES} ]                    \n",
+    "  }} \n",
+    "]\n",
+    "output [\n",
+    " {{\n",
+    "    name: \"output__0\"\n",
+    "    data_type: TYPE_FP32\n",
+    "    dims: [ 1 ]\n",
+    "  }}\n",
+    "]\n",
+    "instance_group [{{ kind: {instance_kind} }}]\n",
+    "parameters [\n",
+    "  {{\n",
+    "    key: \"model_type\"\n",
+    "    value: {{ string_value: \"{model_format}\" }}\n",
+    "  }},\n",
+    "  {{\n",
+    "    key: \"predict_proba\"\n",
+    "    value: {{ string_value: \"{predict_proba_string}\" }}\n",
+    "  }},\n",
+    "  {{\n",
+    "    key: \"output_class\"\n",
+    "    value: {{ string_value: \"{classifier_string}\" }}\n",
+    "  }},\n",
+    "  {{\n",
+    "    key: \"threshold\"\n",
+    "    value: {{ string_value: \"0.5\" }}\n",
+    "  }},\n",
+    "  {{\n",
+    "    key: \"storage_type\"\n",
+    "    value: {{ string_value: \"AUTO\" }}\n",
+    "  }}\n",
+    "]\n",
+    "\n",
+    "dynamic_batching {{}}\"\"\"\n",
+    "\n",
+    "config_path = os.path.join(FIL_MODEL_DIR, \"config.pbtxt\")\n",
+    "with open(config_path, \"w\") as file_:\n",
+    "    file_.write(config_text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9beceae2",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## Set up Inference Pipeline of Data Preprocessing Python Backend and FIL Backend using Ensemble"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1855b520",
+   "metadata": {},
+   "source": [
+    "Now we are ready to set up the inference pipeline for data preprocessing and tree-based model inference using an [ensemble model](https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#ensemble-models). An ensemble model represents a pipeline of one or more models and the connection of input and output tensors between those models. Here we use the ensemble model to build a pipeline of Data Preprocessing in Python backend followed by XGBoost in FIL Backend. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34c04454",
+   "metadata": {},
+   "source": [
+    "The expected layout for `ensemble` model directory is similar to the ones we showed above:\n",
+    "\n",
+    "<img src=\"./images/ensemble_model.png\" alt=\"ensemble-model\" width=\"200\" align=\"left\"/>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6274e1ed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create model version directory for ensemble model\n",
+    "!mkdir -p model_repository/ensemble/1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5061b0c4",
+   "metadata": {},
+   "source": [
+    "We created the ensemble model's [config.pbtxt](model_repository/ensemble/config.pbtxt) following the guidance on [ensemble doc](https://github.com/triton-inference-server/server/blob/main/docs/architecture.md#ensemble-models). Importantly, we need to set up the ensemble scheduler in config.pbtxt which specifies the dataflow between models within the ensemble. The ensemble scheduler collects the output tensors in each step, provides them as input tensors for other steps according to the specification."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "125319fc",
+   "metadata": {},
+   "source": [
+    "## Package model repository and upload to S3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d770f25d",
+   "metadata": {},
+   "source": [
+    "Finally, we end up with the following model repository directory structure, containing a Python preprocessing model and its dependencies along with XGBoost FIL model, and the model ensemble.\n",
+    "\n",
+    "<img src=\"./images/model_repo.png\" alt=\"model-repo\" width=\"300\" align=\"left\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c286c251",
+   "metadata": {},
+   "source": [
+    "We will package this up as `model.tar.gz` for uploading it to S3."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b3b29ea0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!tar --exclude='.ipynb_checkpoints' -czvf model.tar.gz -C model_repository ."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7eacc699",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_uri = sagemaker_session.upload_data(path=\"model.tar.gz\", key_prefix=\"triton-fil-ensemble\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71c2ff3c",
+   "metadata": {},
+   "source": [
+    "## Create SageMaker Endpoint"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "294a69cb",
+   "metadata": {},
+   "source": [
+    "We start off by creating a SageMaker model from the model repository we uploaded to S3 in the previous step.\n",
+    "\n",
+    "In this step we also provide an additional Environment Variable `SAGEMAKER_TRITON_DEFAULT_MODEL_NAME` which specifies the name of the model to be loaded by Triton. **The value of this key should match the folder name in the model package uploaded to S3.** This variable is optional in case of a single model. In case of ensemble models, this **key has to be specified** for Triton to startup in SageMaker.\n",
+    "\n",
+    "Additionally, customers can set `SAGEMAKER_TRITON_BUFFER_MANAGER_THREAD_COUNT` and `SAGEMAKER_TRITON_THREAD_COUNT` for optimizing the thread counts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fb3c04ff",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sm_model_name = \"triton-fil-ensemble-\" + time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())\n",
+    "\n",
+    "container = {\n",
+    "    \"Image\": triton_image_uri,\n",
+    "    \"ModelDataUrl\": model_uri,\n",
+    "    \"Environment\": {\"SAGEMAKER_TRITON_DEFAULT_MODEL_NAME\": \"ensemble\"},\n",
+    "}\n",
+    "\n",
+    "create_model_response = sm.create_model(\n",
+    "    ModelName=sm_model_name, ExecutionRoleArn=role, PrimaryContainer=container\n",
+    ")\n",
+    "\n",
+    "print(\"Model Arn: \" + create_model_response[\"ModelArn\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ee9ac31",
+   "metadata": {},
+   "source": [
+    "Using the model above, we create an endpoint configuration where we can specify the type and number of instances we want in the endpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0be165b2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint_config_name = \"triton-fil-ensemble-\" + time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())\n",
+    "\n",
+    "create_endpoint_config_response = sm.create_endpoint_config(\n",
+    "    EndpointConfigName=endpoint_config_name,\n",
+    "    ProductionVariants=[\n",
+    "        {\n",
+    "            \"InstanceType\": \"ml.g4dn.4xlarge\",\n",
+    "            \"InitialVariantWeight\": 1,\n",
+    "            \"InitialInstanceCount\": 1,\n",
+    "            \"ModelName\": sm_model_name,\n",
+    "            \"VariantName\": \"AllTraffic\",\n",
+    "        }\n",
+    "    ],\n",
+    ")\n",
+    "\n",
+    "print(\"Endpoint Config Arn: \" + create_endpoint_config_response[\"EndpointConfigArn\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e132d08",
+   "metadata": {},
+   "source": [
+    "Using the above endpoint configuration we create a new SageMaker endpoint and wait for the deployment to finish. The status will change to InService once the deployment is successful."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c8e48705",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint_name = \"triton-fil-ensemble-\" + time.strftime(\"%Y-%m-%d-%H-%M-%S\", time.gmtime())\n",
+    "\n",
+    "create_endpoint_response = sm.create_endpoint(\n",
+    "    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n",
+    ")\n",
+    "\n",
+    "print(\"Endpoint Arn: \" + create_endpoint_response[\"EndpointArn\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "635fd26f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp = sm.describe_endpoint(EndpointName=endpoint_name)\n",
+    "status = resp[\"EndpointStatus\"]\n",
+    "print(\"Status: \" + status)\n",
+    "\n",
+    "while status == \"Creating\":\n",
+    "    time.sleep(60)\n",
+    "    resp = sm.describe_endpoint(EndpointName=endpoint_name)\n",
+    "    status = resp[\"EndpointStatus\"]\n",
+    "    print(\"Status: \" + status)\n",
+    "\n",
+    "print(\"Arn: \" + resp[\"EndpointArn\"])\n",
+    "print(\"Status: \" + status)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d9c2761",
+   "metadata": {},
+   "source": [
+    "## Run Inference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c4be74a",
+   "metadata": {},
+   "source": [
+    "Once we have the endpoint running we can use some sample raw data to do an inference using json as the payload format. For the inference request format, Triton uses the KFServing community standard [inference protocols.](https://github.com/triton-inference-server/server/blob/main/docs/protocol/README.md)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "96fb73bf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_infer = pd.read_csv(\"data_infer.csv\")\n",
+    "data_infer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0749cedb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "STR_COLUMNS = [\n",
+    "    \"Time\",\n",
+    "    \"Amount\",\n",
+    "    \"Zip\",\n",
+    "    \"MCC\",\n",
+    "    \"Merchant Name\",\n",
+    "    \"Use Chip\",\n",
+    "    \"Merchant City\",\n",
+    "    \"Merchant State\",\n",
+    "    \"Errors?\",\n",
+    "]\n",
+    "\n",
+    "batch_size = len(data_infer)\n",
+    "\n",
+    "payload = {}\n",
+    "payload[\"inputs\"] = []\n",
+    "data_dict = {}\n",
+    "for col_name in data_infer.columns:\n",
+    "    data_dict[col_name] = {}\n",
+    "    data_dict[col_name][\"name\"] = col_name\n",
+    "    if col_name in STR_COLUMNS:\n",
+    "        data_dict[col_name][\"data\"] = data_infer[col_name].astype(str).tolist()\n",
+    "        data_dict[col_name][\"datatype\"] = \"BYTES\"\n",
+    "    else:\n",
+    "        data_dict[col_name][\"data\"] = data_infer[col_name].astype(\"float32\").tolist()\n",
+    "        data_dict[col_name][\"datatype\"] = \"FP32\"\n",
+    "    data_dict[col_name][\"shape\"] = [batch_size, 1]\n",
+    "    payload[\"inputs\"].append(data_dict[col_name])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "94ca03cf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.invoke_endpoint(\n",
+    "    EndpointName=endpoint_name, ContentType=\"application/octet-stream\", Body=json.dumps(payload)\n",
+    ")\n",
+    "\n",
+    "response_body = json.loads(response[\"Body\"].read().decode(\"utf8\"))\n",
+    "predictions = response_body[\"outputs\"][0][\"data\"]\n",
+    "\n",
+    "CLASS_LABELS = [\"NOT FRAUD\", \"FRAUD\"]\n",
+    "predictions = [CLASS_LABELS[int(idx)] for idx in predictions]\n",
+    "print(predictions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a40cfcae",
+   "metadata": {},
+   "source": [
+    "### Binary + Json Payload"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f69eaf44",
+   "metadata": {},
+   "source": [
+    "We can also use binary+json as the payload format to get better performance for the inference call. The specification of this format is provided [here](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_binary_data.md).\n",
+    "\n",
+    "**Note:** With the `binary+json` format, we have to specify the length of the request metadata in the header to allow Triton to correctly parse the binary payload. This is done using a custom Content-Type header `application/vnd.sagemaker-triton.binary+json;json-header-size={}`.\n",
+    "\n",
+    "Please note, this is different from using `Inference-Header-Content-Length` header on a stand-alone Triton server since custom headers are not allowed in SageMaker.\n",
+    "\n",
+    "The [tritonclient](https://github.com/triton-inference-server/client) package provides utility methods to generate the payload without having to know the details of the specification. We'll use the following methods to convert our inference request into a binary format which provides lower latencies for inference."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f38ef326",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tritonclient.http as httpclient\n",
+    "\n",
+    "\n",
+    "def get_sample_data_binary(data, output_name):\n",
+    "    inputs = []\n",
+    "    outputs = []\n",
+    "    batch_size = len(data)\n",
+    "    for col_name in data.columns:\n",
+    "        if col_name in STR_COLUMNS:\n",
+    "            np_data = np.expand_dims(data[col_name], axis=1).astype(\"object\")\n",
+    "            infer_input = httpclient.InferInput(col_name, [batch_size, 1], \"BYTES\")\n",
+    "        else:\n",
+    "            np_data = np.expand_dims(data[col_name], axis=1).astype(\"float32\")\n",
+    "            infer_input = httpclient.InferInput(col_name, [batch_size, 1], \"FP32\")\n",
+    "        infer_input.set_data_from_numpy(np_data, binary_data=True)\n",
+    "        inputs.append(infer_input)\n",
+    "    outputs.append(httpclient.InferRequestedOutput(output_name, binary_data=True))\n",
+    "    request_body, header_length = httpclient.InferenceServerClient.generate_request_body(\n",
+    "        inputs, outputs=outputs\n",
+    "    )\n",
+    "    return request_body, header_length"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1547f43c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "output_name = \"predictions\"\n",
+    "request_body, header_length = get_sample_data_binary(data_infer, output_name)\n",
+    "\n",
+    "response = client.invoke_endpoint(\n",
+    "    EndpointName=endpoint_name,\n",
+    "    ContentType=\"application/vnd.sagemaker-triton.binary+json;json-header-size={}\".format(\n",
+    "        header_length\n",
+    "    ),\n",
+    "    Body=request_body,\n",
+    ")\n",
+    "\n",
+    "# Parse json header size length from the response\n",
+    "header_length_prefix = \"application/vnd.sagemaker-triton.binary+json;json-header-size=\"\n",
+    "header_length_str = response[\"ContentType\"][len(header_length_prefix) :]\n",
+    "\n",
+    "# Read response body\n",
+    "result = httpclient.InferenceServerClient.parse_response_body(\n",
+    "    response[\"Body\"].read(), header_length=int(header_length_str)\n",
+    ")\n",
+    "predictions = result.as_numpy(output_name)\n",
+    "CLASS_LABELS = [\"NOT FRAUD\", \"FRAUD\"]\n",
+    "predictions = [CLASS_LABELS[int(idx)] for idx in predictions]\n",
+    "print(predictions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7154ecdd",
+   "metadata": {},
+   "source": [
+    "## Terminate endpoint and clean up artifacts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "03f60fd4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sm.delete_endpoint(EndpointName=endpoint_name)\n",
+    "sm.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n",
+    "sm.delete_model(ModelName=sm_model_name)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1b0265ac",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "conda_python3",
+   "language": "python",
+   "name": "conda_python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/sagemaker-triton/fil_ensemble/README.md b/sagemaker-triton/fil_ensemble/README.md
new file mode 100644
index 0000000000..6a4d2217f8
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/README.md
@@ -0,0 +1,19 @@
+# XGBoost model inference pipeline with NVIDIA Triton Inference Server on Amazon SageMaker
+
+In this example we show an end-to-end GPU-accelerated fraud detection example making use of tree-based models like XGBoost. In the first notebook [1_prep_rapids_train_xgb.ipynb](1_prep_rapids_train_xgb.ipynb) we demonstrate GPU-accelerated tabular data preprocessing using RAPIDS and training of XGBoost model for fraud detection on the GPU in SageMaker. Then in second notebook [2_triton_xgb_fil_ensemble.ipynb](2_triton_xgb_fil_ensemble.ipynb) we walkthrough the process of deploying data preprocessing + XGBoost model inference pipeline for high throughput, low-latency inference on Triton in SageMaker. 
+
+## Steps to run the notebooks
+1. Launch SageMaker **notebook instance** with `g4dn.xlarge` instance.
+    - In **Additional Configuration** select `Create a new lifecycle configuration`. Specify `rapids-2106` as the name in Configuration Setting and copy paste the [on_start.sh](on_start.sh) script as the lifecycle configuration start notebook script. This will create the RAPIDS kernel for us to use inside SageMaker notebook. 
+        * For those using AWS on Windows machine, because of the incompatibility between Windows and Unix formatted text, especially in end of line characters you will run into this [error](https://stackoverflow.com/questions/63361229/how-do-you-write-lifecycle-configurations-for-sagemaker-on-windows) if you copy paste [on_start.sh](on_start.sh) script. To prevent that use Notepad++ (or other text editor) to change end of line characters (CRLF to LF) in the [on_start.sh](on_start.sh) script.
+            1. Click on Search > Replace (or Ctrl + H)
+            2. Find what: \r\n.
+            3. Replace with: \n.
+            4. Search Mode: select Extended.
+            5. Replace All. And then copy paste this into the AWS Lifecycle Configuration Start Notebook UI
+    - **IMPORTANT:** In Additional Configuration for **Volume Size in GB** specify at least **50 GB**.
+    - For git repositories select the option `Clone a public git repository to this notebook instance only` and specify the Git repository URL https://github.com/kshitizgupta21/fil_triton_sagemaker
+
+2. Once JupyterLab is ready, launch the [1_prep_rapids_train_xgb.ipynb](1_prep_rapids_train_xgb.ipynb) notebook with `rapids-2106` conda kernel and run through this notebook to do GPU-accelerated data preprocessing and XGBoost training on credit card transactions dataset for fraud detection use-case. **Make sure to use the `rapids-2106` kernel for this notebook.**
+
+3. Launch the [2_triton_xgb_fil_ensemble.ipynb](2_triton_xgb_fil_ensemble.ipynb) notebook using `conda_python3` kernel (we don't use RAPIDS in this notebook). **Make sure to use the `conda_python3` kernel for this notebook.**  Run through this notebook to learn how to deploy the ensemble data preprocessing + XGBoost model inference pipeline using the Triton's Python and FIL Backends on Triton SageMaker `g4dn.xlarge` endpoint.
diff --git a/sagemaker-triton/fil_ensemble/create_prep_env.sh b/sagemaker-triton/fil_ensemble/create_prep_env.sh
new file mode 100644
index 0000000000..e17353d49a
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/create_prep_env.sh
@@ -0,0 +1,9 @@
+#!/bin/bash
+
+conda create -y -n preprocessing_env python=3.8
+source ~/anaconda3/etc/profile.d/conda.sh
+conda activate preprocessing_env
+export PYTHONNOUSERSITE=True
+conda install -y -c conda-forge pandas scikit-learn
+pip install conda-pack
+conda-pack
diff --git a/sagemaker-triton/fil_ensemble/images/ensemble_model.png b/sagemaker-triton/fil_ensemble/images/ensemble_model.png
new file mode 100644
index 0000000000..43788e10b5
Binary files /dev/null and b/sagemaker-triton/fil_ensemble/images/ensemble_model.png differ
diff --git a/sagemaker-triton/fil_ensemble/images/fil_benchmark.png b/sagemaker-triton/fil_ensemble/images/fil_benchmark.png
new file mode 100644
index 0000000000..60713e74e1
Binary files /dev/null and b/sagemaker-triton/fil_ensemble/images/fil_benchmark.png differ
diff --git a/sagemaker-triton/fil_ensemble/images/fil_model.png b/sagemaker-triton/fil_ensemble/images/fil_model.png
new file mode 100644
index 0000000000..db595b8250
Binary files /dev/null and b/sagemaker-triton/fil_ensemble/images/fil_model.png differ
diff --git a/sagemaker-triton/fil_ensemble/images/model_repo.png b/sagemaker-triton/fil_ensemble/images/model_repo.png
new file mode 100644
index 0000000000..00f98ff4fc
Binary files /dev/null and b/sagemaker-triton/fil_ensemble/images/model_repo.png differ
diff --git a/sagemaker-triton/fil_ensemble/images/preprocessing_model.png b/sagemaker-triton/fil_ensemble/images/preprocessing_model.png
new file mode 100644
index 0000000000..2b0946be36
Binary files /dev/null and b/sagemaker-triton/fil_ensemble/images/preprocessing_model.png differ
diff --git a/sagemaker-triton/fil_ensemble/images/rapids.png b/sagemaker-triton/fil_ensemble/images/rapids.png
new file mode 100644
index 0000000000..b91131b3c4
Binary files /dev/null and b/sagemaker-triton/fil_ensemble/images/rapids.png differ
diff --git a/sagemaker-triton/fil_ensemble/images/triton-ensemble.png b/sagemaker-triton/fil_ensemble/images/triton-ensemble.png
new file mode 100644
index 0000000000..32f5c1a6aa
Binary files /dev/null and b/sagemaker-triton/fil_ensemble/images/triton-ensemble.png differ
diff --git a/sagemaker-triton/fil_ensemble/model_repository/ensemble/config.pbtxt b/sagemaker-triton/fil_ensemble/model_repository/ensemble/config.pbtxt
new file mode 100644
index 0000000000..397db68b6c
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/model_repository/ensemble/config.pbtxt
@@ -0,0 +1,162 @@
+name: "ensemble"
+platform: "ensemble"
+max_batch_size: 882352
+input [
+    {
+        name: "User"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Card"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Year"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Month"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Day"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Time"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Amount"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Use Chip"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Merchant Name"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Merchant City"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Merchant State"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Zip"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "MCC"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Errors?"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    }  
+]
+output [
+  {
+    name: "predictions"
+    data_type: TYPE_FP32
+    dims: [ 1 ]
+  }
+]
+ensemble_scheduling {
+  step [
+    {
+      model_name: "preprocessing"
+      model_version: 1
+      input_map {
+        key: "User"
+        value: "User"
+      }
+      input_map {
+        key: "Card"
+        value: "Card"
+      }
+      input_map {
+        key: "Year"
+        value: "Year"
+      }
+      input_map {
+        key: "Month"
+        value: "Month"
+      }
+      input_map {
+        key: "Day"
+        value: "Day"
+      }
+      input_map {
+        key: "Time"
+        value: "Time"
+      }
+      input_map {
+        key: "Amount"
+        value: "Amount"
+      }
+      input_map {
+        key: "Use Chip"
+        value: "Use Chip"
+      }
+      input_map {
+        key: "Merchant Name"
+        value: "Merchant Name"
+      }
+      input_map {
+        key: "Merchant City"
+        value: "Merchant City"
+      }
+      input_map {
+        key: "Merchant State"
+        value: "Merchant State"
+      }
+      input_map {
+        key: "Zip"
+        value: "Zip"
+      }
+      input_map {
+        key: "MCC"
+        value: "MCC"
+      }
+      input_map {
+        key: "Errors?"
+        value: "Errors?"
+      }
+      output_map {
+        key: "OUTPUT"
+        value: "preprocessed_data"
+      }
+    },
+    {
+      model_name: "fil"
+      model_version: 1
+      input_map {
+        key: "input__0"
+        value: "preprocessed_data"
+      }
+      output_map {
+        key: "output__0"
+        value: "predictions"
+      }
+    }
+  ]
+}
\ No newline at end of file
diff --git a/sagemaker-triton/fil_ensemble/model_repository/fil/config.pbtxt b/sagemaker-triton/fil_ensemble/model_repository/fil/config.pbtxt
new file mode 100644
index 0000000000..c831b6b4c0
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/model_repository/fil/config.pbtxt
@@ -0,0 +1,41 @@
+backend: "fil"
+max_batch_size: 882352
+input [                                 
+ {  
+    name: "input__0"
+    data_type: TYPE_FP32
+    dims: [ 15 ]                    
+  } 
+]
+output [
+ {
+    name: "output__0"
+    data_type: TYPE_FP32
+    dims: [ 1 ]
+  }
+]
+instance_group [{ kind: KIND_GPU }]
+parameters [
+  {
+    key: "model_type"
+    value: { string_value: "xgboost_json" }
+  },
+  {
+    key: "predict_proba"
+    value: { string_value: "false" }
+  },
+  {
+    key: "output_class"
+    value: { string_value: "true" }
+  },
+  {
+    key: "threshold"
+    value: { string_value: "0.5" }
+  },
+  {
+    key: "storage_type"
+    value: { string_value: "AUTO" }
+  }
+]
+
+dynamic_batching {}
\ No newline at end of file
diff --git a/sagemaker-triton/fil_ensemble/model_repository/preprocessing/1/model.py b/sagemaker-triton/fil_ensemble/model_repository/preprocessing/1/model.py
new file mode 100644
index 0000000000..2b573394b6
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/model_repository/preprocessing/1/model.py
@@ -0,0 +1,209 @@
+import pandas as pd
+import os
+import sklearn
+import triton_python_backend_utils as pb_utils
+from sklearn.preprocessing import LabelEncoder
+import numpy as np
+import json
+import pickle
+from pathlib import Path
+
+COLUMNS = [
+    "User",
+    "Card",
+    "Year",
+    "Month",
+    "Day",
+    "Time",
+    "Amount",
+    "Use Chip",
+    "Merchant Name",
+    "Merchant City",
+    "Merchant State",
+    "Zip",
+    "MCC",
+    "Errors?",
+]
+
+STR_COLUMNS = [
+    "Time",
+    "Amount",
+    "Zip",
+    "MCC",
+    "Merchant Name",
+    "Use Chip",
+    "Merchant City",
+    "Merchant State",
+    "Errors?",
+]
+
+ENCODE_COLUMNS = ["Zip", "MCC", "Merchant Name", "Use Chip", "Merchant City", "Merchant State"]
+
+us_states_plus_online = [
+    "AK",
+    "AL",
+    "AR",
+    "AZ",
+    "CA",
+    "CO",
+    "CT",
+    "DC",
+    "DE",
+    "FL",
+    "GA",
+    "HI",
+    "IA",
+    "ID",
+    "IL",
+    "IN",
+    "KS",
+    "KY",
+    "LA",
+    "MA",
+    "MD",
+    "ME",
+    "MI",
+    "MN",
+    "MO",
+    "MS",
+    "MT",
+    "NC",
+    "ND",
+    "NE",
+    "NH",
+    "NJ",
+    "NM",
+    "NV",
+    "NY",
+    "OH",
+    "OK",
+    "OR",
+    "PA",
+    "RI",
+    "SC",
+    "SD",
+    "TN",
+    "TX",
+    "UT",
+    "VA",
+    "VT",
+    "WA",
+    "WI",
+    "WV",
+    "WY",
+    "ONLINE",
+]
+
+LABEL_ENCODERS_FILE = "label_encoders.pkl"
+
+
+class TritonPythonModel:
+    """Your Python model must use the same class name. Every Python model
+    that is created must have "TritonPythonModel" as the class name.
+    """
+
+    def initialize(self, args):
+        """`initialize` is called only once when the model is being loaded.
+        Implementing `initialize` function is optional. This function allows
+        the model to intialize any state associated with this model.
+        Parameters
+        ----------
+        args : dict
+          Both keys and values are strings. The dictionary keys and values are:
+          * model_config: A JSON string containing the model configuration
+          * model_instance_kind: A string containing model instance kind
+          * model_instance_device_id: A string containing model instance device ID
+          * model_repository: Model repository path
+          * model_version: Model version
+          * model_name: Model name
+        """
+        # Parse model config
+
+        self.model_config = json.loads(args["model_config"])
+
+        output_config = pb_utils.get_output_config_by_name(self.model_config, "OUTPUT")
+
+        # Convert Triton types to numpy types
+        self.output_dtype = pb_utils.triton_string_to_numpy(output_config["data_type"])
+
+        cur_folder = Path(__file__).parent
+        with open(str(cur_folder / LABEL_ENCODERS_FILE), "rb") as f:
+            self.encoders = pickle.load(f)
+
+    def execute(self, requests):
+        """`execute` must be implemented in every Python model. `execute`
+        function receives a list of pb_utils.InferenceRequest as the only
+        argument. This function is called when an inference is requested
+        for this model. Depending on the batching configuration (e.g. Dynamic
+        Batching) used, `requests` may contain multiple requests. Every
+        Python model, must create one pb_utils.InferenceResponse for every
+        pb_utils.InferenceRequest in `requests`. If there is an error, you can
+        set the error argument when creating a pb_utils.InferenceResponse.
+        Parameters
+        ----------
+        requests : list
+          A list of pb_utils.InferenceRequest
+        Returns
+        -------
+        list
+          A list of pb_utils.InferenceResponse. The length of this list must
+          be the same as `requests`
+        """
+
+        responses = []
+
+        # Every Python backend must iterate over everyone of the requests
+        # and create a pb_utils.InferenceResponse for each of them.
+
+        for request in requests:
+            # Get input tensors
+            data_dict = {}
+            for col in COLUMNS:
+                data_dict[col] = (
+                    pb_utils.get_input_tensor_by_name(request, col).as_numpy().squeeze(1)
+                )
+                if col in STR_COLUMNS:
+                    data_dict[col] = data_dict[col].astype(str)
+            data = pd.DataFrame(data_dict)
+            data.loc[data["Merchant City"] == "ONLINE", "Merchant State"] = "ONLINE"
+            data.loc[data["Merchant City"] == "ONLINE", "Zip"] = "ONLINE"
+            data["Errors?"] = (data["Errors?"] != "nan").astype("float32")
+
+            data.loc[~data["Merchant State"].isin(us_states_plus_online), "Zip"] = "FOREIGN"
+            data["Amount"] = data["Amount"].str.slice(1)
+            data["Hour"] = data["Time"].str.slice(stop=2)
+            data["Minute"] = data["Time"].str.slice(start=3)
+            data.drop(columns=["Time"], inplace=True)
+
+            for col in ENCODE_COLUMNS:
+                le = LabelEncoder()
+                le.classes_ = self.encoders[col]
+                data[col] = le.transform(data[col])
+
+            # Create output tensors. You need pb_utils.Tensor
+            # objects to create pb_utils.InferenceResponse.
+
+            # FIL XGboost expects fp32 input
+            data_np = data.values.astype(self.output_dtype)
+            data_tensor = pb_utils.Tensor("OUTPUT", data_np)
+
+            # Create InferenceResponse. You can set an error here in case
+            # there was a problem with handling this inference request.
+            # Below is an example of how you can set errors in inference
+            # response:
+            #
+            # pb_utils.InferenceResponse(
+            #    output_tensors=..., TritonError("An error occured"))
+            inference_response = pb_utils.InferenceResponse(output_tensors=[data_tensor])
+            responses.append(inference_response)
+
+        # You should return a list of pb_utils.InferenceResponse. Length
+        # of this list must match the length of `requests` list.
+        return responses
+
+    def finalize(self):
+        """`finalize` is called only once when the model is being unloaded.
+        Implementing `finalize` function is optional. This function allows
+        the model to perform any necessary clean ups before exit.
+        """
+        print("Cleaning up...")
diff --git a/sagemaker-triton/fil_ensemble/model_repository/preprocessing/config.pbtxt b/sagemaker-triton/fil_ensemble/model_repository/preprocessing/config.pbtxt
new file mode 100644
index 0000000000..b81ad52965
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/model_repository/preprocessing/config.pbtxt
@@ -0,0 +1,94 @@
+name: "preprocessing"
+backend: "python"
+max_batch_size: 882352
+input [
+    {
+        name: "User"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Card"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Year"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Month"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Day"
+        data_type: TYPE_FP32
+        dims: [ 1 ]
+    },
+    {
+        name: "Time"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Amount"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Use Chip"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Merchant Name"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Merchant City"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Merchant State"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Zip"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "MCC"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    },
+    {
+        name: "Errors?"
+        data_type: TYPE_STRING
+        dims: [ 1 ]
+    }
+    
+]
+output [
+    {
+        name: "OUTPUT"
+        data_type: TYPE_FP32
+        dims: [ 15 ]
+    }
+]
+
+instance_group [
+    {
+        count: 1
+        kind: KIND_CPU
+    }
+]
+parameters: {
+  key: "EXECUTION_ENV_PATH",
+  value: {string_value: "$$TRITON_MODEL_DIRECTORY/preprocessing_env.tar.gz"}
+}
diff --git a/sagemaker-triton/fil_ensemble/on_start.sh b/sagemaker-triton/fil_ensemble/on_start.sh
new file mode 100644
index 0000000000..26d35a3ac5
--- /dev/null
+++ b/sagemaker-triton/fil_ensemble/on_start.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+
+set -e
+
+sudo -u ec2-user -i <<'EOF'
+
+mkdir -p rapids_kernel
+cd rapids_kernel
+wget -q https://rapidsai-data.s3.us-east-2.amazonaws.com/conda-pack/rapidsai/rapids21.06_cuda11.0_py3.8.tar.gz
+echo "wget completed"
+tar -xzf *.gz
+echo "unzip completed"
+source /home/ec2-user/rapids_kernel/bin/activate
+conda-unpack 
+echo "unpack completed"
+python -m ipykernel install --user --name rapids-2106
+echo "kernel install completed"
+
+EOF