Skip to content

Commit

Permalink
refactor text
Browse files Browse the repository at this point in the history
  • Loading branch information
atqy committed May 5, 2022
1 parent 41540dc commit 233c48f
Showing 1 changed file with 25 additions and 13 deletions.
38 changes: 25 additions & 13 deletions prep_data/tabular_data/train_featurize_train_tabular_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preprocessing Tabular Data\n",
"# Preprocessing Tabular Data\n",
"\n",
"The purpose of this notebook is to demonstrate how to preprocess tabular data for training a machine learning model via Amazon SageMaker. In this notebook we focus on preprocessing our tabular data. In a sequel notebook, [02_feature_selection_tabular_data.ipynb](02_feature_selection_tabular_data.ipynb) we use our preprocessed tabular data to select important features and prune unimportant ones out. In our final sequel notebook, [03_training_model_on_tabular_data.ipynb](03_training_model_on_tabular_data.ipynb) we use our selected features to train a machine learning model. We showcase how to preprocess 2 different tabular data sets. \n",
"In this notebook, we focus on preprocessing tabular data. Then, we use our preprocessed tabular data to select important features and prune unimportant ones out. Finally, we use our selected features to train a machine learning model. We showcase how to preprocess 2 different tabular data sets. \n",
"\n",
"## Contents\n",
"1. [Part 1: Download and Process the Dataset](#Part-1:-Download-and-Process-the-Dataset)\n",
"1. [Part 2: Feature Selection for Tabular Data](#Part-2:-Feature-Selection-for-Tabular-Data)\n",
"1. [Part 3: Training a Model on Tabular Data using Amazon SageMaker](#Part-3:-Training-a-Model-on-Tabular-Data-using-Amazon-SageMaker)\n",
"\n",
"#### Notes\n",
"In this notebook, we use the sklearn framework for data partitionining and `storemagic` to share dataframes in [02_feature_selection_tabular_data.ipynb](02_feature_selection_tabular_data.ipynb) and [03_training_model_on_tabular_data.ipynb](03_training_model_on_tabular_data.ipynb). While we load data into memory here we do note that is it possible to skip this and load your partitioned data directly to an S3 bucket.\n",
"## Dataset and Package Dependencies\n",
"\n",
"#### Tabular Data Sets\n",
"* [california house data](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html)\n",
"* [diabetes data ](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)\n",
"### Tabular Data Sets\n",
"* [California House Dataset](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html)\n",
"* [Diabetes Dataset](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)\n",
"\n",
"\n",
"#### Library Dependencies:\n",
"### Library Dependencies:\n",
"* sagemaker>=2.15.0\n",
"* numpy \n",
"* pandas\n",
Expand All @@ -31,7 +34,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setting up the notebook"
"## Setting up the notebook"
]
},
{
Expand Down Expand Up @@ -96,6 +99,15 @@
"print(role)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Part 1: Download and Process the Dataset\n",
"\n",
"This section demonstrates how to preprocess tabular data for training a machine learning model via Amazon SageMaker"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -248,9 +260,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Selection for Tabular Data\n",
"## Part 2: Feature Selection for Tabular Data\n",
"\n",
"The purpose of this notebook is to demonstrate how to select important features and prune unimportant ones prior to training our machine learning model. This is an important step that yields better prediction performance. "
"This section demonstrates how to select important features and prune unimportant ones prior to training our machine learning model. This is an important step that yields better prediction performance. "
]
},
{
Expand Down Expand Up @@ -462,9 +474,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training a Model on Tabular Data using Amazon SageMaker\n",
"## Part 3: Training a Model on Tabular Data using Amazon SageMaker\n",
"\n",
"The purpose of this notebook is to demonstrate how to train a machine learning model via Amazon SageMaker using tabular data. In this notebook you can train either an XGBoost or Linear Learner (regression) model on tabular data in Amazon SageMaker. \n"
"This section demonstrates how to train a machine learning model via Amazon SageMaker using tabular data. You can train either an XGBoost or Linear Learner (regression) model on tabular data in Amazon SageMaker. \n"
]
},
{
Expand Down

0 comments on commit 233c48f

Please sign in to comment.