Skip to content

Commit

Permalink
Add FeatureMetadata related APIs (aws#3486)
Browse files Browse the repository at this point in the history
  • Loading branch information
imingtsou authored Jul 20, 2022
1 parent 8167b4d commit 45a9c96
Showing 1 changed file with 100 additions and 41 deletions.
141 changes: 100 additions & 41 deletions sagemaker-featurestore/feature_store_introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,21 +24,21 @@
"This notebook uses both `boto3` and Python SDK libraries, and the `Python 3 (Data Science)` kernel. This notebook works with Studio, Jupyter, and JupyterLab. \n",
"\n",
"#### Library dependencies:\n",
"* sagemaker>=2.0.0\n",
"* numpy\n",
"* sagemaker>=2.100.0\n",
"* NumPy\n",
"* pandas\n",
"\n",
"#### Role requirements:\n",
"**IMPORTANT**: You must attach the following policies to your execution role:\n",
"* AmazonS3FullAccess\n",
"* AmazonSageMakerFeatureStoreAccess "
"* `AmazonS3FullAccess`\n",
"* `AmazonSageMakerFeatureStoreAccess`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Feature Store Policy](images/feature-store-policy.png)"
"![policy](images/feature-store-policy.png)"
]
},
{
Expand All @@ -54,12 +54,14 @@
"metadata": {},
"outputs": [],
"source": [
"# SageMaker Python SDK version 2.x is required\n",
"# SageMaker Python SDK version 2.100.0 is required\n",
"# boto3 version 1.24.20 is required\n",
"import sagemaker\n",
"import boto3\n",
"import sys\n",
"\n",
"original_version = sagemaker.__version__\n",
"%pip install 'sagemaker>=2.0.0'"
"!pip install 'sagemaker>=2.100.0'\n",
"!pip install 'boto3>=1.24.20'"
]
},
{
Expand All @@ -68,7 +70,6 @@
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import pandas as pd\n",
"import numpy as np\n",
"import io\n",
Expand Down Expand Up @@ -130,7 +131,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![Feature Store Policy](images/feature_store_data_ingest.svg)"
"![data flow](images/feature_store_data_ingest.svg)"
]
},
{
Expand Down Expand Up @@ -194,7 +195,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Append EventTime feature to your data frame. This parameter is required, and time stamps each data point."
"Append `EventTime` feature to your data frame. This parameter is required, and time stamps each data point."
]
},
{
Expand Down Expand Up @@ -290,15 +291,6 @@
").list_feature_groups() # We use the boto client to list FeatureGroups"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ingest data into a feature group\n",
"\n",
"After the FeatureGroups have been created, we can put data into the FeatureGroups by using the `PutRecord` API. It will take < 1min to ingest data both of these FeatureGroups."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -318,6 +310,83 @@
"check_feature_group_status(orders_feature_group)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add metadata to a feature\n",
"\n",
"We can put some searchable metadata to the features of the FeatureGroup by using the `UpdateFeatureMetadata` API. The current support metadata fields are `description` and `parameters`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sagemaker.feature_store.inputs import FeatureParameter\n",
"\n",
"customers_feature_group.update_feature_metadata(\n",
" feature_name=\"customer_id\",\n",
" description=\"The ID of a customer. It is also used in orders_feature_group.\",\n",
" parameter_additions=[FeatureParameter(\"idType\", \"primaryKey\")],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To confirm that the feature has been updated with the new metadata, we use `DescribeFeatureMetadata` to display that feature."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"customers_feature_group.describe_feature_metadata(feature_name=\"customer_id\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Feature metadata fields are searchable. We use `search` API with filters to display the specific feature."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sagemaker_session.boto_session.client(\"sagemaker\", region_name=region).search(\n",
" Resource=\"FeatureMetadata\",\n",
" SearchExpression={\n",
" \"Filters\": [\n",
" {\n",
" \"Name\": \"FeatureGroupName\",\n",
" \"Operator\": \"Contains\",\n",
" \"Value\": \"customers-feature-group-\",\n",
" },\n",
" {\"Name\": \"Parameters.idType\", \"Operator\": \"Equals\", \"Value\": \"primaryKey\"},\n",
" ]\n",
" },\n",
") # We use the boto client to search"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ingest data into a feature group\n",
"\n",
"We can put data into the FeatureGroup by using the `PutRecord` API. It will take < 1 minute to ingest data."
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -340,7 +409,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Using an arbirary customer record id, 573291 we use `get_record` to check that the data has been ingested into the feature group."
"Using an arbitrary customer record ID, 573291 we use `get_record` to check that the data has been ingested into the feature group."
]
},
{
Expand Down Expand Up @@ -370,7 +439,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We use `batch_get_record` to check that all data has been ingested into two feature groups by providing customer ids."
"We use `batch_get_record` to check that all data has been ingested into two feature groups by providing customer IDs."
]
},
{
Expand Down Expand Up @@ -422,19 +491,6 @@
"orders_feature_group.delete()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%bash -s \"$original_version\"\n",
"\n",
"# preserve original sagemaker version\n",
"\n",
"pip install sagemaker==$1"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -446,7 +502,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook you learnt how to quickly get started with Feature Store and now know how to create feature groups, and ingest data into them.\n",
"In this notebook you learned how to quickly get started with Feature Store and now know how to create feature groups, and ingest data into them.\n",
"\n",
"For an advanced example on how to use Feature Store for a Fraud Detection use-case, see [Fraud Detection with Feature Store](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-featurestore/sagemaker_featurestore_fraud_detection_python_sdk.html).\n",
"\n",
Expand All @@ -464,7 +520,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook we used a variety of different API calls. Most of them are accessible through the Python SDK, however some only exist within `boto3`. You can invoke the Python SDK API calls directly on your Feature Store objects, whereas to invoke API calls that exist within `boto3`, you must first access a boto client through your boto and sagemaker sessions: e.g.,`sagemaker_session.boto_session.client()`.\n",
"In this notebook we used a variety of different API calls. Most of them are accessible through the Python SDK, however some only exist within `boto3`. You can invoke the Python SDK API calls directly on your Feature Store objects, whereas to invoke API calls that exist within `boto3`, you must first access a boto client through your boto and sagemaker sessions: e.g. `sagemaker_session.boto_session.client()`.\n",
"\n",
"Below we list API calls used in this notebook that exist within the Python SDK and ones that exist in `boto3` for your reference. \n",
"\n",
Expand All @@ -474,20 +530,23 @@
"* `delete()`\n",
"* `create()`\n",
"* `load_feature_definitions()`\n",
"* `update_feature_metadata()`\n",
"* `describe_feature_metadata()`\n",
"\n",
"#### Boto3 API Calls\n",
"* `list_feature_groups()`\n",
"* `get_record()`\n",
"* `batch_get_record()`\n"
"* `batch_get_record()`\n",
"* `search()`\n"
]
}
],
"metadata": {
"instance_type": "ml.t3.medium",
"kernelspec": {
"display_name": "Environment (conda_anaconda3)",
"display_name": "Python 3",
"language": "python",
"name": "conda_anaconda3"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -499,7 +558,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
"version": "3.9.13"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 45a9c96

Please sign in to comment.