This lab is provided as part of AWS Innovate Data Edition, click here to explore the full list of hands-on labs.
ℹ️ You will run this lab in your own AWS account. Please follow directions at the end of the lab to remove resources to avoid future costs.
In this lab you will learn the basics of how to use Amazon Personalize in order to create a recommendation system. Be aware that the data upload and training steps do take a long period to perform.
Amazon Personalize is a service which is based off the same technology used at Amazon.com. Amazon Personalize is designed for users who would like to have a managed recommendation engine, but may not have the experience required to build their own.
Due to the import, training and creation of a recommendation engine, be prepared for a lengthy time period waiting for the service to finish.
In this lab, we will be creating and associating resources for Aamzon Personalize in the Sydney region. You can easily use Amazon Personalize in other supported regions.
For pricing, please refer to the pricing page.
Estimated Time | Estimated Cost |
---|---|
1.5hrs | Free ( Free Trial ) |
-
3.2 Trusted Entity
-
Creating your Personalize Dataset Group
4.3 Importing Data
-
ML Ops, Automation, Filtering and Leveraging Contextual Information
In order to understand how Amazon Personalize works, we need to refer to some terminology:
-
Dataset Groups
A Dataset Group are domain specific containers for your recommendations
-
Datasets
Datasets are data used in order to create solutions which then generate recommendations
-
Schema
The datasets which you will use in Personalise needs a Schema defined before import, this is provided as a JSON string.
-
Solution
A solution is a custom model generated on your datasets to provide recommendations
-
Launch Campaign
A campaign allows an application to retrieve recommendations. Analytics on a campaign's usage is also available
At a high order, the process is as follows:
- Import datasets and associate their appropriate schema
- Train the model by selecting a Recipe
In order to use personalize, you need to have a csv dataset for each of these types:
- Users
- Items
- User-Item Interactions
Preferrably, you would use all three to achieve the best results.
Please download all the files below:
Name | Schema | Data File |
---|---|---|
item.csv | items_schema.json | item.csv |
users.csv | users_schema.json | users.csv |
user-interactions.csv | user-interactions.json | user-interactions.csv |
In this lab, we will be creating resources in the Sydney Region, you can easily subsitute with another supported region if you are operating in a different region.
Start by creating a S3 bucket in the Sydney Region using the Web Console. Leave the public access to default settings. Please remember the name of the bucket as we will be using it in the next couple steps.
Change the bucket policy to allow the Amazon Personalize service access to S3 files: Permissions > Bucket Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<s3_bucket>"
},
{
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::<s3_bucket>/*"
}
]
}
The reason we need to do this is due to the Amazon Personalize service needing access to our S3 bucket, and its generally best practice to assign least privilege permissions.
Once you have created the S3 Bucket and assigned the permissions, you may upload the data files into the bucket. We will reference the S3 Path shortly in a later step.
We need to create an IAM policy for Amazon Personalize to use.
You'll need two policies attached to this role. The Amazon Managed Amazon Personalize Full Access role, and an inline policy.
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:ListBucket"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<s3_bucket>"
]
},
{
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::<s3_bucket>/*"
]
}
]
}
For the name of the policy, remember the name you give the policy, but in my example i will use AmazonPersonalize-ExecutionPolicy.
On the web console, navigate to IAM and click on Roles > Create Role.
Give the Role a name, such as AmazonPersonalize-ExecutionRole.
Attach the policy you have just created, and also attach the managed IAM policy - AmazonPersonalizeFullAccess. Attached IAM Policies should look like the following - your policy name may be different:
On the Role Summary view, click on the Trust Relationships tab and edit the trust relationship to look like the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
The trust relationship tab should look like the following:
We perform this action in order to allow the Amazon Personalize service to assume this role, in order to perform actions against other AWS services. If we don't perform this step, we will encounter an IAM permission error.
What we need to do now is import 3 sets of data:
- Items
- Users
- User-Interactions
Because this step does take a bit of time to finish, you don't need to wait for one import to finish before starting another, instead you should aim to upload all three datasets in parallel.
Start by navigating to the Amazon Personalize using the web console, and click on View Dataset Groups.
Then click on Create Dataset Group
Provide a memorable name for your Dataset group, this dataset will contain items, user and user-interactions.
You will now have to input your dataset name, the name of the schema and schema JSON. Make sure the Schema Name is relevant to the dataset you are about to upload.
Start by inputting a memorable dataset name, and click on Create new Schema
Then input the schema JSON string from the downloaded file(s) and hit next.
Be sure to upload the correct schema with the correct dataset type!
Fillout the import job name, and if you haven't created an IAM service role, select the Create a new role option.
Then fill in the S3 location, taking note the required S3 url format.
The fastest way to copy the correct format for a file is to use the copy path option when a S3 item is selected.
Once you have successfully imported all three data types, you may move onto creating a solution.
Currently, as of 04-FEB-2021, some older algorithms have been deprecated in favour of newer models, so the latest available list of Amazon Personalize algorithms are as follows:
Algorithm | Explanation |
---|---|
aws-sims | Computes items similar to a given item based on co-occurence of items in the user-item interactions dataset |
aws-personalized-ranking | Reranks an input list of items for a given user. Trains on user-item interactions dataset, item metadata and user metadata |
aws-user-personalization | Predicts items a user will interact with and performs exploration on cold items. Based on Hierarchial Recurrent Neural Networks which model the temporal order of user-item interactions |
aws-popularity-count | Calculates popularity of items based on total number of events for each item in the user-item interactions dataset. |
aws-hrnn (legacy) | Predicts items a user will interact with. A Hierarchical Recurrent Neural Network which models the temporal order of user-item interactions. |
aws-hrnn-coldstart (legacy) | Predicts items a user will interact with. HRNN - metadata with personalized exploration of new items. |
aws-hrnn-metadata (legacy) | Predicts items a user will interact with. HRNN with additional features derived from contextual metadata (user-item interactions metadata), user metadata (user dataset) and item metadata (item dataset). |
Click on the Create solution button on the personalize dashboard.
Specify a solution name, and select a recipe. Lets use aws-user-personalization for this example. You can leave the optional fields blank.
Note
If you are seeing an IAM Role permission error, you need to check:
- Amazon Personalize Service is allowed access to the bucket
( S3 > Permissions > Bucket Policy ) - IAM Role has policies to access the S3 bucket.
( IAM Role > Policy )
You'll need to wait for the solution to finish, and then you can proceed to the next step.
When Amazon Personalize has finished training the solution, you need to create a campaign to interact with the trained solution.
6.1 Click on the "Create a new Campaign" button under Launch campaigns
6.2 Fill in the campaign details and Create the campaign
6.3 Wait for the campaign to finish creating, and then you can begin to retrieve recommendations.
Once the Solution has been made available, you can quickly test a recommendation by either using the SDK, or using the web console.
For simplicity we will be using the web console.
Web Console:
Start by selecting your campaign which you have created:
Then you can put a user id into the field, and click Get Recommendations.
Examples below show two different user id returning two different results.
The Personalize Score is a value which may be used to apply additional business logic on recommendations.
Please refer to this recommendation score blog for more information.
For Running a personalize campaign using CLI, you can use the command below:
aws personalize-runtime get-recommendations --campaign-arn <arn> --user-id <userid>
Please see the documentation here for running Personalize using Python, or another language like NodeJS.
You can stream events directly into Personalize, by configuring the Event Tracking and including the SDK in your application, or tuning your hyperparameters & HPO.
With Hyperparameter Tuning, the general approach is to create two models, one with and one without hyperparmeter tuning, in order to compare models and their results. As data and usage patterns drift over time, its generally recommended that you don't retrain models with hyperparameter tuning for every retraining, but instead perform training with hyperparameters on designated time intervals, for example every 6-12 months.
The usage patterns of your data may also change depending on your business usecase, so please reach out to your AWS Account team for additional support.
ML Ops is rapidly gaining traction, and is outside the scope of this lab, but this link showcases how you may use AWS Step Functions to construct an automation pipeline. The benefits of using Step Functions is both because Step Functions is a scalable state machine used to orchestrate and automate business processes, but supports business logic with try/catch, errors and rollback capabilities.
Amazon Personalise AI/ML Ops - Automation
So you have trained your first Amazon Personalize model, and you want to apply business rules to your recommendations on the fly, without additional cost. You can use Dynamic Filters without the need to define all possible permutations of your business rules in advance. In the eCommerce example, this could be brand names, shipping speeds, or ratings. For Video usecases, this could be directors, actors or even premium service subscription status.
Applying business rules to Amazon Personalize by using Dynamic Filtering
With your application, you might have a business workflow where context matters, such as device type, location, time of day or other information that you provide. Users may interact with your application differently from a phone vs. a computer, or even on rainy vs. sunny days. Leverging this contextual information lets you provide an even higher level of personalisation experience for your users, and can reduce the cold-start phase for new users.
- Amazon Personalize Product Page
- Amazon Personalize Developer Guide
- Amazon Personalize Scores Blog
- Tuning Hyperparameters & HPO
Recording Events
SDK
Customers
If you have any feedback, concerns or would like to have a chat, please send me an email.
Steven Tseng ([email protected])
Solutions Architect - Digital Natives MEL/SYD