forked from aws/amazon-sagemaker-examples
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FEATURE] Add SageMaker Pipeline local mode example with BYOC and Fra…
…meworkProcessor (aws#3614) * added framework-processor-local-pipelines * black-np on notebook * updated README.md * solving problems for commit id fc80e0d * solved formatting problem in notebook * reviewed notebook content, added dataset description, download dataset ffrom public sagemaker s3 bucket * grammar check * changed dataset to synthetic transactions dataset * removed reference to dataset origin * updated to main branch * fixing grammar spell Co-authored-by: Aaron Markham <[email protected]>
- Loading branch information
Showing
10 changed files
with
1,526 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
44 changes: 44 additions & 0 deletions
44
sagemaker-pipelines/tabular/local-mode/framework-processor-byoc/code/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
## buildspec.sh | ||
|
||
This script allows you to create the source artifact to use for SageMaker Training Jobs or SageMaker Processing Jobs | ||
|
||
Parameters: | ||
* ALGORITHM_NAME: Mandatory - Name of the algorithm you want to package | ||
* S3_BUCKET_NAME: Optional - S3 bucket name where the package will be uploaded in the path s3://<S3_BUCKET_NAME>/artifact/<ALGORITHM_NAME> | ||
|
||
``` | ||
.buildspec.sh <ALGORITHM_NAME> <ACCOUNT_ID> <S3_BUCKET_NAME> <KMS_ALIAS> | ||
``` | ||
|
||
### Example: | ||
|
||
#### Processing | ||
|
||
``` | ||
./buildspec.sh processing test-bucket | ||
``` | ||
|
||
#### Training | ||
|
||
``` | ||
./buildspec.sh training test-bucket | ||
``` | ||
## build_image.sh | ||
|
||
This script allows you to create a custom docker image and push on ECR | ||
|
||
Parameters: | ||
* IMAGE_NAME: *Mandatory* - Name of the image you want to build | ||
* REGISTRY_NAME: *Mandatory* - Name of the ECR repository you want to use for pushing the image | ||
* IMAGE_TAG: *Mandatory* - Tag to apply to the ECR image | ||
* DOCKER_FILE: *Mandatory* - Dockerfile to build | ||
* PLATFORM: *Optional* - Target architecture chip where the image is executed | ||
``` | ||
./build_image.sh <IMAGE_NAME> <REGISTRY_NAME> <IMAGE_TAG> <DOCKER_FILE> <PLATFORM> | ||
``` | ||
|
||
Examples: | ||
|
||
``` | ||
./build_image.sh training torch-1.12.1 latest Dockerfile linux/amd64 | ||
48 changes: 48 additions & 0 deletions
48
sagemaker-pipelines/tabular/local-mode/framework-processor-byoc/code/build_image.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
#!/bin/sh | ||
|
||
# The name of our algorithm | ||
repo=$1 | ||
registry_name=$2 | ||
image_tag=$3 | ||
docker_file=$4 | ||
platforms=$5 | ||
|
||
echo "[INFO]: registry_name=${registry_name}" | ||
echo "[INFO]: image_tag=${image_tag}" | ||
echo "[INFO]: docker_file=${docker_file}" | ||
echo "[INFO]: platforms=${platforms}" | ||
|
||
cd $repo | ||
|
||
account=$(aws sts get-caller-identity --query Account --output text) | ||
|
||
# Get the region defined in the current configuration (default to us-west-2 if none defined) | ||
region=$(aws configure get region) | ||
|
||
echo "[INFO]: Region ${region}" | ||
|
||
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${registry_name}:${image_tag}" | ||
|
||
echo "[INFO]: Image name: ${fullname}" | ||
|
||
# If the repository doesn't exist in ECR, create it. | ||
|
||
aws ecr describe-repositories --repository-names "${registry_name}" > /dev/null 2>&1 | ||
|
||
aws ecr create-repository --repository-name "${registry_name}" > /dev/null | ||
|
||
## If you are extending Amazon SageMaker Images, you need to login to the account | ||
# Get the login command from ECR and execute it directly | ||
password=$(aws ecr --region ${region} get-login-password) | ||
|
||
docker login -u AWS -p ${password} "${account}.dkr.ecr.${region}.amazonaws.com" | ||
|
||
if [ -z ${platforms} ] | ||
then | ||
docker build -t ${fullname} -f ${docker_file} . | ||
else | ||
echo "Provided platform = ${platforms}" | ||
docker build -t ${fullname} -f ${docker_file} . --platform=${platforms} | ||
fi | ||
|
||
docker push ${fullname} |
47 changes: 47 additions & 0 deletions
47
sagemaker-pipelines/tabular/local-mode/framework-processor-byoc/code/buildspec.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# © 2021 Amazon Web Services, Inc. or its affiliates. All Rights Reserved. | ||
# | ||
# This AWS Content is provided subject to the terms of the AWS Customer Agreement | ||
# available at http://aws.amazon.com/agreement or other written agreement between | ||
# Customer and either Amazon Web Services, Inc. or Amazon Web Services EMEA SARL or both. | ||
|
||
#!/bin/sh | ||
|
||
REPO=$1 | ||
S3_BUCKET_NAME=$2 | ||
|
||
NAME=sourcedir | ||
PUSH=true | ||
|
||
if [ -z ${REPO} ] ; | ||
then | ||
echo "Repository not specified" | ||
exit 1 | ||
fi | ||
|
||
SCRIPT=$(readlink -f "$0") | ||
# Absolute path this script is in, thus /home/user/bin | ||
SCRIPTPATH=$(dirname "$SCRIPT") | ||
|
||
cd $SCRIPTPATH/$REPO/src | ||
tar --exclude='data' -czvf ${NAME}.tar.gz * | ||
|
||
rm -rf ../dist/$REPO | ||
mkdir ../dist | ||
mkdir ../dist/$REPO | ||
|
||
mv ${NAME}.tar.gz ../dist/$REPO | ||
|
||
if [ -z ${S3_BUCKET_NAME} ] ; | ||
then | ||
echo "S3 Bucket not specified, no upload" | ||
PUSH=false | ||
fi | ||
|
||
if $PUSH ; | ||
then | ||
echo "Uploading s3://${S3_BUCKET_NAME}/artifact/${REPO}/${NAME}.tar.gz" | ||
|
||
aws s3 cp ../dist/${REPO}/${NAME}.tar.gz s3://${S3_BUCKET_NAME}/artifact/${REPO}/${NAME}.tar.gz | ||
else | ||
exit 1 | ||
fi |
137 changes: 137 additions & 0 deletions
137
...maker-pipelines/tabular/local-mode/framework-processor-byoc/code/processing/processing.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
import argparse | ||
import csv | ||
import logging | ||
import numpy as np | ||
import os | ||
from os import listdir | ||
from os.path import isfile, join | ||
import pandas as pd | ||
from sklearn.model_selection import train_test_split | ||
import traceback | ||
|
||
logging.basicConfig(level=logging.INFO) | ||
LOGGER = logging.getLogger(__name__) | ||
|
||
BASE_PATH = os.path.join("/", "opt", "ml") | ||
PROCESSING_PATH = os.path.join(BASE_PATH, "processing") | ||
PROCESSING_PATH_INPUT = os.path.join(PROCESSING_PATH, "input") | ||
PROCESSING_PATH_OUTPUT = os.path.join(PROCESSING_PATH, "output") | ||
|
||
def extract_data(file_path, percentage=100): | ||
try: | ||
files = [f for f in listdir(file_path) if isfile(join(file_path, f)) and f.endswith(".csv")] | ||
LOGGER.info("{}".format(files)) | ||
|
||
frames = [] | ||
|
||
for file in files: | ||
df = pd.read_csv( | ||
os.path.join(file_path, file), | ||
sep=",", | ||
quotechar='"', | ||
quoting=csv.QUOTE_ALL, | ||
escapechar='\\', | ||
encoding='utf-8', | ||
error_bad_lines=False | ||
) | ||
|
||
df = df.head(int(len(df) * (percentage / 100))) | ||
|
||
frames.append(df) | ||
|
||
df = pd.concat(frames) | ||
|
||
return df | ||
except Exception as e: | ||
stacktrace = traceback.format_exc() | ||
LOGGER.error("{}".format(stacktrace)) | ||
|
||
raise e | ||
|
||
def load_data(df, file_path, file_name): | ||
try: | ||
if not os.path.exists(file_path): | ||
os.makedirs(file_path) | ||
|
||
path = os.path.join(file_path, file_name + ".csv") | ||
|
||
LOGGER.info("Saving file in {}".format(path)) | ||
|
||
df.to_csv( | ||
path, | ||
index=False, | ||
header=True, | ||
quoting=csv.QUOTE_ALL, | ||
encoding="utf-8", | ||
escapechar="\\", | ||
sep="," | ||
) | ||
except Exception as e: | ||
stacktrace = traceback.format_exc() | ||
LOGGER.error("{}".format(stacktrace)) | ||
|
||
raise e | ||
|
||
def transform_data(df): | ||
try: | ||
df = df[df['Is Fraud?'].notna()] | ||
|
||
df.insert(0, 'ID', range(1, len(df) + 1)) | ||
|
||
df["Errors?"].fillna('', inplace=True) | ||
df['Errors?'] = df['Errors?'].map(lambda x: x.strip()) | ||
df["Errors?"] = df["Errors?"].map({ | ||
"Insufficient Balance": 0, | ||
"Technical Glitch": 1, | ||
"Bad PIN": 2, | ||
"Bad Expiration": 3, | ||
"Bad Card Number": 4, | ||
"Bad CVV": 5, | ||
"Bad PIN,Insufficient Balance": 6, | ||
"Bad PIN,Technical Glitch": 7, | ||
"": 8 | ||
}) | ||
|
||
df["Use Chip"].fillna('', inplace=True) | ||
df['Use Chip'] = df['Use Chip'].map(lambda x: x.strip()) | ||
df["Use Chip"] = df["Use Chip"].map({ | ||
"Swipe Transaction": 0, | ||
"Chip Transaction": 1, | ||
"Online Transaction": 2 | ||
}) | ||
|
||
df['Is Fraud?'] = df['Is Fraud?'].map(lambda x: x.replace("'", "")) | ||
df['Is Fraud?'] = df['Is Fraud?'].map(lambda x: x.strip()) | ||
df['Is Fraud?'] = df['Is Fraud?'].replace('', np.nan) | ||
df['Is Fraud?'] = df['Is Fraud?'].replace(' ', np.nan) | ||
|
||
df["Is Fraud?"] = df["Is Fraud?"].map({"No": 0, "Yes": 1}) | ||
|
||
df = df.rename( | ||
columns={'Card': 'card', 'MCC': 'mcc', "Errors?": "errors", "Use Chip": "use_chip", "Is Fraud?": "labels"}) | ||
|
||
df = df[["card", "mcc", "errors", "use_chip", "labels"]] | ||
|
||
return df | ||
|
||
except Exception as e: | ||
stacktrace = traceback.format_exc() | ||
LOGGER.error("{}".format(stacktrace)) | ||
|
||
raise e | ||
|
||
if __name__ == '__main__': | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("--dataset-percentage", type=int, required=False, default=100) | ||
args = parser.parse_args() | ||
|
||
LOGGER.info("Arguments: {}".format(args)) | ||
|
||
df = extract_data(PROCESSING_PATH_INPUT, args.dataset_percentage) | ||
|
||
df = transform_data(df) | ||
|
||
data_train, data_test = train_test_split(df, test_size=0.2, shuffle=True) | ||
|
||
load_data(data_train, os.path.join(PROCESSING_PATH_OUTPUT, "train"), "train") | ||
load_data(data_test, os.path.join(PROCESSING_PATH_OUTPUT, "test"), "test") |
1 change: 1 addition & 0 deletions
1
...er-pipelines/tabular/local-mode/framework-processor-byoc/code/processing/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
pandas |
54 changes: 54 additions & 0 deletions
54
sagemaker-pipelines/tabular/local-mode/framework-processor-byoc/code/training/.dockerignore
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
.idea/ | ||
.vscode/ | ||
*/**/data/ | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Logs | ||
**/logs | ||
*.log | ||
|
||
# OS generated files | ||
.DS_Store | ||
.DS_Store? | ||
._* | ||
.Spotlight-V100 | ||
.Trashes | ||
ehthumbs.db | ||
Thumbs.db | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
pip-wheel-metadata/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST |
25 changes: 25 additions & 0 deletions
25
sagemaker-pipelines/tabular/local-mode/framework-processor-byoc/code/training/Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime | ||
|
||
ARG PYTHON=python3 | ||
ARG PYTHON_PIP=python3-pip | ||
ARG PIP=pip3 | ||
|
||
RUN apt-get update && apt-get install gcc -y | ||
|
||
RUN ${PIP} --no-cache-dir install --upgrade pip | ||
|
||
RUN ${PIP} install \ | ||
pandas \ | ||
scikit-learn | ||
|
||
WORKDIR / | ||
|
||
ENV PYTHONDONTWRITEBYTECODE=1 \ | ||
PYTHONUNBUFFERED=1 \ | ||
LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/lib" \ | ||
PYTHONIOENCODING=UTF-8 \ | ||
LANG=C.UTF-8 \ | ||
LC_ALL=C.UTF-8 | ||
|
||
RUN ${PIP} install --no-cache --upgrade \ | ||
sagemaker-pytorch-training |
Oops, something went wrong.