From ab01615e1b5101db476ca7c6f9d35fbcd62f48f3 Mon Sep 17 00:00:00 2001 From: Ajay Karpur Date: Wed, 11 Nov 2020 17:30:51 -0700 Subject: [PATCH] merge changes from public repository (#92) * GluonCV YoloV3 Darknet53 example training and inference with Neo (#1266) * upgrade MNIST experiment notebook to SDK v2 (#1576) * GluonCV YoloV3 Darknet53 example minor fixes (#1582) * Code cell type corrected. Removed empty cell * Unzip datasets if not available in the notebook's folder * fix invalid json in MNIST notetook (#1594) * Kkoppolu inference examples (#1587) * Compilation examples changes for new inference containers Update examples for PyTorch - to use the new inference containers - Use SageMaker 2.x * Clear outputs Clear outputs in the notebook * Fix typo Fix typo in text box * Undo change to iterations in old way Undo change to iterations in old way * Code Review feedback Organize imports Code Review feedback * CR Use new inference containers for both uncompiled and compiled flows. * CR Remove incorrect code comments * Update versions of torch and torchvision Co-authored-by: EC2 Default User * add template notebook (#1570) * add template notebook * resolve comments * Bump tensorflow (#1574) Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 1.13.1 to 1.15.4. - [Release notes](https://github.com/tensorflow/tensorflow/releases) - [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md) - [Commits](https://github.com/tensorflow/tensorflow/compare/v1.13.1...v1.15.4) Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * mxnet_mnist.ipynb fix (#1597) * Update mxnet_mnist.ipynb Set notebook to default to CPU training * Update mxnet_mnist.ipynb * updated birds dataset download source (#1593) * fix pandas errors in notebooks (#1490) * Refactor the Debugger detect_stalled_training_job_and_stop.ipynb notebook (#1592) * publish BYOC with Debugger notebook * some test change * revert the kernel names in the metadata * fix typos * incorporate feedback * incorporate comments * pin to pysdk v1 * remove installation output logs * refactor the stalled training job notebook * remove unnecessary module imports / minor fix * incorporate feedback * minor fix * fix typo * minor fix * fix unfinished sentence * incorporate feedback * minor fix Co-authored-by: Miyoung Choi * Make RL training compatible with PyTorch (#1520) * Make RLEstimator() PyTorch compatible & modify cartpole notebook * set use_pytorch to False by default * minor refactor; check in first unit test * indent correction * Verify sagemaker SDK version (#1606) * updating mxnet_mnist notebook (#1588) * updating mxnet_mnist notebook * typo fix * refactoring * refactored mnist.py * updated bucket paths in the notebook for better organization * notebook updated to handle sdk upgrade Co-authored-by: EC2 Default User Co-authored-by: EC2 Default User * fixing Model Package ARNs and removing region specific dependency (#1611) * fixing Model Package ARNs and removing region specific dependency * Adding a disclaimer on reference notebooks Co-authored-by: kwwaikar * Fix: add 'import tensorflow as tf' required by _save_tf_model (#1560) Co-authored-by: Felipe Antunes * Update xgboost churn neo example for sagemaker v2 (#1591) * Update xgboost churn neo example for sagemaker v2 * Remove use of latest version * Add sagemaker installation command and remove duplicate import * Use sagemaker pysdk v2 * Add setup and cleanup steps * clear output * Revert kernel metadata Co-authored-by: Nikhil Kulkarni * Add integration tests using Papermill library for RL notebooks. List of notebooks covered in the tests: (#1580) 1. rl_cartpole_coach/rl_cartpole_coach_gymEnv.ipynb 2. rl_cartpole_ray/rl_cartpole_ray_gymEnv.ipynb Co-authored-by: Akash Goel * Delete KernelExplainerWrapper and remove importing LogitLink and IdentityLink (#1603) * update-neo-mxnet-notebooks (#1625) * update-neo-mxnet-notebooks * refactoring and typo fixes * Add Ground Truth Streaming notebooks (#1617) * Add Ground Truth Streaming notebooks * Made below changes * Replace .format with f-strings * Added pip sagemaker isntall * Download image from public url * Minor comments * Minor f-string updates to chained notebook Co-authored-by: Gopalakrishna, Priyanka * Added downgrade to SDK 1.72 and edited the text. Verified notebook runs through with no errors. (#1633) * Add SDK version rollback code. (#1634) * Running tests in parallel for RL notebooks. (#1624) Co-authored-by: Akash Goel * fix: resolve breaking changes of neo container, adding `softmax_label` to `compile_model` (#1635) * Fixes #902 (#1632) * fix probability out of bound * fixed probability out of bound * cleared the notebook output * fix of probabilities out of bound * adding an example for Linear Learner regression use case with abalone dataset and input csv format (#1622) * infra: add PR buildspec (#1642) * add notebook instance buildspec * Update HPO_Analyze_TuningJob_Results.ipynb on where to retrieve a HP job (#1637) * Update HPO_Analyze_TuningJob_Results.ipynb Adding instructions on where to find the hyperparameter jobs needed as input. * Update hyperparameter_tuning/analyze_results/HPO_Analyze_TuningJob_Results.ipynb Co-authored-by: Aaron Markham * infra: update buildspec (#1649) * update buildspec * terminate early if no notebooks in PR * reformat command * move conditional to build phase as one command * removing object2vec_multilabel_genre_classification.ipynb (#1648) * adding preprocessing tabular data notebooks * incorporating changes * incorporating changes * incorporating changes * incorporating few changes * minor fix to persist sagemaker version * minor fix to persist sagemaker version * removing notebook Co-authored-by: Ajay Karpur * fix: move the Tensorflow import in coach_launcher.py inside the _save_tf_model fn (#1652) Co-authored-by: Akash Goel * delete extra common folder inside rl_game_server_autopilot/sagemaker directory (#1653) Co-authored-by: Akash Goel * Removed pip install, edited for clarity, tested on JupyterLab (#1660) * doc: fix typos in PyTorch CIFAR-10 notebook (#1650) * fix typos in PyTorch CIFAR-10 notebook * deliberately raise error to test PR build * Revert "deliberately raise error to test PR build" This reverts commit 7c2bac339570cc566af7a39f85cb99f835e72508. * Update mm byo (#1663) * Added note that nb won't run in studio, add note about kernel and sdk version testing details * changed kernel metadata back to conda_mxnet_p36 * Removed conda command to install s3fs. (#1659) * change: updated for sagemaker python sdk 2.x (#1667) * min_df was larger than max_df and outside of the acceptable range of 0.0-1.0 (#1601) * min_df was larger than max_df and outside of the acceptable range of 0.0 to 1.0. This gave me an error but changing the min_df to 0.2 or 0.02 resolved the error. It is unclear if the author intended min_df to be 0.2 or 0.02. * Update ntm_20newsgroups_topic_model.ipynb remove output and changed min_df to a likely better default of 0.2 Co-authored-by: Aaron Markham * Neo pytorch inf1 notebook (#1583) * Add Neo notebook for PT model on Inf1 * Change target to inf1 * resolve comments * Add revert sm version * Add multiple cores instruction and fix revert sagemaker version * polish instructions * one more polish * make sm version at least 2.11.0 * change to upgrade only * remove fixed pytorch version Co-authored-by: EC2 Default User Co-authored-by: EC2 Default User Co-authored-by: Aaron Markham * Update generate_example_data.py (#1077) Added code solution for Bug in the Multinomial lines: theta = np.asarray(theta).astype('float64') theta = theta / np.sum(theta) and lines: topic_word_distribution = np.asarray(topic_word_distribution).astype('float64') topic_word_distribution = topic_word_distribution / np.sum(topic_word_distribution) Co-authored-by: Aaron Markham * Fix boolean argument parsing (#1681) * Fixed predictions showing as array of False instead of a single True or False value (#1679) * Fixed predictions matched showing as array of False instead of showing whether prediction is correct (True or False). * Fixed predictions matched showing as array of False * Fixed predictions showing as array of False instead of a single True or False * Dev branch (#1688) * Adding new project gpt-2 * Reviewed. Reset Kernel. * made fix to reflect region names in model_package_arns * Minor notebook content rearrangement * fixed region-specific arns * Update README.md Added description for new project 'creative-writing-using-gpt-2-text-generation' under 'using_model_packages' * Update README.md added description for new project 'creative-writing-using-gpt-2-text-generation' under 'aws_marketplace/using_model_packages' Co-authored-by: Alex Ignatov * fix: use image_uris module for retrieval (#1698) * added autogluon v0.0.14 support, changed the build method (#1640) * added autogluon v0.0.14 support, changed the build method * changed the bash execution Co-authored-by: Eric Johnson <65414824+metrizable@users.noreply.github.com> * added data ingestion notebooks (#1602) * added data ingestion notebooks data ingestion notebooks v1 * Added image for Athena and Redshift notebook Added images displayed in two data ingestion notebooks -- Athena and Redshift * Text Data Pre-processing Notebook New notebook added for text data pre-processing, feedback incorporated * Include Data Aggregation to text data ingestion (S3) include the text data aggregation content to the text data ingestion notebook * Modified Data Ingestion Notebooks and Text preprocessing Notebooks Modified all seven (7) data ingestion and text preprocessing notebooks to incorporate feedback * Modified the image data ingestion notebook Added some note to downloading COCO dataset from online resources * updated all the links in the notebooks links to notebooks are changed to relative links; links to videos are removed for now and can be added later. Citations to data sources and existing aws notebooks are added. * modified some links that were not working modified links that's not working (refer to another folder) * Modified 012 for running error Removed a typo in 012 * updated SageMaker SDK, clear output, added data downloading added data downloading to the beginning of each notebook; update SageMaker SDK at the beginning of each notebook; output cleared. * Modified packages used in notebooks modified packages used in 011, 012, 02, 04 and text data pre-processing. Co-authored-by: ZoeMa Co-authored-by: Talia <31782251+TEChopra1000@users.noreply.github.com> Co-authored-by: Aaron Markham Co-authored-by: Ajay Karpur * * Add framework_version to SKLearn estimator (#1716) Co-authored-by: Sean Morgan * Fix autopilot_customer_churn.ipynb notebook for Sagemaker V2 SDK (#1699) * Fix notebook for Sagemaker V2 SDK * revert account change Co-authored-by: Michele Ricciardi * Notebook fixed and cleaned (#1726) * Notebook fixed and cleaned * Comment reformatted * Fixed notebooks for errors due to syntax change and cleaned notebooks (#1723) * Revert "Fixed notebooks for errors due to syntax change and cleaned notebooks (#1723)" (#1730) This reverts commit e691349f3b8f0bfc7479b43db54ce7093b8f87d3. * Revert "Notebook fixed and cleaned (#1726)" (#1732) This reverts commit b68acb49a2b17c0fc0606feb7400da50ee556b5c. * Sample notebook fix 2 (#1675) * Reducing the random hpo resource values We've specified the total number of training jobs to be only 20 and the maximum number of parallel jobs to be 2. * Edited the text to be consistent with the new parameter values. With the new parameter values, this notebook now runs without error. * fixed typo fixed a typo * Updated Neo compilation notebook for GluonCV Yolo example (#1638) * Updated Neo compilation notebook for GluonCV Yolo example * Minor fixes to comments and logging Co-authored-by: Eric Johnson <65414824+metrizable@users.noreply.github.com> Co-authored-by: Ajay Karpur * Fixed malformed TensorFlow estimator declaration. (#1628) * Fixed malformed TensorFlow estimator declaration. * Removed extraneous output. Co-authored-by: Eric Johnson <65414824+metrizable@users.noreply.github.com> * logx=False plots data as User_Score is <=10 (#1265) logx=True doesn't seem appropriate since User_Score is <=10 the plot shows nothing Co-authored-by: Aaron Markham Co-authored-by: Ajay Karpur * Update detect_stalled_training_job_and_stop.ipynb (#1735) * Updated sagemaker attribute configurations for V2 SDK support (#1636) Co-authored-by: Aaron Markham * Update Batch Transform - breast cancer prediction with high level SDK.ipynb (#1138) Fix a small bug. Before specifying content_type='text/csv' in sm_transformer.transform, I get error that "Loading libsvm data failed with Exception, please ensure data is in libsvm format: " Co-authored-by: Aaron Markham * Edit xgboost_customer_churn_studio.ipynb (#1060) Co-authored-by: Aaron Markham * added a feature selection notebook (#1664) * added a feature selection notebook * addressed comments and renamed files for CI * used model.model_data to index last trained model in s3 * added pip sagemaker>=2.15.0 * add lineage example notebooks (#90) * add example notebook skeleton for fairness and explainability (#91) Co-authored-by: Xinyu Liu Co-authored-by: Bartek Pawlik Co-authored-by: Dana Benson <31262102+danabens@users.noreply.github.com> Co-authored-by: Krishna Chaitanya Koppolu <71738025+kkoppolu1@users.noreply.github.com> Co-authored-by: EC2 Default User Co-authored-by: Aaron Markham Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: IvyBazan <45951687+IvyBazan@users.noreply.github.com> Co-authored-by: chenonit <72450093+chenonit@users.noreply.github.com> Co-authored-by: Valentin Flunkert Co-authored-by: Miyoung Co-authored-by: Miyoung Choi Co-authored-by: Anna Luo <45078924+annaluo676@users.noreply.github.com> Co-authored-by: Pratyush Bagaria Co-authored-by: EC2 Default User Co-authored-by: EC2 Default User Co-authored-by: Kanchan Waikar <36546813+kwwaikar@users.noreply.github.com> Co-authored-by: kwwaikar Co-authored-by: Felipe Antunes Co-authored-by: Felipe Antunes Co-authored-by: Nikhil Kulkarni Co-authored-by: Nikhil Kulkarni Co-authored-by: Akash Goel Co-authored-by: Akash Goel Co-authored-by: Somnath Sarkar Co-authored-by: gopalakp <72235203+gopalakp@users.noreply.github.com> Co-authored-by: Gopalakrishna, Priyanka Co-authored-by: Laren-AWS <57545972+Laren-AWS@users.noreply.github.com> Co-authored-by: Chuyang Co-authored-by: Hongshan Li Co-authored-by: moagaber <47145559+moagaber@users.noreply.github.com> Co-authored-by: Roald Bradley Severtson Co-authored-by: Paul B Co-authored-by: Eric Slesar <34587362+eslesar-aws@users.noreply.github.com> Co-authored-by: PaulC-AWS Co-authored-by: Corvus LEE <51771215+corvuslee@users.noreply.github.com> Co-authored-by: aserfass <65733011+aserfass@users.noreply.github.com> Co-authored-by: minlu1021 Co-authored-by: EC2 Default User Co-authored-by: EC2 Default User Co-authored-by: hbono2019 Co-authored-by: H. Furkan Bozkurt Co-authored-by: Eitan Sela Co-authored-by: awsmrud <71855151+awsmrud@users.noreply.github.com> Co-authored-by: Alex Ignatov Co-authored-by: Eric Johnson <65414824+metrizable@users.noreply.github.com> Co-authored-by: Yohei Nakayama <25813762+yoheigon@users.noreply.github.com> Co-authored-by: ZoeMa Co-authored-by: ZoeMa Co-authored-by: Talia <31782251+TEChopra1000@users.noreply.github.com> Co-authored-by: Sean Morgan Co-authored-by: Sean Morgan Co-authored-by: Michele Ricciardi Co-authored-by: Michele Ricciardi Co-authored-by: vivekmadan2 <53404938+vivekmadan2@users.noreply.github.com> Co-authored-by: playphil <66652335+playphil@users.noreply.github.com> Co-authored-by: Gili Nachum Co-authored-by: sdoyle Co-authored-by: fyang1234 <33530337+fyang1234@users.noreply.github.com> Co-authored-by: annbech <19807786+annbech@users.noreply.github.com> Co-authored-by: Xinyu <59369929+xinyu7030@users.noreply.github.com> Co-authored-by: Xinyu Liu --- .../AutoGluon_Tabular_SageMaker.ipynb | 4 +- .../container-training/Dockerfile.training | 9 +- .../container-training/train.py | 6 - .../011_Ingest_tabular_data_v1.ipynb | 251 ++++ data_ingestion/012_Ingest_text_data_v2.ipynb | 1033 ++++++++++++++ data_ingestion/013_Ingest_image_data_v1.ipynb | 245 ++++ .../02_Ingest_data_with_Athena_v1.ipynb | 593 ++++++++ .../03_Ingest_data_with_Redshift_v3.ipynb | 1269 +++++++++++++++++ data_ingestion/04_Ingest_data_with_EMR.ipynb | 196 +++ data_ingestion/image/athena-iam-1.png | Bin 0 -> 12749 bytes data_ingestion/image/athena-iam-2.PNG | Bin 0 -> 15537 bytes data_ingestion/image/athena-iam-3.PNG | Bin 0 -> 35435 bytes data_ingestion/image/redshift-sg-1.PNG | Bin 0 -> 20599 bytes data_ingestion/image/redshift-sg-1.jpg | Bin 0 -> 30800 bytes ...Linear_Learner_Regression_csv_format.ipynb | 507 +++---- .../pca_mnist/pca_mnist.ipynb | 5 +- .../04_preprocessing_text_data_v3.ipynb | 42 +- .../rl_cartpole_coach_gymEnv.ipynb | 3 +- .../rl_cartpole_ray_gymEnv.ipynb | 8 +- template.ipynb | 333 +++++ 20 files changed, 4169 insertions(+), 335 deletions(-) create mode 100644 data_ingestion/011_Ingest_tabular_data_v1.ipynb create mode 100644 data_ingestion/012_Ingest_text_data_v2.ipynb create mode 100644 data_ingestion/013_Ingest_image_data_v1.ipynb create mode 100644 data_ingestion/02_Ingest_data_with_Athena_v1.ipynb create mode 100644 data_ingestion/03_Ingest_data_with_Redshift_v3.ipynb create mode 100644 data_ingestion/04_Ingest_data_with_EMR.ipynb create mode 100644 data_ingestion/image/athena-iam-1.png create mode 100644 data_ingestion/image/athena-iam-2.PNG create mode 100644 data_ingestion/image/athena-iam-3.PNG create mode 100644 data_ingestion/image/redshift-sg-1.PNG create mode 100644 data_ingestion/image/redshift-sg-1.jpg create mode 100644 template.ipynb diff --git a/advanced_functionality/autogluon-tabular/AutoGluon_Tabular_SageMaker.ipynb b/advanced_functionality/autogluon-tabular/AutoGluon_Tabular_SageMaker.ipynb index 76fa3f357e..1c6e22522a 100644 --- a/advanced_functionality/autogluon-tabular/AutoGluon_Tabular_SageMaker.ipynb +++ b/advanced_functionality/autogluon-tabular/AutoGluon_Tabular_SageMaker.ipynb @@ -119,8 +119,8 @@ }, "outputs": [], "source": [ - "!/bin/bash ./container-training/build_push_training.sh {account} {region} {training_algorithm_name} {ecr_uri_prefix} {registry_uri_training.split('/')[0].split('.')[0]} {registry_uri_training}\n", - "!/bin/bash ./container-inference/build_push_inference.sh {account} {region} {inference_algorithm_name} {ecr_uri_prefix} {registry_uri_training.split('/')[0].split('.')[0]} {registry_uri_inference}" + "!/bin/bash ./container-training/build_push_training.sh {account} {region} {training_algorithm_name} {ecr_uri_prefix} {registry_id} {registry_uri}\n", + "!/bin/bash ./container-inference/build_push_inference.sh {account} {region} {inference_algorithm_name} {ecr_uri_prefix} {registry_id} {registry_uri}" ] }, { diff --git a/advanced_functionality/autogluon-tabular/container-training/Dockerfile.training b/advanced_functionality/autogluon-tabular/container-training/Dockerfile.training index cf7b134b12..990df4bfed 100644 --- a/advanced_functionality/autogluon-tabular/container-training/Dockerfile.training +++ b/advanced_functionality/autogluon-tabular/container-training/Dockerfile.training @@ -3,13 +3,6 @@ FROM ${REGISTRY_URI} RUN pip install autogluon RUN pip install PrettyTable -RUN pip install bokeh - -RUN apt-get update \ - && apt-get install -y --no-install-recommends graphviz libgraphviz-dev pkg-config \ - && rm -rf /var/lib/apt/lists/* \ - && pip install pygraphviz - ENV PATH="/opt/ml/code:${PATH}" # Copies the training code inside the container @@ -23,4 +16,4 @@ RUN pip install seaborn ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code # Defines train.py as script entrypoint -ENV SAGEMAKER_PROGRAM train.py +ENV SAGEMAKER_PROGRAM train.py \ No newline at end of file diff --git a/advanced_functionality/autogluon-tabular/container-training/train.py b/advanced_functionality/autogluon-tabular/container-training/train.py index 40886a81b6..996c9c90d1 100644 --- a/advanced_functionality/autogluon-tabular/container-training/train.py +++ b/advanced_functionality/autogluon-tabular/container-training/train.py @@ -13,12 +13,6 @@ from collections import Counter from timeit import default_timer as timer -import numpy as np -import seaborn as sns -import matplotlib.pyplot as plt -import shutil -import networkx as nx - with warnings.catch_warnings(): warnings.filterwarnings('ignore', category=DeprecationWarning) from prettytable import PrettyTable diff --git a/data_ingestion/011_Ingest_tabular_data_v1.ipynb b/data_ingestion/011_Ingest_tabular_data_v1.ipynb new file mode 100644 index 0000000000..fd68a9ca4b --- /dev/null +++ b/data_ingestion/011_Ingest_tabular_data_v1.ipynb @@ -0,0 +1,251 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ingest Tabular Data\n", + "\n", + "When ingesting structured data from an existing S3 bucket into a SageMaker Notebook, there are multiple ways to handle it. We will introduce the following methods to access your data from the notebook:\n", + "\n", + "* Copying your data to your instance. If you are dealing with a normal size of data or are simply experimenting, you can copy the files into the SageMaker instance and just use it as a file system in your local machine. \n", + "* Using Python packages to directly access your data without copying it. One downside of copying your data to your instance is: if you are done with your notebook instance and delete it, all the data is gone with it unless you store it elsewhere. We will introduce several methods to solve this problem in this notebook, and using python packages is one of them. Also, if you have large data sets (for example, with millions of rows), you can directly read data from S3 utilizing S3 compatible python libraries with built-in functions.\n", + "* Using AWS native methods to directly access your data. You can also use AWS native packages like `s3fs` and `aws data wrangler` to access your data directly. \n", + "\n", + "We will demonstrate how to ingest the following tabular (structured) into a notebook for further analysis:\n", + "## Tabular data: Boston Housing Data\n", + "The [Boston House](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. We will use the data set to showcase how to ingest tabular data into S3, and for further pre-processing and feature engineering. The dataset contains the following columns (506 rows):\n", + "* `CRIM` - per capita crime rate by town\n", + "* `ZN` - proportion of residential land zoned for lots over 25,000 sq.ft.\n", + "* `INDUS` - proportion of non-retail business acres per town.\n", + "* `CHAS` - Charles River dummy variable (1 if tract bounds river; 0 otherwise)\n", + "* `NOX` - nitric oxides concentration (parts per 10 million)\n", + "* `RM` - average number of rooms per dwelling\n", + "* `AGE` - proportion of owner-occupied units built prior to 1940\n", + "* `DIS` - weighted distances to five Boston employment centres\n", + "* `RAD` - index of accessibility to radial highways\n", + "* `TAX` - full-value property-tax rate per \\$10,000\n", + "* `PTRATIO` - pupil-teacher ratio by town\n", + "* `B` - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n", + "* `LSTAT` - \\% lower status of the population" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download data from online resources and write data to S3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -qU 'sagemaker>=2.15.0' 's3fs==0.4.2' 'awswrangler==1.2.0'\n", + "# you would need s3fs version > 0.4.0 for aws data wrangler to work correctly" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import awswrangler as wr\n", + "import pandas as pd\n", + "import s3fs\n", + "import sagemaker\n", + "# to load the boston housing dataset\n", + "from sklearn.datasets import *" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get SageMaker session & default S3 bucket\n", + "sagemaker_session = sagemaker.Session()\n", + "s3 = sagemaker_session.boto_session.resource('s3')\n", + "bucket = sagemaker_session.default_bucket() #replace with your own bucket name if you have one\n", + "prefix = 'data/tabular/boston_house'\n", + "filename = 'boston_house.csv'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#helper functions to upload data to s3\n", + "def write_to_s3(filename, bucket, prefix):\n", + " #put one file in a separate folder. This is helpful if you read and prepare data with Athena\n", + " filename_key = filename.split('.')[0]\n", + " key = \"{}/{}/{}\".format(prefix,filename_key,filename)\n", + " return s3.Bucket(bucket).upload_file(filename,key)\n", + "\n", + "def upload_to_s3(bucket, prefix, filename):\n", + " url = 's3://{}/{}/{}'.format(bucket, prefix, filename)\n", + " print('Writing to {}'.format(url))\n", + " write_to_s3(filename, bucket, prefix)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#download files from tabular data source location\n", + "tabular_data = load_boston()\n", + "tabular_data_full = pd.DataFrame(tabular_data.data, columns=tabular_data.feature_names)\n", + "tabular_data_full['target'] = pd.DataFrame(tabular_data.target)\n", + "tabular_data_full.to_csv('boston_house.csv', index = False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "upload_to_s3(bucket, 'data/tabular', filename)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Ingest Tabular Data from S3 bucket\n", + "### Method 1: Copying data to the Instance\n", + "You can use AWS Command Line Interface (CLI) to copy your data from s3 to your SageMaker instance and copy files between your S3 buckets. This is a quick and easy approach when you are dealing with medium-sized data files, or you are experimenting and doing exploratory analysis. The documentation can be found [here](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#copy data to your sagemaker instance using AWS CLI\n", + "!aws s3 cp s3://$bucket/$prefix/ $prefix/ --recursive" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data_location = \"{}/{}\".format(prefix, filename)\n", + "tabular_data = pd.read_csv(data_location, nrows = 5)\n", + "tabular_data.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Method 2: Use AWS compatible Python Packages\n", + "When you are dealing with large data sets, or do not want to lose any data when you delete your SageMaker Notebook Instance, you can use pre-built packages to access your files in S3 without copying files into your instance. These packages, such as `Pandas`, have implemented options to access data with a specified path string: while you will use `file://` on your local file system, you will use `s3://` instead to access the data through the AWS boto library. For `pandas`, any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected.You can find additional documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data_s3_location = \"s3://{}/{}/{}\".format(bucket, prefix, filename) # S3 URL\n", + "s3_tabular_data = pd.read_csv(data_s3_location, nrows = 5)\n", + "s3_tabular_data.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Method 3: Use AWS native methods\n", + "#### 3.1 s3fs \n", + "\n", + "[S3Fs](https://s3fs.readthedocs.io/en/latest/) is a Pythonic file interface to S3. It builds on top of botocore. The top-level class S3FileSystem holds connection information and allows typical file-system style operations like cp, mv, ls, du, glob, etc., as well as put/get of local files to/from S3. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fs = s3fs.S3FileSystem()\n", + "data_s3fs_location = \"s3://{}/{}/\".format(bucket, prefix)\n", + "# To List all files in your accessible bucket\n", + "fs.ls(data_s3fs_location)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# open it directly with s3fs\n", + "data_s3fs_location = \"s3://{}/{}/{}\".format(bucket, prefix, filename) # S3 URL\n", + "with fs.open(data_s3fs_location) as f:\n", + " print(pd.read_csv(f, nrows = 5))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 3.2 AWS Data Wrangler\n", + "[AWS Data Wrangler](https://github.com/awslabs/aws-data-wrangler) is an open-source Python library that extends the power of the Pandas library to AWS connecting DataFrames and AWS data related services (Amazon Redshift, AWS Glue, Amazon Athena, Amazon EMR, Amazon QuickSight, etc), which we will cover in later sections. It is built on top of other open-source projects like Pandas, Apache Arrow, Boto3, s3fs, SQLAlchemy, Psycopg2 and PyMySQL, and offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses and Databases. Note that you would need `s3fs version > 0.4.0` for the `awswrangler csv reader` to work." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data_wr_location = \"s3://{}/{}/{}\".format(bucket, prefix, filename) # S3 URL\n", + "wr_data = wr.s3.read_csv(path=data_wr_location, nrows = 5)\n", + "wr_data.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Citation\n", + "Boston Housing data, Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/data_ingestion/012_Ingest_text_data_v2.ipynb b/data_ingestion/012_Ingest_text_data_v2.ipynb new file mode 100644 index 0000000000..cbe9cd31a7 --- /dev/null +++ b/data_ingestion/012_Ingest_text_data_v2.ipynb @@ -0,0 +1,1033 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ingest Text Data\n", + "Labeled text data can be in a structured data format, such as reviews for sentiment analysis, news headlines for topic modeling, or documents for text classification. In these cases, you may have one column for the label, one column for the text, and sometimes other columns for attributes. You can treat this structured data like tabular data, and ingest it in one of the ways discussed in the previous notebook [011_Ingest_tabular_data.ipynb](011_Ingest_tabular_data_v1.ipynb). Sometimes text data, especially raw text data comes as unstructured data and is often in .json or .txt format, and we will discuss how to ingest these types of data files into a SageMaker Notebook in this section.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set Up Notebook" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[33mWARNING: You are using pip version 20.0.2; however, version 20.2.4 is available.\n", + "You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.\u001b[0m\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "%pip install -qU 'sagemaker>=2.15.0' 's3fs==0.4.2'" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import json\n", + "import glob\n", + "import s3fs\n", + "import sagemaker" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# Get SageMaker session & default S3 bucket\n", + "sagemaker_session = sagemaker.Session()\n", + "bucket = sagemaker_session.default_bucket() # replace with your own bucket if you have one \n", + "s3 = sagemaker_session.boto_session.resource('s3')\n", + "\n", + "prefix = 'text_spam/spam'\n", + "prefix_json = 'json_jeo'\n", + "filename = 'SMSSpamCollection.txt'\n", + "filename_json = 'JEOPARDY_QUESTIONS1.json'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Downloading data from Online Sources\n", + "\n", + "### Text data (in structured .csv format): Twitter -- sentiment140\n", + " **Sentiment140** This is the sentiment140 dataset. It contains 1.6M tweets extracted using the twitter API. The tweets have been annotated with sentiment (0 = negative, 4 = positive) and topics (hashtags used to retrieve tweets). The dataset contains the following columns:\n", + "* `target`: the polarity of the tweet (0 = negative, 4 = positive)\n", + "* `ids`: The id of the tweet ( 2087)\n", + "* `date`: the date of the tweet (Sat May 16 23:58:44 UTC 2009)\n", + "* `flag`: The query (lyx). If there is no query, then this value is NO_QUERY.\n", + "* `user`: the user that tweeted (robotickilldozr)\n", + "* `text`: the text of the tweet (Lyx is cool\n", + "\n", + "[Second Twitter data](https://github.com/guyz/twitter-sentiment-dataset) is a Twitter data set collected as an extension to Sanders Analytics Twitter sentiment corpus, originally designed for training and testing Twitter sentiment analysis algorithms. We will use this data to showcase how to aggregate two data sets if you want to enhance your current data set by adding more data to it." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "#helper functions to upload data to s3\n", + "def write_to_s3(filename, bucket, prefix):\n", + " #put one file in a separate folder. This is helpful if you read and prepare data with Athena\n", + " filename_key = filename.split('.')[0]\n", + " key = \"{}/{}/{}\".format(prefix,filename_key,filename)\n", + " return s3.Bucket(bucket).upload_file(filename,key)\n", + "\n", + "def upload_to_s3(bucket, prefix, filename):\n", + " url = 's3://{}/{}/{}'.format(bucket, prefix, filename)\n", + " print('Writing to {}'.format(url))\n", + " write_to_s3(filename, bucket, prefix)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "#run this cell if you are in SageMaker Studio notebook\n", + "#!apt-get install unzip" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "URL transformed to HTTPS due to an HSTS policy\n", + "--2020-11-02 21:16:07-- https://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip\n", + "Resolving cs.stanford.edu (cs.stanford.edu)... 171.64.64.64\n", + "Connecting to cs.stanford.edu (cs.stanford.edu)|171.64.64.64|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 81363704 (78M) [application/zip]\n", + "Saving to: ‘sentimen140.zip’\n", + "\n", + "sentimen140.zip 100%[===================>] 77.59M 18.9MB/s in 6.4s \n", + "\n", + "2020-11-02 21:16:14 (12.1 MB/s) - ‘sentimen140.zip’ saved [81363704/81363704]\n", + "\n", + "Archive: sentimen140.zip\n", + " inflating: sentiment140/testdata.manual.2009.06.14.csv \n", + " inflating: sentiment140/training.1600000.processed.noemoticon.csv \n" + ] + } + ], + "source": [ + "#download first twitter dataset\n", + "!wget http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip -O sentimen140.zip\n", + "# Uncompressing\n", + "!unzip -o sentimen140.zip -d sentiment140" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Writing to s3://sagemaker-us-east-2-060356833389/text_sentiment140/sentiment140/training.1600000.processed.noemoticon.csv\n", + "Writing to s3://sagemaker-us-east-2-060356833389/text_sentiment140/sentiment140/testdata.manual.2009.06.14.csv\n" + ] + } + ], + "source": [ + "#upload the files to the S3 bucket\n", + "csv_files = glob.glob(\"sentiment140/*.csv\")\n", + "for filename in csv_files:\n", + " upload_to_s3(bucket, 'text_sentiment140', filename)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2020-11-02 21:16:18-- https://raw.githubusercontent.com/zfz/twitter_corpus/master/full-corpus.csv\n", + "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.200.133\n", + "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.200.133|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 910195 (889K) [text/plain]\n", + "Saving to: ‘full-corpus.csv.2’\n", + "\n", + "full-corpus.csv.2 100%[===================>] 888.86K --.-KB/s in 0.08s \n", + "\n", + "2020-11-02 21:16:19 (10.2 MB/s) - ‘full-corpus.csv.2’ saved [910195/910195]\n", + "\n" + ] + } + ], + "source": [ + "#download second twitter dataset\n", + "!wget https://raw.githubusercontent.com/zfz/twitter_corpus/master/full-corpus.csv" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Writing to s3://sagemaker-us-east-2-060356833389/text_twitter_sentiment_2/full-corpus.csv\n" + ] + } + ], + "source": [ + "filename = 'full-corpus.csv'\n", + "upload_to_s3(bucket, 'text_twitter_sentiment_2', filename)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Text data (in .txt format): SMS Spam data \n", + "[SMS Spam Data](https://archive.ics.uci.edu/ml/datasets/sms+spam+collection) was manually extracted from the Grumbletext Web site. This is a UK forum in which cell phone users make public claims about SMS spam messages, most of them without reporting the very spam message received. Each line in the text file has the correct class followed by the raw message. We will use this data to showcase how to ingest text data in .txt format." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Writing to s3://sagemaker-us-east-2-060356833389/text_spam/spam/SMSSpamCollection.txt\n" + ] + } + ], + "source": [ + "txt_files = glob.glob(\"spam/*.txt\")\n", + "for filename in txt_files:\n", + " upload_to_s3(bucket, 'text_spam', filename)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2020-11-02 21:16:19-- http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/smsspamcollection.zip\n", + "Resolving www.dt.fee.unicamp.br (www.dt.fee.unicamp.br)... 143.106.12.20\n", + "Connecting to www.dt.fee.unicamp.br (www.dt.fee.unicamp.br)|143.106.12.20|:80... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 210521 (206K) [application/zip]\n", + "Saving to: ‘spam.zip’\n", + "\n", + "spam.zip 100%[===================>] 205.59K 112KB/s in 1.8s \n", + "\n", + "2020-11-02 21:16:21 (112 KB/s) - ‘spam.zip’ saved [210521/210521]\n", + "\n", + "Archive: spam.zip\n", + " inflating: spam/readme \n", + " inflating: spam/SMSSpamCollection.txt \n" + ] + } + ], + "source": [ + "!wget http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/smsspamcollection.zip -O spam.zip\n", + "!unzip -o spam.zip -d spam" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Text Data (in .json format): Jeopardy Question data\n", + "[Jeopardy Question](https://j-archive.com/) was obtained by crawling the Jeopardy question archive website. It is an unordered list of questions where each question has the following key-value pairs:\n", + "\n", + "* `category` : the question category, e.g. \"HISTORY\"\n", + "* `value`: dollar value of the question as string, e.g. \"\\$200\"\n", + "* `question`: text of question\n", + "* `answer` : text of answer\n", + "* `round`: one of \"Jeopardy!\",\"Double Jeopardy!\",\"Final Jeopardy!\" or \"Tiebreaker\"\n", + "* `show_number` : string of show number, e.g '4680'\n", + "* `air_date` : the show air date in format YYYY-MM-DD" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "--2020-11-02 21:16:22-- http://skeeto.s3.amazonaws.com/share/JEOPARDY_QUESTIONS1.json.gz\n", + "Resolving skeeto.s3.amazonaws.com (skeeto.s3.amazonaws.com)... 52.216.241.76\n", + "Connecting to skeeto.s3.amazonaws.com (skeeto.s3.amazonaws.com)|52.216.241.76|:80... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 12721082 (12M) [application/json]\n", + "Saving to: ‘JEOPARDY_QUESTIONS1.json.gz’\n", + "\n", + "JEOPARDY_QUESTIONS1 100%[===================>] 12.13M 15.0MB/s in 0.8s \n", + "\n", + "2020-11-02 21:16:23 (15.0 MB/s) - ‘JEOPARDY_QUESTIONS1.json.gz’ saved [12721082/12721082]\n", + "\n", + "Writing to s3://sagemaker-us-east-2-060356833389/json_jeo/JEOPARDY_QUESTIONS1.json\n" + ] + } + ], + "source": [ + "#json file format\n", + "!wget http://skeeto.s3.amazonaws.com/share/JEOPARDY_QUESTIONS1.json.gz\n", + "# Uncompressing\n", + "!gunzip -f JEOPARDY_QUESTIONS1.json.gz\n", + "filename = 'JEOPARDY_QUESTIONS1.json'\n", + "upload_to_s3(bucket, 'json_jeo', filename)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Ingest Data into Sagemaker Notebook\n", + "## Method 1: Copying data to the Instance\n", + "You can use the AWS Command Line Interface (CLI) to copy your data from s3 to your SageMaker instance. This is a quick and easy approach when you are dealing with medium sized data files, or you are experimenting and doing exploratory analysis. The documentation can be found [here](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "#Specify file names\n", + "prefix = 'text_spam/spam'\n", + "prefix_json = 'json_jeo'\n", + "filename = 'SMSSpamCollection.txt'\n", + "filename_json = 'JEOPARDY_QUESTIONS1.json'\n", + "prefix_spam_2 = 'text_spam/spam_2'" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "download failed: s3://sagemaker-us-east-2-060356833389/json_jeo/JEOPARDY_QUESTIONS1.json to text/json_jeo/JEOPARDY_QUESTIONS1.json [Errno 28] No space left on device\n", + "download failed: s3://sagemaker-us-east-2-060356833389/json_jeo/JEOPARDY_QUESTIONS1/JEOPARDY_QUESTIONS1.json to text/json_jeo/JEOPARDY_QUESTIONS1/JEOPARDY_QUESTIONS1.json [Errno 28] No space left on device\n" + ] + } + ], + "source": [ + "#copy data to your sagemaker instance using AWS CLI\n", + "!aws s3 cp s3://$bucket/$prefix_json/ text/$prefix_json/ --recursive" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'category': 'HISTORY', 'air_date': '2004-12-31', 'question': \"'For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory'\", 'value': '$200', 'answer': 'Copernicus', 'round': 'Jeopardy!', 'show_number': '4680'}\n" + ] + } + ], + "source": [ + "data_location = \"text/{}/{}\".format(prefix_json, filename_json)\n", + "with open(data_location) as f:\n", + " data = json.load(f)\n", + " print(data[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Method 2: Use AWS compatible Python Packages\n", + "When you are dealing with large data sets, or do not want to lose any data when you delete your Sagemaker Notebook Instance, you can use pre-built packages to access your files in S3 without copying files into your instance. These packages, such as `Pandas`, have implemented options to access data with a specified path string: while you will use `file://` on your local file system, you will use `s3://` instead to access the data through the AWS boto library. For `pandas`, any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. You can find additional documentation [here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html). \n", + "\n", + "For text data, most of the time you can read it as line-by-line files or use Pandas to read it as a DataFrame by specifying a delimiter." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01
0hamGo until jurong point, crazy.. Available only ...
1hamOk lar... Joking wif u oni...
2spamFree entry in 2 a wkly comp to win FA Cup fina...
3hamU dun say so early hor... U c already then say...
4hamNah I don't think he goes to usf, he lives aro...
\n", + "
" + ], + "text/plain": [ + " 0 1\n", + "0 ham Go until jurong point, crazy.. Available only ...\n", + "1 ham Ok lar... Joking wif u oni...\n", + "2 spam Free entry in 2 a wkly comp to win FA Cup fina...\n", + "3 ham U dun say so early hor... U c already then say...\n", + "4 ham Nah I don't think he goes to usf, he lives aro..." + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_s3_location = \"s3://{}/{}/{}\".format(bucket, prefix, filename) # S3 URL\n", + "s3_tabular_data = pd.read_csv(data_s3_location, sep=\"\\t\", header=None)\n", + "s3_tabular_data.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For JSON files, depending on the structure, you can also use `Pandas` `read_json` function to read it if it's a flat json file." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
categoryair_datequestionvalueanswerroundshow_number
0HISTORY2004-12-31'For the last 8 years of his life, Galileo was...$200CopernicusJeopardy!4680
1ESPN's TOP 10 ALL-TIME ATHLETES2004-12-31'No. 2: 1912 Olympian; football star at Carlis...$200Jim ThorpeJeopardy!4680
2EVERYBODY TALKS ABOUT IT...2004-12-31'The city of Yuma in this state has a record a...$200ArizonaJeopardy!4680
3THE COMPANY LINE2004-12-31'In 1963, live on \"The Art Linkletter Show\", t...$200McDonald\\'sJeopardy!4680
4EPITAPHS & TRIBUTES2004-12-31'Signer of the Dec. of Indep., framer of the C...$200John AdamsJeopardy!4680
\n", + "
" + ], + "text/plain": [ + " category air_date \\\n", + "0 HISTORY 2004-12-31 \n", + "1 ESPN's TOP 10 ALL-TIME ATHLETES 2004-12-31 \n", + "2 EVERYBODY TALKS ABOUT IT... 2004-12-31 \n", + "3 THE COMPANY LINE 2004-12-31 \n", + "4 EPITAPHS & TRIBUTES 2004-12-31 \n", + "\n", + " question value answer \\\n", + "0 'For the last 8 years of his life, Galileo was... $200 Copernicus \n", + "1 'No. 2: 1912 Olympian; football star at Carlis... $200 Jim Thorpe \n", + "2 'The city of Yuma in this state has a record a... $200 Arizona \n", + "3 'In 1963, live on \"The Art Linkletter Show\", t... $200 McDonald\\'s \n", + "4 'Signer of the Dec. of Indep., framer of the C... $200 John Adams \n", + "\n", + " round show_number \n", + "0 Jeopardy! 4680 \n", + "1 Jeopardy! 4680 \n", + "2 Jeopardy! 4680 \n", + "3 Jeopardy! 4680 \n", + "4 Jeopardy! 4680 " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "data_json_location = \"s3://{}/{}/{}\".format(bucket, prefix_json, filename_json)\n", + "s3_tabular_data_json = pd.read_json(data_json_location, orient='records')\n", + "s3_tabular_data_json.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Method 3: Use AWS Native methods\n", + "#### s3fs\n", + "[S3Fs](https://s3fs.readthedocs.io/en/latest/) is a Pythonic file interface to S3. It builds on top of botocore. The top-level class S3FileSystem holds connection information and allows typical file-system style operations like cp, mv, ls, du, glob, etc., as well as put/get of local files to/from S3. " + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['sagemaker-us-east-2-060356833389/text_spam/spam/SMSSpamCollection',\n", + " 'sagemaker-us-east-2-060356833389/text_spam/spam/SMSSpamCollection.txt']" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fs = s3fs.S3FileSystem()\n", + "data_s3fs_location = \"s3://{}/{}/\".format(bucket, prefix)\n", + "# To List all files in your accessible bucket\n", + "fs.ls(data_s3fs_location)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " ham \\\n", + "0 ham \n", + "1 spam \n", + "\n", + " Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat... \n", + "0 Ok lar... Joking wif u oni... \n", + "1 Free entry in 2 a wkly comp to win FA Cup fina... \n" + ] + } + ], + "source": [ + "# open it directly with s3fs\n", + "data_s3fs_location = \"s3://{}/{}/{}\".format(bucket, prefix, filename) # S3 URL\n", + "with fs.open(data_s3fs_location) as f:\n", + " print(pd.read_csv(f, sep = '\\t', nrows = 2))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Aggregating Data Set\n", + "If you would like to enhance your data with more data collected for your use cases, you can always aggregate your newly-collected data with your current data set. We will use the two data set -- Sentiment140 and Sanders Twitter Sentiment to show how to aggregate data together." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "prefix_tw1 = 'text_sentiment140/sentiment140'\n", + "filename_tw1 = 'training.1600000.processed.noemoticon.csv'\n", + "prefix_added = 'text_twitter_sentiment_2'\n", + "filename_added = 'full-corpus.csv'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's read in our original data and take a look at its format and schema:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "data_s3_location_base = \"s3://{}/{}/{}\".format(bucket, prefix_tw1, filename_tw1) # S3 URL\n", + "# we will showcase with a smaller subset of data for demonstration purpose\n", + "text_data = pd.read_csv(data_s3_location_base, header = None,\n", + " encoding = \"ISO-8859-1\", low_memory=False,\n", + " nrows = 10000)\n", + "text_data.columns = ['target', 'tw_id', 'date', 'flag', 'user', 'text']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have 6 columns, `date`, `text`, `flag` (which is the topic the twitter was queried), `tw_id` (tweet's id), `user` (user account name), and `target` (0 = neg, 4 = pos)." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
targettw_iddateflagusertext
001467810369Mon Apr 06 22:19:45 PDT 2009NO_QUERY_TheSpecialOne_@switchfoot http://twitpic.com/2y1zl - Awww, t...
\n", + "
" + ], + "text/plain": [ + " target tw_id date flag \\\n", + "0 0 1467810369 Mon Apr 06 22:19:45 PDT 2009 NO_QUERY \n", + "\n", + " user text \n", + "0 _TheSpecialOne_ @switchfoot http://twitpic.com/2y1zl - Awww, t... " + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text_data.head(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's read in and take a look at the data we want to add to our original data. \n", + "\n", + "We will start by checking for columns for both data sets. The new data set has 5 columns, `TweetDate` which maps to `date`, `TweetText` which maps to `text`, `Topic` which maps to `flag`, `TweetId` which maps to `tw_id`, and `Sentiment` mapped to `target`. In this new data set, we don't have `user account name` column, so when we aggregate two data sets we can add this column to the data set to be added and fill it with `NULL` values. You can also remove this column from the original data if it does not provide much valuable information based on your use cases. " + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "data_s3_location_added = \"s3://{}/{}/{}\".format(bucket, prefix_added, filename_added) # S3 URL\n", + "# we will showcase with a smaller subset of data for demonstration purpose\n", + "text_data_added = pd.read_csv(data_s3_location_added,\n", + " encoding = \"ISO-8859-1\", low_memory=False,\n", + " nrows = 10000)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
TopicSentimentTweetIdTweetDateTweetText
0applepositive126415614616154112Tue Oct 18 21:53:25 +0000 2011Now all @Apple has to do is get swype on the i...
\n", + "
" + ], + "text/plain": [ + " Topic Sentiment TweetId TweetDate \\\n", + "0 apple positive 126415614616154112 Tue Oct 18 21:53:25 +0000 2011 \n", + "\n", + " TweetText \n", + "0 Now all @Apple has to do is get swype on the i... " + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text_data_added.head(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Add the missing column to the new data set and fill it with `NULL`" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "text_data_added['user'] = \"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Renaming the new data set columns to combine two data sets" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
flagtargettw_iddatetextuser
0applepositive126415614616154112Tue Oct 18 21:53:25 +0000 2011Now all @Apple has to do is get swype on the i...
\n", + "
" + ], + "text/plain": [ + " flag target tw_id date \\\n", + "0 apple positive 126415614616154112 Tue Oct 18 21:53:25 +0000 2011 \n", + "\n", + " text user \n", + "0 Now all @Apple has to do is get swype on the i... " + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "text_data_added.columns = ['flag', 'target', 'tw_id', 'date', 'text', 'user']\n", + "text_data_added.head(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Change the `target` column to the same format as the `target` in the original data set\n", + "Note that the `target` column in the new data set is marked as \"positive\", \"negative\", \"neutral\", and \"irrelevant\", whereas the `target` in the original data set is marked as \"0\" and \"4\". So let's map \"positive\" to 4, \"neutral\" to 2, and \"negative\" to 0 in our new data set so that they are consistent. For \"irrelevant\", which are either not English or Spam, you can either remove these if it is not valuable for your use case (In our use case of sentiment analysis, we will remove those since these text does not provide any value in terms of predicting sentiment) or map them to -1. " + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "#remove tweets labeled as irelevant\n", + "text_data_added = text_data_added[text_data_added['target'] != 'irelevant']\n", + "# convert strings to number targets\n", + "target_map = {'positive': 4, 'negative': 0, 'neutral': 2}\n", + "text_data_added['target'] = text_data_added['target'].map(target_map)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Combine the two data sets and save as one new file" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Writing to s3://sagemaker-us-east-2-060356833389/text_twitter_sentiment_full/sentiment_full.csv\n" + ] + } + ], + "source": [ + "text_data_new = pd.concat([text_data, text_data_added])\n", + "filename = 'sentiment_full.csv'\n", + "text_data_new.to_csv(filename, index = False)\n", + "upload_to_s3(bucket, 'text_twitter_sentiment_full', filename)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Citation\n", + "Twitter140 Data, Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1(2009), p.12.\n", + "\n", + "SMS Spaming data, Almeida, T.A., Gómez Hidalgo, J.M., Yamakami, A. Contributions to the Study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11), Mountain View, CA, USA, 2011.\n", + "\n", + "J! Archive, J! Archive is created by fans, for fans. The Jeopardy! game show and all elements thereof, including but not limited to copyright and trademark thereto, are the property of Jeopardy Productions, Inc. and are protected under law. This website is not affiliated with, sponsored by, or operated by Jeopardy Productions, Inc." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/data_ingestion/013_Ingest_image_data_v1.ipynb b/data_ingestion/013_Ingest_image_data_v1.ipynb new file mode 100644 index 0000000000..9a6e856068 --- /dev/null +++ b/data_ingestion/013_Ingest_image_data_v1.ipynb @@ -0,0 +1,245 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ingest Image Data\n", + "When working on computer vision tasks, you may be using a common library such as OpenCV, matplotlib, or pandas. Once we are moving to cloud and start your machine learning journey in Amazon Sagemaker, you will encounter new challenges of loading, reading, and writing files from S3 to a Sagemaker Notebook, and we will discuss several approaches in this section. Due to the size of the data we are dealing with, copying data into the instance is not recommended; you do not need to download data to the Sagemaker to train a model either. But if you want to take a look at a few samples from the image dataset and decide whether any transformation/pre-processing is needed, here are ways to do it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Image data: COCO (Common Objects in Context)\n", + " **COCO** is a large-scale object detection, segmentation, and captioning dataset. COCO has several features:\n", + "\n", + "* Object segmentation\n", + "* Recognition in context\n", + "* Superpixel stuff segmentation\n", + "* 330K images (>200K labeled)\n", + "* 1.5 million object instances\n", + "* 80 object categories\n", + "* 91 stuff categories\n", + "* 5 captions per image\n", + "* 250,000 people with keypoints" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set Up Notebook" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -qU 'sagemaker>=2.15.0' 's3fs==0.4.2'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import io\n", + "import boto3\n", + "import sagemaker\n", + "import glob\n", + "import tempfile\n", + "\n", + "# Get SageMaker session & default S3 bucket\n", + "sagemaker_session = sagemaker.Session()\n", + "bucket = sagemaker_session.default_bucket() \n", + "\n", + "prefix = 'image_coco/coco_val/val2017'\n", + "filename = '000000086956.jpg'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download image data and write to S3\n", + "**Note**: COCO data size is large so this could take around one minute or two. You can download partial files by using [COCOAPI](https://github.com/cocodataset/cocoapi). We recommend to go with a bigger storage instance when you start your notebook instance if you are experimenting with the full dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#helper functions to upload data to s3\n", + "def write_to_s3(bucket, prefix, filename):\n", + " key = \"{}/{}\".format(prefix,filename)\n", + " return boto3.Session().resource('s3').Bucket(bucket).upload_file(filename,key)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#run this cell if you are in SageMaker Studio notebook\n", + "#!apt-get install unzip" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!wget http://images.cocodataset.org/zips/val2017.zip -O coco_val.zip\n", + "# Uncompressing\n", + "!unzip -qU -o coco_val.zip -d coco_val " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#upload the files to the S3 bucket, we only upload 20 images to S3 bucket to showcase how ingestion works\n", + "csv_files = glob.glob(\"coco_val/val2017/*.jpg\")\n", + "for filename in csv_files[:20]:\n", + " write_to_s3(bucket, prefix, filename)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Method 1: Streaming data from S3 to the SageMaker instance-memory \n", + "\n", + " **Use AWS compatible Python Packages with io Module** \n", + " \n", + "The easiest way to access your files in S3 without copying files into your instance storage is to use pre-built packages that already have implemented options to access data with a specified path string. Streaming means to read the object directly to memory instead of writing it to a file. As an example, the `matplotlib` library has a pre-built function `imread` that usually an URL or path to an image, but here we use S3 objects and BytesIO method to read the image. You can also go with `PIL` package." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.image as mpimage\n", + "import matplotlib.pyplot as plt\n", + "\n", + "key = \"{}/{}\".format(prefix,filename)\n", + "image_object = boto3.resource('s3').Bucket(bucket).Object(key)\n", + "image = mpimage.imread(io.BytesIO(image_object.get()['Body'].read()), 'jpg')\n", + "\n", + "plt.figure(0)\n", + "plt.imshow(image)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "im = Image.open(image_object.get()['Body'])\n", + "plt.figure(0)\n", + "plt.imshow(im)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Method 2: Using temporary files on the SageMaker instance\n", + "Another way to work with your usual methods is to create temporary files on your SageMaker instance and feed them into the standard methods as a file path. Tempfiles provides automatic cleanup, meaning that creates temporary files that will be deleted as the file is closed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tmp = tempfile.NamedTemporaryFile()\n", + "with open(tmp.name, 'wb') as f:\n", + " image_object.download_fileobj(f)\n", + " f.seek(0,2) # the file will be downloaded in a lazy fashion, so add this to the file descriptor\n", + " img = plt.imread(tmp.name)\n", + " print (img.shape)\n", + " plt.imshow(im)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Method 3: Use AWS native methods\n", + "#### s3fs \n", + "[S3Fs](https://s3fs.readthedocs.io/en/latest/) is a Pythonic file interface to S3. It builds on top of botocore. The top-level class S3FileSystem holds connection information and allows typical file-system style operations like cp, mv, ls, du, glob, etc., as well as put/get of local files to/from S3." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import s3fs\n", + "fs = s3fs.S3FileSystem()\n", + "data_s3fs_location = \"s3://{}/{}/\".format(bucket, prefix)\n", + "# To List first file in your accessible bucket\n", + "fs.ls(data_s3fs_location)[0]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# open it directly with s3fs\n", + "data_s3fs_location = \"s3://{}/{}/{}\".format(bucket, prefix, filename) # S3 URL\n", + "with fs.open(data_s3fs_location) as f:\n", + " display(Image.open(f))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Citation\n", + "Lin, Tsung-Yi, Maire, Michael, Belongie, Serge, Bourdev, Lubomir, Girshick, Ross, Hays, James, Perona, Pietro, Ramanan, Deva, Zitnick, C. Lawrence and Dollár, Piotr Microsoft COCO: Common Objects in Context. (2014). , cite arxiv:1405.0312Comment: 1) updated annotation pipeline description and figures; 2) added new section describing datasets splits; 3) updated author list ." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/data_ingestion/02_Ingest_data_with_Athena_v1.ipynb b/data_ingestion/02_Ingest_data_with_Athena_v1.ipynb new file mode 100644 index 0000000000..8b67001e0c --- /dev/null +++ b/data_ingestion/02_Ingest_data_with_Athena_v1.ipynb @@ -0,0 +1,593 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ingest data with Athena\n", + "This notebook demonstrates how to set up a database with Athena and query data with it. We are going to use the data we load into S3 in the previous notebook [011_Ingest_tabular_data.ipynb](011_Ingest_tabular_data_v1.ipynb).\n", + "\n", + "Amazon Athena is a serverless interactive query service that makes it easy to analyze your S3 data with standard SQL. It uses S3 as its underlying data store, and uses Presto with ANSI SQL support, and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Athena is ideal for quick, ad-hoc querying but it can also handle complex analysis, including large joins, window functions, and arrays. \n", + "\n", + "To get started, you can point to your data in Amazon S3, define the schema, and start querying using the built-in query editor. Amazon Athena allows you to tap into all your data in S3 without the need to set up complex processes to extract, transform, and load the data (ETL).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up Athena\n", + "First, we are going to make sure we have the necessary policies attached to the role that we used to create this notebook to access Athena. You can do this through an IAM client as shown below, or through the AWS console. \n", + "\n", + "**Note: You would need IAMFullAccess to attach policies to the role.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Attach IAMFullAccess Policy from Console\n", + "\n", + "**1.** Go to **SageMaker Console**, choose **Notebook instances** in the navigation panel, then select your notebook instance to view the details. Then under **Permissions and Encryption**, click on the **IAM role ARN** link and it will take you to your role summary in the **IAM Console**. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**2.** Click on **Create Policy** under **Permissions**." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**3.** In the **Attach Permissions** page, search for **IAMFullAccess**. It will show up in the policy search results if it has not been attached to your role yet. Select the checkbox for the **IAMFullAccess** Policy, then click **Attach Policy**. You now have the policy successfully attached to your role." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -qU 'sagemaker>=2.15.0' 'PyAthena==1.10.7' 'awswrangler==1.2.0'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import io\n", + "import boto3\n", + "import sagemaker\n", + "import json\n", + "from sagemaker import get_execution_role\n", + "import os\n", + "import sys\n", + "from sklearn.datasets import *\n", + "import pandas as pd\n", + "from botocore.exceptions import ClientError\n", + "\n", + "# Get region \n", + "session = boto3.session.Session()\n", + "region_name = session.region_name\n", + "\n", + "# Get SageMaker session & default S3 bucket\n", + "sagemaker_session = sagemaker.Session()\n", + "bucket = sagemaker_session.default_bucket() #replace with your own bucket name if you have one\n", + "iam = boto3.client('iam')\n", + "s3 = sagemaker_session.boto_session.resource('s3')\n", + "role = sagemaker.get_execution_role()\n", + "role_name = role.split('/')[-1]\n", + "prefix = 'data/tabular/boston_house'\n", + "filename = 'boston_house.csv'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download data from online resources and write data to S3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#helper functions to upload data to s3\n", + "def write_to_s3(filename, bucket, prefix):\n", + " #put one file in a separate folder. This is helpful if you read and prepare data with Athena\n", + " filename_key = filename.split('.')[0]\n", + " key = \"{}/{}/{}\".format(prefix,filename_key,filename)\n", + " return s3.Bucket(bucket).upload_file(filename,key)\n", + "\n", + "def upload_to_s3(bucket, prefix, filename):\n", + " url = 's3://{}/{}/{}'.format(bucket, prefix, filename)\n", + " print('Writing to {}'.format(url))\n", + " write_to_s3(filename, bucket, prefix)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tabular_data = load_boston()\n", + "tabular_data_full = pd.DataFrame(tabular_data.data, columns=tabular_data.feature_names)\n", + "tabular_data_full['target'] = pd.DataFrame(tabular_data.target)\n", + "tabular_data_full.to_csv('boston_house.csv', index = False)\n", + "\n", + "upload_to_s3(bucket, 'data/tabular', filename)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Set up IAM roles and policies" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When you run the following commend, you will see an error that you cannot list policies if `IAMFullAccess` policy is not attached to your role. Please follow the steps above to attach the IAMFullAccess policy to your role if you see an error." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#check if IAM policy is attached\n", + "try:\n", + " existing_policies = iam.list_attached_role_policies(RoleName=role_name)['AttachedPolicies']\n", + " if 'IAMFullAccess' not in [po['PolicyName'] for po in existing_policies]:\n", + " print('ERROR: You need to attach the IAMFullAccess policy in order to attach policy to the role')\n", + " else:\n", + " print('IAMFullAccessPolicy Already Attached')\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'AccessDenied':\n", + " print(\"You need to attach the IAMFullAccess policy in order to attach policy to the role.\")\n", + " else:\n", + " print(\"Unexpected error: %s\" % e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Policy Document\n", + "We will create policies we used to access S3 and Athena. The two policies we will create here are: \n", + "* S3FullAccess: `arn:aws:iam::aws:policy/AmazonS3FullAccess`\n", + "* AthenaFullAccess: `arn:aws:iam::aws:policy/AmazonAthenaFullAccess`\n", + "\n", + "You can check the policy document in the IAM console and copy the policy file here." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "athena_access_role_policy_doc = {\n", + " \"Version\": \"2012-10-17\",\n", + " \"Statement\": [\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"athena:*\"\n", + " ],\n", + " \"Resource\": [\n", + " \"*\"\n", + " ]\n", + " },\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"glue:CreateDatabase\",\n", + " \"glue:DeleteDatabase\",\n", + " \"glue:GetDatabase\",\n", + " \"glue:GetDatabases\",\n", + " \"glue:UpdateDatabase\",\n", + " \"glue:CreateTable\",\n", + " \"glue:DeleteTable\",\n", + " \"glue:BatchDeleteTable\",\n", + " \"glue:UpdateTable\",\n", + " \"glue:GetTable\",\n", + " \"glue:GetTables\",\n", + " \"glue:BatchCreatePartition\",\n", + " \"glue:CreatePartition\",\n", + " \"glue:DeletePartition\",\n", + " \"glue:BatchDeletePartition\",\n", + " \"glue:UpdatePartition\",\n", + " \"glue:GetPartition\",\n", + " \"glue:GetPartitions\",\n", + " \"glue:BatchGetPartition\"\n", + " ],\n", + " \"Resource\": [\n", + " \"*\"\n", + " ]\n", + " },\n", + " \n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"lakeformation:GetDataAccess\"\n", + " ],\n", + " \"Resource\": [\n", + " \"*\"\n", + " ]\n", + " }\n", + " ]\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#create IAM client\n", + "iam = boto3.client('iam')\n", + "#create a policy\n", + "try:\n", + " response = iam.create_policy(\n", + " PolicyName='myAthenaPolicy',\n", + " PolicyDocument=json.dumps(athena_access_role_policy_doc)\n", + " )\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'EntityAlreadyExists':\n", + " print(\"Policy already created.\")\n", + " else:\n", + " print(\"Unexpected error: %s\" % e)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#get policy ARN\n", + "sts = boto3.client('sts')\n", + "account_id = sts.get_caller_identity()['Account']\n", + "policy_athena_arn = f'arn:aws:iam::{account_id}:policy/myAthenaPolicy'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Attach Policy to Role" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Attach a role policy\n", + "try:\n", + " response = iam.attach_role_policy(\n", + " PolicyArn=policy_athena_arn,\n", + " RoleName= role_name\n", + " )\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'EntityAlreadyExists':\n", + " print(\"Policy is already attached to your role.\")\n", + " else:\n", + " print(\"Unexpected error: %s\" % e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Intro to PyAthena \n", + "\n", + "We are going to leverage [PyAthena](https://pypi.org/project/PyAthena/) to connect and run Athena queries. PyAthena is a Python DB API 2.0 (PEP 249) compliant client for Amazon Athena. **Note that you will need to specify the region in which you created the database/table in Athena, making sure your catalog in the specified region that contains the database.** " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from pyathena import connect\n", + "from pyathena.pandas_cursor import PandasCursor\n", + "from pyathena.util import as_pandas" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Set Athena database name\n", + "database_name = 'tabularbh'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Set S3 staging directory -- this is a temporary directory used for Athena queries\n", + "s3_staging_dir = 's3://{0}/athena/staging'.format(bucket)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# write the SQL statement to execute\n", + "statement = 'CREATE DATABASE IF NOT EXISTS {}'.format(database_name)\n", + "print(statement)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# connect to s3 using PyAthena\n", + "cursor = connect(region_name=region_name, s3_staging_dir=s3_staging_dir).cursor()\n", + "cursor.execute(statement)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register Table with Athena \n", + "When you run a CREATE TABLE query in Athena, you register your table with the AWS Glue Data Catalog. \n", + "\n", + "To specify the path to your data in Amazon S3, use the LOCATION property, as shown in the following example: `LOCATION s3://bucketname/folder/`\n", + "\n", + "The LOCATION in Amazon S3 specifies all of the files representing your table. Athena reads all data stored in `s3://bucketname/folder/`. If you have data that you do not want Athena to read, do not store that data in the same Amazon S3 folder as the data you want Athena to read. If you are leveraging partitioning, to ensure Athena scans data within a partition, your WHERE filter must include the partition. For more information, see [Table Location and Partitions](https://docs.aws.amazon.com/athena/latest/ug/tables-location-format.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "prefix = 'data/tabular'\n", + "filename_key = 'boston_house'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data_s3_location = \"s3://{}/{}/{}/\".format(bucket, prefix, filename_key)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "table_name_csv = 'boston_house_athena'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SQL statement to execute\n", + "statement = \"\"\"CREATE EXTERNAL TABLE IF NOT EXISTS {}.{}(\n", + " CRIM double,\n", + " ZN double,\n", + " INDUS double,\n", + " CHAS double,\n", + " NOX double,\n", + " RM double,\n", + " AGE double,\n", + " DIS double, \n", + " RAD double, \n", + " TAX double,\n", + " PTRATIO double, \n", + " B double, \n", + " LSTAT double,\n", + " target double\n", + "\n", + ") ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\\\\n' LOCATION '{}'\n", + "TBLPROPERTIES ('skip.header.line.count'='1')\"\"\".format(database_name, table_name_csv, data_s3_location)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Execute statement using connection cursor\n", + "cursor = connect(region_name=region_name, s3_staging_dir=s3_staging_dir).cursor()\n", + "cursor.execute(statement)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#verify the table has been created\n", + "statement = 'SHOW TABLES in {}'.format(database_name)\n", + "cursor.execute(statement)\n", + "\n", + "df_show = as_pandas(cursor)\n", + "df_show.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#run a sample query\n", + "statement = \"\"\"SELECT * FROM {}.{}\n", + "LIMIT 100\"\"\".format(database_name, table_name_csv)\n", + "# Execute statement using connection cursor\n", + "cursor = connect(region_name=region_name, s3_staging_dir=s3_staging_dir).cursor()\n", + "cursor.execute(statement)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = as_pandas(cursor)\n", + "df.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Alternatives: Use AWS Data Wrangler to query data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import awswrangler as wr" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Glue Catalog" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for table in wr.catalog.get_tables(database=database_name):\n", + " print(table['Name'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Athena" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "df = wr.athena.read_sql_query(\n", + " sql='SELECT * FROM {} LIMIT 100'.format(table_name_csv),\n", + " database=database_name\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Citation\n", + "Boston Housing data, Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.\n", + "\n", + "Data Science On AWS workshops, Chris Fregly, Antje Barth, https://www.datascienceonaws.com/" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/data_ingestion/03_Ingest_data_with_Redshift_v3.ipynb b/data_ingestion/03_Ingest_data_with_Redshift_v3.ipynb new file mode 100644 index 0000000000..20f4881136 --- /dev/null +++ b/data_ingestion/03_Ingest_data_with_Redshift_v3.ipynb @@ -0,0 +1,1269 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ingest data with Redshift\n", + "This notebook demonstrates how to set up a database with Redshift and query data with it. We are going to use the data we load into S3 in the previous notebook [011_Ingest_tabular_data.ipynb](011_Ingest_tabular_data_v1.ipynb) and database and schema we created in [02_Ingest_data_with_Athena.ipynb](02_Ingest_data_with_Athena_v1.ipynb).\n", + "\n", + "Amazon Redshift is a fully managed data warehouse that allows you to run complex analytic queries against petabytes of structured data. Your queries are distributed and parallelized across multiple physical resources, and you can easily scale your Amazon Redshift environment up and down depending on your business needs.\n", + "\n", + "You can also check the [existing notebook](https://github.com/aws/amazon-sagemaker-examples/blob/master/advanced_functionality/working_with_redshift_data/working_with_redshift_data.ipynb) for more information on how to load data from and save data to Redshift." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## When should you use Redshift?\n", + "\n", + "While Athena is mostly used to run ad-hoc queries on Amazon S3 data lake, Redshift is usually recommended for large structured data sets, or traditional relational database; it does well with performing aggregations, complex joins, and inner queries. You would need to set up and load the cluster before using it; and you need to load data into created tables. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up Redshift\n", + "First we are going to make sure we have policy attached to our role (The role we will create specifically for the Redshift task) to access Redshift. You can do this through IAM client as below, or through the AWS console.\n", + "\n", + "**Note: You would need IAMFullAccess to attach policies to the role.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Attach IAMFullAccess Policy from Console\n", + "\n", + "**1.** Go to **Sagemaker Console**, choose **notebook instances** in the navigation panel, then select your notebook instance to view the details. Then under **Permissions and Encryption**, click on the **IAM role ARN** link and it will take you to your role summery in the **IAM Console**. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**2.** Click on **Create Policy** under **Permissions**." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**3.** In the **Attach Permissions** page, search for **IAMFullAccess**. It will show up in the policies search results if it has not been attached to your role yet. Select the checkbox for the **IAMFullAccess** Policy, then click **Attach Policy**. You now have the policy successfully attached to your role." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[33mWARNING: You are using pip version 20.0.2; however, version 20.2.4 is available.\n", + "You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.\u001b[0m\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "%pip install -qU 'sagemaker>=2.15.0' 'PyAthena==1.10.7' 'awswrangler==1.2.0' 'SQLAlchemy==1.3.13'" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "import io\n", + "import boto3\n", + "import sagemaker\n", + "import json\n", + "from sagemaker import get_execution_role\n", + "import os\n", + "from sklearn.datasets import *\n", + "import pandas as pd\n", + "from botocore.exceptions import ClientError\n", + "import awswrangler as wr\n", + "from datetime import date\n", + "\n", + "# Get region \n", + "session = boto3.session.Session()\n", + "region_name = session.region_name\n", + "\n", + "# Get SageMaker session & default S3 bucket\n", + "sagemaker_session = sagemaker.Session()\n", + "bucket = sagemaker_session.default_bucket() #replace with your own bucket name if you have one\n", + "role = sagemaker.get_execution_role()\n", + "prefix = 'data/tabular/boston_house'\n", + "filename = 'boston_house.csv'" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "iam = boto3.client('iam')\n", + "sts = boto3.client('sts')\n", + "redshift = boto3.client('redshift')\n", + "sm = boto3.client('sagemaker')\n", + "s3 = sagemaker_session.boto_session.resource('s3')" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Your Role name used to create this notebook is: AmazonSageMaker-ExecutionRole-20201006T125078\n" + ] + } + ], + "source": [ + "role_name = role.split('/')[-1]\n", + "print('Your Role name used to create this notebook is: {}'.format(role_name))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download data from online resources and write data to S3" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "#helper functions to upload data to s3\n", + "def write_to_s3(filename, bucket, prefix):\n", + " #put one file in a separate folder. This is helpful if you read and prepare data with Athena\n", + " filename_key = filename.split('.')[0]\n", + " key = \"{}/{}/{}\".format(prefix,filename_key,filename)\n", + " return s3.Bucket(bucket).upload_file(filename,key)\n", + "\n", + "def upload_to_s3(bucket, prefix, filename):\n", + " url = 's3://{}/{}/{}'.format(bucket, prefix, filename)\n", + " print('Writing to {}'.format(url))\n", + " write_to_s3(filename, bucket, prefix)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Writing to s3://sagemaker-us-east-2-060356833389/data/tabular/boston_house.csv\n" + ] + } + ], + "source": [ + "tabular_data = load_boston()\n", + "tabular_data_full = pd.DataFrame(tabular_data.data, columns=tabular_data.feature_names)\n", + "tabular_data_full['target'] = pd.DataFrame(tabular_data.target)\n", + "tabular_data_full.to_csv('boston_house.csv', index = False)\n", + "\n", + "upload_to_s3(bucket, 'data/tabular', filename)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Redshift Role\n", + "The policy enables Redshift to assume the role. The services can then perform any tasks granted by the permissions policy assigned to the role (which we will attach to it later). " + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "assume_role_policy_doc = {\n", + " \"Version\": \"2012-10-17\",\n", + " \"Statement\": [\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Principal\": {\n", + " \"Service\": \"redshift.amazonaws.com\"\n", + " },\n", + " \"Action\": \"sts:AssumeRole\"\n", + " }\n", + " ]\n", + "} " + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Role already exists\n" + ] + } + ], + "source": [ + "# Create Role\n", + "iam_redshift_role_name = 'Tabular_Redshift'\n", + "try:\n", + " iam_role_redshift = iam.create_role(\n", + " RoleName=iam_redshift_role_name,\n", + " AssumeRolePolicyDocument=json.dumps(assume_role_policy_doc),\n", + " Description='Tabular data Redshift Role'\n", + " )\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'EntityAlreadyExists':\n", + " print(\"Role already exists\")\n", + " else:\n", + " print(\"Unexpected error: %s\" % e)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Your Role arn used to create a Redshift Cluster is: arn:aws:iam::060356833389:role/Tabular_Redshift\n", + "arn:aws:iam::060356833389:role/Tabular_Redshift\n" + ] + } + ], + "source": [ + "#get role arn\n", + "role_rs = iam.get_role(RoleName='Tabular_Redshift')\n", + "iam_role_redshift_arn = role_rs['Role']['Arn']\n", + "print('Your Role arn used to create a Redshift Cluster is: {}'.format(iam_role_redshift_arn))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Policy Document\n", + "We will create policies we used to access S3 and Athena. The two policies we will create here are: \n", + "* S3FullAccess: `arn:aws:iam::aws:policy/AmazonS3FullAccess`\n", + "* AthenaFullAccess: `arn:aws:iam::aws:policy/AmazonAthenaFullAccess`\n", + "\n", + "You can check the policy document in the IAM console and copy the policy file here." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "#s3FullAccess\n", + "my_redshift_to_s3 = {\n", + " \"Version\": \"2012-10-17\",\n", + " \"Statement\": [\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": \"s3:*\",\n", + " \"Resource\": \"*\"\n", + " }\n", + " ]\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "#Athena Full Access\n", + "my_redshift_to_athena = {\n", + " \"Version\": \"2012-10-17\",\n", + " \"Statement\": [\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"athena:*\"\n", + " ],\n", + " \"Resource\": [\n", + " \"*\"\n", + " ]\n", + " },\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"glue:CreateDatabase\",\n", + " \"glue:DeleteDatabase\",\n", + " \"glue:GetDatabase\",\n", + " \"glue:GetDatabases\",\n", + " \"glue:UpdateDatabase\",\n", + " \"glue:CreateTable\",\n", + " \"glue:DeleteTable\",\n", + " \"glue:BatchDeleteTable\",\n", + " \"glue:UpdateTable\",\n", + " \"glue:GetTable\",\n", + " \"glue:GetTables\",\n", + " \"glue:BatchCreatePartition\",\n", + " \"glue:CreatePartition\",\n", + " \"glue:DeletePartition\",\n", + " \"glue:BatchDeletePartition\",\n", + " \"glue:UpdatePartition\",\n", + " \"glue:GetPartition\",\n", + " \"glue:GetPartitions\",\n", + " \"glue:BatchGetPartition\"\n", + " ],\n", + " \"Resource\": [\n", + " \"*\"\n", + " ]\n", + " },\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"s3:GetBucketLocation\",\n", + " \"s3:GetObject\",\n", + " \"s3:ListBucket\",\n", + " \"s3:ListBucketMultipartUploads\",\n", + " \"s3:ListMultipartUploadParts\",\n", + " \"s3:AbortMultipartUpload\",\n", + " \"s3:CreateBucket\",\n", + " \"s3:PutObject\"\n", + " ],\n", + " \"Resource\": [\n", + " \"arn:aws:s3:::aws-athena-query-results-*\"\n", + " ]\n", + " },\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"s3:GetObject\",\n", + " \"s3:ListBucket\"\n", + " ],\n", + " \"Resource\": [\n", + " \"arn:aws:s3:::athena-examples*\"\n", + " ]\n", + " },\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"s3:ListBucket\",\n", + " \"s3:GetBucketLocation\",\n", + " \"s3:ListAllMyBuckets\"\n", + " ],\n", + " \"Resource\": [\n", + " \"*\"\n", + " ]\n", + " },\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"sns:ListTopics\",\n", + " \"sns:GetTopicAttributes\"\n", + " ],\n", + " \"Resource\": [\n", + " \"*\"\n", + " ]\n", + " },\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"cloudwatch:PutMetricAlarm\",\n", + " \"cloudwatch:DescribeAlarms\",\n", + " \"cloudwatch:DeleteAlarms\"\n", + " ],\n", + " \"Resource\": [\n", + " \"*\"\n", + " ]\n", + " },\n", + " {\n", + " \"Effect\": \"Allow\",\n", + " \"Action\": [\n", + " \"lakeformation:GetDataAccess\"\n", + " ],\n", + " \"Resource\": [\n", + " \"*\"\n", + " ]\n", + " }\n", + " ]\n", + "}" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Policy already exists\n" + ] + } + ], + "source": [ + "try:\n", + " policy_redshift_s3 = iam.create_policy(\n", + " PolicyName='Tabular_RedshiftPolicyToS3',\n", + " PolicyDocument=json.dumps(my_redshift_to_s3)\n", + " )\n", + " print ('Policy created.')\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'EntityAlreadyExists':\n", + " print (\"Policy already exists\")\n", + " else:\n", + " print (\"Unexpected error: %s\" % e)\n", + "\n", + "account_id = sts.get_caller_identity()['Account']\n", + "policy_redshift_s3_arn = f'arn:aws:iam::{account_id}:policy/Tabular_RedshiftPolicyToS3'" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Policy already exists\n" + ] + } + ], + "source": [ + "try:\n", + " policy_redshift_athena = iam.create_policy(\n", + " PolicyName='Tabular_RedshiftPolicyToAthena',\n", + " PolicyDocument=json.dumps(my_redshift_to_athena)\n", + " )\n", + " print ('Policy created.')\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'EntityAlreadyExists':\n", + " print (\"Policy already exists\")\n", + " else:\n", + " print (\"Unexpected error: %s\" % e)\n", + " \n", + "account_id = sts.get_caller_identity()['Account']\n", + "policy_redshift_athena_arn = f'arn:aws:iam::{account_id}:policy/Tabular_RedshiftPolicyToAthena'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Attach Policy to Role" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [], + "source": [ + "# Attach RedshiftPolicyToAthena policy\n", + "try:\n", + " response = iam.attach_role_policy(\n", + " PolicyArn=policy_redshift_athena_arn,\n", + " RoleName=iam_redshift_role_name\n", + " )\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'EntityAlreadyExists':\n", + " print(\"Policy is already attached. This is ok.\")\n", + " else:\n", + " print(\"Unexpected error: %s\" % e)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "# Attach RedshiftPolicyToS3 policy\n", + "try:\n", + " response = iam.attach_role_policy(\n", + " PolicyArn=policy_redshift_s3_arn,\n", + " RoleName=iam_redshift_role_name\n", + " )\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'EntityAlreadyExists':\n", + " print(\"Policy is already attached. This is ok.\")\n", + " else:\n", + " print(\"Unexpected error: %s\" % e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Making Sure your Role **to run this Notebook** has the following policy attached:\n", + "\n", + "* `SecretsManagerReadWrite`: we will use this service to store and retrive our Redshift Credentials.\n", + "* `AmazonRedshiftFullAccess`: we will use this role to create a Redshift cluster from the notebook." + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Policy SecretsManagerReadWrite has been succesfully attached to role: AmazonSageMaker-ExecutionRole-20201006T125078\n" + ] + } + ], + "source": [ + "#making sure you have secret manager policy attached to role\n", + "try:\n", + " policy='SecretsManagerReadWrite'\n", + " response = iam.attach_role_policy(\n", + " PolicyArn='arn:aws:iam::aws:policy/{}'.format(policy),\n", + " RoleName=role_name\n", + " )\n", + " print(\"Policy %s has been succesfully attached to role: %s\" % (policy, role_name))\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'EntityAlreadyExists':\n", + " print(\"Policy is already attached.\")\n", + " else:\n", + " print(\"Unexpected error: %s \" % e)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Policy AmazonRedshiftFullAccess has been succesfully attached to role: AmazonSageMaker-ExecutionRole-20201006T125078\n" + ] + } + ], + "source": [ + "#making sure you have RedshiftFullAccess policy attached to role\n", + "from botocore.exceptions import ClientError\n", + "try:\n", + " policy='AmazonRedshiftFullAccess'\n", + " response = iam.attach_role_policy(\n", + " PolicyArn='arn:aws:iam::aws:policy/{}'.format(policy),\n", + " RoleName=role_name\n", + " )\n", + " print(\"Policy %s has been succesfully attached to role: %s\" % (policy, role_name))\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'EntityAlreadyExists':\n", + " print(\"Policy is already attached. \")\n", + " else:\n", + " print(\"Unexpected error: %s \" % e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Optional: Create Redshift Cluster\n", + "\n", + "Most of the times we have a Redshift cluster already up and running and we want to connect to the cluster in-use, but if you want to create a new cluster, you can follow the steps below to create one.\n", + "*Note that only some Instance Types support Redshift Query Editor, so be careful when you specify the Redshift Cluster Nodes.*(https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "notebook_instance_name = sm.list_notebook_instances()['NotebookInstances'][0]['NotebookInstanceName']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Secret in Secrets Manager\n", + "\n", + "AWS Secrets Manager is a service that enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. Using Secrets Manager, you can secure and manage secrets used to access resources in the AWS Cloud, on third-party services, and on-premises.\n", + "\n", + "*note that `MasterUserPassword` must contain at least 1 upper case letter and at least 1 decimal digit." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "secretsmanager = boto3.client('secretsmanager')\n", + "\n", + "try:\n", + " response = secretsmanager.create_secret(\n", + " Name='tabular_redshift_login',\n", + " Description='Boston House data New Cluster Redshift Login',\n", + " SecretString='[{\"username\":\"awsuser\"},{\"password\":\"Bostonhouse1\"}]',\n", + " Tags=[\n", + " {\n", + " 'Key': 'name',\n", + " 'Value': 'tabular_redshift_login'\n", + " },\n", + " ]\n", + " )\n", + "except ClientError as e:\n", + " if e.response['Error']['Code'] == 'ResourceExistsException':\n", + " print(\"Secret already exists. This is ok.\")\n", + " else:\n", + " print(\"Unexpected error: %s\" % e)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# And retrieving the secret again\n", + "secretsmanager = boto3.client('secretsmanager')\n", + "import json\n", + "\n", + "secret = secretsmanager.get_secret_value(SecretId='tabular_redshift_login')\n", + "cred = json.loads(secret['SecretString'])\n", + "\n", + "master_user_name = cred[0]['username']\n", + "master_user_pw = cred[1]['password']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Set up parameters \n", + "# Redshift configuration parameters\n", + "redshift_cluster_identifier = 'redshiftdemo'\n", + "database_name = 'bostonhouse'\n", + "cluster_type = 'multi-node'\n", + "\n", + "node_type = 'dc2.large'\n", + "number_nodes = '2' " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When creating a new cluster, you want to make sure that the Redshift VPC is the same one you used to create your notebook in. Your VPC should have the following two VPC attributes set to **yes**: **DNS resolution** and **DNS hostnames**. You can either specify a **security group** or specify a created **cluster subnet group name** (which you will create from the Redshift console).\n", + "\n", + "If you are not using default VPC and using **security group** returns VPC error, you can try create a subnet group in Redshift Console, by choose **Configurations** -> **subnet groups** -> **create cluster subnet group**, then specify the **VPC** and **subnet** you want to choose and you created this notebook in. Specify the `ClusterSubnetGroupName` in the following command with the subnet group you created." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Optional: Get Security Group ID\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "notebook_instance = sm.describe_notebook_instance(NotebookInstanceName=notebook_instance_name)\n", + "security_group_id = notebook_instance['SecurityGroups'][0]\n", + "print(security_group_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Redshift Cluster using Subnet Group" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "response = redshift.create_cluster(\n", + " DBName=database_name,\n", + " ClusterIdentifier=redshift_cluster_identifier,\n", + " ClusterType=cluster_type,\n", + " NodeType=node_type,\n", + " NumberOfNodes=int(number_nodes), \n", + " MasterUsername=master_user_name,\n", + " MasterUserPassword=master_user_pw,\n", + " ClusterSubnetGroupName='cluster-subnet-group-1', #you can either specify an existing subnet group (change this to your Subnet Group name), or specify your security group below\n", + " IamRoles=[iam_role_redshift_arn],\n", + " VpcSecurityGroupIds=[security_group_id],\n", + " Port=5439,\n", + " PubliclyAccessible=False\n", + ")\n", + "\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Wait until the status of your redshift cluster become **available**." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#check cluster status\n", + "response = redshift.describe_clusters(ClusterIdentifier=redshift_cluster_identifier)\n", + "cluster_status = response['Clusters'][0]['ClusterStatus']\n", + "print('Your Redshift Cluster Status is: ' + cluster_status)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Existing Redshift Cluster\n", + "### Prerequisites\n", + "Your existing Redshift cluster have to be in the **same VPC** as your notebook instance.\n", + "\n", + "Also, note that this Notebook instance needs to resolve to a private IP when connecting to the Redshift instance. There are two ways to resolve the Redshift DNS name to a private IP:\n", + "\n", + "The Redshift cluster is not publicly accessible so by default it will resolve to private IP.\n", + "The Redshift cluster is publicly accessible and has an EIP associated with it but when accessed from within a VPC, it should resolve to private IP of the Redshift cluster. This is possible by setting following two VPC attributes to yes: **DNS resolution** and **DNS hostnames**. For instructions on setting that up, see Redshift public docs on [Managing Clusters in an Amazon Virtual Private Cloud (VPC)](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-clusters-vpc.html).\n", + "\n", + "We will use [sqlalchemy](https://pypi.org/project/SQLAlchemy/) to connect to the redshift database engine." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sqlalchemy import create_engine\n", + "from sqlalchemy.orm import sessionmaker" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Retrive Redshift credentials from Secret Manager" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "secretsmanager = boto3.client('secretsmanager')\n", + "secret = secretsmanager.get_secret_value(SecretId='tabular_redshift_login')\n", + "cred = json.loads(secret['SecretString'])\n", + "\n", + "master_user_name = cred[0]['username']\n", + "master_user_pw = cred[1]['password']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Set up parameters for connection: replace with your own parameters\n", + "We are going to use the data and schema created in the sequel notebook Ingest_data_with_Athena.ipynb. If you see an error below, please make sure you run through the [02_Ingest_data_with_Athena.ipynb](02_Ingest_data_with_Athena_v1.ipynb) notebook before the next steps." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "redshift_cluster_identifier = 'redshiftdemo'\n", + "\n", + "database_name_redshift = 'bostonhouse'\n", + "database_name_athena = 'tabularbh'\n", + "\n", + "redshift_port = '5439'\n", + "\n", + "schema_redshift = 'redshift'\n", + "schema_spectrum = 'spectrum'\n", + "\n", + "table_name_csv = 'boston_house_athena'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Check cluster status to see if it is available" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#check cluster status\n", + "response = redshift.describe_clusters(ClusterIdentifier=redshift_cluster_identifier)\n", + "cluster_status = response['Clusters'][0]['ClusterStatus']\n", + "print(cluster_status)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Get Redshift Endpoint Address & IAM Role\n", + "redshift_endpoint_address = response['Clusters'][0]['Endpoint']['Address']\n", + "iam_role = response['Clusters'][0]['IamRoles'][0]['IamRoleArn']\n", + "\n", + "print('Redshift endpoint: {}'.format(redshift_endpoint_address))\n", + "print('IAM Role: {}'.format(iam_role))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create Engine\n", + "https://docs.sqlalchemy.org/en/13/core/engines.html" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Connect to Redshift Database Engine\n", + "engine = create_engine('postgresql://{}:{}@{}:{}/{}'.format(master_user_name, master_user_pw, redshift_endpoint_address, redshift_port, database_name_redshift))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create Session: we will use this session to run SQL commands" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# config session\n", + "session = sessionmaker()\n", + "session.configure(bind=engine)\n", + "s = session()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Method 1: Access Data without Moving it to Redshift: Amazon Redshift Spectrum\n", + "[Redshift Spectrum](https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html) is used to query data directly from files on Amazon S3.You will need to create external tables in an external schema. The external schema references a database in the external data catalog and provides the IAM role ARN that authorizes your cluster to access Amazon S3 on your behalf.\n", + "#### Get table and schema information from the Glue Catalog: getting meta data from data catalog and connecting to the Athena database" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "statement = \"\"\"\n", + "rollback;\n", + "create external schema if not exists {} from data catalog \n", + " database '{}' \n", + " iam_role '{}'\n", + " create external database if not exists\n", + "\"\"\".format(schema_spectrum, database_name_athena, iam_role)\n", + "\n", + "s.execute(statement)\n", + "s.commit()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Run a sample query through Redshift Spectrum" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "statement = \"\"\"\n", + "select *\n", + " from {}.{} limit 10\n", + "\"\"\".format(schema_spectrum, table_name_csv)\n", + "\n", + "df = pd.read_sql_query(statement, engine)\n", + "df.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Method 2: Loading Data into Redshift from Athena\n", + "To load data into Redshift, you need to either use `COPY` command or `INSERT INTO` command to move data into a table from data files. Copied files may reside in an S3 bucket, an EMR cluster, or on a remote host accessed.\n", + "#### Create Schema in Redshift" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#create schema\n", + "statement = \"\"\"create schema if not exists {}\"\"\".format(schema_redshift)\n", + "\n", + "s = session()\n", + "s.execute(statement)\n", + "s.commit()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create Redshift Table" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "table_name_redshift = table_name_csv+'_'+'redshift_insert'\n", + "statement = \"\"\"\n", + "rollback;\n", + "create table if not exists redshift.{}(\n", + " CRIM float,\n", + " ZN float,\n", + " INDUS float,\n", + " CHAS float,\n", + " NOX float,\n", + " RM float,\n", + " AGE float,\n", + " DIS float, \n", + " RAD float, \n", + " TAX float,\n", + " PTRATIO float, \n", + " B float, \n", + " LSTAT float,\n", + " target float)\"\"\".format(table_name_redshift)\n", + "\n", + "s.execute(statement)\n", + "s.commit()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### `Insert into` data into the table we created\n", + "https://docs.aws.amazon.com/redshift/latest/dg/c_Examples_of_INSERT_30.html" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "table_name_redshift = table_name_csv+'_'+'redshift_insert'\n", + "\n", + "statement = \"\"\"\n", + " insert into redshift.{}\n", + " select * from {}.{} \n", + " \"\"\".format(table_name_redshift, schema_spectrum, table_name_csv)\n", + "s.execute(statement)\n", + "s.commit() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Query data in Redshift" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "statement = \"\"\"\n", + " select * from redshift.{} limit 10\n", + "\"\"\".format(table_name_redshift)\n", + "df = pd.read_sql_query(statement, engine)\n", + "df.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Method 3: Copy data directly from S3\n", + "You can also `Copy` Data into a new table.\n", + "https://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-run-copy.html\n", + "\n", + "#### Create a new Schema in Redshift" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#create a new sample table\n", + "table_name_redshift = table_name_csv+'_'+'redshift_copy'\n", + "statement = \"\"\"\n", + "rollback;\n", + "create table if not exists redshift.{}(\n", + " CRIM float,\n", + " ZN float,\n", + " INDUS float,\n", + " CHAS float,\n", + " NOX float,\n", + " RM float,\n", + " AGE float,\n", + " DIS float, \n", + " RAD float, \n", + " TAX float,\n", + " PTRATIO float, \n", + " B float, \n", + " LSTAT float,\n", + " target float)\"\"\".format(table_name_redshift)\n", + "\n", + "s.execute(statement)\n", + "s.commit()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Copy data into Redshift table\n", + "Redshift assumes your data comes in pipe delimited, so if you are reading in csv or txt, be sure to specify the `delimiter`. To load data that is in `CSV` format, add `csv` to your `COPY` command. Also since we are reading directly from S3, if your data has header, remember to add `ignoreheader` to your command." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "table_name_redshift = table_name_csv+'_'+'redshift_copy'\n", + "data_s3_path = 's3://sagemaker-us-east-2-060356833389/data/tabular/boston_house/boston_house.csv'\n", + "statement = \"\"\"\n", + "rollback;\n", + "copy redshift.{} \n", + " from '{}'\n", + " iam_role '{}'\n", + " csv\n", + " ignoreheader 1\n", + " \"\"\".format(table_name_redshift, data_s3_path, iam_role)\n", + "s.execute(statement)\n", + "s.commit() " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "statement = \"\"\"\n", + " select * from redshift.{} limit 10\n", + "\"\"\".format(table_name_redshift)\n", + "df_copy = pd.read_sql_query(statement, engine)\n", + "df_copy.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Error Handling\n", + "\n", + "Sometimes you might see an error stating\" Load into table 'part' failed. Check 'stl_load_errors' system table for details.\", and below is a helpful function to check where the copying process went wrong. You can find more information in the [Redshift Load Error documentation](https://docs.aws.amazon.com/redshift/latest/dg/r_STL_LOAD_ERRORS.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "statement = \"\"\"\n", + "select query, substring(filename,22,25) as filename,line_number as line, \n", + "substring(colname,0,12) as column, type, position as pos, substring(raw_line,0,30) as line_text,\n", + "substring(raw_field_value,0,15) as field_text, \n", + "substring(err_reason,0,45) as reason\n", + "from stl_load_errors \n", + "order by query desc\n", + "limit 10\"\"\"\n", + "error = pd.read_sql_query(statement, engine)\n", + "error.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Method 4: AWS Data Wrangler\n", + "\n", + "You can find more information on how AWS Data Wrangler works at [this tutorial](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/008%20-%20Redshift%20-%20Copy%20%26%20Unload.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### AWS Data Wrangler Get Engine Function\n", + "Run this command within a private subnet. You can find your host address by going to the Redshift Console, then choose **Clusters** -> **Property** -> **Connection details** -> **View all connection details** -> **Node IP address** -> **Private IP address**.\n", + "https://aws-data-wrangler.readthedocs.io/en/latest/stubs/awswrangler.db.get_engine.html#awswrangler.db.get_engine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "engine = wr.db.get_engine(\n", + " db_type=\"postgresql\",\n", + " host= '10.0.14.121', #Private IP address of your Redshift Cluster\n", + " port=redshift_port,\n", + " database=database_name_redshift,\n", + " user = master_user_name,\n", + " password=master_user_pw\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = wr.db.read_sql_query(\"SELECT * FROM redshift.{}\".format(table_name_redshift), con=engine)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Citation\n", + "Boston Housing data, Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.\n", + "\n", + "Data Science On AWS workshops, Chris Fregly, Antje Barth, https://www.datascienceonaws.com/" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/data_ingestion/04_Ingest_data_with_EMR.ipynb b/data_ingestion/04_Ingest_data_with_EMR.ipynb new file mode 100644 index 0000000000..c1d3df4476 --- /dev/null +++ b/data_ingestion/04_Ingest_data_with_EMR.ipynb @@ -0,0 +1,196 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ingest Data with EMR\n", + "\n", + "This notebook demonstrates how to read the data from the EMR cluster.\n", + "We are going to use the data we load into S3 in the previous notebook [011_Ingest_tabular_data.ipynb](011_Ingest_tabular_data_v1.ipynb).\n", + "\n", + "Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto. With EMR you can run Petabyte-scale analysis at less than half of the cost of traditional on-premises solutions and over 3x faster than standard Apache Spark. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up Notebook\n", + "First, we are going to make sure we have the EMR Cluster set up and the connection between EMR and Sagemaker Notebook set up correctly. You can follow the [documentation](https://aws.amazon.com/blogs/machine-learning/build-amazon-sagemaker-notebooks-backed-by-spark-in-amazon-emr/) and [procedure](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi-lifecycle-config-emr.html) to set up this notebook. Once you are done with setting up, restart the kernel and run the following command to check if you set up the EMR and Sagemaker connection correctly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%info" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install -qU 'sagemaker>=2.15.0' 'scikit-learn'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%local\n", + "import sagemaker\n", + "from sklearn.datasets import *\n", + "import pandas as pd\n", + "\n", + "sagemaker_session = sagemaker.Session()\n", + "s3 = sagemaker_session.boto_session.resource('s3')\n", + "bucket = sagemaker_session.default_bucket() #replace with your own bucket name if you have one\n", + "prefix = 'data/tabular/boston_house'\n", + "filename = 'boston_house.csv'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download data from online resources and write data to S3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%local\n", + "#helper functions to upload data to s3\n", + "def write_to_s3(filename, bucket, prefix):\n", + " #put one file in a separate folder. This is helpful if you read and prepare data with Athena\n", + " filename_key = filename.split('.')[0]\n", + " key = \"{}/{}/{}\".format(prefix,filename_key,filename)\n", + " return s3.Bucket(bucket).upload_file(filename,key)\n", + "\n", + "def upload_to_s3(bucket, prefix, filename):\n", + " url = 's3://{}/{}/{}'.format(bucket, prefix, filename)\n", + " print('Writing to {}'.format(url))\n", + " write_to_s3(filename, bucket, prefix)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%local\n", + "tabular_data = load_boston()\n", + "tabular_data_full = pd.DataFrame(tabular_data.data, columns=tabular_data.feature_names)\n", + "tabular_data_full['target'] = pd.DataFrame(tabular_data.target)\n", + "tabular_data_full.to_csv('boston_house.csv', index = False)\n", + "\n", + "upload_to_s3(bucket, 'data/tabular', filename)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%local\n", + "data_s3_path = 's3://{}/{}/{}'.format(bucket, prefix, filename)\n", + "print ('this is path to your s3 files: '+data_s3_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Copy the S3 bucket file path\n", + "The S3 bucket file path is required to read the data on EMR Spark. Copy and paste the path string shown above into the next cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "### replace this path string with your path shown in last step\n", + "data_s3_path = 's3://sagemaker-us-east-2-060356833389/data/tabular/boston_house/boston_house.csv'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read the data in EMR spark Cluster\n", + "\n", + "Once we have a path to our data in S3, we can use `spark s3 select` to read data with the following command. You can specify a data format, schema is not necessary but recommended, and in options you can specify `compression`, `delimiter`, `header`, etc. For more details, please see [documentation on using S3 select with Spark](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-s3select.html)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# EMR cell\n", + "schema = ' CRIM double, ZN double, INDUS double,\\\n", + "CHAS double, NOX double, RM double, AGE double, DIS double, RAD double, TAX double, PTRATIO double, \\\n", + "B double, LSTAT double, target double'\n", + "df = spark.read.format('csv').schema(schema).options(header='true').load(data_s3_path)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df.show(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Conclusion\n", + "Now that you have read in the data, you can pre-process the data with Spark in an EMR cluster, build an ML pipeline, and train models in scale." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Citation\n", + "Boston Housing data, Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Sparkmagic (PySpark)", + "language": "", + "name": "pysparkkernel" + }, + "language_info": { + "codemirror_mode": { + "name": "python", + "version": 3 + }, + "mimetype": "text/x-python", + "name": "pyspark", + "pygments_lexer": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/data_ingestion/image/athena-iam-1.png b/data_ingestion/image/athena-iam-1.png new file mode 100644 index 0000000000000000000000000000000000000000..fc30b7eed4499271d77fa813969516c412024640 GIT binary patch literal 12749 zcmdsdXH*mG*KbscAVm)%ND-BTNbk}?q$y2$F9OmD#SoAZP^wZ@dT0WI5D1}#5}JU> zQA%hDEi@5A3mpQ4gqw5Ucik`d>s|N$*ZnYS);u%M%rm?E_I`eQzc)70WoG1J1ONcc z&z@?T0sxm|0Dwz$SLvuDcSuFAsi#XIQ{5+knqi)G>IIF5hM@)kP?vV?+?keo%@FX^ z1_S{774+|MY2Kah-x>FUw5@|QU%5F4dHMU@HTMF#0i-3RC6y!i)5pAanh0{u0waa602Fo+&qJwP;L=Elq5-AvgBe zlMuEQC6;@iZ^F!l8{a*o6~1h#p~b>(9Q*LH34=*0;E94&O{~b@-;^(=r-!!P<>_f? zR*5SzeC{3|A;In*GH41y?BmDWP*dl8l~7Z9X=#f#_v!{3>XJF2pN21-d6iy$&%FWw ze2aVN!YSJqR~~fz-+MI?`gE57fB3%}H%4x)Yt3k=u<6eg~p=eRSsY7%Ay zVMMlc*%PL0MlP}NF4!a~q^@aNo0t+|Mi#o=P5Z53wHGImQn^&MqDZ`S7Xhk;Dim!X ztC`B$FWR?5Uo}NUZ78SMe|gsNJKyeGYqa<82C)jpllSX-d^w8+^ngE`=ZrCr(%jV| zQtsp~UsUV^Mb{wjOd5veVP;>B3$>@v@2ZJjkPR-NbFZyeLkH~mq#p59!HM8g%eEK|a{_i_Uhdvb>vH>B7_cQKDm+|UjI64lbmZ+ebH{p=S{#}eGQLO zA;zS29M#xH(a$*E-dAJ{n4wzPdGv6vdo}wEciAC`1^AhN&;^brQnenT>nC(H_@l#q@7ZkMV8f z?lmCQzC02>ZnBbVEjsMJ7B$-PWdsIOIEr*d>CAK1sw=x!Enw1oYCxdf!6WP@$j>@E zJ42ptR8>O!I6Iaukyl0k=at_b@+@WExNr3L4{e#f9`d!IH%cbExJnDb`FTxDa2{d? zMbh^zJLA6h2wAa1YTpkT6y0F8ND6NLA<=GO=NPVc0nNO%KjVt=+vPOm`fxqqhImh; zPDXKADc>kR;_o;J3xCr=I#dPaKeEXZ=2TQr6ZL&~G6`Wc(lc~iV$ZxYa&p0B*8Jhg zTCzat#7YHHLm;tt!ZXH#pVKQ#2&Z-6E+1M&omu>Ul zaa(damHchifkG!dT&Ft8?rH_ZDd8nQ!!WWUxPIYU+AXzdeMtH0xQ^tmzm^8R6TIco zsbsFXz!Y-F2ETo!S9X7z#d}e7PpJUJ2Z#*CJLh$a@~W;4u70e^MdR24Zlsvy9l!Su zsRP|O{@Y8dYsr?ga``9e!n~{VZ5peDo0j^VSNz+b;m8koazFgq(z0{(=Z`g&&(XW| zvC%IhlvG?EORMxl1bA-So@77O7^%(T3vs`e;TA`rpWuITUAAfPHms(rXt8SwFQ-jb zIvPs~6_1bC)z?kNkS;k_?UHhfL_;H!d1#o^kOk^Sdq;l{iLCTFZYW3^T$g=>9{CXR zsTxjG`vk2TlE5Hx>-ppBqR?@b1VL8g{@DaPb98*@)`B`ysFNyGxk6WMCj?dBQ>e`6 zeHcTY7~r6}(zj3%uB>p7MNHRB3x!#B?hM+7g>=q1K?{8n=~Ou>Q^KK%XDXj&9ToWb zFVH?qp=o4(=a|pV1#SMvW@#R`W(9sMu@k6B59$u(plcX&_bfqZb&u79*>>b3^?#h# zh5doxp9c`}jh;vMhoI8W#&20)s4v}Y4lVb@onPf(-E?UkiEW8z2u@&;!3g7V9A9G2 z(h6(bs?0Jd<6&jP5vB$C;l8|g#!p(4GN35ObuSUQp=MxjgN|K@!olgUk;Fc;bB+un ziJ^w(mAtXRPH=EyrL?8j(#r)X@8MVv?r3b>Gy@fcQo;mpi$C4C*|Dov57P9OdC{Jw zNRskDobz55wb~Cz`A1N;|G_clpIoezHD2YoOng{1p8Y(+qD6Zeefj$XmhqF^gN&_q ziFq=s8bwGStHsp%;5C?A+A+!UHDA#&=JgTl!w-kP6vewVFFm(sdqG#6w#2v;L&_S4 zZi+PwzxNw<&<>oz*P4aN`K;@e+S-g|J*yXGW{d*FixF#X%QDNoY2kgTDZ%K3&%S+fWcpj zDZ&%W`F8q~zVCL2)J!CXkm71Z&nIO; zmzv76FINST{|Yioqmc`kq3in_S263?=+QCrH_R>~znNg^wdsOU`<(M-q&|aP5^%IV z*B0XlGEWBINDXnc!%7i|!#2Ton{?GSN_tX(4BofRnR#q}l`ynk5~Muq@O7Z!Rv?Ym z_os&)s_i)Z=wVI2Pgm8{L4t)ugP8{JNv9ETa!(Y5dm>i{7&VBvVm}6K_7x|RB>&-w z1Otm^_x6Y;S-B#2{GkNQh<@&-&}eAq7>^T4R(~hE$ZF>`KDHq>G$t17z0zSp(-+7h zSV6oJQel`seyhf|9ksy8sdL%Qb3MPVwZF7ivO~EEp(OX-!oVO|t`BDOgahfdhT|)l z8?mum4C;a9R*x))e|4E98vA=VR`Pj!Jzje8>(?m-_=Baz2zX__=J63?>jfbJ^i?q- zOkI34=rB)dM`Sa!~@i^_J_Fa=;nI1z3zvr~o(2jE~?vtTwDyW}h-MeT>ofEuk|y)qfmb6a*() zsajy)8oHC;co{x^_G^CdYScd>m3o+H3GPsvGzQ>E{AN(Hq%9LtcN&P6tELOI87iAPCU95C2m zYRu4=Iz4k>aSNx8;V4UqV}ah+szD%^d(qJ6`L;;bhcHN#rdnCMHZ=<<^LuZ3J!rK9 zS^;$Vmi>vC&d-|63c%+cR70oCpMggqnm;_}qn~l%)E3dm{!;L8b;g%b*1S*O->fgy z*>2#fTDan1kIu^E){gRj&@<&N+bG-d(;b^WIY`}*536hXqYz@l@ZaYlP=SYov-hH=fem?#f$8#<7SGcHmAMGQeq=!1do^$wA(lgBYY&L4z z!qoN9^g`VGw$gV^a~q35cSA2T88g*CJ_9$U!9L~`wiKxF+9^FvaJ(=H_l5HJk+mkb-Bp$KxmTs(-G9Cvo^-ScF`Mgrj8Af+2oPLZKAe zf_2wR0DSF1&6Q_bQgRmKMfaf_-AhKw<&xtk^M?6H!>n}-&7O;Ep?XGzU!Q5BPOA1* z4s^X%iNnmLLDe&dudNXh*5wm%%SXN;PC|*gAl-B@LFNqWY6g66A&^_j;i2WU|!GzROM50^hgAV5`6&Muk~?z{v3z zc;R@>0qYHfACUwFh>j%6-GMu9lX$mTHgc^GO*_T7+j{}ievQNG9>(w?WpM3zxfS;K_-= z8KElAUiby(`~Pb!xWbJ6g;QIbL7*siV;3fh#VB9h5(FXaqZVMp+G2I88DG-ADjj5H zS;@V}{g9eH?HYkr3;~C$1y;JAbvqS4Y2ppH7{8B1(7XXn)83?Wov0!8Qv5fCO6e=m z>1ML>LN=SKw`W`;Q|@`aE$@^z=f9uYA0oczxhU`Z}G!s|N`-WRSZqWX9m5()<2OM5;4duV>DkASCN^{{()KTcJc_!x2*DdAZ zkNW-U%GHQ1bYj8V^{GJ@^Py3vWowDGMW>yN%d2_|ja_^Dd>x4I>`|V@*CIO>)Uxa>@{}x9{kT4Je6@4BJQz7xR2to{@RD)8G}8GK$;@_u+=du|zZ8C}qdU>TKJu0H5dE0;NRSrC*R?waKd^2dLp zOE5`f5G^ZI(u;A-{T#SF*w}J?h^b_v2A|%lDOkob8N>R#HnoG8lXi!*ozUG(*T_fn z<^V?(BwQW;o^{jkBInf|u>vJ?^7ckqDsXlU>Q{b)0$}D+)!sy#vyQuDAtzCsc9*K3 zqigg6l(g)}K^5*w9A#6pbZd`DVL4?E|F|F+IXio0G$6KeW;pGk zi9Z{TVdWnP0!cp@S&1r$LW-7afxkbC3E%kW@_fJC{jJZvv{XR`m_Kpq;89>%U8Uuc z_k7db`zyxsDWAON?YKL&D*2pXl$iWt*c{yNecVITT)Q<}f(`Rx;F>2wCr?4 zw5XwPKW$}E2qoiU+P zbxAoOLRdR_@Lx5E&QRifdN;301Q+yDGu~_e+wZey@cUaHksG)Yt8$g<;Cmv_OK+T< z6dcgEu0MGGtUx>r^7aTvY06EFcR1W?8;~Wu@8LVJr@B^f6H(`BKYzlsWlU|15r?C> z3;i75?nK=cRE5L6`WMZ`uCnzZehLk_2cd^AJI46^!+SXoKrx#BCJi6vJU?7tYHmKL zP3?QL2fed2)$bUQbksRKw&~o&48S*lcAMWPJ-5_BVG+}?E7*vYGqLI<=#K6=TH-Cq znefh5E|9K~*)1|Q$^&ZYkDP2Iw&wcpE}3Pt5njQj8{cz%FPuME`8q<%<=f=+wQ1I~ zW6hc5-r;+EmaLqyl<`?xsQe!ZrTPX4i-+6GKTT$$7WT7xlY{ii8y`>I)hw?pIDF}T zzm#ecyC8tBPlX?pl(;a4P)?8eg=cIpGJXJ%YWjCJr~IejrLb!1(m-9D3swMyZ2J-C z>)#abf}_KNK#?d3%Cn)oS@|dFaaXD2Re+Ae3(R-GFsSvopuPRmycWXRR&+A_J+kR-ZV`JmAPLh9%8w#*VdW zD0Ir510WvMc$+4`J-^sU4)Er;_RqU-WJ;eTvtJ(4x=zKb+yAMYnf3eR!NK(5h!1Rn z&%H^*4lt7r=zlYUmwOk>hM-FNR2na!xERX$pGx}u$FhwMNMsDF-pNFe&;3-nb< z_OC8>YaT1TO}BC1zGg8(S_)SKwfCyt0RV0~RDO+T3Z2hobrDDAV&mkS!sK?MuZ?WT zse6u$ORSA7^?IiVP{VUJg8;uy3MTI&QkNtnJ#n@_U|#={0q-cmcKPCb>}rP$`3fc( zUL4PX3E5`IDky+U`5QErH}m@;THt5L;bp+HXOm5Bwn!gLP}Z%q9-chc{?v73-Ll*^ z{O8&3GA+dmt9u~3lddgnRg=$U(_i)S*-pq`3#7W2)05s1Z4nEY`KaUea(L%7h+_@v zghbxqTftFTXCrpZ_tU2fon~I#@>Q=41R%{y7Hz8P%EUk5-`^RAa*374+p{~O3{Alh zXs6979JG0EmEmfJ^kye76<>@N-HG~Mct#|Y=Hw)X`HY!EP@eA_ENhC0TNwHV{4MJ- zD$8%g=PY^IjQJF^pK0k?`DMN%(lr}E*W(k6Ssbq> zhWVBZWRI`FZ7YDA1B6OY|1sBL)D`pASIM`c>LzN=n9V!FAcw!I-@x0yI#JB2AKeX#ZrFlbDJT_- zqNN<;)8juOgbPD%6f1sk(#Q43Y;y{~ZkqS%R%_VrO8ik9i4V(H_}W%xz6dBG!Nj&| zAhaT*nXoxePY=7`qnn;pei4W2dAL!>@Kxss7R+APuVhI&<_pLQBRBOM!py45=SeSu z3rPULlcXh+rY_S{#AtjLv} z#TPT!A<79Grup(X#tK=N&Hi#{o_zM2;t>{wh4RVb+M9j$FR=xUP1XD)#um73L_|A)|?)xaQBfBQ!}qTgLJi<9%n7c^9cu zq5Q@sX@&~r4ex|ZWJ$(#F0>w%R=z1*D6+i#?5y=+pQ%z36_Ds|(Df=rp2u27k~-M+ zU{%kSq0Tk{1EQM>&=TE+?56KWS{*nOYHL*fCEX z8g{s&O700zRa&zNM z0y!fGY@KVcTwn(Z$P#PV`V^KzW4-XDX+ktWn)}!<-xsg7udN!>v2c@q$M)|8U5i_jydTIA0MEb6P^^ zrG3wMJwLK%E9V#VaX#9aru90o29Oj^V2VQ&9n=dyfaqHtmNM-i=TtGN9Y0?c~+68{H;rcNK@Xs8}DzzEL&hp?<_X0On&tE*DNd6RC4w$(dq<3*%r znq3Zcx6sEl%f@Y2@QGTIOuX$Z`34a*#1vwR)MNMXs%=4mg8~m81pl0A;#Dpv^GK^K zft-J&P=do*nR$FK{Cg(Z($v1J8geW9=)Ldq_Mz{hrD6c9X#q&3zb2Tx{4Jpx;_G)I z{Of`_(vp?wb+NNB92YyBRA>5a za-UyDX3OFv792R=3Cb1+Vkc~(LGKYBwG43&?&gj?SumD~bEC^q3m&wK_Rw1x9Ff+B zuuxSR`QK$#u;2fQ8tmTecLjeQn_+p>WS%iJ-n zd3iEO0Gj*5BxTGX3mV*iY5!eD``N zxeT`XAt~v?c1_j7TDojnmlm1sGyTp-`ZV`LgO!^X-OFFrM+Ajps_Z5U=>a$SdhQNC zhmzcVh+)jWN*Vvg`Dcp?`% z6(y^S5CDSxR=_5iuTT=jsj6MC$JHR6p3L9jtPRHz5O(ci^qNy}*YNSkU-&ch=XoCF< zqbzg+Wti8z?F52Lh+UwWj;g1>PegljbF{6L1@FM-YS!Q+d-( zLfOvXx!&i8gD2_nm16T@ixnzekH_OnrZOTanJ<42N-mfCNtAyg+;F0NQ5Tz9#C@X7 z#5GYKa#8c_!(`F6CE#Y1xD~uh+$7VvsA?`x(fX|2%?am{(m5CkYu3shi%0-&&W;g@kRS%aVS3=X_T1m_wft+Nv-Uw$10) zk|uUBWnqWQU=0G}(AjF>0d{Bfmd||q_K(u6Y0}ZRLB1_;2(OB}l?KTXZJ6F=0Y#=Y zUGval$QKYa%CA-C6+89nf_7znVeyBL8Ex{&40>!{RHI?13;7{Y4Qgy%c^_RC9vKKK z9@un6{uaLQ*co0IvTgnv;G{Q*m-5zz)8#GBAay>W7QOF*sbD2+P0B6tbY^I$6dB z=6>PDQQ`~b)9PVCZXJ5T#3(adm*=<|sTE6!u+A*i^IG}=30|8Gk(~|AI6Pg2oNaz8 zlJ%axbLUkBQOSKBrsIpgufQHsb{dpb;wN?pyVy1hF|fBih9$w> z;b28j{lQhJgz;nta*X2QMv2Tp$FIM9)pk_&Gh&np>3X4&DRg5ug{3CsP)jJ^9<>MWa>ro0Y@0qO%<~ZW4u*@Pp+|FW4w4iXv9^j=ifu}AyPb9NPSx}? z6Ae9?#MCmouvEtX(>94NAwd91F6^O5ZSneoN2UCa?8jwn@! z)OXM5pwwEMk&)~iuIl9oa2Er3RZ;z%?1t*B3%TxSW~&g~8ZwvYraxVF2U1f}+GUHF zZ6133KgCYOZehO?k=PZa$P7nkI#wN2cO1;Fp)Af3(fyvxOP_vbuk(R+pb~-^HoDb3wEGxw4JbB%<02iytSt zJq*+ z`;HNaDZV#YVcD6?TVjwI@X+j)2iIo&HH(z0Orn!#a!byu#1wI#?CVfTw;WPc#sIT} zPRrzVj%8q#hhvP*WQIb{h`Sy+8Dja)7O85Mc24E@tCK!f!lVb!+$q1pq)@VtY~A35 zOr&_~0%36C(&m%fMlF*!f8jVdX;NNcX`9{Sq?KG~fA&syYjT)36-w07=P~rG=!bv( z_SaL;QtEqQH>dB0>xm{c_KKva&yv5iSDJj}@L3X9={7tao(|nSInDQ=)dme80UhMm z@@1J1+T;<%J8+3dm2LbNms3ycJv#s7nC^e;@9(#)rRZc#vKB7(LnQ*76pPFjgV@Wp zMv}!mA4K0}?8P{b#LAj!%SQZkZv%cjebf}N?C8d;P7cg9u$I=SzFSU%ZPb*VeT3+4 z&q6eX&0agp8KVb%lqwy}`>QpDlsZ%9EpNvm#&&K1Pzl;{PE3G8j^rg%qADYA_`6H6=o?B&ndxQ$)UT3<*Hl5F`-oy1I%Fp zceMKCn;W*m*4JV2fe%`aVLswl%ZJcq7f;73-0YB~MX9z%0us@*DzvU-X5h8(`ILL> zkzRUrlHLy!f5DsMsZz!%YxStkXL4ZVUzcO=s{|%ksgHe^;dz%|#R?hGsLx?|1j?gEKXgc^#Mqm$a&mg*Y6oP77_3R|?KheNazX?? z^$7_J>mglS3+x`hG;tdcuJLogTQ2c7bZ!w?J)^jPI&WUH{JZUbL}bZ8))r*SvDZe2W-uSmtNO6m$%oD7R6O3asR=e_OVwXp2qoXcV>R z{AF-ieY$p%(iY(5`)5OWylF(1_Vo2!VT8JJyq9{5@4zKwS4AS%1bkUEZZ=cYLJyqu zS}mum8?Sjwqtv-VxcuT!=#v8!_%VIx^=pDA6l#F`+dA<6hbyvOW%Xp#DU5gvs<=v>$V9&6p z&_L|qL7GjWA7N99JcE+d=JJrt@VegBuk^sRYu;Tw;dUgxcv^i1Gu$T@(B87V;xbWd z!Vgq5c)FzqxSH&KoH5m8XnB|2SA zMS!UxNxFzeEJYLwn?IA9H|1tyO377!y=CkHb+KpXD7};YeWvT=#msMLKNOvFTAf?` zm6C^=jZEGvl=CNccC|MNTfVb|LuF&$m>28GWj0-QHpzYb^tx6c2*~Mc+?wx5zz*KA z>|^4x3;B~B93ZXUyGmMi!cq0^VZ3}=}Jo^D(a^bM&2ZwcG1woogoR(^9g zE6Han&EKLb+QD$eH69@~pjq{9AWOWmnQFZ;-jjFgzDGvNJX-wV!ws{#=Fe`|5=f^> z-my43oIQaCXEueV#zsW~i8PTgh^9x%Z`Uv@`cw|9aswN(d`GMw<-`M0WM;u3SAhrv zZ+Om`Y?4XapW_j~cAxv72WW-sS(?txfVaT=dD43QQ)$y5EW|xev%~f1R6W)Xk5Qt1 z=2f>3ITvoOD-Jczhjk#}YBvw<`9H}Vb!ROn!tD7y9S+V|vnqZ4t3DsGTF=Uocv}i{ zDGf_GshL%O9ky%RGzY_>FiSYfROX9B2#=Z3&-!5WoOGEX`rur@4jpTls|Y<9H$h&! zNTBSlKMFs8ER!qoKxZw~>^i8kYx=+x>}6{h<=A3ks3QjY!GSB~P?E!b4(6CHQd~~W zLT>GP#Ai$Kwi_JP3bCJ0!n4;pMK(OgLX@}uOOG9grw=ZCb~9z0U>2hnv^ADPQ0;Bm zH`fNN+CWr_JP~y5WTiDMU!yl8HCNcx%+P0MvEb)sex(s3?>Jf#m}o#M+jko@7U{Ed z1H!%yzQA!Wm4j5Xr2?@!)ZL&Qb3yuG# zn4N>pxSd043GL1rA>KFF*m1ZE8XUa4rlWy3F~O$%IoVtOH9TAwWjbIWP`whZN4G3rj4H=mL1u_ijNhf(Fu*H?ZdwJxe~PP$p|gab~q5&k9WNLKM7 zsV>7dpAbKW`{EJq_6BCs_N8HzE|OzTAl)_7-{_+03#Z9inCf??4}6me8UO{hh?w4} z_HB;~z;^NA^L0WkW!T;3LT2hLuF4=+TG**3e**(|MRWWu$y6xbbC)rO+QaSf=5|5@ zI%;x?*Tk`{vef~ZU*tMHtrW4gYr`B@{%iJp`!wr{P1xG6FpZ(-0cXFm#9S{1-Pgyo zMIaKxlAe)<1BU_h`e)9jlg7Y8J=>GrD!v0qQGM!q0{EGJwBfFQ8yIQF< z>YHo&ozA->ANgmRK-nx*s++nepqhpK(=QkFcAzxp3P|*yf9>R%wvkrNlNZtd1FDm+ AG5`Po literal 0 HcmV?d00001 diff --git a/data_ingestion/image/athena-iam-2.PNG b/data_ingestion/image/athena-iam-2.PNG new file mode 100644 index 0000000000000000000000000000000000000000..9c2867ce787da4f6a36b1d3a5988455a5c4181a0 GIT binary patch literal 15537 zcmeIYWmuct)-GCiN(-ep)F8OKmSV*z?m>!MaZ8|3T8g(tf>Yc{(V(Gtakn68v0#Cc z5F|jd)AwE1de=JN`FYN@_m4fV$V{Fo&m41%F~@yBBT`#and}zbtqT_}kg2>>(7ABo zGLTd+y>X57sk4u>Abnl*(oueKp>mjEll0??y_|;Jg$q^jx6iDul78QO^U~Ps!UYQN zKkti6cFg`4E=VbVY_4yPoF_lOc=6NIY!D_AM}6T@ z%ro%Sb55C$f;2C`wqCYmh1L_U?=7BQzappNeUS6`U~fe0tCup&WTfcW&YD0Ascv`)ICjxuu(fzZq^08 zo_n8J+*4tG^zn^bkUt$ZgCpgMs#+ffg+7HJ-j%a^kBb+2dlIHI@wu|opvLn)#JwAO zqp62XPFpBa<{ADA^8wWRy_o!E6^Ffhuaiqc@-JF6T%^GE?P-qf$x6LsZi7Y2fNWXO zvsIn=!jJxMUM_-89dBB3RvOItu{L{ZOB>(Cl^~(t2(M=z3ZzEYqfHeXPn3H zXtUdBkBh*dj-_?ylUHm!iA<6miOxYsjLNMKM*|n@!KJv`?>x--f=Rscz=?kWfF$g9 z>O&B1G={4}L&p+c{b)w6Wd{&(*eB5!DU+kar*e3uyGDA@(D(*PoNILU`k|4V@1;v3 zpU#LGtMXm^a5Bz0bI8;Oy1C}R6!R>Fx6KkBsJy=J{3zwF_A}C0gMf%u$uWSQyM~T< zI#+^qDwM!Y5`Us|2G!L&*UFFwo$71NPuQqJ%+Lw^jm&JM!RA0!{r#)oL1cP=_A}oa z9%cwj-4BJ_>FppRIqEh#I!9b^o@M>GxcGv ze4%-q1ui)$o4*2h6sqF>F<@L{M+$Ji^R?-^Zb#x(?DQGMdUR0;{o)l1>fYx~qo8yC z8Db#X^)h_LS!_&+!?d4cgus%Y&XB(RON}%BoujpG6Vx^b9$AABp&q@YAvwjagG{*9 zqvO8wBt}qYEI!t4XKZi&k@%kAV&I_eCev?}q@C7ex!b_?DSuWA{y zzw6wZ)@CKO73=A`#}e{WcV7RS^i^wgWCR6_f**HS!UmRC$y=>R?rqcXH}0e=(>Osi zISzizvaf}^W=N_iMwSp$j_b#!Q ze%OF}%IPR)9V!Ye?YsR__+yUYxdLs4cgz^*L_JSgMxMrm=dmz2rn*1_scynW3y5e6 zRVeoF5Wf4U>CBrE`8nv)jK*Lcz~7Y6K9MY49F8_*93=yYfCISZPWck$)@`J$oLDKM zq0xu9-}> z)GfM)YDf_)a2H?jGfmXIp93;VjMm5IagPI-`H&5h^0VGNq|K)vz4Nqv_7@cBurP^@ zKbZ|YwUFqqYvttUi%oJ<&+AsuK)G_dWf2H#<>bzsS%(ENUWXp~FdW0SgRL;TX1YL; z#w` z41FEskPgsUrPVuo_mPml=a#n$m9E+Z&>)&pG-E#Cws;+P1AU(BV-6?BxNAhnJF6y> z=RQH3EKEfvuJnIfVpqSV%N8vx1y6LjkXYl z^vo=Fn)3!WZp^Ta74=mcUC#yW`52XUP$$fAT$u)t(NyeXfVsgaX+5Py?d8t2%a1+0 zxwx;T+gN`@fi^)}Xz))l09ZnNGCX0TwZ`(M9s|YsRJ|(FZdvt|dF2w_j=HMii%lG^ z06Ezh!QEtaQ-Q)I-`(|!0r?4HxN3dTlG=>y_@iEX_=B3Cf*0c2J!0OsXS__@Q#SRT+OnO~F$g&1&7M5IA?^`3ZLyQmv+3-}Nfd)Pl!bZ>JQ!FG)RN_66m7KV zGZep8(3AS3S4p~(V8f)}gmFfwIRvU9f_L+It2C*xVmRILz+U#n#uw6u1b%k#8e(ZG zCFv7AfQ1RZg65%8Dqd+i1{yZn`oBf+K9iqD8*LZ6`%U%WnG1piqP*o5lH7cxxh z7)nGMl?FhX(z+c64N01yK}dM0*rN%(la>~$Ecy0^BgX6cmFzRh3tEU9m)}?O5+jYz zhuMONJ(MjJ@813;rV7zrHMWIXn)vuaCX17}&)ONFsf%Wgw|hjs2HO|K>((dgXPpn6 znm^af@agR|_}wf9=5}3Uve$L7sdH_85-)GnJ8H0Flb1M=w~+wQMG94rmq7`Vu$V)! zhZ|U(rmClb=P%;re~P683XDk}-i<*PjOki>V6X3%v0!Anod_;HXLDH~XX2Hom0G)_ zf_gQ}AB!lQ6tUKctd6MH%@$gT;R)WiE|q28o+u3nqRI%#=lo-dcZMKHBL#k7{VzUR zE;&bR!BRE#B8Wf72Q#hQ+OY~DB)2=PbDlXFIAU2on$*|!fyZ+>ekpW|8PQ+_$u7Lu zdi0fEDn=lA@ig<*;r9Yz1wtsyNXzx4cAkDZeO%S?+Eha?pnPfTIU^_V+;WdW7kTqL zHWML}o6K#SWJ|>KzUVQbMtut|Ch|A|)l36=#mlqer1)<~7Z`r-&G{zcsy1UUBoNCe zcNO4Z!H&E08Nt-K<%}PmCoX~m=Toe5X{v&AdLQ#8r~NG$COY2adTC|H>v(L61HIc$ zD_TRce*RXB-?{+Jh@?`|M)YSy;<(a_)kJPL&Zn|Bc~o}rz@W7O`ACwEEG0aQS5J`l;hVg(Q`p{t zKKr0qky+jl5T507E|A>Xr( zbOgw;^e0#MLqkdc^1{ANViAle3fJdzFwUVqX_hs5docYiRYqmiF11&y9&E#6c^sIk zF|CX+n}{c_OLXjD!E$)dUdH)vJnNeK$RWN;{Ef8Mm)~9V{pO&-v84kY z*W|FdlRz##n<^O-6Yaq8En#sP$MLlOU(=?>?g)r4_{UGETK4Z_|H*9-&UU(|BPBwv zaJHZrvkfd1`PwyP%sg(9*rL`gZqTM$Y_?K9-5Q>IAU&5nZVdWY6v59FF z>S~%z>7q=T4fR6Ew-^tq3g%mz_GMNrqueqQC5pGnY(Kiemy@_%JsG>s1$^MbSU%USlIAOWRB#RA{ z+%vLOGhUl%J8+$Npal+k6_@Zy%7FjWq`#kccs;O5XIYmcd8=V9Uw>tL_vvgsu2)BT zxBI+HOeEB`qV_T68*{qF2JV6(AP^{A;r61aN0-frr(ns~#b#>c9?(@<__SZUF^)5* z*CO__9|O0z@0I0Mdc)fDGH=td-Lfu^D4>C&2O<$s=wlnTNG?2YG9KHEuM*Q#A3`t0 z<}4bj?+c)1H%=H@3Tr%@naJ z@$t04+ih>vP}G4#M=SW6_({@BXS&&_TUoyuX$RccUD~$ZQqrh*2a*g(g&QJq{E!7Z zwqB+M?bn4E^#Y`TTA2y!Wve-w3UvZxhZ45lm+o;45DLm?I8=9bQ?}~ePmQ`ao!L2I zVr4#)u^P!?N-Au|$lEWaWp4J+W=`5rF2PzTrS!M1OwU90OKj4EbBiE$T!)h?q6Pq3 zNf7RKYeH^0DEs*#OgudzJk!{j_Xzap}f|esgx=Or1wA$XHmctqpi~h-0c{~(At9jFO{B*hN_urMp^#& zEoKHkd4%$hl(i&xb?d7Gc651XDDuU?^CPRMiNY|)pn38$ge#(x2i<+NHm1|F`=8B8Q)sgn%T7&m7P3Vbnl z*Xn9%Acq2*%}CQVO+&?KLw-9=LZlNd|WM7)iJlQqCX*upZM`Nm%jge zRyR#uTsr!ke%Pc;$h#s#ZAw5%k#=atIBqTWiAJbXHro4i_r%TVT+XLTYs6*!^Vd6L zX}qJb129r3mA7aw26ad=?ne^$0rR!Olj?8o?xm%rMbn;e{hMYN?pR{$%0BmaECRTv zfH(=i@>see%oW|9qUY}^4B?uU<(!4Bu=hoUGQ)naCSF&ai#aXWR&WCTEW=X-7_%6n zY9>r#SVv`PM%SZ<-M(H680Q{+GNfJ{gtacsHgTn`<>1hp(LUF8MWDJSP4$`#g-4&A zFsO!>Za;iAe^b*^tEo^Fdni5Gm@yFpW6hY+*Igt5GU-!O2dm6h;OE{KPSQGO!z`-? zEXW!@46W#JUgP4uc-_oN%!)*s14Mbp%SoEsVo_0=(DY#u(S_SG$>g&0rCmVzvC_gQ zuf)ZImWom(tTnQBR3Irgu>8i2pSDVqY;?9xa#k28%X@~Ryas|7ee}++@E%MTVizZ? zG6saKGA17OXcXtY=adBZ7Ie8f6Ar)ol%>|BAk?cU(7Ih6b5JmYX$j|H`~yJAn#E^j z#=0vRU%toP-m~d2b6P@f9spZUwsNNN%SU9}!^Yb!-o#v`VA)+jbA|9P|G4&>_sdkK zxnmgi`WDDVB4VEM&x3O3@8;FlP{0^_Xh)uT^t`{}4HMPPL)^_2*g)u4!&7&RSU1xN zGS*w^7OVUzF0P%u9^+?rTdSQkPVnOB$%bANz&#kRv{4IK zrC3|KYE_WsFfwp?&?I0xaX~2~#oZ-;arw0iN&Z~uFU3k-C#LX#V zdDnj7N|n1`uAz@PqcFX-L5=18DPuc*x1yfc9n02d41xZ+*39%lvXoSL#_}51EAm;Y z0On-18_mUpgZ=j5o{4tjpC$(`+!|=(nfA z#DYHj+;iT&9#J&wUiT8(v!axB!AT=A^yz@Zm)W@a`r$884MM|uAvAAJp-UhSv)9H> z4yl`sJk7CD%o64LR${$BtCZX&z45WX3a;NofXuKgc&tJZcNF#2l3wUfSJ6Uvs-;x4 zG*5W$p|Ky)3G{AciFG@@FV1!WP8+;%Ti1V(T;!p~opRTP-DoDRO3}=krmRf1HmE6W z5%%JR3v9}7=5xeO_HSC8D|#Ghl_xyY5;C3YyQgK1V1lx{2{uMHJLs-GR1HLvMSDY8wilcmRtk^Z_kVV5p| zyVWI*_{f8iI-S;X-5(VlChu&NTq6iF$eOl3 zr--X=5l&^Q)9W^=&;&EXX6_Q#1dmhfGZQ-?m1qj|c*Qu(Q-;` zSwn0~ZjY!jQr(wpVk*PK)E}xJS~rnry;pO{O!vV zpiFe(;s8pbjm=PYseOX%w!NpJ^ReyXaNBb-`o;OIjD+A&fc{qgloCI%9muPtgB@J< zlUNz;1h}1KqPoh`zs)6ld_Vb+?Xp&}e>GchfvZc+fGaAeE$R{L@G>N|{lZ~S{j1nC$+_6>nD#XQ2Y0@hQFOj-l$ z>UvyvjPht%vpJ^vRkuzD(s8W^w*gU`tGgLpm?@=iJDw-u|<`#COBblcQ1@?MLzl zv(6qkIgD*-au|1OlN|kCz3$bbnEE+Mq|)s*5z4xR<|l_9+Spi`_L>#r-|9-8qqSG< zs~>v4-w2;#?IyoA&4ChI6oVw!`HhJ$lHdINYKto?C?g35Ccz$L|Khc`bt@xdGs`X zu|E<-$6H_8?@T$C=>mzP+Zo77nt1+hVQ{8DiF2)wC9Y5--OiOU;>Nn#Ss-xcxklRc z-b|ns8pYsUoz1L0_$l~Oj<)rSl(>`SI_NZ2$XwkspaOqeq{-n1U>qeuw`i{#-moO> zMOP9~(?09K>i>&8+ElEKMzmU(T}Qw!4ICK=Ti9MIe9Bt1EO+#Qg}YYr@Yz_F_xrQ7 zop^A5WUFL=k9%s2OndYLE%LN_OQ7`1@$87}Ra@DjTo=FkcXXNdL*?kAY26h}{SCV% zWpzru{JlZQQN2ZyW$Ksu0PvhQ*cfT?V(b8tzhb=O@%S1_6Q_ODe%Hx#PT}?x^&5YD zKp4F5nRxQpX`%C%43wi!b#`CP_C!_j?1Dtc|(s}BcpDu6d#tvz5T@n zCmRXoOewvjk%n9bA1VvBM+@DbnrwSLD0KQ|)}2n5c^0Ne15}HCN}O>C8mAIaIqH>J z?vkNN$QXi{@4m#;J>KUU)y3`3i&0UzIetz4o-Rt(B_yZlonZ5x4F;S$)vw%mpCftA zBFt75v5w;(!>xG_Ld=I>V$$D)aLv|gOFLR+of5fHa|)1Poe>Z2>EtIz6A0V}KZS*VyNmf2FV zO1WJMLI+)?(gEA7E;3!R2}={ILt6fT@j@c(*?6|ptQ@x*om7t!n3c^XjGE5NKkU3dZ z9Z5GC={0F+p&XUOytr!geuXXG3-2yDDT89Ynau8(J~9ttVH}ZCprwBC2Kn8|ydW6F zFAR0di+i^?*-5+68+rwtpyJ7;QFZrFV@+IjWI#^U5_pnUJvhtcv+|ZX43EJNUWBLFF{m&WzGtU6*DDl{jl%n+KS6<}s4npE93k+zVPFL7tJC;{p zao5;|p(dqQPpDDMD_dD70^RC0Q0?Jn_YX0?J7^Wi$q9FJ%<$?|fZemzR4BGJXng># zKX^Q7__O;-TK(4+hsT)J)(1rhha8-pL@Ph$bL7c5tts!TtbteTbHSN5!&S9e=4y!b zymTm!QFLM*$Zs4xn-PD?ET)3DYYzx4qyHj&_J zybLGC_4f3sjO1rGxHo%g;tLZznbw_Wtg}8D>b4)0`h>^;OdNKHC>=A1`dcDR0^4_9 z#fY|*;5caohv2@%7(n{ex@b}Nsn2YdxBl3(MZqVUjZ>oS-Tw1As3FK0z<*btGppQj z_p3L?FRsQ7<3{wdkzY*lVnBLybZNNd-!;C!wElR1P*r#n_|&4G=45O}JY)Ryje2Zl zretqStb3PXjwJeAp{PzuPG2*OC02oHUCrYD z{p~1Xb}u%JEUSE$SX6>&d}7&l$@pw#OhjT+WsiX5+8@czWPj%tR-}G~@;sYYmt^^# z1bgkAwTE8&rW1!I%kpcr0-WVWaeZm1f7$PXv#>;vJC>tO@ z(4W@gVW(|5lm55yheVM0%BVj}h$~WEAd7{DT3yUAtsz#n+vl5sn&z%Afj2p^XrT{p zS8YRE^{v}3H~AxY<-Rt0y2Bxt#dxmP1);c`RF8IwT9$}js2&jE?h9< zJ19A!C^xS85PXowY@9v*J+p;H5O^nhAw5ZSD-y`xZCaiI^Ndu1?S(8P3Q+{4@D>Ku zNvB%(nNBFoEuwsG>C!pEJa=Vb_EokI!xuzX#7DID!z=W`^-7ZMHi5pJM6TLnV%Oxk zfw`gb5ZvBdwMV^~H_M-$QUq0DGhD-VwbV-RSL!M)DSoFc@hue2sl|A0k0dY-Igd9R zvU}7cpP{4-4ZlXl`yKb+vk&V%coT@KcsK2l$~PQU)E4)lPjccs3UjE|oU!F4dE}X{ z1DLV*o3LKXh9g{&74-Ct`eUa_uIq#D+1Qms1LONf$A+=ml~Egh+hXLsO;bU z{+zC=Y0R9h@?j#(10eq80Cv(G1YcHGOmm`6D&8(^pX;#UUf4%y71=}9@s3{lD8mFP zqUV#p0>+>j(db@tcIp|&S3zIv{Fk-MqZbYQTYkbT79p`<;e~@+;6I1a2N_Y|=8EPL zMLKW$iPvOn0RNPhK0tTbXCS(@&DTT>a4sVP49n`QsP)&3?Y;By`XY7Z0U>brP0PEq z5HnO}qTyfgcPX;*yHGpj57<4CKx>2IhtSZO)y~A2R1bkUdGoUFs#gFh7oqKkI??!n zV3kCb8GfEq%r#W^v-kT62RK9VB_$24xZCh+e+G@T2?sX$!JAoc0Yi)fQ!ojPf~V7E zZs(&SVc6C7Mw{gdpMK6(TNzxfe@#xv>?w}srq?`qH(Z(*lf(uiCb!hRg2#?jKvG7h z&l|df3ivK%`ThtHk>I8NC6?r*OM;>#FjR#v@;d9PN@=e&drgXgs?^6twZ`4{d-PG)KRcjDPN$WbAOO)HWuJ3St#?ed z7emZRpy+Fx=fyPM3%R-3-l0NO*?y7uYyD^Y>a%jGpUT}WKJ|_VhiXm2VlI*3ld~oK z`-N-koYduz2gM`FNtArRwHt$w*z=6CePB3gTQ&?y`8F5t3>wN9@!n&f0Pb4B`|qbf zpG+s&;JP_T4e-KzK+6l4u}SpH#zkQ#bw}WvgNEXluq>a5cXLahlWqSlrzY7E+rNg% z@@~^nd;j93n{K^~A@ro02r(9(xjIc^>v$L};iU2Ylt*mv7d1QV#0YzLqstx@mp?-kIArH$k~WM~K3z{u zfwGhxjen9fadbJCJZNP0owN*thnFXy$!*1Gd4JkNX1 zaBH)Me0N1<_vQ@_=SW`;M41EZUu{%KZN{IfIp_`D!k^^j?F5w z6@#eI5C3ZVmP=(3q;GNb==_)b^#JQ2hA{klh5x>4ejvMBICB2;@`*ZGve6-=;GbDo z*SBRuh?l4N@fmep_8`r@nhI~(P7q0HNxWZqUl-`ys`J#-lM_NVO_9Cd81o<3dO$<6 zx8%-M`TnFEl61sbfFDTQ{QOiZM>>op`P}a2zqYTMl^%LNA||;jk5#3CCCUGZafMBs@pGF za0Sq#k1~Me=e_6UiGEA%%P2kU9@^gg#9z8sI1qH69WYmh#{TzIn~hSbC>7U-16nmI zMu8oO)So+!PX3n6AtT7E<24t59?Z1h?=ym6rf%Dyk*4)fqS|$>vkx=Ozg;2Im`7lI z5MX1d#?82m*m~{>u0_{aYz_F44aDvn<4-OJiMR5=n+iUbrJ+(kt?0_A{fYWW<&!We z{;`OP;UYNfuF)w6MiNmA;->!qh99^z%_y%CzwGsvSHq9>2;qgo9sD-Tiq-2 zfD^X`4V;&qKmG8kPeVOUlWmUKa=AB1GImn=k)}R4BaEcHrks*}d0HTLT zxl?tR|GQnT=|gN1pQ$h2nQ2zpCS7ZcowKjReA{7X(R6cye8=kY(OzY!WOe3Hhh6H@ z9O2>;Z+a}LsWbPSzmZ{h*qY>bBsLJx`qGf=Xv*twe3tgQW2X6P*7?DY7-6&cMa026 z9;=g|u-Q%MLb1(9dq$!aPAURFVoZ?R{i1EYWU;BO?>|@&FE{P*wrRYBdsD||Eve27 z&aEuQ>oR?YevJMBGpgiw`FV}F$go7PYkZJxE7900Zn0bK-y*$M4^cg7-3&US{kH-s zt(S@2r*1-|nH}K;wPvABN8C;lcIb~wdEo06vYS_0`A#*&#UYyk;=bu%WIAL5Fg)(K z?M~h%({2|YJD0IIJS9NBe7d*$>nxh}1r5CwQ%~i_&DEJkG`FLRLU5k5B$Zc9JX2O+ zqC`bJ1t%~y+XSd)7%jnmDhB>GU1MIe6-Q$c5iV?uRd-s ze0s$jWo1k#>q=6n19gBjoRd8KcZ=)d;wG?u@W$L6nbZ}*!1d}R@)e>@{R`lCeI+-v4gx9Rxy%Eq!Yw5VvQ&R`t6d4iZy)WuZ?sXb*haV!9L4nZ`> zZ`X*76-yok4bA)ljiiW7Ef>7hR-2N+?6V}MDhjwj&JQp{rphf1gjr36UpsoRy`&!wq&SrcV@#|bs()g(yO^45U4`}k-P(r;)c?jfCrsbFl{l4@t0&-ct9 zeyu7=3BcS}E@DcW;LOT1%%aOMT;p9E{st=K$B}0|a=a&7bYf%9Rmgor+<8&Cb{A-` zyH+vaaNJe2*^J)d8{Wyk4i_?A7xmn-=mzhr`7*U7mPdj#7&D=Y^1L;?; z&2fxW-&;!dOH6aa)26p_3lMXFM5NG$JLjy z`a@LnYCvO%mC#~S&E_gzuNddQZ99x5OEpNe8P&Qgj$||=;CQ^@P5MmmOcBW8hD>616=k zfo&S=;EHNk#1#rzb&s?#XXm8*&3n?-uZvDc-3yf1%txsdldbI&Mou`_g%KwvA@oJ;i}szGMZZX$;aI#PnvgsORl24X@eC zcdapck*9wP>Q1U<)Y}vk#pa(_>-N}nJHxYb#%f?5t?yq|ZxemLF*Q$DMJ{k1Jsl?x z(iFBs7Fi)>HcI9{ZaU9i*-hM3*3`AA^so~hTJbbgpIhN(PS5^4VGy+Kx2-0$JPbU! zGv_H1?eau1K#^cp#DC;@9w5GaH~VC{R^Fg_-5D7jB2R4|ByE279y+Etjdh?7*}v9= z;l#NtEd4|qj#Gu`YC+EW$PP*%i9z=mW%5z>E^NCqNDB%ctG_YE^Xb>t@bLY635ZwI z)uP$Tk(~OnqAfYvy{FBNtL36|>aGPjQT+GUov&!kzCS5#m~w_IeVhAaw}FnYNRqB4 z6d{YE=7FE>zPB#hq(hS|@g&rSRo`h4!e_#S72Z^A90sC3yBOJumrJ;V0&de+qxFne$^_G4Ia=%^NHi#ugSAb8 zN|-z2P?)E)teb^7;?=URY9;pLW|^Z{dS^TEl;0dBJQAtv2-t$>hMR z@*k0Q7f5B|4SCs&nuuVH2q*8Sjhs3p(Nz3?+Xj~SoZ5U~xlI<=m_1I18Vc=rQ~qV1 z?9`yhStQP-=>9S-z=OpRBiUc0b?V4O&&9v*H*}9a+)t$MfGU0GX4_kh-Wn|(7L@D; zk-?X2OnGlYzK(u&p~P@wc&rC9UDdvyn$FHG^d%PY5#eKj#VzQLtKYngsvtfT`Mg{4 zKJ2o)CzH!02hcnFU7n!(#KKLjnw)@Go)C?Ik+bX9#nJ2544JSShoen^U0eA&4kH zdKi1!bE1wo$W9M$F9|FzbUDub3|vkE{ev(lvBDTPkAnBH^6n6y`MC`SFW@5&I{?`I z>7MH;ahmn10c(K+K zN?iAmPeL`R@0h-l?-YO7F-b<~S?PCPdzXR0bCI=f+GqH_6H?8>crYdTm19z&z;9J% zBR;{UuQMZ%I^{Jppw<^uu+|q49)x?P@(2@tNtB?-G{Q+30!My~(?ySX(fda#E!62F z-6**JF`3u!S)s|BlsRmND6~|gTb>oY*^#%?>4@X8>rhMZ?O0MA>u6L1tCn-dY0k<@ zsACK3F$x6zrk!jsB_Yq~n_zOs%e#*j$-u@{g0?cb zT@*xg)+Y76*NB1PJSM+=zq5TL-uDfdI@fzu^JU=Myx$rrMk77&tXtv)j>xbN&Ajwb z%cUfbYcCg-Uf-Tf25b;SrY_({5zOFs*W!uxrLP5rzmP@F?yUjg5T3wEponJk&+Y1k zB;61eXx7!NlP~K7r|Rhq7-8#myErVgWBZIsKC^qOc5~a|hiHf@H4##`)NuQ5Mg9OG zh?^M#X}p`UTiEU0m+d@0w?^Co@eHIb@B1O5>(r0`0R*h;&f!8zE)4$S9VR`*1_Cw&TYNLWEO!2w3;Aulvzr$T}gHmZr@={$r9`ZZ7gA!nE3`XAL~O$22ZCDu_EBF=!$*DIksqBx#D-Zi6_l`c08 zisto9-CaL}8XTLVPX7#)~=~1IJidTK~r~8>&70*l*wJAu0AbtD+@VfE63gx;% zVy`Zoom;QH2NTs6q+Qv304*B-W(rLH;1^zwrOyJ%`k=*(QJh4=V-YQvh1us(Qaq@J zNP(#1$dU1ls$UEOzZ~OEU(@>0^4>q?U6k*1H|JSli79z0s;6-jpjpJ(R@8btuhla0 z4@)2jOzIFbQav+PpLL!Y`#o>!K$D*OYry9lrL*F+MCL+JolE!#cuQr zc9keh*VFTS-2(-6=u#H}xa@qW+P}9Qf$Z6y)6@IC3f$Gw;LW<84u_{49vU{@={R#; zJFnisM4m?-z*=<{h?!=ZRm8b7XwTwXDOI5}k)SF6Z})%U zNCEu2fp)v%2HA&ure^yTC*u3})U3WYxGV=*MeWpDXX*xjZmR^fpx=mCT4xsf6*O$w z7C=sarSuz_`Y`cqeP17`teJ)V7#P+C@wjnr)l@kISGd}?6cS4PTUKStS2ju6PndaJ z>}=Kxng1kN^am*@U9+A?t5h2uZjtcn9G8An84s6Npw+W-?oa zelpSM$#duNk=p)}QwvZ%42GpbM&krp&6%sDLwb9hMkjh7U~v*;HZ{-gd@ceF6I$ja zPX@f<=K}{#m=fI7+TN*!8#GzgB?w9A(lHHrb$2aDvenGJ0Gq$J_z#$*O@%Ip({EMO zkN)1=u3J~5p*R=OrfB~UYKRQd^EUQfgG6V@ir4sv`}l817r0ws8qdR#n)A(A?`|L2 ze5+;KCmAB==VS~2C_ptoP;iz;%=&WZNT2MCTdtK~Wll~)h>EIj1K+NmBh1l!2W5v*wa?X#X}-z8nm?s)ClFlk(wCHcCF%sWlvP`->5~ z@Z-zm&`iE^@+cJ&^yM0|N=eg=#PBUZa$BQ(|4wjxo}BLnuiukYh3GG8L68RwWTnO$ z@BBTZS(4Y99-#I9dn|}$KRKlTCYp-akW!hl(2=lU(4=!;X2lB_@(`_4t2L&n81y#@ z`WL-+$%931p`RtI;6AV5+pnHq93Z*i<*0u#Jkf*y5*dl^q5c;qKq7uF{_m1dX` literal 0 HcmV?d00001 diff --git a/data_ingestion/image/athena-iam-3.PNG b/data_ingestion/image/athena-iam-3.PNG new file mode 100644 index 0000000000000000000000000000000000000000..681333b5e24e6aaa71311970f7138b930b3ce5e8 GIT binary patch literal 35435 zcmd43cT`hbw>IuM9>LDB(m_QRxZn7VZ;WsKfgAVUYpprgEYEz_+JrpRRy}o`?f9WX zhfb+Ixc~Ujq2G{)4*jxqlmYmkFFFPuz(2pZKUTeWsHBTy4*2Ca+q+tK4;?CvIpRy!Z~Xf6`VY;QB)S;;E%~>p6MtTd9YZnyd2Mwz zT=>uH)~hVwKd;}awTmOWyxP{#!L4sEN3xB}5xQD$1ryqCd8f8C_7b&=L{PHF?FR8Y z)>h3{$t3mQHMnOYMuao!`N;T9&ZP?b-mUgq)$&tTXZ6ij( zE`3(0*}?m#(nMPMIyZ*JO;uDPxXH|1SEPKRB=5Pu{huUb{v)OCFi(8^MVx zvCXHsep~R9>>nP8I@R1OF2Us4+K_+hb6wD>#7EwpZ>7qeY<;t})|*T-#<+-sofBM& z4+1`mA|pN9wp-kzYtJemJ%VR^=1)JDd`_gk6&8}Qy0o#AtAf^qnNj>eNwl|pmN`?$RT|6~Y zVYrUlZXpI*t4e67_63GGli&Dee*=vV8I~K$ZsDX~2q9u+CEo?HZc!dqSELYYtM0P( zt+7e*(X~>_|c3E z0!_kk%_DiDN3fAvRy*_M>+2uxdNgbG7K>6h-aw zi?lYnfZ`-jIML*Q+-mqMkWl-Jgil2wY@C`9|HA5wwdyl;^si3cu_rCU5y}O-`{rUHblf(Wd3*MYRd)&DsO$=U1b4xr=5U0m^Ihti7DJyK9wdv9=p3T|-v%;P~|9$?}?U2X*j_NB%nI+Z)<>d7LG&)xE?x z;|0{4=F8l&cz(>>YAbHlA+DM02 zv{tOwl<~-YJ&pfn96UxG20;q2{(ldU!(}D8+Eb;o5>R}druLmHSQES_3uPszgMm2~rUy1V3{T;jVAlpK0JK@zYKdOIg zsxjWPeBqU(Xtt7K@eAvsnpj2L_nK3fIZwh=s%W%lQkqljb%dt+6k1|zekqyPyrnla zf-~wj`SN*ghjPYw9gP@L_VrXl+s|-4dxcsjZX4=~z`B_@ChN?4dxK!hv`6X8?Vpp3 zcpL{fevG>RTWR0d*HXE6tYPU@E$;TQ1BN!O(?NCLDNWtb@Vnp74P+!T|gk(20|oZDC`i^casK{p-F&Es-ayP8u=C&VLa>#$d8e+i4qt zV@TKhP{hKLAae!v>)Ppy;UV!D#u4!m?u;&QRmt7!it;+cUQ~i)SWHzd@z}EJrPYNK z8L!;S7Tg6l?eni~B}%4RQB%!(W4&_^W`3woN|^@$&G6#-N02O&v%kN-&wq95Av&hD=O5rVmtoW0fB$lP40|*6LN-Rvs*is?tOG4{k=56;vX64l>re zIcjB>ry;@+F!**`>C8_hNItNiDkQotR~G*eqkXF}T>4^PixJ>fZ1pon;aWI9BLoZ1 zh`I)KEYBv4$NEJZ_Qo(&FUQOsYwAu6J)UwVgavKTwNaq6Es1)`vdFrEGJJ*glkcSv z7FG%o2`cg#at6sQb4p{q)OL^L=ZnuELo0{Y)JEh)K%Sc4E0Aa96mc9K(tjxYX-v^w z8?p{8%9~+6OVbS=42KI^Ro zwFlfl6c+#T)u_sBN<+4N!&8kZOTN+u_IeO^b^&KfxVHNe!r@n#y6p}fgnUkkt*XuY zyx7SdDKf9XJ2O1QWk)(`ITqBsM2Fcx`-6=)p5zI|TTR?*TMzTM=n(zv##k%dvT+>R z8y=J3xAgTEwUXs-zl-x+~(${&?F^I;ZL!gVx}!2h3p33Zyj+Gi&C4E zO0^mII?%AyXghGDI`WN-F`;&#{8^cVQH$FK6PdjsKa%}ms&^uyzifyFWS9QYb3Sv1 z_T553w$(2abeqQYHPB~3$^UT_0_c2)!Tn}dE3^ZAh{K{2%~?Lm4{3O8M>P?^hCg^+ z>Ntc-RK}h2)HYWNYO66G%M>kubu313Ib_Kn#%_rDl8lMk&}{OZ?kR_jbK^30Yfm+0 znQv|mzvKd+V`g|AQ$4h~AOW3uS2+O=myXzy&DCfO5YkQ*!oK4tn~D3TUlR(blAa~S zSE=&;ob;)OgBxPd;`8#mQMHa%Z@1_`h;damxuHgNQ=6)mA-cC#*0uzbw69wr)o*T& z!bLLSW}dw(9k97!+s^9twWWo(G>m3kbV;;`W0I}hG@D~1BH{$fY15)nGoV}p>LJhT zshKy265j4tb63Sfe|-32RCN|*IM`Q4MbBPOt{!G(E?$0E?eVJ0`%Rzxz~xN;XDYr7 zIV9`M#{N<3pp)!;>k1R|8O(JNP6&x3BYA|R9wg)huS(bJ({l>bLiGp%5A7jBl4}WJ zjL_}x-I6sj312o@o}!=zEe3q5geL3aeM{yX+6kKh{a1>|H+87$>+?Rn62l@hAc0l6 zpWvG1(|HNxOT!&?5tk0ot%Q4lqp8kScT=6%t`XU{H)pR<9Eg#O>+`k-dEnowyT`NiD#d>NMqB|Ek`A+H1sI|O!Md|{Cv6?~? z9tL+{r+g}XmQKdj6|#Sri$6W5HF#ec!?%;I&c;j!`dsTisn z#&XBHEjp82*;QcZ%ZBQA(9L&WbVPW}B6K6Aa2b;|hY!rNPb8nAn(^TJ(4jm@5ug-5 zXqRIDuavL4Uij?Cwyh2 zjq>wm;^AN@e2zw^Hh>nGYC^s89L0GHa0P9+9N!6Mp*SQ`Ve9+lzKmY7CPrF*`QX7P z&tfjnd!N_s-WaZN%~}|))KJ*lc7N3svIfJFPaizRz-zn;bXP2NBmyxoU_hJAidt}n z9!)@B2qLWO_GT`4I?l+pgb3 zW^S%#AG|H{JOp|MZZA2*S`GamhSbhbaN@psg=D6T{83OGeqt>~}7O{9QJ2Qscn+o~-m{ zfWE#s%s5?KJsy*r;cw!wsp;nd78>aC%)iJ? zYB;JhUT&FB8*yne?_LSDpYO|e+uK>+sx=0e5`XGKVS7bs>m5a76njOQl{XkLxp<+ML7K zRK`PYzooRz+wJbAR&Ad19d7Y8wJN&3n8QpR`0BOt*8=9pF#}moY>#ya8_1+^W2D+8 zN)RyxyGA(~CXiu3A3+ag2nK_{>36TWNqXXgOVJr8Ui+#qR_YAYxE)p=Mn)O3qHqOK zi+1)NbHeIrGVUnx+1c55Y_0kihRT)5QwdF5OZ7_V z=j!|!*Zg}BD-Y+d z?)*`rUrKoPC(!j7YcMQfSRsBZol-YT?};&mO+@2{?^PZ<{m6|O0m2E1S5SL zbjyj+6xKJRRZRXg_H5qr5zXvHVMTkQPe3NyW$aDPz>+klG+H*waLlYzXn{hTNj-09NlrVzgE6^Z4Kuo0 z#WP#E{;_CxmZxdFW>HA$Zzs)v#Y~!x)!D?97mU?=&j>boFONT@<;5guob#dkbvPoH zS7*345Wiz?UlPi89VFv|gtOQO`Q#*a>BgcYkX#`5VSm46td3%(R?<7^KCRW&UGb$t zKZhObirfUztLFh>Psr4AP8`H)u2&FEWh~HqRKE z(ufd7n7{Cm9vHW+!ogzFdrv4tIvCdDSm=dmd@b${AA8 z>-9<|^l1q%kw*$k$3(sx7@=RS<21@z?CYD5aONFqerYc8C@WY%(W6}@Tv(7DIztP%oKX!6S$-CZz#^j~2jLS#p&3k=wa>gd3g!s!K;7}s% zsZJQ!09@E-cbBO?*gD=+%Gp=r!e2ouQQh}U|HYq5UKkXmCpm=M>POew<CVf3i#j?39;igf?{+?j z9gwN#Frz?&;Wd}CmQ}Ht*9_sf(t6FndD@s4D?(x9{u;>J!?!}2F%7pU(kbq8d$Ow2 zA&^&?sn`~9h+*oJj?#RmOh~4>yW9!<-$t>P+zr!SgFgyu3s|q8XzP@o;1Eb>7Ub^C zLaWX&$CATMrbip!O4;;q`#EGk`X^<&Q|E7o%%#)*+yIr4dv zRT{ZR03n@y@L$MjT;NB6?M31ZIvnC6Cf=i7SDjJ~UG#oHr1TBg(DqT1WFtNGzh|#0 zd}XsCMT%S6Ic|+K5YvE`;DULY6zS_@x_tB>$@nbj zehf*{Z*%P>dcqTD?oYF54jO9ONJ79eTg9Cz51e3}GHzcDWombeZFEQaMq`Dw;g8(6 zH&)-0vU#HnbGw+Jj;Eq~FizDt_qACupH@=>YQ`!|vUB_OS5bEk?A+HfzFi5-Mpip4@dAjIRQw}sfUz662lzJ>Yhr+-BhbY*yM+!?0uY7l_@ATs2 z_5O_^O@Q+TnAk%br}~T-IsXlAzLBrA1B{mEw+n`??ePK-w6B8`<+R(4R69}(qQ#fB%XD!tc(cNwqlRUJXF+Z$sydT|dbIc+c^ zlE?}{S$T{n3$U4NiE!orY?<=dfHkaKWv7z?I)E8#QH?QL?pbuv;W~EGBFm;?5BIZTWi%4 z!%Kx>jAXnVjaxTXwXc*b>|Fa|rZjJ1ywjsv<(g$Jk<#?|se^qgY*j7Qzo{%&o6<@tG?2O%jKBweHODW!TP}m)G&aIA!RI^?z>>n3a z2tq@NVFSgeL}#VJjd?1ua{YCsOAIwR#SUZ~L3j>PLN(kNiJ;-(Og(9ThuRtCI3Daq z-7<1(y9ql#qqyhe*qHt1087ex%)WKf)Dwd;f;%rr7J1xx&iNFSMUXY0@x{w$X-2Sd z-U6LUeCk&vL;lR@#agV9?dd~EHtZ6IHHg)P<4&hLmzbjGJp^ZpO=s+U`<|`NKm1mC zo6QqWv(Juf&K|KIu&6Rj)E#0-(D(Kjsqh=$)ElqLUvnAbt$H!zoLV&`K}uP$DX8EC4l+91c-L%Z^RhYZtpzl~E_z8fcbmzj0+#Lecku!z z#a|%NlL9N4@ifQB&ZCyJM>@~_(jT_;^p-YygbSt{KCQJ^maJ(YVf19VlBg!w#RuiE ze^1*M2K!F!1oMF|H?Fs$J18FvmDH@e$ONVeGc6%R<9M(Xn5O;@^zt5bzc}o`;D(z5 znw%a0RTux?fUd{(o}Ypq_Qr{y>vwO~>>p^`7B)w=xVT@w#FHvrCKR@w*sa~y9?y{Z zJ8S1LKO)j!e$>5T@8nb&Fi@VG%b4kYMSfzf;h^^a<8z2D%o^Z@`?KDWDqW{khCW&Z zIz3nzt51Beu%0)Q3jev)DRQgrF)xR#w_f5)6s|3n7r$6NQ5A7i^19sOPzy(S}FIfX4 zEW&L?|@8T_-9ewUQ~ z%xhuLf?dr{*5Kr;yv1zymQR-||Jqr9CY0Z=eamBs<<|dmG{)Z>=ZF61ZT|mL#^M98 zhR$QMe=;LSFY5)plLbolNN&Z*pKJ+0nW+DcGTcu-x4k+P9^Sta{ov}^ke~m2oxB3v zyiwPZ=-u7g_HXUP3}7dLT;P9wWrsx+S)gh^b>c5Z=I90CaK!uz((Av*?2G!cQNYCl z{+qSgxdA)^RPD0RwLHIfZ~nE*Z_4zW!y95uY%>4WeCjx+3ZdOoVgH!){VS#1JoIT7 zv@joVLiBO?wSn2vi-PIa=k^G05rRPRG&PO-}B?Y$=SxJ8%ffa51l6||E(GN z3~vfq(%(Tb-@9QLe3~#=?%hZX? zvy}mR{&dHpr&2l-vVM;4eRe&>T#zaST7c`XAqoOL#2vAK=`)23zfym@Yir`ttCM!? zhd(_zztdX-p3;Z<)4MGY0n`ofO)Ec|sV}aa4bkqyiLEL20gpXoU8^g=;UsVt`)leT zxIbHYo3}DTsc$PCqP>F33Nfo(`0m(EyI4 zyjy*~o*rKZirsHaacQ=#^N%WkHJ#kQtN0yx?sA`^$rYy4tXGuo)jfhQwhPvO=t7?zW(pe z(Ru+^C{J$G`)tTp^_CK2={g&g2IsW?3;ZrA++mRULhVrwF{D&3sDwMq1)9(GG2*=* z`qcNDK}G71#UVMYOt5oWt<3W3>noVI{XCvghziEo2OKA(g$02Pj&=a-Mn5QWG`qsI zLC}mzgM-K2)Ci0{VL``c94;ll%5E{jx|7S@vg*?x#a0Zqid#_tPBGf(-c5=|VPT=h z#hXT#nsydl@Cl~Q(Q2{R+sGJ2;X;Grm?*{FK|z255xMmP8T;U>frJqBlcd;9Nugt% zOq!9;bl@LpXGDEQaVOme&e<|tD;Dm9pv{(4e7xKrD$M3zv zatD6*`6xNtb6)Zn_o>MwLlylw%`e?hlfz1U6|fI}YVxC#&{h0etGYyS?}AQYn$Io^ zf8`7_-JzXemtXxFznV??dQPX6f&*vd0qZ>?g#y-_L15JT3cy`9*F1mn$YXc%owU)E zWdQVv|J7DW^=;@$_iKhsgM`?G6Uhpxjra&DqvOLYk~VG;Y=VX!rsu-b?%0e>p06+* zv1d~r_af+iWV+(E9&S4(SSfT<4R9@Ego&e103GA}o%TQV` zJsDa?lw57x-6Sr>0Fwzkv2^>)y?FgWBXvmkE6sQ#-LDcBCl7y1b55~=ur_L_v((Rr%-6wI=9xi3fpW%$uC@{!)1%ToeTe9R7qPz$QqKSCuhq9u`f zA)e=dhMkAU_z4;gUtS%PK_yTZdJWmv7e{!*E(*rDPN81618AwkobAw|Co7-)c<<#+ zw&T!k3!KyZlrYDv^}zvKcW{&x-bUz zHxCI)sy_|Mn2C!xK6Szjuwx(`lrib?k%|eg-Nq?2eQEDpUH`V*yA@y|SK?5E!Tfvt zJxUXg3!6`;3S?y2d{SGwdWU8{H!lhq-PJ7TD1D;2l}#a}x&)mp#-kJK-M^&CdaC9VQDDF$Ki-%kfs}DDH-m;}IFwsezeLreKjK z)M{(h2>G0*qRtU=&5m6cuds{`Ih#)2Ote1s^zf*0kFk6DbOp^!`DUawg*IQOc*|dN zsjgNVsamUIrk{^Xnyo~{;2hlC+*WHwC7!^tn$x*${Rcx?jAaGeG|MiCnaeKPy(>jT_a1MFyT)$n555cg*}cmK!m8h^tc zIiCR5kwjIvvCYzmnT%79n_7{m+yrDGMnm{h%Ubf`z#Fr3=Oq=lSI>y6B}NhX;t>r)aK5Rc7E_n* zEdi0@FY<95S=X1T-&nvj%{o|4Qf^Jd_uYu;wk-kwt@r2|0x2lj#qC;(feNDZ9x(M! zCkK}M6NOrUB&s0?}{BdMH%UFx6C#oaNd_doMguIY(_k#r3 zM7VJ+#>Fz5^P~S3b#YfLA|hg`@0UnhSp2lU1#lAb&Yjxz^rM%p_(e>94dh93NxA;w zN30d>Te6y3&Kv%GX8ngv0@dc3Rl{PqjFRA(>d3(=lveqLChp!Ft~LSu(ZR;hvdLe2pu zSmyhu#KAwPyou#&t=F^U1BG*Fy(wbWp)QvBnei80SoQ5EL7v;&agOF2HJWwgi>y=q zGF5l4?<_eDQKC&zli#_QO9^8BpWjnE3K@Vl|+#eI$e#_ua#}G*Oxg z7HO2UdWK)B{xCx_UD6$cc+Jx?~7-qmJBldn})?RTSCvU?$wO2#RWz z?-cABG(}Im&1D-C{hE@{%2)jXqB3@+dQ?`0jV+@bjg_t+|?kkohQv7YjJYhqE z*R%k$yz}K)0qrB}xpqjI)lPMYyd52B-}076##m7~_}jKcJ9w~qp=%FTwQe-cB-_DW zcb0#BT7i)Wlps{Kh{TTKj_$0w<~+8?VxCspvDINa;pVPe;-!r6CjaUklt0EUfYo)W z%-P4Lr|KmhZZ{?*T)ZG;RAK_C&%o9;7b0d1osL!=&8LZWmTfIJO{Ay1%p>(zKjv!k z3?E(9%#Gbji>Y45ZgMkYOx!y1i1>t-k*l*uy;KaOZZFo@!tKDEafZufITIB7C-(S; z`8V|VmHlXm4y>-CT0Z7aqebK!R+UrLWYD#t)V0+tn!QDg+PTFu`XO9k=ab3cF6b)t zYx{mK!4~nZ$77zy-mFf`9^^76>>AklDKBC2YK<*1^ehg5e|~*CD2#^J%CafFEr{Aa zh9j~_cC9@VE_v>Qk0>iSHr=0|S3|{jxI-`BgoI z#hNEl%G(7~K@)a6wJ1Z6XHeVC`#zLEZhy{RB(1E{&tX60Yj0e>57bu;zbX~ZBjog;xfUR}*HCE&kPd;Da}S4zut6^%mvNVD#W z5^oI6-!sG8HC1%{vZTMM5#>4?aOk=Yd?B$O*fboI;ZqYsDk$=BH#TX=QYhb;Ghzr^ zY@?zLmf?s8ST`;H;5^qL`r5^RkwEghHeRRK0N&(o7R#K^{cPbDc&dLv>t?IF8+{q- zS`70UrA3g#N{t1U=v>Kwo1IZ>8*cLgsE%QldCv#UL~U?8WUVbrxMo*OdlGO zgp2plYLJ}#_Db`g)MEpUkSYf7#oe&}3Onp*w(ePR6b|`XH~X_xEk{Ic<=yL&ioBin z>h#R=>CjgRq+UGIpI)ypjl&>aM*5HD1k&2jv5BelQtv$!7Js&r3+#o-f#G#i6f>a_ zT;Njq{K)}`Di-FZS+?P(A~9^6V$9|~66+vF!%u0Jvgitq%W2qG*DExC4I9ZIZ51aO zeen+q$)QoUUXNI6AtWez0v7?`2JE^I6CR2}w@EEy*}A_=BCEDPbEWMjv>3D_k>Xtv ziC+ylc|f{DC|tT=89IHklz4=bgPVTrf0FMY!i3Nu+fSBd2?llekZjdIUBDkx6tY-# z=7J$A(V{Pp2GJ4k%#a!LAJTGv31;pyWOmO@#Y;2TL`;K}Vz!A~0r9$6(D zRV;)w2L)x~jf%y)F49?!X6%#m;Z6>kso`#fy4Yd8(7x}h9m4Uu#n{RA(-J`8kd4+% zyVWBVGAUu*9@mYKEr9zFSiR?dy&o<-(GeOxaxHgwnq*qBq+&d|TLi%ayLBB`eYR?D z-<84g1=Q&0Lg@Kj4nGc*8o%E=tWW2Zo=&0NZ(<3A!blsYn>v|ts64fK(gm?+j^EZQ z+M~DF-ZvtjPYF0}OcuyGT6D9X43EG`DI`7d*-FW;uBKa8xVTMJ`np+n( zaDj)BE3)q??Cqi*mhBg77YkmFuo&hFluM^%t?X(D+43#!s`E)U*k|V^=I-wHhElir zpZb40=^S7W36t1$3^#3Isl_kZ*bP0q-&LccPqv@A&%GLD zYa+}laX(2tKL|#z82+RNKD{_heZjU?>Y44bn-?7im9ZpD)^79;SJ=fu*o^BIt42oi zfl6%X^*)}sY9slRDCp(t&}@0gRB#?2k{mHJub~k@d(^T^C&O<8@#0ebv;DH-Ymo|I zo_C`VmRIDFZ=9zO*n+?A!7mrm*&yTNtYA@c&$}~V6JV(L;4N*yDvU1xwRzT!DPrUL z_zBKiZmXYD$k{;G4}I1)P1^p}tWQu8*V@Ltyx&L%cWdauHctO1X8_j)s+;uI#EXXx z-NEmd-Dg9LBjl9m=Ohw+0N#qxx9i|$57FTo0yn+i4quyLErXu72FhfP|Jtnx@^}8b zz+ONJY_*MdDVP2MPrEXmC5fnCqT6$&5Zxu)H|Rv|0bfz=JjO3CFaOVHwcI(c)2X!5 z0eWuVFN7k}U>|3o?nsMf1{v4MS6gbp;g&xh_Ym_zhLnF1889YBK%HODPL-I~Z>M^I ze1SRK;A-@T?rHWC?Nf-t`#fFcKHK(49{Kw{i@DBCfC*m^Ra+~62~cgBYrO}A5}S9r zGQdZbU1?mp>j~^KEsj(RdwlvYbRHH@C&&PzM4i(SsHFbcRzJu644G&CWCjlyy09?% zJC*;S_Fje3*$3c+{wjyYKf9n`uYQpM?g2!~U##3i#p86UNesEhn?t$t-**0!x%n67 zjphEn#R>28T7S;ytAF@^KN-5nWA(WIi`*t)2!Dz6`@QTF1@r@88>A^ozF|7in(6kH z!jgFh#`XKXJdZi=8+yTdz?V#rmW9Lm4n6g^as?INS_&?E?@LpDwwCK$$lYo39YEr2 z)t%F?=idTZcQ{Z6cm|y2;L~)DE^tzOD<$c{$txL*z*{%v=YdWcp|DfkM(C|{Iu-i0 z=zdAdeaqiza=(Y%x14*_q?9T9K?-UAsDhKu%i`Y^A+z@14CMlzFBC&-l0qzoSjJ21|o0ILH;zH!v=ME-qcmo5*~~32fvO?iR+6 zI@TBTQr2xtoDf!i2kt)oZN@CBtt~JOa~O!%K6ADH!S{f3g1AEwl?@a7Ce)!m4b{BL z5REwucvb=z<@WgkCBU)lwf<+7H3u_|{4Z6uZ^rI`T9LPKtNy(7uplZBO%#o0xv#(k zDB1|xx4#a)FrU@aCW6z1Z!*rice)yPK>3?LT{R3-%M}~TD3ifR_oZHUx_TZ~kn}a) zFYXiEs%BAVj(?r`1)%g3>Cypq-?+ECMSW3FIcuJVP2he|ny`L*KWrF>m_>zry|n6c z2{4u3to^6QrUVv0a{$i>cl{@o%mcheL!iNlS|8WQRK>UME=`GM6Ym-6r|En=vgGP( z!#AZB1*Gdd$8Nco-wr63GhH44?-e;pq3f?4(978k4a8Dc^P5-K>yn!k+*aoa)_z#j5Oj}QHYHQgAnNO?ZAwJ-$x)!dfz$w-kn1C91gQg(8`KK8Fsk= zvOheP-#~Po1Bm#;kAS``mpQdgz1nJhPTz#!zx<~e0eR~mW^~iHDOUoyhI%iq<=4Yf z=QF3-#Idg%$@A!XF7J!huy_x^@G%xQiR5k>0V^0jL0myte+xKl4W3Kj0rd9ez^j%> z*^Fr2;_03+cMtlCRq=KTC$Xavsq@D%(n9k~0#TVXnVg{ru|llbI6t>L`Mvbb@@@76 zW9q3e@OWstxQoQRp8Tb-G1}*bBriV#hL-weYahbD61Fhg&#JHRj zqINm0%*zk8`_PB@_y|#)8uduX)py@M3gL-<@~82sN;tK}p0fe+ow^EceoQ5g_-E?a zZY~!qx-GgPhn$ZD@AQz-!H7IeZq*blJsi^+Q7u zMroAh*cG({J6{71OVVt!&f5DLkpdFse-@Cir_d`>x$18d81Snq5wJWAsDvIOLV+l# z7Ff4<43(JmEJ}D!^ysjCiQ-d- zk4zoxdrwRQ_fmqL9k+L4M#yNmGn<+J&?~2v5q2c2($UKj&-8|TL+;_)Rw$RepkFG5 zcG3_tVR* zUoXcg|4b2vE-r|FHP~S1aB^$iQ5UyFhsKH2@u%RTz)ce+=`N}V@um5)cHv(0_3F`_ z?r&rH((5mm`-qyju*N3%>j>@I%M_|sxVT@b1fi%!kUL^`H5xiKoKX(@$fXM9!FMoG z!-j;&gc}xh)s3jmi19Z^h^MHAHH~vqIZHcbt}VNIt|Lq9SDhN^DZ#_idO2{iBvw|w zMmrgIcyAK|Qj8$_uGJ;Eb4E3m1mT#GLT7;b#W*AQ%}}hV5P@tl>Jqff;|^JvQO??v zSy^(4RR}hazlif(VDPqrxStt7nF&Y$^rY%7 zv>#ruo%F?6Gxv&BCkmdY<#B37Ae{TZXlgf zGoQUU315S2=vu{&b)BK@_2#==l8tFwQ{TmH{T0ZmZo@f)JIdPH@LDrE208Bp>yKgZ z9D!)Q+YW8&z8zZoNXHI_M@Xyq*Hcs!jRv-(Jl>zJmZ?5PX#T@kDrrgX>8u16`0&W< z8#}mk>)k9Q8YF(vIZU$leI2QdQi|pQwm=c>;~HPhdF5Xtd`LJJ4O$6bro+>1$y1qb zdgC*h-SQThG!jFMdsNbL`2~F><-C1hmgN$URx+B#KL~pD(BQ*9blB%=Dwy<&m6s>e zsEk)Pi_8Jgr}dUq-q?<}YAh9nVk8pH z)Us1Z2PcP=(wC>ZR!2m9BU5O-HUl}3sP4Dw*<6EU#7(rWj#KzabpR^4h~AQ8S%)OV zU?wEiHFEA&-Y5e?w-g&0b~<9q6fs@{4Xup|Hw8g9srwl4Cwxrgx~Wqcqbl6hLpv<9 zJ`9&=dcx&i&Yr@Z$c~=KZN6kYU%o@=j4NllXOpClu$@dMVs|EQ zX4r16PDeG2UvPl#G-oYd>)*$SOVifiC?%h5#lUS^P(Hr3cIh{?DgIoEzAJ8}n}T^> zBZYYJgJ_*ejb)-XjA8=qobj=HIwReUG2Tsfd>-8Mqe`z86p?n;qp55exhsC}>S43~ zosZ#Lm_fW!26)*YBeHdTT$r#V41gp@{61a!ro@7^1lI`Ow<@ zPSNy}w4C^ah}s_ykZ)3@IcwYPz_U5XH`Er#a&g6KAk{TR)*jbH*6g+B|2icgLOaWC zubCxW(7ozYg}AfZEs&~|V>s6AWPKnUXfEJ=^4_K6d)Ss_?pi61m(URYj)POHDU{Vapx;%WwY3h!H(Lp`)T%atpr$OUiLeJF7XzVpkut&J5(- z1h>CX9$TMNl<14c0xcgzg~mm#HG!{J{W{p8@wAE3uTZCCY2_b#UNic@r*kO7XXj$e z)2R*bimJ_lPr9nEb#1gaF>Gp6`mfAP&I!1K?hFZMRG#m&>+N zw}90u@cEz1>j=c4*ilL>19^uJC>#WS>I^TFQ2EwjM8J5WwqqYi+g}FX8fqL`_h{x` zz}<;y-w%Gk7byPPB`rb8_*NfDfhyQblK#METbnK+khB{IB!Gs;f7J^5g4KqMF_C%S!d&p4VTwKjef52_iziXWeY>9Dt-RP6=03c7m~Y@e zA%6hfo_A8Eqgk0rX!yfR3QRybRJTSiLwZn&Ge!gdfLz)0KM$}7E6hMOwYjEYTyzkq zfuUC|hvktBdG@vRqTr#ye^Ytx?rbgoFDuV|^?Lz=9*yTIX*qgilKcly-)j7fSh5S)*Cgt1n9S)83%dwE6L~1(ahfL zpXtbVeF@mT{3{*(r=kq_i20)O|5!l(8#i%xiueRfl6^8CC}!Pvp<0N(ymVten)yN~ z@AL(6WK4L;db?hQf~Lq{xB64w$;LUszOjOGTr2nbn5(GDo?`D?VJBC8)Ve?Tl4l`lX)n3tU##^Wf=3rI>j8` zR@u_OG*y7HPRxarjGE2Zu_795gOs&C-a34e+k~F7x^TN!GzJ#J-wczz=04?HmVPXcv!2G%et`iltt42xm10ng>U6#Y4tUWV_JP-8S)ix?Np$ zdI|vqht+So0rTwO&1!9|kBJI;UaZd14A|iRISYOeuo&VL-4I3_}vd_k2H25R}y zYk0OYH<3*ro4}E{tTiN9KCIpT16l6uG}mM8VO*?4?3G=Fg9l_i%g);$;NFQhfW^p!qQ@PU=OjzY%}ydb)euppy_;1 zqHG5UJj~#vvF52to@Ub&1DjW9eKelm9~0wl2uF{K7KK|+8f?SY}gw4f^|wy!1?E+jW_Yf=hV^^AAw|_ zXioHB#|8{t;t^se+`Mg-WCl`vlmc2zNFQ)AsX47zC;Q>i`f^-klFyB|^%<7l?za=!+4Lnsz%O?XP&h=TL7XIXEV^45QtYyV+js?84E z_fY$r?$RjaaNK#nC7#oN=I* z8Gbwkbro1WUE+Y}Y+b|u*WR1PC7HJU<1@E2)l}-;w7FYTS!qh{si{q+R&JRoxiaMz z;*uzuf;iQrEiRP{xwe>^B9aRtE>Kyhppq*JDixwBqA3a@3jeF-o@SokyXW(H{_pPV z1ux*@y3XY|&*M11$MUTvKk$z;ajkmw?D<6=|F+z}yL+V+wb1qfUXgeQ;{WHIzcGMy z_^$u5;O{USy@~_dgAj1ZyWQ}m%A)pFtz4P%VC$f5XAa`YRE|sRsbWIsyC1PnW-l$9@?{ki>*;RMrAgZ;pVvP3A$$_>oitBU`ZPFFtwq{&fj}PC%WkJ zJvvV0ACXvpVgi#mL+$9ZN`-DH*>)a)_CP|W#*)X0wd}WDDc!69BsAmDnw`C70r|YA z;K=ShXkpce@WGPvrN@5>r#llD{y7yS{8Liy7~`JyGIzEtjA2u623GCCO~`|v6vQPq zb!Oy-@LLQ>q6GcglUixANp@dFER3Avb%2v!4I-!F#JrxLTu4H6E&CC9iWekXxNSr$ z%XXhLitn`?;4VC`r&RZPG5<*G)iTxw464C_J+2uwTA$Gt5dRgz(iOyTjF*;SQI4(I zNWv$z{l?KQ#B0LLyy?kbWobA3`jTBbk1GR<|`{^);OB2N`Fc5pZZkMi);VCKQ-$=*xAX`0f;Tsvz*`jE zuP^GWxg;*zrw*YLH8Gpo>Q<89EQb&qqBXdVf65nAlc31T0U|&+TvQi&KBKx1i#yV% zTXF3Yioo#tWpZ>xHwjERi#|HJHe^&fj~MIX17)^SL)rF{YOn!yd1Zc$AG9#{^P=bL zM&a`#T)H8TY)gjiJl*XPI(>iN}f1+%Bx1}#p07Iy+#3fB?#Ay zn1lpP@7Z|Z(*?TOkFr?T!b&Sv!IB~($kDJin*b-HY&yDNK*>YkUF5hzP?Dv2I?^uSf zt5>FqS^fOYp_GIY%!iyOO-L>0{WD;8Dox zr)q-$FY&cb^rz`e%xt@<(hldl&HtJrt62obIC|lYBMPj#8#$``jjbAkCO{4y8Fk&e zTgMmXjRPgSv7OhHa%Ht{M(4XWCzdQ=a29-a!`0*wPupIh^} zKj%4apY z_?I6Co|phB)?#=XC<%rClBzz|R}&L5PX!lDmE<-4mf`c-kX#qxmr_MnpLAqD4zFI{ zVB1E;Of!{D-sY6&e00J-U-!p6A+`nO+Vyy_uO?(lO_IJO{-fl+9EJ7eVrPfb(xKa= zjl}^^6FIuk8AVnKKy?CN??=7S$n0kLVqNo79mtv*f9|pcAQrjV(#JtIyDY4I)B?NF zwUyG}O=JB(ADa|)HwqAwzxKtyH(g&HhukyLqo#HvSQ3EfR^SG4$9FP1F?RV0C1vk! zM4aqyOs4a|N|{^YI`3IfADW+N8`IlbImJAB*cI4^bzM2uz38))k;4MB`qWxveOg06 zV`l+JXVVS_W@=EOVeMEFZ?l&GM^98Cub$ywl=}RY$Y94G zO*mnP?PMw4G*n5EU;`POGzxVYR(J|aRP23*$bIw^mq5e;I|ln|*J!PPkw$W%q?}Ei zCsdcbvcasHO~IFDEyO1=9B%t+m=Z3s#^D;&;>oTX$5(mY;I!V3mN!X3C3@1v(O z4t^2@$d7o@IRk1Mb0$iq_3l0#BSgJ^Qt>VG($;1T}<5QdDzqK0@ z&(Frq+za;FmWvU(uK(lQhKdvOKvI+nm;XMbCYEAMWO74 zXi)l(W{)YfpkfBOz$ePRN z`L~%gS_h{dYH}F(IX*@>Ym|l`;aS@e_YvlK5$rgv;e$w%I7JGKv%JfVKarL(H+K&#LhY!+h7e6gDJNy6V+X~f z^LjxGy`QccJfe%zT~s4Svru4u&7wK+ZB}2bHQ^jNvJM*&8q)B^xV*Godbl2ES5-f| z`{|7ltKcg7Vv}&lPkHS zTy^_n(hgtE(CnX1q|k4b`9&Y`Cc^Gbiuz|Ve-!g2nqiTvA$<>|XAn^W=X*p~d_S`? zZEEa7m<)TJ`tW+Ng`;{1%qor9xJeB%Yseg{e$p7D-QST>6$&qH;@tSacq$#;CRW&; zTrf8@5;Z=ftzAtn+poV#92`wuu9;3VBo{C{)8<0H3f7krsEbtKytMLSwUC`NY>#p9 z?sM#bp}fM&OXo{M6Jip=hNHh~?h18?LmaDxNDfeO!Wl0fy>QT&b2n=VsK^%qBmg(O z4k}=34E$UDDiy;7;tf5De$8qdw?dpOS}mz+V~ufQXd^tUpb29=o;1`qEfV9|gc)k{ z^eJF;is$(Eay@fNAfn^wWvtU$&HLR?o80ZE;Y*I=tdRN+Jv#ApU*byRr{k)Uh5Nr1 z@|kpwSD!4RHz6MW!KG|uZ<(xT)6wWzgvJyc@h}mf0AstyB!pVVIJueo}?%inGtwuJ^tE8quiyGRoSpnd#KB zK0NYh-KS-1UFO5j)br8xjKlk^3&S!`d#|U0zmL~BUz1!>{UO)TPD6c8<4)cKnPss~ zmid~@EFf>6IOUhsgr^f(unWp9RbKwAm_tu?mLOzP$`>wY)Ocyu7(gG(PK~;_?rrTi zDvvD`m17UrMfBG5EvXl3;Ccc=^om*dHVLhk4wgvyw98mW?uYjHQ7$vj7z=u zlQ^{9kh*bd+|bKL>@W2w`~1F}w4ha`-Zh`II+Yy73+>r9S~QZx%1)bfT zyJ#F)`NWs2$E>>*uSoNdS!QDBxs#$>s`I~-?aS~kd3gd2E4IdIxBb9GlY;w#xPiTO z!BCT>RDzu8;NRw)7*<(xd)*&R-38nUmkh1i=RMoIq$`1vt+QA{eN1Olhtb+V)fEs< zb1*S~t)j}^rEOaiF23l{r!!hyy{Ty4fswIijNsW=z+5SSJ5?NWaor#1Uj(;!lwVMD zUt%G(^6IwPknqZaLVH*zKgyG8_*nMqM{8}M8aJU46E7BbKY%UVTu+Kadhn#YGmid@2GV-& zBpaSqAuOG};*U2WVBM!v%`9{Z2uWz-2~W|R)f6|L=$%#Ms&0JDfV5`vq?0OOwp7FY zAF1w3v$S#PZACtJxGVjwv~MKk~N)?G+jJ#*_N<`u42?SX+_v2!CoRTt^KyWMthfu#F-c|lhGe(^ILIY?E!s(p^k zhVh^5vvdp$4f^ z-4>^Gb)zC&+lYKpR998A@VMy^pBG^fcGsaGYb)7gSssh#pmH1kLgIq`m zc+>K9yhC-8n$X@Ike&^M#YJYb7qdI7YXQ{lMHHsCS$eoY`dKCZl(Sz*5m>W@xxdA7 zlsN^5)OKbE?3hSb9GBBHfc3H+irO9()=ioZI-^*{+|Ra7x&#pm>nM(SODIwIlT)K5 z&C?GXS|r_X>jVffbWv)zeMNS}0dF}iQ%&${Ioi^ZKS8d4-U1m-RAmmncmP4fppWdC z5DL`7s_~FUb}Kghxyo0>)lzbECN>R^OJUGT#~6WY$dJILvA)5yQ$ViIxG`m8uKyhX zH^R?agyS8V<^0)t*5mLVGY|H1V->n?=Ezmuzh);?4KfBpi<`HY25xqz&XjL6BsViP z3&HA|wFjEQu*U~-fV62M%<+FFgd>LsH9VkB&zph zoAeoR3j4Q5(0lguu2AiBugHxgR+{ck^a0TQ4%nrLcL3xn!M_?;kEW^~wmwJok9g|T z0K0Jg^!tbAOL0zhG@ja7(FN1c+0B~!`>ZZ${Xirfq2zhK{2jo};o2Lbo$h8@FVwAl zj$EzC^QnJ4?{Tv5v_g+m!9&WP&CBfhzL`)-Ak%S{RYe2glVjBzm8T=K`_{v8Te;ZZ zf#Se-j+l{>@08KhAU=>shX8bIUzaA;NSC=9)!k)~Yg6HpJhoj{GVt7QGPB_^eqr+;BPeJ12O=K&M4Op?bBbY%1lEli#Z z@s@dh#k>HJ_m=64^UyTVSD>^D@yw&@G&lEGLrUjmdSZ5u#<)!=Dn6*cp&G(AZlhJ) zo6i>BdCaW(B^Ce7kbac3&i`45d>bzzrflZ4N~RtmUF$x{ zeYE=JFSdLC4Pg6Zaw|IgoEVG(_Bpi?I2w9bcC7;=egU(27vHk?(7vViD+$_OK|iBk zg@s6CNcMaJZF9@hU(pvcN&+`ufxLMyN-I%;jKwPAFh5zM`#zfG3OfVKDPa#cYfjgV z;+csm2A5IW-OCKDJj>!0JczW;_0u>w0!{)-U=ad}0)CJ^_SfkOCrlG)X0BOVEcF8$ zuPVRtEEKCFyd`J#@6LN1d#eV?>D9`Q4n)PbC=skFR~WA_6E2u2%L6O#Zp-j{4U#%7 zTq1cQ%lyePAYP$i0kHV>9a)b(Z>NL zOIxZJQbFMigmZUnb7&&|6P9`c$DquwJ+Xh5=(Nf;f#l7-16SdKB5Or0;?bPwf~QnW z$tOc|(W^fs_x&r!;;vUsGAH!7@qvU_tKZ*0HXBEMcOTB)+K6usZ#R8sJ z5#cxL?bvmk!{p9PZ0{X4$R50G|8Rq`Y}33ie@G-*cvS(Rl?i1KN!ToeK4}cf0Tr$E z-FuSFSndTpMg~~KWk2sSQhFWDkUZF=P`f|(*PLd^uZwg5N*Lp()5nWAhweVS%u|Cz z;IW^M1OTQ>-C7<50J9CBJGmR^#=S=trwT)XBGwlolp^G$I^yNg*^KWSZm=gen@~_` z)@v;DMU0Byji}~y=D!v;NvNksjEf-(pr`=gH;rv2jT3t z)uNoW2W?hvoXd|JgJl9{@xS7g1Ajz&b7p_MB>z`|MSF|XGTpRG>%{&L)k%VO%lZx} zKV*GlkV~xv`Ci^{OM@r;u$6boHmb!nsi1*+&s-C7H+p#>T+-vA*lR*@h@HNHGwqt= zP1tPS2_(uu`R5B~vVZJSoP8(LAJ})iiOI_AA0+XtU#W#anUoj z+g(KMj@0*}D5=f8KB35=>n8fz&re=F{t|msP8tWCAFw;@3>7YFil1&>#zlzPc7X4N0H8hQWK5M0`>ZpUPul|SXvEFS({QB;uBZbJark@6gCqMNbxV_G zihLf~3Oytr%RxZ!{6G9vDPAGffR**gk?(%9bC=`TvFReowg8kQ`L5cL-CJ>hU8*v( z))a>$Lfi_Z@TUGW{{uOmKv#1Ftr?>h*Xu7&W>^u$TD7_5mvQhmOtE?82TpoZ2`?M8L&udk<#FQp2BiXG2A zvwpj}>yOuGOrEXJ3FGQ5Yz*8zOy9Qb!M+~K1ot}3=k3w;{_2u#hQyk*N#H<@wuHpd z!)4%LLWz0(#wQ49p?6#gL21s^!H;N7#tiHT?<1L!4JaS?*97-<4;d+(ZrN{pe~;91 z{6@W$x2Pw#Xj|_6(}NW~4Buu%K6*lLlBrR=kZ_!IW28+L(Bs!Ebi5^0<|}t?8V$p2 z&uFs@U!GmgHX3exP|sd^X8i3hqT3ejYky)Ot4m2CGe4_Zhv37^k%3Tk&Sea2AhlO8 z@cmR$?Fm*frM@Ag%tVzNl^9Z)!+((dD*%G*Vsboc(O6tbH$+ z;&qS{$42-kVo^!wnmR|eP5qb*D1F86pcxOf{Y)(;Jq#jK@z-RBGtLfpY@#xEmx(>e zQQHE7lrKO|f~qwvtPv?=J>KuZd$-WGpHBe}^fXh-_3vj3nVLjx3q3y;pa0mneQlFw z;zG;8rXa_;vIzZ>;;-A(AddeIF=g4p)L`&liYf>`B9_81{sLuRtj$+PZ^hF^QzQPh zA#J_WZ3@WOD(bp#cMWY&3|0J%3*&zr?tXX^Jf-53$s@EsUV#Q?v`#3$gFKsIo}Uuj zJhK&WQl{cR@zaCx_&8;>&KLJH)G?i%oGlupiCFhv!$qPq30>8cG!S6?1Cg&U zc&XVYMk>*#+LA*f?yg?aI)1=Yfz$?;#vnJ>Jpr6W6Qbp6%EswHjESFxuNruVls}%h zzyhW4RvYz>Lje#LsA`NJHSFipRr%Ojv9T0!(zK!ulS6*`oEts$e(~)o%vYx(9sRaP zh5g0Wt4AU)0?9jHUJra+{tn2x05*`I{4-UOAXy`~iu=e>>^2F220}+<$Q1Urr2$6i z44%^7060QwP|UKLn8~u5y}uNGb=ePm@w>LpPVw^mzqmfr1-vEE#+Hf89bv|QJ|B3= zSpv|^BU~Vm01}G_^{ra&Fr+9Y24}RDD2(wzh84F52BW<_=dAUyhssGoXa4e_3wR~K z%#}_Bkv~2Ue@y(_Bf75Ut7UYv6Q9}5PcD^|;D4Du;%m8~CH!}&@(YE~>lW4xivw@J z1pxfzk%j`X75F0+siH55NMAd@LGcH`TBa?Z%WWYa1-k1WKY<}%f6k4ZdM{j|-wpMK zM&i0sG`5Dh;{!sWu2uIc|KlO18a5YPn<+GG`REMdEZ?cy{`Yw90$@z&7r`t5o|txE zj^LmPBq0U=xbC5F<1;m{qAff2N0M^}f{wO2e%&&qOZHBG3d5IH4cz;4zdg#;Yg?sF zt??GW6FMDE^lKP*fu>g20o@*6J~uo8wDYl(!lBUX68ncVQv!S( z$ct@yuo?7^CMm`MH`ABT*>!>XzM~tya*C(sT)dKdC(lneTJxU|{Sv{#US8N1@_V@x zw_=+ezxDIqGJpfz*l$xo?&P0J?-=9n+;;VI1)>VM5R_1_l&8G&D}|GR2k|DA(B zNA$b@Lt?c5is^sG^uJ;Xq!s^p{qO1Z{wqcQl_FpS{wGP}xa${6hhP8J<_BOanG74L z)89(}-?Tj0IFJVXB^&!cO8@NO;(m4Wh;4tF0KoS(I)Afm{LfeX|Ev+J!-k^IR3%lo z8??mV2J)Vcf1|Ugq`dr#n(}vzBA;dtazX$W1^XB*!BTH5j2?8**WCTH-i~kOI#(3F zu{3U|Mzhh9g`~m*a#4|BfvK#OGiMX?COt`D4_`Pl@rC%JbkeWno2B#nzl+gp1-|Fx z4iarG6hy{)VDCCUzILDg|K7G!_E}33M`0b5w<@HJt4dmk_k?473>`S;M$J=0O@r-?Oi+D$9i0M-{`5n6M>4Zoom|Jtt{aS_R%WA!G z2jFwmo%Ev|-dpHNkC~MplXc-71>w14jf_Ie7gth2Z71E<@{u zZ0m5;r>zom@yw}1a!lOZy+NFIdw^rc(7Fs`h;yyx!q9`(n)W<#A+l*L=nl;F+K4x& z_^vWNw-z!Ko)}VwSWT^;$g*XU2%`+Vbp+8l{^Dqq#C%ZJyLPGGOYAYTpm5-?G5`J7 zYfz=PrzfGg-p_!TB1*G^qK;$ih)4A*D5=_6D<13Mk`z1Wmw$@WRu;5c;O9B4d4FNl zXkkig$~xY=8zmwgt@(|(%JcJ?nc~Ra#>SUPvszZ~=+DuK==u)t`lk0Qcj-+$P}0~w zWqDLnfep7V&TbO9Ab1p>MQLVxgNK(e`96aDp@R&4P7u)4h+!9rs8AiwTE+b|=S}xa zke@gE*4R$>_Qi$s%#q#NiE0n`6OId1AfxsDhvsDlww#OVrh1J^B-q9a)l0Wf(n&wh zcx%BquUGp=hbK1vf8*ZobrB5?7f`X-4KZ*{kRB7NvLDEGFikAT%PwheL>BpRH!V_?WGyGMkWrv(q<394+rT zV$zc(oiK_AV}$T3{xgcb1(4|gb7N)*fe!H(zun=b+E>r6tl`v8_lqD2lk4ImI?Up# z+QayBk)tGkEwAU)_miT!Y<7kZn>1{IrjQ_L9=%uUk5*eTJc*bD6G z`BhecZ2pe*nBBx*e7yIdPB#vgJ7CmKhia5x8{ZP*lxw6|+7Q+1WXHtv@9LEqwW-sY z?)v2;znhW!XU=9myYAM7N!d9ga(-sp+^g%zG8<0f!{RMn({`G=lZW5wutl2HlewR2 z@?&`-75AJ-7Kj@%caQ8^u)@7S;{H*qgmng66MO)0a=QDc;!-!^(;EvoE2}_O^ULKk zmm=8G`l+0z(a<8q5PsK0tIwl|CXo)EX%-a0zKBc=MRocATmT70B?{|3j08)#YhED& zj3WWVVi4PLj5)^+wRu_FZZxl%!?eW_a?90fY2Y9&as3p zk1Y12^-a<+lsKGq%X|lB21R##h?VYE2s)YO0G`5bVYx_z! zSzKw)9TU&1B7N&3$H->qiL$jWAMj09^Xh2JLRa#@pp0`F*J`2MwN|`@tQ>-$ibx2% zpNPs5NHgdqL_8+adE^lH(db1S(~w&~YqM)7=d%N;KNOpXxS5gk;oCl+OP%p6Ny3}~ zpfad^JQV}AKN!V#3C)nrxNzTRv;X#H3mb-wr4*~r|MRx>Yy!WX9vw1oAC*zXs!&J~xIm-rcwqOSQG#a5fN zGz^-@)^*x38JUg|{<=G5+%O4~)8s;4)pqD$PNVf0`kU6F!Ae>d2b1JOI;!*RE)hSQ zc1@f?3yLGQ-()riI&=~Cv%o2~dpR$r3s$CelKgz)W;5}AvKPKF0y=yc@4v)Ge{MfW zS|Aq2vRm1l>;M;iC_hUmM>qOj9o1VP&`Vhe-Nwj>YLQ!AxIFx$6RBMnE9JimnUIEi zRWH3j1*}$RaQW(7aC?9JulY+$PNuELrAbV*g`C#Ld+nxglt9y~?^#L#W7;JM%Kc69 z_^LzUb2ap(#PPb*iX>>#$OoncjEp!G;UBEc{483h`?oG|$_z9qdF5+X0faf5RAn_F zu!be^{d?~|h#W0y_75RkLit%%I~sWOQPX(5d|rNBYe~X-k|wIhLJF1 zuEG7Sfw1c*%~;o`ks~YVhmdg@)_#hZa(qNqq)8py{sGh;n(G+h1=X*0Ga!Yh9VOPZ z6=}7-YTZ9hb6^1zXf}-JAM_Dx+`C&Di{gr6NuQ|ZJP=vFNizNUAbftv&NDH5_g3iC ziXgU$5-+tZ&#(7Z(y)neCGx}lBMFJ_fl_n!Qr<*=C2^3xF935v8rb4%68wY!8XW3F zX!iMYbo$0NY%VQ<5xv8aUEZ7o7@OJdO!50Drn2hip8KNQu^!?bR^S+GOe&=?h)mL%{&I&h}vK zyN|-tEyhX!56Yj%3}6RUC|7Ym5PGz{BWrQxA|VxoR|E#N{E3^P^QSt+y@_2f{tYA- zp@rSf@Q!7VEnHY8s`w%frxDPmEA6r}rxgseh)gnYO`?t$u-OL+DFySd`ya1#)R`L1 zPA!q=3E`1=-!OGIb)I@$O#41_{5{7AKYdO^?VdP4&8F~7XZnD3^I(;0&joS2cT@Uj zkrmsfX~3|oPD_9bj^zFTQ;+o9!!TflG2DYe!@J!f!=Ug zS~VOy{%jyQmJ(qnCofL~Sx6m-k<-{${EV^Kp`|hMEAL$tfW z{}T1uUe!GM?i$N?w@Xs#jgh&i&ClmqfTyX}u}i}=q^``Z51VZ9jN09$H%{ZkcDxGc z2HBuLe{g`0GV*!Na(I)k2pbPnm{X{qCwNnB2!lXuSchFGs(dP7DnE{_$9b{n+_c^c z`Q{{ibFpIt1cb<_V=o<4GiAo65IIvfMU_w&{o&bsr=6Te`(UCae*3g#X>)^Tp8wdr zTAT0;wJ2_I%nyRGK0QD*@+i$u`K0BE5)0|%(|qq}Vk?WnIa}NviXDxJlOn8l4l*z$ z`~v^bgppv7K!h0>OV~MDH>Ce)vSFvH;eQwjcL}djvvfnzxuy3*H1?_bp4Ra;1QH6L zY&YfHF{}l)yxuX_il6eYQygjLw0(*&9*O6|z?po~LjVFBAUUc&j zaxO?81(r?8Idk0T?2jdvBTMLLYlwlyT25+WXkp}zVJfqKU?DRlM;iQaGHWo`)$>&a(Q0xVAF`qR&v9=wEP_Rzk5e zHEK)Qc6N~68bWpuT#^{ME+9qIdLP(krK-fB1I32rPO|%HGRpFaj91Trl7Vo9iy6g8Ggc)AJFq;b&$K5eYhAoC*Z zy1*4^!g-NqbU;vk6R@`O^Q^QVfYz3jd5)a5``b0Yq$L6)iRhYRP0%Zeg%HH~y0ZSa z#O6Ji5mrvXM_-dt53~^?E%Idf#bTWvJ8q7IvbX#ymH8u6?v_)y(#fE6*FiXE{kX~ACA@&H@O#rH&A2@ zefTQ^9!M362eju)# zjvZj4S0veNf42G%6`bOWY7+gf3-YgQ$x4MN^h36;PIGJz?(`Ms`Ce&NuO~;Wtr51k zvV=!tjxB9!GsrxsFOb#Hav3%l=%$46JL@r$0KfP{a|w`OSyXoZ+v1kys>|N`JIDHU zZHtchC~0g=M`U?z8e|Dz38Xz?zohx;?&J&_fK}}C?E%$!;i_%ey}J?U{edoTpDfuP ziGJdHgw^Zp7%cS2Dfd2@kyxUz!cg z0!H;I?b>O6q2>})l0f>D;~@i5o{c#d_Us&4&hfnp%lC*a z^hL2PNvE81R?G$QT5OSatzrRwZBe=A8LqEz`Z3qJFsVy#;ia)KG1_6ce(u()L|Hp) zAIz-LU85*|DqNCC|CcmU-*zd3&3l~>rZI7ritwcv^q$De`wgguoGiWe;(c(`e@i6i}3h9h&b$up}fm?}LlTRbf+nKVdc+!>CP}1j|xbaok zq!%v<3X+2;(tt7pm?lKh!G&KFksISnP=e`U;6-f&(L$18^JxQ>rcPG9nPf##2#j6a z4b#LwG)*a=21UO~1%lf8F4p?edDCcg(<)YO;zC}umA~=k1Ofk_x0Fn~%HQ`R+1??! z^RO8$aHGv=GAJ?b>RC}m$oTdAoT2cB&l7RudP(wUtH2rm24PP2N+!K-=vZlH3nNI> zkV&j?Euyc18fP|4_#87%m-Cd6YjPuRe2$Ry8v#kX%SUlx9V~s3taoW0dolv{Bm5uW zP5;K)Wb<4J(Dt`8>(P4+u7TX-eDFoFlst{w81OUw_w=&Ib*M)cwP z!3Jts-Vsq}Wh`d60{>rF{aCAkLp3xu-_0wiP{eOGU07Nn7K*&xq6@Xglszr5RbHDE zxUj+3&TyHj@cPmK<>1a#ALsy#&~{<=mpGsh@gzi==ROu8w~COCt_2H}LnzQD+!~hd zC7s}VdKFf&@w`>h1@`KaIxg0StpULgDB)Tw9Fb!OL($MVkhI~lW)By9?b|WpyS4NH z6lua`-?NF-yyNwTfoqyzYK~j?A(%m=OGb`(!j8@+8FoR&JX|D~Rin|p0O8uNs;2nY zRAw;yn7e|myrz5M3F1wTDQ#KU$y>-W4+61uY$eW^#iA?dANx+UL8LT}ZIF1wra?Ea zF9mwPUdJfK>)!lGtn2uZs|f||S~w>(g}-{J!i6pV9OD0q$B07HL!08*gDoi|gLgEJ z#GFJQmPJYOL`Ah-jT@^NK*jGrp98EVH-(RUVr4brntF$07nm*y$e}%wk4H&*Ijj9& z^5Mx5UD(wXT4$vHK1kCCeE_0dBdt1WDmb;tS8Utb!Bs_jCNumlW?s1c*c$&1 z&ig0O2mp|I;qRJH>@BMO=cI+BM(d7~L#JQ?`p~`m=O+Ir21&LICZU#e98Z3Rqr=)q+_?-C}@{%sgy$p+k>(x z4N*VcADVOfLP@Nv_zkoC;{qj|f{Jfvj6)!%bTb{8$0tuSnPH=|=4G6ujXa^@ME(24 zGjV;izuOnz8nOOvUp;dq{kwht;@=y{*-a*COQyr4O?a+$S8+%Vb%~{yMsO~_=GlkaB1EifkYgJaIvT5BZ-qN>5p2HDxm0`LaTN< z?Nnm1Hu3UipJgrIe|(+5VC7FRl>prP$g@l}Zw(-;Qs?SmH>^5#w zt(~0QKwAC%{b7%du-*X#Uxu8V{0#rxf5}pL>dX2DJ^{v4u?!qOdT(!$P|40H;1<1V zDB7A!8=Hj8lu*e)nVH)qD5ANg8&{V-@nSnlcQ$Ja5sOyZbv_3tliTdt&Ano;qBB`+ zKw$8yr_X2KD{khu-zowgzY-KQTP*k9C`2&uA_Og}eBZwi699Ti`Mm{JKV5u+!;;&4 zBkZK|^cmUo*9$(9ZOo!2=Sn#@pD<=ApXmD6#L=&@=0JKA0mzv7gj zdX9n9^m#8;Jo_~-FNv5(KRf}GcqFCGz9--~eNm1+w!gn<`7TQJ+4jyuZ1P-V{Orcgx;xdkq~Buho-;ycdpp@wWZNQQeY0Shg@=KvQ^ z_m-|Rn}!nUEGgCJxG* zrT6aN`xxgWa%)U{O5vWPgRStJ#fw21-Ad{9e(9Qax3{z2XAG(+X`;1H;)P4O5lS`p zLQWjiyo0@yknJOpY?2^K4>#Qb$BFU-0^)7cwl}2L2AZsdQCE2@%bom+rnYQW+;aN) zC5f_m39MT0meTgCrAVPL?H8ol-v?`>>ujQB-jx@T7>tvr$Y;c`t)eZ8BL)A@r$pU68gRi%n! zN?<#Ih%(45C@3H!C*40d(9=%FCZ7|a&%2F=?}C!bst*~@-8^*E(3G}PBs8H25$JDi5O;z-eDg@jGAIbmFt1##C14-;K} z2k~4c_f%WQf#7%Y_JP=KRjAKoA`5jGyP=l z2^F6&jq9E4x?&<2*bqZ^!X)BKhIr8^@YR(XTI(7k%O(Nz7KW)>0d}<08C(@-j;K#IPW`k`GD;}!Ygq>Y`W0ZjV&rDWVlE5qq379G49PoOFB6iNo`ML3@y9zGqia@75{-|FpkP4$+Hm|De5-d6aULdbKC zRVfGi;&#&=zQiy%xU|omV_1068ls?CF6Mfblj(~h@)MptFY%&D|HG(k4Wo!hV-MR* ziNKGmuJH|v$V?#_J@Fm91OAg=9{2odlD?UZA{VOnvxwK0JBKU2moPotD~dg@!!DW3 z2t8y0@)bWSADT5@EjdI{I7C?;heGg!Le-V#3%8djD5eUAuDz-S^46Cz5zVwmzOtsa zeHz_CtP(3#p+EYw3vA~E=TD;OBbY4Z@^s~@!WIweVmBH)BN8r?Sgm4hmneOLzcKef zr}e$vzSm;s9T7RQS-nZudu}IpDjZg$dv4m3j$NzFQKb&Dcx@MKJEh@RC1eX3W)Tw^ zkF^fRVz0a8sO>hoNbuS!?8BJt0@4y$mdKETC!8%tf+Oru-%Ro2{vul~s?fkHv3%lYEWLDseG;a*yQBo?4(J8bpc~99IkI>sr`0KOOr*CF`?dW`j^CF zqHV3(y^x6v&hc^$W?9N~G>9R5^v-5KSR%vI5g<>Cbu6VLtfywS zO#AhphK$2?`|TA^pDYp!8xy6#dd1#Zb94q6&kh$Dqg1~N0i+9eTU|lhQ-@107Lkx| z!M8VXHS0VM(l>e3N(=R$ZYOSeShZM3xeyYsUyq$k)(pko^6;1Gh>toVifE>qft+c% zIx+RFyL(8>K`kx|c})uLb1-Q4nz@v7tDmve0sY5wrRp(@iOg=idWGR-+}BKNQ{M8) z!?*>5_AZQ;j$H2O5iOfNm05BUx6tcpWD56GT-xvK#Ju0kF-@q~4WC`@gJ0m5FG{$L z<%Zi_79@Nhs>j<)Ij^fNPJU(9fyp|C6)SCG#Z&aE`hM0`m1Z3;qPDO#Udb{4;!(_V zWa#JmytXFArcAubMoqE71Krvdd4UZ^Ngq(JjI~_jNtYvWLMyd({-9KF;(Dx-(=R6J zmCDxwW=1DNFP^Hp2@2xJBjgF(*^f6nq%r3gH6iM4`YE8)Pg5%+!TU{7<3owXnoWMK4Tq$rq_Bu2#?iHl z42NquFxQijk=0tw4-Zd$d#qoTBjw`aax>7$+`y5_3Y~RhdCuCK%wUyO~4CU6g{-xpg5zwyO-mc~3 zB-gg;w}-+TD_vb0nopTtO%KYhHq%*1K8U)w1!Jmb)~iEWAl)Z=-D-%5fMc?Y>8#`f zxCd3(meLHw3u?<4G#EG_^>YPSTC$N^(q8KrtBRx#A5ywqH@2yC>7;kC%h8|RD}9%p zA7?PL;v*wyn5~x6rk(0BD7#K{m>*7c?ib4pY#KpUGpaSyp^5)c+4ujil`qlS%mxo1s}_)urqdZ)URr}Na_{}X!*+I|e_{D!~F8)N~M^!oDBhDP>x zfJ`-X^Bg$KA4)D~4QNa7bZOtrKHVA#j#uKCC%irhDldeP*ZwGJRi(K!Aeno#z=mt^#?~f?GWeE%@(`hcqc}BR+JA;XOx6Mt@#(Vkjk{ zr6TFY?JPP_2S2+`f6Za%%>jOpezN-g;4AxVWZ?8BVpO7z_ga3{hVxq?96lXP805Ilqv7+%QHP2k@maW&{WS*p`aqa zs};!-XPoe%_>R0gh-IVZ`nS-vwz(7}3+u~>_wUFR8SvH;aL4uh(liz-I_gCjvu1*# z;%n6~BqfI>sY=MoTfd<6Q2WOnSM-3Xv^4t|5H_7W>G(O7FD_h^-Z(lEc#yQoE3nas`h(sQ zic54cr-OeF4t91f4R^e6m;H2%*awp7j6WYO?Znmt9qZhJb``Icb@4*0t}b<}_q?@4 z&AglhJ5dtno@CJJ_kJ6J<4TUuC>2%o%TgdGO-^r(tp5_bRkO^E$i^P|EP^3bEbi*? zf(ML`iOIvN-|{tNnfzNerLv8zqN1@;LA)$IAzOSIIQ$OheU5dkb@>UENXyvkV`TUP z_1>^QMy;OZP7$NiFjNV1ejfDub|v?q;V0_jKnKCq3;1qVK`gZo@gtP?Zd>QxN%*u) z!lh5!@JMy#(bHd~I2(n;o*WG=XDR6!EjX7nevyEACkf&~64jGN!?_KtlK9rE%Zg-a zK#pH&+{yW3l3Ny1NxBjkKypJmv8nf#XtyyhAzY)53gcwVWsH)JJY$o0h{in!*HWzO z#FO2;AJ~NRiE%KoynKsP_@*CX35&B6k5Yo;E3FYx5e@k;~zm_*r>*bW0kLHeY3T;tT}5<-xv z!6E+c@V10Q_`3SeIsb#4$K~mx(k`@~-*zomVjk?0n|55UO)a$U2Y&2^wx}y z_4;t;Y^N6{MLRBxJ4$FH0PZ#8>1KN26UY=vDV-wPrAK)p`1x|%ju?jRmz$Hbr#6>9 zw9C?;2Xb2#ie2+g;?ZXs_e!mO6F)a~@TsAq0HqZn({L?c!gj+tl~~`pqX5X6~IX0Xv_6 zi(zNOhwN)Q5^XOrfChO9v{{DS4wfR7%z#KFTx%mI7ay^2X`_&j;6RviTU8vVV-Q!H z(4=ke-Pz2NO|#ZOlp}FWWUz7|?9bIS3rIa&Pv-JI8C;PQ^`wXK*{NsNM?+O>WwTlE zW#&uDWz5qF2FlYdFh!s<9BJd6Iou+YVdMC{SS8vINCQpF zv^3sg8I_R1S}3>O@rtvn4N`JG(sdxKu)}`B*2GM1bgF+W+Ks&|j5Sa~oW*XoT&Go` zk6#S*!gis^u&FwKx2fF=cC`}71lvcbYq+`-)B(jOSlCvzA+aoiFI`e8<#n~~UMzz5 z7&BGuffFEtYzNj#hE~H?QjH(ife2|$^va#Z$skV0NlG9>f;T+LA2RZOJTIW<;I7>? zzSc5{636{ys_^Ms#c+A4?T>veWl8B7qzCPQt;KO0Vwu$Y884Zxx< zI$8;dE?c`YsG<@#!Xs0kGb&0Y>upg*Mc=ux$CTmv`Z@zB6wXhaiid~4#d3~C3?Q8? zj3C-BSVQ3?g+)S%zOk@NwK{M6=8 ztnI)r|JGc9<0!BaYbByEv8;V&MxpZWefsDOzMlIHROvV8uYEjZ5J5) z2vtV;lrB>PqVCh*A=(%X@M3$}0#qmj>ig;8<70{>5~^w&H&xEI?`(r0 zDc8jK1}M$}vhepd`3p;X^Wwr>I{43Iseyrk(tr9xwL`?oSF<*{H6|t-{+I{;FJ|@> zlpvp<7)!18;eVB({rBhe|Cr+a&l;eYS61E*|9dZFIXNcv$kx^A`z8=%vjpWFJw{f& zOupy$y=NK;`I&~`rvD;1r1Pp*TG}|~`}C{7xAeYOvvgU1sY-hX$BrJ)ivs<s+1D zF(o*POU)EPIij@{ed$PA_IG%5Z|^@)=#ZW>Stm}tg@m4OG(G3~d)oeF5JXZl*ejgO zV#Ni&)8_AY{}ZbE-|u6qq7iX;8@}R=KK9yp>MeDqir105ipHQTMmeQ>YU+r%{WK^F~1#@8PohYfjFL3MfvPcZ0p~Z>=TX$%mcO&>>wa7& z@&@9IO~uD@|M{qZM+{sL3kLo2_5y>!bhNZU=+?;>kAcs0T~r-R`gMHNj57?zd~AyI zj{wldBebwUxmd@lRiGZ8NuI#D*Uf5TVj^I7=Q%J4vsZcQOCs3y^bvQ`fukS;Ly+CyzPavpHME3NfGqXNG)ZusNKi|0q zyQZ~yZRx^YCjzCVq^h7m*}7>{S6ej9YE|s9wJ*=lbnkt6Gf;ojV!358YVbWY!mP#$ z7=vx@$lu##i&hynvkvBORe~oDDLgo7XlQV$M6X~QqEtEZ{2I;|MbT@{0~AO23p^=@ zv^gt_ykifSm+L9RN24-^yxI~A%dJ=yWtal&j)+^ zgmvXe1!ArujRPkB;*9z6D(&=zx)GyB5>nD0#+Wy`GCxT9#Ep>1h_nLxrC+Y28TVC= zj$R3giP`?Yv;6O+BPhx<;$m07$Cwzk`bccK+Gx7$&t*(!zWwSk+$g6$w)D|;>(UC0 zF~3;0Sa}OgYgk%h6`L+<^O~n!u=UYb2+uh>6vX5*Y#xp($slNbZ1l@ZYD=`$bL8Pb z#lXOTp9LFsK`CN=6a4{Iv)oXD35|D4(D70@Z0y zBNV@Yo!LipES=|cf;+jRg-zj_z1`g){ero{*#|w4)A?B(z=5~MBqwjG)_#T+CvX`w zefX-~+W$M3=|s%SIIBXyX%0o0JZ*=|Xu0u7oDTwlF!oBLp`ih|d_x+CpPSp1O?^_o zva&V3xxhT4xJ;K<#Pw=@*%LQ6$HemNQy^TkST%BmEiN`zvwU%V|6yrlroj7vD5+xg z{OW9adfIBfRBk8I&ZV|)K2=Uub}nY4kXqC+9H?9XDAuf7v z!>Lt3Kxo+dwZW{<5Sq$mj|G(o2oSG75j{n0*+3*gW!4OI5`jar_G!7aT2G+zDXS&r;=`x zCNo-Vl?yizuHze-JPy&_y>mY)o|-H_)H$R z-VtIvCC|@KEYq$#Tm29{wwvlou2rgk5n=HdlbC~&&pg0flab)F%Fnm?E5SuYMYouV zmgZ*SDq&_@r7fS=CbDSgv^7UQ#0R<`FVxuHvMw;W5UG`zM_6l=sEQoTNrfkg`91_d z-HzZ9mpUo^Vq!`_O#Bre8(Zyou-d1RS$ORc)=n)7M>wx5rs6sPk>wusF>0X1=FfP{ zq>7d$-)>Rjd=!~D@vcs8M6$Z`2WV=EfPGZvy9-3VP3>Ea;BUcp%S~K>I0x-|@&wIp zzA33bWJYmOTJOuBQ%6_UEV$SXXWQ4lpr_<;1)DTc&zI>`m89$Vc)R|VvdYSz_|#>= zaS0LGq~=`u#}9{3MR?P-mOM|B&9)RPr<Y}%p#r-1d8-!vFC zC+Q7a%aZ5kD+>42{`$51#T`-VmFA_Um3(IzehDJAP9H>fU1OX{ZZOPhYFt$m?^TMwt=Yy!L z_gJ+Q&$Km=W1NBjnPG!*R2^5qp`{WIOxVA`IjFhLccI#Tz8qQtpys53Iys2?Op`>~ zar&)FR2rYvYuEMN5BEOP2uXN2fFbmFZGQP|bw-pBX;tWkS_epoqXs?@CPn10h{RfX1(yRUe9=RI!Cq!M=Jq@@MP=y^_6K3Wtg&1zT2!NwGGga-%-nyN99 z=c?uz{&EV`R1h{*Z8!9FFpy{|a_nj@-fHXEiOVdUl@Zn>_Y~H0KmCRlhP_$|25{ZcFtfzV3yvgQgpo(Yzm`0hhFL6%lEY+{QR1Bx(|vDN%1%$E1u)@gVhczt%c&P z&3xojWp>8KioQ0po1I;HxT9m0&+Kp6hfGZX%xKuQE($N#Zc~g4hA|gTa5}j#4;Gk5 zy6qw6eM=2QeWBG~brk3Jdb+!t#kNGV3S%!{mlWZI(1;9@uiJNxJi+Pdb`TJQ&inzYJB|!jWz>pjX#uDQVdLkG3*u0@CHE(LKg2n; zc6WExi#pt$blZ$wD^~86Gvx35iGkwOz6b>AoC?jImY5zSgmAO4gu;ZfY^@8LKPQT% zq^6RPd^T_eIJaE~?JrMSII1LKItf-G?|s7s39`jqToffuBbRGA-*LxMg<>QUZ41I2 zUFwSj{!Gn>ADZ}!J?Q>{G2v3`l2z0pi>2`fsc|Yi$HhOt05Hd@j7Z&4lRG z+~X`7TO__+Fj?b;ej`j44?5KOP_CL+u)WT;Z&r8{;NSGmzOYUA-K8o;*Ndzxm3BEgaxZO3;P% zYbXGtjT;*LvJDDDjl7NW8C>k1OU@5+E7L30;a73~C~*C% zCyeOUW}#cjCb9M*=%D)ixOH}NM-bs0_;v5@`;epK*-B=M2!W=-y8)$`oPIn!^`zW+DR-#!~~A87Jj3zP9-!l>_DN0ZxSc;sfsY|rA!YWz*E(USDUHhH$9-L6^ruHGc(^j1j>{iZf^D;BMbNR+7L-->C591mDLvp zNzu{1-e39A^PO7dpEzvCqCtljiXtPG-=H5qRUFA^iS99%-yuPT!>kqy2op>;*1a?e zdw|YrTHB^f!t#C6Vx17C+|zhcZqVqJUP3MCI1^wNzp0M-yc&p2m9Gms#h6CjPJ~?m zP_>fTc=xakNUxPu?MOkF;mt=~y+&;wan}3Ow-6V(WBVt;!Dy5Vb-)l7ZSyhl&3xm8 zpq@16Fclcg>hJf#>zSMTZcrqI7=@_e7r?0b?E!xfZz@M^u(=Z1m~~uiTvGYMYum&4 zwm|7g@HUeLdU{H7iX(CV`ufHWr~Vs{br%oA!wh=sg@pz76Ew7^xa2&ye~X((WqJrj z^7o@Iol`pMg7)W%aIt+J%gdozhpqE?*EIJr#&P8=xRo!*GYp*Ucbd%v4M}a_TVBp~ zk9+eKgj8Q-<}UyqwcUq2r1B-fa_yFiK7_6>9T#lRHJgQZ1~tZ|g(q>2rq=wqN^jvE zHfo&PTJbJs*U9-3Zmu++v@;AHE-fu9dS7q9_;vqMNB0OoUz!y8IcBP~0l?5oW6>Rda``r0d;@wr|$F`1Smn%fb!d z)k;Mz85!1uYA0o@nOYjD>D^tDuwvt?MxX+?zq?fR^@V6g6vy@A2=Kj8VYo4u-`CV`FMk^hI%47bjrPpCiJb|hom+xNA~ZBvxUXN!Gtxw?#|z@xTS zpFX_dgla*HH31*dEnXt>yMFmxy~%7U-@&9WT?Fmny3~bGt;5i|lf3+3^;QYLz4Bd) zq*OT3b6m{A9=-;eE^NAei^5w8@7fqlBF^Qyv^$XuEVDDMdjK^Db01{=rkiqa_z1wd zOBn24Lm+bo5!8uE%!NyOg`~bPiHnzKNGC_Uu``DoNY0%g$HA&d+K*(w47_)S6J$rA z15}40=Y6r!3M;tG< z1{%0N5Shs*r6TUWz*6PoCp_HT!>y%?%-pHT zB$&(H5q2oh_rB9=k`dO-5|dL7tKwJh_wc{EjUVTyE2h!MCp=#~aL)>Aw?n|0kx>??g7n!H{)rFiO5r5g>gzi~lZ-DUWy{ko-DOp7(ai}3x!E4)xXqsLA?M!H(t18sdW6$ zpTqKh51>3S1qCNHQ-Jk^NqdlJ;oBKs4=qLQ4)_{km8$In|4QS7Jd8Hy&3PzXW;Ga zouWP%<)sB+&{^9hfCdO^O+OkI>jfC>3;b6*S->U7dD>-!Muy(+9?>PS71`RG>lQ^* zrivh5Q#FdXGTS=Xn$kb~)OBX(l#!Ad)%&#oxQD4#(ABa~23t2^BO>S(fO#OQXzW(I ziJJla^2p4>2CC&<){*rQ>Gw)PB5+-~hkjij`2@{T5Z9|v=?w$JR#^x!WqvvZrJ~Z< z>sRDX70|vurbCf0mytsfP)t+2P}7CHH(P^=>wS;fEcP~lr{94afV0G9*lJMgW%vH$ z3nLi7H#XOMK?~w6I0wO-z*I=w`Ut+n#nNrCLa&ns(ybR^hi{njn%kl1=b>BCI%i5d zJ0Ye#e+-Nu@$!yG_&(Z#L`f`xqp6~flc_zt?CD~Yy$fNiK$_HHP|E*0H;f|K*+2Gq zd}$Be&2Fj1+aS}>pvTzJQu_w*!QKoQF|Y^n#*5)%Q2sm;d3qXy^MhdRi@oo1JSJ^q6r~j8#n2%3sPG53_n>wNZ4iW!o7;LNR^yTXZw*!r z5&@^vkSNi`hZToU_WVwkNC4`Y6ppB&9?TZDMAp06>M!j%CpoOPswjPzE>SNZYjyU# zYHG^#{d3fEZNJh+j8Co*2ly%W__*_YPq_0H#-@P!(u8SL#Lb&5Gy1(<#i#p)Bi)AA zJbY{XvOzOmYZEZ^sAYiVV&`B7OeZtVJ)?l&q0#;Ywmk>A$S|Y^c)Gg%D?8n(3NV;4 zZXC4xBL@AUtDq>i%ixRW&D_z=9_FEkGXQ;?T`Y4&OIyF*iF)hZ*U1#_a2Za%Jnt!K z;EifZfdD1+UTlvJc>!4+7BRBSU0}fva0!Tbo`t19=;-M1+ALL?UG$btLz2jZDsd?t zMva=47a8N^f#kW;=8DOWTc@XWy>o{{i3t>)ZY$YGToA}0+pL3)JsM8`ttMyPtyaY< zhi+A5stbTb(O`EPyLYlF6dXVq?0zDAKGQWhIN=45HJ1z7gLHi=VqZ23$_FYnYBXiS zSDQDSHWD%U6eJ{2!|*8=>hp-7m3_0up%%VByk+xH^(ak*rWKRFdtaY-@`ENt(3z2K zD=RNAZ*kTnAYX)bp3uE%dexq}v6Rtv3g49o89{?tSBa)iePn*r*V{Y4pgsEl(fusj-@_)5 zV`7XN7=TM9BIm25q*Np-#AtpN28+sQtYMTy_ZCB%I30EQVVZ(EaDoyv{EGF8`iR%ia1GLyGkt2E!~e zEQ<}T8sD;M0I&xoP=VeCZg1+=-8m!uvVeMZ$Q0y}g&w7Eg~?~c6zk`;aK= zw@$Q-jEQ6@fLE^oc?Lwob2EkT9%E!j$E97J_&oRGgXo{z-)SQ9;>NUP(RFLQea%?J z6}pi13{;sDF%NOf>UqF5y4zS>SfcPrt5qGaMo}iAUsJs^UzAolI}%p$)Fa2Z0;Or?vF_f;mCg3eW`eBN#IHO&nh~|5++w<{ zn_+q**MQ+QS~d^+VA_~muHuY|*C@ZrVmH!YrLotmp`WOwViiwPjgnxT|0_R+u)7C0i&%iLG_Szj& z3@R$x#ijNxdT$Xzz|xqa9wsaZ#5(7@UP50*G6|;dF05bIho$66TzCs-qZ*>P@^|=# zwkhZ8906-A9PR=5;mr^HABQjW-piDK_?@lJ37X3ofogjj8`DzGLkP$Bh$ssZbiMF% zeoW*l+yV4v6~A=f(njGZxT6dNL42pVpJRwN#*ylcpg`O;zLW|lP?nPubliSyhAA)A zg^sMZf*J#Ea`}$JQNtX_X5Rh+O<9xK1!GClNPa>Twa7QBcvh9nmiW|E-Mr}gfceRp z3jMiSH4$B;$!uY*T`}|p}oq*)PwNeBJMH1!ent z2UjPPfqaA$`6lGnVSwTnKgiv7`#!aY0raNlI;f`5&Y;~EIfCud!LHF6YVLV|B@$kb zRkvs?g*lr3CMDGc2+&2aiD0$CY`ORlAoVDR80qdTrGT$&^%Tlx0h=I)?C1BK$)%<2 zHuvfg7%on(4-<7&?W4)9pP!~9Cz;dy$uzcQ6!`Tt^Thm&nmsO{i`TrN_5Q9f#6eW( z6#F!;ZkXM2=<@rnQL$)4 zvF7GxJ{$FE#QVa^R)aNhH@JrfKs!Oy%DAV;QW&75_~h4@^9p2k>g|z$FNAP+xd`Aj zE`@G)FGBV3Fi0>OCm(cg4*X(T(IXd}|1h^sPWspRq^BZG2{9hoB#jzXSYv(sZ znoyn1R0((zAe$a$5D?aB)%E9?QCa>{nL>Oe*T>`Y3xlzm%2DFZZ5|hqIO|jAEpFBa zyQ*B*IxKf5syHaGnM0dPbaPY9 z_4s&S7pa!J029m>G_W2yMZj`y?o>hq?ZLi;L0BeOs9U_5C|}P%-MB1@Nn$>h#jf*e zny4i(D#@2Zw|1Dm#yprU)98dGtFzV>Dm7G_f}i8`KgF->pfXsPyd`J&NS0slFw*4= zXdeWrIt`gzR2-0K*VHmy&Lkj%1+yw7v5OWNaScANJqL23wr+9_dVZHJ&|Tv;5er@e zc*^Q3xH23mWf*_-HVyO{^A!?4t5PavwtbPxC7|0)&|DhNj_xJIRGM1Xq5d&9*I2Z> z`@J7n5M@aw;<+}MXXR{}Ea*NEg4_7x(Yvrho&x2pYg?apl&<*IHFcW4e%hs$FX0MM z?zifvMPyVjKX622NMRBie+_yNDkAz385;592M%_4mZDApD@b*6cm9ZUE z1{+95dgR9~l>m7qC#PRXck{a_%@_pYZkbTDFe2JEw=K?Z0$kv;c7ZGX&L>`g!XfH= z>v6Ywx5KI1bp1zczI9yzI|JvJde+jY^+d;`rjFPFQv;z|opz@+z%K@gY5!V@5Ws`wF99fn+=G0GO)%9+NGd#Q)g&VGuUR{H$>+@%IJLWJ+$wF1=w zP>YplmBqL1fD=Q9p-^T}O*iUO?5z<}aIXVs*~4shdzuHC+|gE*#DVu5$@)D?`@DzDdC63Ysha|-Czbt!|atlyS$Ta2g9 z+AphNer@8c_WbWDlo($Y0DI>~LQxT_)d@$tVTD^bs87vrEi45Mr% zOd%mp1ClwJQzl`({k&~Yj!r6Si{m2#-gsebELcGULmUW%}cU~#oA_Q7^?1$nuQkKA>h~F4^T#&8o+op%U||+iaUxM zrJo5hCgJh=Q!pJaP@ywXDu?Q$JCmH2GVYu${{EwCDkJodAJ(@KM3^w4331fkrV(e7 z@0sNQYkR<>eDvuaK8&sSUpfq3RPmjcer+OrY3o3M#F)20obkP{-*naI1oZ|1Vb(LdJ|(=#yS zqB5HX`DcGnn=WfJZ1vgM=z)QB4gmyF0S7OqHO{rV^Og)1rV#lU3 z5CT4sW%Y;}An?s3CK7lZso%35uXx_TBhbT%&S$L{E~6Gm6RVJn+2Zrli1TASihg4 zqBcJ6ek=;;!UdCKJUbQo?R#GPOAN(%^YE*E`NE3YQb_1oU`Y8|OpI3fYMbNTab4O7 z(}@0Cr$d(%P>-Fu*pJ7z@`fv|Y!O+1UhH_{LR6cKZ}EC4b@CuX>89>U|vn4}WZ z-1Yw&Zv}%Fw0Ew`*b=!Ueg~w^14{rs?h4^G#N`yit|R?pS-{#9U@a#wyz!}gqogjU-;0f)~;?W?-A!Gu9!A%biT@5_fx*!bP~ zv|jwXV=*srbPu6pz<;c!qM}9n((~8#Is6YLF|qH%1fM+3K6dcWb83DmYp2V0Y6?6! zaGDZ<<1yc}q1mOf_!7+K)uG{t_>_zrLbfvB+{pGc381a;2nsA z1zgJ2QOF>#1Te#0T(Ba@)~+AZ*vP*nOyjkGo1Uk3fPsx!*u_~_1pv|oAS7HB zN6X~{YrQlxd&VE>Y9)CN-9kJP&T<>cyVQ)co&rHz2A5hp9}k%(9G?g*m>48uGBGxm z?DN*`a%gR-vsn(nxXk!)l*pLCVGujhYhR`kwODE;arK>GX%8B#N8&@^^1x$liS%%_ zp77zGpT6$c|5IFUx$bgv`6MN^9+eoRrhv&5W-h(>DMv*D|i?z09BA_daF@+`Atj zd=qCgy}e;dPj94~&P#k!a*2KcY=nTrY6Gx-POBkufAjc$zcSARWDed+b$$Z%YZDV)<7QQ%bV*couWL&+6VciH{t2^=Lte zenf=lcD8^nk{%+;bJiI=@FLv&i5MdQTtN77jmJaHLM~(p z5plHE%8qACJxzyhE(QEQf5s|87vCF+dFw$rVHJcRPEw&tu9m$=zaI(u4o>r{$kZe! zYa5CA0-C%0C@FvCa8zQc23rCI4u=~#j^SG`?gLVln~^1o^`7rES49P2&G0$xhoy%k zDKeI5-Xg{9(y1t&BOeYp#x%SVs~wVDS`d`VUzp?Nw?Ey zfb<8bhrj>{7PvaAk!E^AXfqVE&hF-HEM^d>7ePTz`SNd{^Sv$}-n1Oh#ba zeQ>1kp!gm;p^#0I24G+OrnE zK}>0P1W2kssQ6Kmq90{t4IH!$J%1i*Ttz}1L-69Z{)C%=*D}l~!d0n2JkrH4_#MC# zo=!j9wGDZA?0Vp$oGxd5x3&CuI!&4N2*n5GE9Jl5V9-YQpD}{}POJV8uQd2R6ditw zUmC6Qm+JffmYw`xxR!rt-2e3b46+0iM7{svtgdf3X{Gp=_e+XalMck1@ZSx^f?{4j z898%#{;`PveLciKjsFiXfmkc?0S-hzvaqmdd-7D}9uB(&{SK^v|6|-X{hz->BAQ5u z6SatNsgXTtNX+}()!mnjlCn$dpMOEO-=O|6+OdtbaX`2R=`#!ORQ=%*|f^G z`kszq@n7{&iFyo26<)w$X-C4ys|(yeeHu2$Q~4GVx-eg>xA z@I~Ew>kvd~F+V90kuvtiLtx0mqer#uB`WQBFnT{*IlwZy7>!oE$y6vIBW0cMQa&OL ziQcXp$P@emQn-S_oE@7~I0WCmebT`57^U9?pf-UeHo;LO-~ulV*dO(3uMp+s^mK!V z#X#ZK$)E4MuSQ90RDQneu_nOisoG!y7F*Zb&7bZ!9R$0U0Dz%L^xZqe?=G3Ds@2qU zn(M|Uv1wZqD>Q5WDI6<)c^W2_bgPA5u1#Un3$DBJN0rtvA-e_Fhtsp%4*B9?qe^;i zZV>DhijM0sqQ}J!!m4o|hlF-*c+XO`3cK&!jxUBxtNKe%vMYBbmI=>ceLFwPTSXQb z))*F+i$EA=A*){osuKV?4FSAkq~BX`i8NPZh$sa2SmbP!hHR$XFeW*k*M5CdCMxqZ z4v7hfo+fwo&K9G#vju!5azFu$jm@D|uNVk-UkWPOVi{ zSEW8j71LA`CtCkb@Hw*NfBCFjNlpBnVhP|uRzY=NmGRY~d zN)CH}_Rf2~vC2d`z5Vm}{T|qd`#sSY&%3#W`Kc1EkU9o_0#5z!nFvy;#SJE9gC>tw zucm6R1Plg~!5LeKkgIi2aPYEgQ-ke*I0n=<7%)VnzjO8I&w6AaBN7V z4^Ir~r{lamYcp2s+M>*>XKBeS_92KvA_75 zMEUN^H`WIGMN$fiPNI;LXZMAf!oI&pM#@(myqUvK>x^h)WQ*0yftR#|AYJ-70Ud$t z=H(-mOcA3x^ZQyzdxO<-Yz16E2y24Ia{heUS2=A0@C}detj54E2a-}5(n*VLe*C~9 zpQ^aqJ&RLm^C<=5m6P6a?^a{u7oizFJ-wQSxa$nVdvqKT_XlIZw+NW!UQ$wum@jn$ z4y@`FmVO*6q4lvf-~E}gdxZ=0@^bt5!dCAaWuF)|HDATduQ#XTaM=FU;e44Y&E1ez zxs?x%d1p6dqZg^{pf848k}OwMwHNgXI4MPpFM2)%FaUU8z0I}nc6j~*h%+=a+1JNn zpWWQD-MJkyRa&kOo!x!q+46>!5-tU3X#6?;TOsHE&vf6%@yb=M6uVKva){ze;#SV1 zwIr8vNX}t8pd6OW5VP$n*SIP~kwcnem>H4G*+hhEWXv(9Ok$h!ag6)ZeLe1-~AY-V$s{fhS%lK&3I?h#k3%C29w!MVcN^HUkl-Q;cW{J&IGQ|5&jv z*Qv44K2RPbn|IqGbIi0hjQc?l?fWGDdjyAa_|*6Fgau#OUFS*dAFDDlG9q~`WN?G~ z3XzVsw%XuLXuL@kO^g(#AC^{+RWJzc`)8WG8p9hd^B#Im z)J+z1)2GrZ%YFQpBIiPct&Hj3e5;+?i>Y?1+vz$e;c&8#k57O`nsEq|3QZy7hx^zA zRM)SZAL}3@k(YNIzj!A~bd_C;#rCSj6*-YBZgX3kMb}E)8>*0_#tuK2)wyOpIl#T- zFZMIIjf3GLi`hR?Q{TS5fA*|B3MJ>6ycaa6lu~zES(%nyj-lt?`4bqcr+>v_f(iYE z@wd#$B3=I;pZ<)E%yS!yFC01Ojcip5Kq-OyS5R8oJ+M8}Z1GIpHqFP+N99F3$rPKt zk?oqOLIg^aR8>O>%+GG^+S$EJ8N#{m-hiKQV&kK*%QGWXi%jE{SH&K*n;+J76krv( zmIy=>NCfx|=am^JrlzMl6kjEcc>w!=;ey#5{DDhB=xMRf0Y!jND=+Ws?e)&Rmexn7 zkMl>u(&tX3^Y{TQXYacRHyHHE_`1FoLK+I~j2MlSe9sDj>i4`wvxbrNn{y+C`=7>l z`_{bNEy`JF1NRL(2EB0e`#7OM)mR*6QU?J;96&15=!ur}Qt~vUkhHLpR zp2-plB5Z6#bxlooh91pvBp$aZevS53Y_cj!7Pl{8)S*|5#R0Ye!bUFhYWQ%aHOHP8?OA~w~XYmKj+QP=;~t7oH>nzYC`<$)W#e8#Q{cC z@&3J06~VweeK|8qXwl&-itwP(>R?E%)_Kuxc)nFmj3IXRLMD4(ydq5Jn9gH~ zBHebn@`OJND3J-Yp_T!Pd3SD>?|vuxG?(H?LCj2^d#eTeyQ{WCv-FZ)ssNBc{x_k@6@3C4WE zxX%2LJ`6c-p?6oq<#wr${@V@{D_k0)sR@G_t_XK?z+jR!l9k}0AwV~Xyq*H)1DWDBW}_LmAu`RxtUTa5eC1A#bWh;Vv&_! zYHD~c#e-S9h4DvCL#G!Hhsq7o>2&6Y9Itk%TnmXiM{W>?!}UzB?6qLp&$ z5PARp{Ya&Ab-2xYgY!MFYgWCQ8-$`{G-&XhPQ@MbVYYe$bY~%y$ZLB>+H%N^NW{=6 zBD?-@f9@8PCLv&?>!H{sjChaQAzQMmo2!wL5hz>pILDI~b334zdUDbbR{LNx@*Bm$ z!1*$xu#E`lUbu=p#4&zv3;iW0gq_}|cQP0cTG<#9or z+a}~FSTw;L7zku79tl*+Y_~7SG{Ta63?n&%Kgy&%8}ZHkq1@HixjRSK7B*TU%#~pH zYuQ(mv^<_zGbEQ~%ptTe_Cr`#SRgwn|1?LP9`*)bZ#>Z!*YIHa^2n-3zRjzkt(KF1 zCT^9pvM~JfS{R=A3;ORD>i00QIs00TuLBhs`f0ug(*iZjQc{?0xuv5Z!q~P3RGOxg zoV*+;`j}9D9|XEGv%;0&k1!7|{|g-a(s%81gs)Nx0uz%l`?Dky@9V_H$$hIk)+&i_ z89?oc%i;Gp-I%=PuEr{Inr2q3?!CI|xFEwIAZsItZP;!b59~{d3l%6%)k|@p-R9B6 z^+Q%VyqgeBQ8Dp(a$3lmt<9F85N^K{)$)qU%Yea(m4adMj-=A-^*;5TDWzSnmYloB z+OmRjqZ?OwHi5r;)UF}xFy>ZP2jROLPF*8o`u@6U0ht5i6W-kDW?@{&+|XL-&=Sd% z&~A*|!}+zN<>5m_AKk^`>CI*fdwJwLwCn3(y$z#=mKLxXB`95wH&$&eq~Onxb-@;s zzpEZ&eo)wjL(Iujgd#iK*GJV(PA>xhFXgU+Q7wODnY4z3aM0la?ipv2xoLNR$ax{; zZdV0p)LY8R=GB{D0{bX1R8*^mQMVZFf&lvog$_<74z(WAsVHGov8oPOdCp4 z!HDRpV!4!H+CN5$9wzSmkNJE6!oEK(Rf72cGoe{;R^XwE4`l-pnJz!Sur#;1SaRXU GqyGTS_dFZ` literal 0 HcmV?d00001 diff --git a/data_ingestion/image/redshift-sg-1.jpg b/data_ingestion/image/redshift-sg-1.jpg new file mode 100644 index 0000000000000000000000000000000000000000..9c2227f8a0d57b0c7b76face7316dea6d7a739bd GIT binary patch literal 30800 zcmeFZ2Ut_vx+oe8C?X&TC`FVKdMKeIMY;)1LI6V*NJ2AogaC?5TIdo&AXKGyq<2ti zAQYvk6qR07ItqyJaP580Ui<9x_I>-jd+vANeQV^)oFilWeU$l+IsTF3cgLRqm!R4Z zZNRBh0Kln}58!y_0@l&n+g(vy+|^6W7U^b(60>*1i2K>Pi%W`0hy#>V{oHNsT~OXP z>`)F)uF3-IO>F`?SrRb;3^E-9Lll-q@aUK;VPb69qrm6i`h?8H+Jb#tN*Vk9fS4zyy(?MKP zK|w)W0xS*&gH9koURYOeTR)Ji7w=yQXrjFAJ)PXWo!ne+{32*;=jP+B%;V$Zgj7V^ zqGjxmNEwiwtrQX@DT%ZN*&?MRK}abnq@0YTJQyu$$MaYA`cC$qZeDI^?>mle-fsUc z$%$1k|5o$6W!!&N|8AQT0sbCI5#o))82i}4vF<1pJs*sdt(czE11DDp2*%a{r6m4O zpq0db$^Hjj{$ia!6#TzZ?xZdLsL?;R^NB7eFp3(UC|hro=E+wDECH4PNhpA%B;b+~ zic&I)GBQfy{{iMdk@|N%P8wC^1o+>&z)63&@xLJb$>d)$@}K7V(_H_O1^y-AKYQ1o z=K7Z`@Gk-X*}MKfX0E@vkto*_-_iHPNj#neXaG*1`jviNPyb41&-}VlojrTzEHxE1 z_3z(vH0P<$(VU~EK2Lj|=EARZlDkNI;o`3!zli(=^)%I)GgKGOQJ?!A;7mS~!GApt{w(WFAEbPHuX)$$gT)r-LArIOuH{yOud%O1U3F&amz9wfAH}Ab zn8wt`)Y0^>hdN~dP)ga>WF${DezGqkd%ns)LYt%{z>5uhM&42dotMtW9s_QZ9m;3@ zxj5*pEc2v1UElM69ZhX(sWU#PdA;#7=yE1~tei-(53mmAI&C-sX$qoE2Ap#^u`Pf~ zY21m5kEm&o7q{vD`srtX?g9Q%x?KKTE=_HRc*bp3dj*HstCOxdE}Whep;)O0MF2n6 z;pNwF!xCL<>x|p(1KkvBCnM&|s#2a}1*;ng#U4kt>(M*@)8dT7J;lR)s2Ds&v~vAB zQD%I3w^`0~r?GO5&{#a&KVz;k4C8M(A|h^RJR?p21vBF-9O z#5D&Yv2o2LaQM`v>h{+s0wDv!kVS191=`}9sUMNIP80I+J~hnS#hO&(44Hn_YP5OPI}d#b`r)xf%%p~K*xaZ zQLlfYARc_YM0~O~Zlcv%m*`u2IZ{kP!bkdg&E~}|If4t1cD@DF@7e~3inJ#k8ho8DJmvAFp@=9+nV&!d@kh#||)?q0Auzi!#n&9E{7!dhO zu>a|a&eqA3gMGA-Ow5VB`ESw}A1(2`HByVW^T_qHVdqLznhs&P?Az_GicD#4Yu_or^m~pwqf zgK2Z$CW$qVJ`NrO99rDOTY}*B(1$J`GA|Q4m^+MKGi?Vu@rN^qu@oZ&5Qj3ZGHh;# zos!LMEOqY>c};z_R(mgGK%KXqBy}@4Ugw|Ltu~kT@Hc6Ien2|3rJ>lD&hLgi-1YxH z5iq(SbeTic^@8FxF#qf}h6WlKE2&9XB=DvS#1pZH!TrH6J#R=H1B~wGtrf=xJX96J zqst+<(O$;dKt!!Uyy#)fDlf!s57@61Tle^#zGd_F>7z_ebf~rs+ub(+iU{>GqxrUJ&J1#^{|$R?SJe zc5dYOtXDv!(d8WMDo=lOC732=v{ReM*@7^A&f}R-qjikao{a7_d>vCm>}~hNQV0B- zf~Vy|{N`Q|qys5B+tT*t{k#W*_k7j`M0~r@A8Y)7zId&dq{?n-8fbQabv~Rt6dI&; zUU?VBKLp?^W>umN%IU3Ck4QVN zuqg*K1j66b3gwv;z3M;ZdoB=kx5U7_J=GNN`OwmC7nNrNCq~I8v3oQ8WSnAzmJTO>j$hY0m^!#J63BW*RE*QXI2MNBzcS)^aPYZm z)JLUeN3XSESDY=aMU4z0pf;4W?fYI#(I zEfi3a&Fe$~Bx8&-z2qaC6-J%s|I zn^iX#^k zZvr01#v{I6vUZtGmv)%Xw!(3wm|F`sYD_P{(`?Z#vTKeHx&T=#>vCL*7Qj}+a8u$BF!FT$q(K5Kn$ z*uRfSj}*A-U3f-ZN*WTRug4xim$?YiYEVF;NCzI{F3gO~1kutovm1GikysCp&snx` zT$m|=2{BuPOs7ud%KA`>*6F|L)0%Y9Q9RqMjYW-hqgtgnVp4wi4Oy3g5yj%1sP*p; z{H7TSkK&j2+!$izcS?IK%RDI!V3LhNVyy)#t+LP1u+qOJP+~0MLf>?m$Mt|uBD0o= zO{rud^&5l`C&T%4l)!vRoD`WH2V|Xz?*?~>alY&548zbc!qaB$l}N&}sH-?;4XF}} zrv-}INnKZJQ8g(Uhl}Emd(-+`TujHxHPJw8C4apCz*I@o&#w(upKV%>%7?nkfer|gd$ z@+gNj>yH6ZH_uSHC%xUZs(ShLRnc-mzU?9nA&;$|c(M2X^Y5vst{h5oFRiK0mRIzu zl0}r;%FS|h{n8pA5xx5vzNhCo+gU7XYAa9t+ozPYxU^{8ky?psC6VMmmskGoBGn92 zz+=S4GB-!3`lThfh{u~Uv5Dk$S|4M*Dv78NXTc9dC-aD`IeZq6;VWLyJb{sMUPDQJ!*Mn?<+=3><}2p| zS8eWRXhh5LK8g$TSHNi%2ZHV8I~#I_(P_hu5Qh!06!$}GYr0g7)y!!~NHNAlbNEH| z=qud5=*peKDEF+Vxhc~=it(koa;y&V`Mh9?htsvXuol;!mb*z`WtMZB>$dj$_HGHL zDZEZ3h@K^8*KMC)vYpMvC}miNsCFslA`P6N_+V+E6E6>C;VQ$Et+Y<>@p)@#nab`h zVVyJ9`iPSG!u-iGh1l72h^FkyVu3e)%k+s-rD)hIi|F&yLfU*nqK4Lbrdp;W6VAFU zAkS-kD(qBADL%qbnpSR9*UEUdQAwIiX5W)3?kf57^|h>fV~H(Io!c8b`Y5HSl)>e@ zk=obBlrJ5yB0R?wy7`{gtb%9^ka);aiyDPSMwO;Vg!tLeJxZb5q~Oz9x}9*h4OTQu z<2KxA(*u6rgl|+mFG4rQqkG;sQszdbaxX?m33r$!#S3~;x><}9)v~nEy|>X}tm}wK ze5hV$Vp!z30_DGuRGBwg1|(WF+Di2G=ckL>Z$xp!XidCRy3Nsc^YzKQo4q8SinaLB+-7G*l>diO(ML7iZgw)R~u{BB<6ff zkPQ~@MWVx5_fx{51Okvc+tj>LxzqeMF0!Anad(>BC;Rx>nwShvJ}k-wc1y1?!q{6> z>Y{YNVwEg0r=O!)}WnhLYk59}Smwjsd$eqXs2zrjo9vs-K&Ih^8@ME)cv5 zWpbdBOdgKWWKmOtv|pg~Xa;=?4+nwe^k=Nc$;IUf; zd$E!UvWB|k{*he&OQk&|H__Vf*Xx{{E0Q%!)=I0=_-&LDURKVxg4|1+^$R{?sQo#i z_;R4O1-V;1!QI<>rE9FV&Th{hDqIm+TP3Dwc7B15?tHlImYN5;HsftK+q39tM7={i zmJgd!^+@Etkex5X?zQNNHpg#I*lHhE&nxBQhF3X(ElL8FMsvWgp@^m%nMRv4wM`dL%RIRXF$XWHq$e_4O!;=R6=~qfB z^owJZrz^ovDW7atSr_KfhUy;vT(We5-I#Ul<}E26vp!Kpo#5KFfIJJ1e7}m|E2iBR zR}%?kRB%ga4%$#z*n4U1k(Aycz8Hg217|W0pT6juWbpm_F`#oiXro)+xbY5Advi{B zUFW5+P?75cFDmt1gh%%qZNd1=UQ#@35Wn3}^%URRS=ov@AHpO4oL&59XbS7;tm>^F zp7ZQTiA@pdgy?>{v-zVRoQ7c0RVqI5XjwbLr``9dpS?xKKIqIA6j#t*ip&c2)Ev)k zTxCvoiXL}?@J7226P{I#Yi29+*~OAS)v?wJ=6EYh#wCX@3!SwO#HU8p-7(NCqM{um z4}uKX$uTMhm7?TU8oo-N=Eg^L2SGhM1_~8QnOBk{3N~X@YwB!RVNEK97(}mABQ@5I z-JkUtF@eNAOj68AVO{lQh4RR-Ir8rDJ`5}T@PUD{9dY5eLYXh!Q1p(HGSv9CxlgL^ z8HTGJPVS>PZyj#`5xk*VL%s~78?R)IzAVu~7nFo2#0wnF3>A8$jQDNsPTT-+PX67B znf4KCvylI&y#E;B1`q?-bA4a>3II?~Fm#~oN^&K`sn;-aPEPd~Vo{cAg!YH$fVTDI z3`wz+y&gQ?e3y+e1C3p1MvJ`BvCtE{TkoGjp1SF4%b6m~cDbk!eqU-(3kVW|IOWl* zsjOv7@qiI{AJUaRvoJ-8sP*A0_istl=gE!)uhP%$O6Cu4Ko!i0&SOEGicz zHKGDJ^eT8op4&U=78RL4udB*igdwsk;XNItownc~4Vsv@OSlb|Wn9U_A&ezE#F-H;HXF#^ z&z7`xQHNrrByUu-x^Z$h2{BR3e92UdXvfL|_a1k0vq*V08fnhtnpz=70zQHE4?6>a zLZI#wCvlzgx1}uP9V5oe@_uhQLof!~>hT_i1;dE?=;*^lFXHx8;GAWJ+LIC;JEQm0 zSJwSIm2pWUdKzwkQoIqMoZn_B|e@yt(^(YEI&Z+pJs9 zke>B^=Vdg-jUzF6!yzeg3#O2d`#qd%O6YWng}$#}0D#5i|LHGG{&RvGVnbaf`af-= ze?IAH6J$b85er zvK~_AU?(I=M;`!|r_MKgIFh@u$z%q-B!$r{`;QU#cH9exvN@ zC2bhJrDP`ekiG_mBVPU!usd5zi(C#hN7o!q^~Slop|W@1*4ktt+n$8i%T>NP_&h28 z25j?6Ah%k?mrD&CIwv zXYzOIZ|1620jL9h-95p8-~iC|vX;u9mw|`ON)7wr|Hp|HmDht9Lhf_7cn5C%m~HRc$6zqY_}rDIBj3}(Xp;Xs=d`6> zGXQPS{USr@nsd3jE%#;na<+Bv_`{pVJK#$y(0YKy4-^cWTc9j#_1U)8L?FdHcsTb% zEqlJgmdVTioaKu#g&}+yfc5Bad@z4wUN{(-=nkm+_BTF%V=iOhHDCOgs_k!lIR3`G z$SV_Q2>|{4chz3}LEk@>{Iezh?5=-~$^Vv#r}@_;$sh)~qn}p+4Uc`kh#v)38L4va z2;;N+oAI+Y9*FfJETgOP{=ClW`QNvz{>#tf|2D+aO%9(L&3??$WjbX`QL1eJkM3$m4*=|80}XKN3EjfssihqdcT~r(PxBmY4Tw zyTWp5hP&2WT_0*r2*Bfcq@B5ulKH31HBPW!|DTH;mC^MxE7Zw|ade?C)TufpBOAXjak4xWJhCpoZdh!AsebN^eNse#y$>|H&<+TJ~@p0s;98fyzGdhO#NIW zUoH(#e}Dfz4(BLsR>{=SyqZ_f+BakiJ1`q7Z7s~o8M$B@_iXUq+!FnEbi-AH4*6O3 zu<`ftLLIu{B;@;b3hzC`P~m}%^miPgMw5~3S-E*T0Uk^T__RWVx=pBup1dO%#*EBT z7Uu28=M)O_8cL?dS`BnlESf4@TzAsn|IBNE|1<)0s=YY$xhZcaL!9Q}l@_{R2mvCI z{(9SF%F-3UC$m%MXjbld0d`qb`Cp~aH5#2Son7>b2t9NACL@TT?un*_>sf1ujm-<0 zdn(}(1tU!=jWfO>Z5Q5^y!v7)=Qz=@qQxN;8u1`+f;u8OLU?ARa4{1K&l<&($**&n zdLhFO$qlwdA@JlqfS0mLZymz|!&=7Lh4mr~Q(oF{a$ZQ|uZV_j$FYnSHiw^FQqlX; z&Rg0Pd&5xQi}n2NW8%@m8T1eD8u^4+ih!>c33FZheqB#P7+dFw!xEOIpKWi|FlHdU z7UUNub?e_hV;_rIW42&>OtElt9eA5n)SJ^fZ@`^fdDq{ciKXZHG&WX2*e?Tz5rPPK zr0DBpN?w=W-t20h3(H{mQQ?LGZRdWL$EC(x__5`;>a3l-_->XEvg}c<&{ODQfrG+E zT-iwLqh5d0w-b;=PYpp@0tC zXE{we4ET&9+mJ&%P0m#^18q3}5d~T{=oIv!=le8{fmbC2U6i)Nv+V(qMI$$IL%5?r zf*K`yU-P-cWq2S32GK=Z7=Kc7R;k2;MyRny}cyUT@miMV)%qSCD zQwvu3mn|o7XqaB>jR;Kk7OlprhfZqY9ws~a7%)4R^l9q8XEd%#iuLO$UySlQJ(t?b zW#bw!syyK`i}7|I?**2TAwGr9QgfF2?;^ZY_7`RnLdyDPKQj5cU64JBExIA_?FnxJ z+-L-M7pc}IIPH-okCHtY&l;#Y3H+NvO4q$>((HIOyq6Xe^$-LeoA+_8pRLZN;ZsDT z9r`uZdWKVD%K=M?m@hcc*)zK!l!Jt=f0;~WHb1>O{H=LX-krjm{1cTP-KI53S0}Io zP}wcf#0|aZZ7%-3dlK1UQ^I%l%J1ioMK>-f&<`{}h2dxYZ51ERuj)cD&X5};8*ptb zz8Mb`?B$=+?;Zn;r=52XH}UH)5P+w9{1!FfU_ZYh7>%K%LKRNYN?;5eN9aF?B+Abk z^1{54nn-L6k!e5>aQYvghX3V~i#9RGfJH0cpvr^bLxoM!`N}B`y1AhJB&&~KTAv*Q z=mXJeZs?#<03Ze+b3OF?rSIkRver*C5gh52Mk6U(?1<}f@K$ypNJE~Imo44N4I#(y zWh?j0Z^xsR@AJK~8ncuyiY%Ei%UkD+wGiw4^o4v$&EX#+phU5cX!91#& z?r$0Di0>D^J5}FOcoUbwqNu$%V_y4P2On@WvEdvzEHu$_Uk}f70k&Nb=@N}e{$>e$ zHX5Ua?`6kr9mO-wFNxi!nJuOp7LGlfx~tH+I?W6#jPUI3Roc{{&J|i-DBLot*M^#+ zC&RM&kd8|I>lbTQ(>6b5d)^{PJMN~>Sf`r3TPzl3ZXX{rl5X;N%CzbPJ5V0&y45M~ zq*2~a*K_{;y-~E2QU0fOa`&sb#%v=5LM}>V3|FD>lNh zID|%RtD+=beePz|C?X-yIYuXm5HK{+v2;Err4#cpeg2hgt1-#ysXBSM~?O$Z^~Q7td{E%AoFt8cTF~vrLzc z(0TACT!D8$dC;#t<)vm}6E;x~hjniQoxvFDY|$eI0%A>nyT_VKlqE0UjY{oK_uxGj zDlj~-HT8*n{rib@h!9P*a&n~iZkc^C&q^;a2`Dn_Y_9|H;>T>rVmb#&cn zH74#`NBowJOY*{@gV6l@&A>;|HOGMbfj04{4q;Z+o-w#nPs-iS17lvuQK!@37?M)CcWwX5~FM2SM?#WO1_`VifIM-$eO z-p^RrJ-cVAy*k~cLLNxMKJAbcPyn)spH7W6{}6k3U2pkJN+5T$FzjYU$o}Vd)?rQ3 z6R8EZ2*MN&BCS_zA;G7dP+GxBNx}8w4#R1xz@n8y?#rc&jEpQ=P@dTgi`WLyaEn;^ zY#1>c!;eM>E;?*Qxl`C0?FoIW%I@qv_vP=pr$1Pe&KouwgvY{-ox}L=kX(j6c|$A| ztFQ~X9T+;n;Vc8FOok00%I~|73S$5rVy3d9ODRv^m_POTqT9bocEaW9mU;r?rufD}KA7-9RoElN)2zrZ0%E}d zc#vq9>`00wNQ{l0+1X%wsUKaUIrnYEcJ&j_Na7hQGZQE(S28aD>omA9O|S9d5DyLq za@2KH-HqNN_v50Y*OQX6JTppG&)5jBF*QpWKTyZFdit(jz_T_+!>@_l$U{a(r0VhD z{Vl`w6jD4rE`WrAf>BDi7{tk5#LJI&Id2(>nYuq1tjZybIzP9;zxCc)sQ6kc1g(-K zI1lhcMG4x-fFpaRIYO$7$0GnBlFgyMo8#5FZLw;b_YYN%IA8V$(|rowPcj$((wccP zVckQ#&)fw+2Am%rDLImGItIMxIjZ+qW>BfxdzF9Gas6PX=h?-<-8BF}nqJ2p8T8|Z z-N&Er*1vx0vDC`evpwoClH+m13Q~Z#`-BgE<0tK!h~&-KE*kTZ;ujd1!g}9EN#`8a zaP1`ZYusJvaY9dZeNzk?a^Sc=F{-miAY&i(Odhq`C;ZTk{aHfxQJKCg!s4*x)tEAj zfI^8R_?vJRqkf?YS5~V1;>BFi=?Du@u3^b&O)+dhCuJ17h`v$xF5txjA`kY8bIHT& zdQEUmVV+#i%sktBiLzMjY&jl%;z&=yfXjgHs6(lZLX6k?QEyLG%u$a>i0;{U&`XCYu24zIx)#Tf43Z$Z8VW zoQ=No!k50}k(W*%Q*S~%eosrfZm;?zPCiD+oYt`YBo14-qcXvOq4}J_@*Zt~>?@Oe z+|#}+D_ml)Wu50pNCz$|wgnMm+2`%3ta8kB?U$9?c1(Hxqpvu6qf* z5-7}OcP)nq!O%U65QMPKLKqpTRt+|0XORvxu4k`(lf`ww%K8)geoW0U556LVvt}_~t-UkD zwL=1S3Wz@)0!?+i=-&vqr_HtaqpBh}z%H^RtHH={MDrE_qUn%%y*#o%eJF@+qiSDU ztA2cNdinsYIBjLlY6vsl=HF`wAqI%K3bO(g7By?rAh`%OMk1=7BC*Zanwg$vsWH>n z!^%{l8d?H=efMD-6ML4s@% z4S7UmRlShqpBo2rgu8#q+cef|Sa@*lW;mOx8eu zbSl&mH7aCLy_w?-2BHmYBat>jf5Uj~%jecXC9# zeViA%`8`4|l|DJB6Pdx9*-7;@#|3v)AZA%mu{J#{xsHB@s9g=*5qz2rdQa3wa^`=N zZ%}peU_^S{{E~Y^%=G#iys{*J#O2zPZ?guQ4fJ&r!Dd$p;yl0w5*9L`DjC~6RUTFN z{RdIxn=tEwP-wK|Yah$9=I0wrYQ6b*1;ZLo$Y@dglkljdm-{xa?qX{rFj5O5Cw`_O zA)qd~)z0?$ob6O_bht);bFfCKJRWA6AR5E3Q%|PMvr&{+92`VjA~-Pl!F`1upnO;v z+%y-}d<=NfaZxw41bR2{dH~NCBST3w?_6oovs+Z1&_xzb9f9zNP&h{~kB3%HjQ+kB ziHJoaGG^Yqyewghu$A-5z;8{LYsP%p=D_c+=eIuqkt+~-3wMx_aSNMk3p#V2#uFi4e=~%lk&pJMqJbPDXB`ce=@oDqx@R z@+`2>(Cu|rjG*IrRsfVfl$Q1$0I?GLxWBzNP`VB31s3cOXUTVr>(L#F*L||rYXt5s zH=r;w7)R1L(}6W3DaYJ|nB^%%mrx9J65Tdhig(D+YWvrH&qq8#Nb_GhsB;*_v_%;UaBu z51dxu>3g`(%Gi;sB(v+Rlf|$P>KRgw@e4_KSH7`lgozO%s-?~_c>VO-_F5})v%E@e zjm)hr+^oRmKCuft{JcEm`nRm_3#Xz|5g8iO$`pgD&~* z6TzhOmB6MhApb0`j`0_jTU>#@bKZK&tR_?VGX zWxjE&em^HY4P4W-)i!JlbN2$E_fR`*2+$60=(4DiOjf1xCh8t;*TMVUxVD zAs#6*Zit6%GBJe4$BB;cQkRV&{2Z78jS=r8>T4 z4fmWHExxA?^bqM{DK7|X!ccOzZfNZlAgA=UNI?_!WDA_4N=_|QE(mxkED(Kv504w^ zN+mJwv)8yTay&g_jkekTITB1@wf1`SrS{w!@ruLU8%^8nPiBXfX!5gW9P?QKwcMfE^+y4i&){lcP&lsvYD_|mNyTA3Gb zg06Lmz?3phf=+uC3L2%chaps%ed_yf1BS7W(kHIiIqJ4Win>`IS{3z~c+-hr!}cj8G8E=2{&@5dh=1NP5;w&|JsA)d509MI-m7BUpl zLvQk>cFHh|&r^{{C}MOa;^ya&q{jOynv?5H-qI%#Tf^DB{gfrvpLHERpGrKGqe(YM z;j6mmtjXrT<>rqFS-jrxSQBHLM^pMB+@pAcX0}>}W>)ov;X_j|0^d zKNb7Sfe&`&Ioa>@+7-ECEh^tmI7{lzvHJ*=?usB>F|?u)&`~FlMzOaBd&5JZ~4gx>B0;A!`j+9{m7sjyD#;&7kD}F0ALBpWSft$zzdT%SrCh%^O~Y zkBN9*w>Vdj2^;Ay=uMD*j`uKdHoz|f#S{-jLZug#qID)xliVcV!fkqY zxp5(HzO$z$wVaWtJR8_b&z)zX{%LLkA_5WM1@nT*Ll4SD458bAbQ}BN^j$4thf)HS zw+#OgYlQ-NvWQDyUcT0k*lc@{^05Im}BM$G490P3Qe!9=S=6dL`8Ep77 zOm(*-?iO46=P3Zd@gHW%|E7Xd_dle14$M>~qH=JgBC5tBQXE0yYS?aJNM}cBf_nl9 z?esPSu}-$QKmP8mEur#fWKD$uH(z3B-e{zofVg8FBI^_PDJK-*^j&jF}^#spLhezezF z{qAzK$FJHn^LOmXFFC%+to7V{7<0XNy8Oqd{B z>i-`Cu(w#)TYZC5ScS*i>#I|Yn2Jlmfhkgph`O(r1JRZTKG`Z5$PlX!Ze%>hT?WIpocG29{*F&lS+n^*Uqn@AWcNd zBio(-5%BNzfv}T6?hV~Mb96&gM+*zM6+!IJ2VkwJT!T$raHCG?DBWJXRedZl{A+-%;-ARtcJ zCb`=n&5B}dXlPf6{W*~T4DHUetJ9;KEubfTsKlcyg|l!*2+=Wmm5}i^@|O4IuAHaU zK6go$%fVQ8xdHPf$4=n#UGFv#LIq5{Ojt5k0A}fwGu<&I7CE4Gl-dUYwJ8L@) ztX*YB;PF}ggp(6N6px7Ze>5-6rJw~y8%Oc?&kUJv90U4zT3_2#-Ja#U8Qhc872o$c7GS!IAkIAmD#)XBGu zJ7;FGHyD3z(7QDfSFRy*7I|>3{t62zpRYAH0gcU}@R%uF?-$^#$xD)_Rm^QO&C0AA zq5Vu}*@uRYo~>E#8~|2nTiXFaIYSz*_-Y}j^a>BqO^R%>XV%q~(^dK+xc=J#_5x){ zJ#};IO5U9{9o@^_Aesfb+8Ph@YBnMkt-jef;rak|iM5pBp=jb@T9y%IOA^g3gl;)74qk!THxWEA;^gVeI0S(cYlRFp;( zG^y>utfY*I7ocX1YORVj-92XdW|QuSaNIF~TVG8zELcA7TcGDi`4L;RM!RHG(GpE^ zVB+)-$SLlWPCa2q!LvWyN6I_IuFaPWIL%t!??1$ociE7E-D2_!i3wXtLL3)QTXOW_ zLY%pg=4lvVQ=7DqSBd|ji6JigkoBs%p9>M~x;A}=57XP$MGbzzu=^^^aCcUJ`(f## zVCxP}nND>9%VRrVHAT>!o=x4VE+Ju~`^TPRfz(yg`xK=&p{85+_ISN)A`KY*_dkN2e@|O|AV+=S8=!b#}`R z20Bkjm+WZ;Tchc z5FKN=IBY2+|B8qe5r5i^DERCz8~1=JvbK~6ol4Q>r@tFP2)NH_soqKpVj8uy58nbR zkPcsYP&kTGlg5=LSaVi?yjykEU^Ic75eZ3sGxkYTEZ~@e=PBW6pejwMd@dd~bnU ziO z%~*Oz_M%>uTw~<-w61$!aKnWR`3*QLy%KIghi)F#&VEbct6pf6j9)Zc?18xCl?Kjk z4L?nk>VqQk?VJVCb7`*W3jaP*&&M7EP+NxE@NN z3~xcMlcaHE;z))ICc(5Or~2ew9vCFBeKqf9o%1nZW#|~df3h0=S#=CRsIKIxE~j?~ zwFkD5T_K0cNA#FfHRQCZ*nkLz&u8tpE0yA+F*Pai@xSH~s{Yesc$=F7 zu<myc8d?<}p=yCYP*2VTvA~r>^Z2YRfdx49#3ria7Dj;(a{beXI zZ9l>VDs}iZc~TTJnFgOuHD4&|kRA?BY9GDi0bVTuY2em4%#bQMksxNUS!ngZ54`=svNeP{AHusslk-AelT?@@* zGWN zks7%JiGzhKqSu%=i;~ zbHBEC)D&!+5+@;iRZ>z)QP>RoaZb-4l{Ig#L!JSs;s4%+{*w8FACSSPb_^hBJ$%-N zeth{Q0N_~%NVe{L0K@>tbG{ylr#%)-!;|zA2u6~Vav}wD%!N>X{<_gh9}i*af&uOEC1zAh<^Juk zBg4|%`XXnniJS4{K3|^?4{Q)37e}bvfj5tB#DtqBSRm#R%itltq=vKHr3@hSLnY5^=Y6Sb>d(K{G}Tu3IX3yI_0EI6Q)0jkN)-i;o-%>F zvy%`3seQwf|H=~FT2@kh7iGaH5mEcVJRv_VRjzA*6L0+Dby2kTF#tG%p&3tKs^11J zZO33g?*2@mc!gCyX^)QlG^6$9T$6!t{R7{#q_y78+WPz(`gsjzwVXyNCs7_8nH&qK z?kkd9W~^7(Uwt_Su$UdGhFlPfnAHB?5csYP0sbSDKY&Nr^l8v8cbQ zZmLA3pX{>fXZf%<^dvJThqUXQNbSW~QS#&#c>tZh_&h9sQ^qBYcKB|@!JKk-XbX+7 z)$&(F-d*j*g#P`f1|B&!0xBBCW*qe2E^KJ2cTYzFA)`aLYP!^vRq}1K#yhY}FL zoeYBBfI3Dq%qJa;?;I`4Tx*K&F-Eqt>8r8{p~yTKHb*syoD>=%a;0dcXq0Fr`*wbQ z{@f1%Vo{c`vu9<_$r(Ow)h8(w-e;2!=xsrf8edd`#Y4g`p%bqx@q)RYJ3sn0j3|Ut-JLU=}??j$;~S3UZMHvpyORCokRg| z_SfPhlls(j`DDRKJ+Cr$hG^WIhT7_c9Eg-MAAUePYxE&wJ|?#@N0_a~3UV@>$}lQE zVnh7+6*|O_6U>{m^hVsPt}mR&a>n4`TCU*3=dw@U*7rA;e80L=V1d15 zzJFRs#6bhH^4{20sCxk~WPitf#XL?%Fx5#2!j`eER3$+vUMc*dr|fI_^AjfTMX!P7 zWFjcNeb`;@f|VElQ$o%~i!U|OuGYkz!ukgU}u0pkiUPcLGd_H)>p6^&w6$yAC` zn!F#S?R2NQITlkeX<-1=)K$adv25#*_#`sY{{E-?*P;`rbi{UQ`-flbTJ*uv3rYDi z30yToitub4pGO!vlW)irZ4u|bQ*7bnDh9oQsINVOsOnY9Na9Bab49zD`P3iL4r~^h zvCcR&Ku#iz8Kl-poGJs-O)NXzS0zXawe^NI@@BQ0aVAwl(f9Mft2&BbA`KS9B9<)W z_QpW2OqcKwHYCz!>7vxV22oiy$qErdjjrTrinuTrQ)rZ2PAOZ5iVk@cnKm({X$^4| zZNP==x|)!xvMCOTRc4bM7I~Jgol#J3H5cZG@~i|`|=MbJ!d!H>q0c$ zv9@{tS9{O>*3_~sHjv(tj+6iaLP-cp6-X!vJp?d>A|-)Df<)m6h>a3D3Isy6P(u+F z2u<*)KW(5~qk?O@|?tI%Y{a(3`5-dv#d=a+1>e4pm#yi$Wh-h z7pdGpTzeX&(v?=e6+|i8%CR2ti&XKNnR<|d`^{WhL*7(J3^))IXX(q@t3NZR%E0}a zmwE*j{OJVIPm&pEI$u%OBztGU=dSOylv1sb7jW`BHRnbvW4@~z_Px}V#ch?C6@z13 zC-o~n4N>sF@%tU)a@W8Ki%Pkf${oz%RlsfJW%jkX# zdaI3gFX?jmGWm4Hfu_8qKc}zUA2OIWdX{kbo@$^?rAdR*PMmhxqfGMZ8)U~dj>z>p z|3HTRg|fh>c0(p2ftP_RY;I|*1ALjw@B`a0TES$GE`}J$%WUP@qb5T6;NeIH13RSdDM4_`K=vGIdyX+Czb-@(doeO>OUj z#n}7US&rj#%JyN(V7;+Rw1C=<6}ZgB=<$gNOWYdjj;nL;Oxn^)EhBi*neF_JSsuv@ z8Z_&id9TUEWkh?1S@WhxXUMfgCs0sjW~4~p z+Y^q_oqj8sj6MEMKKRV8k&;M9HE2-T&16>|C-(lckebkQb&w_*VTmYHc!ETzBQgVF zN}jf|$NIvIb@S#6v=iK(6NkH7W|zn}*U|9RXDTfIrLrBzl8c~zt*m>7xR%hbr)mLd+kt@?t3t4rrUo(y!_Rl)Fv#&%NVKe)ladWVGI6dZ^Av94rM zpxg;wp54(>WfVV2kxgjec+3*sYLhFUooOzcQuOp$d2NnYfkFQWA+>=LZDw}68u0La z#`Qu|UW`qf;NSOUzv>`-WXDMPN9<3IHviuh{r|N5>-8Lz zg9ubyZ9m79#>=t)Bt*VRlZOA_w!`L%6S`|b-vhHf5Vu+ zu;agJ$NvKfPsShYK?hc*WA3o6U;W4Dg0}gt6cZ6|A1mxl=&iEtUp^dc``01MduRzL zSw=9bu8WJ?0b|iQcG0pVySMD#78Um*tr-Wz9gh9Ihp1RPsts`dh6@XcrPCo$lPmnXiqG~PXmA!6{D zH;pz&q1UTGrssaC`9(wH$fFR<^LptuB;AmjmtGRlfCW`Oi_lhtld5t~()czI`70LH zkURW59qa(CTiyc?6EWhVq;qZzVoCcb@G>+)Ph35b4j{&+LM)32-A!fc+l-Ci1jWwY zobD)9+%1BSklp35&~~fVrd-t9$@)Bv%+$$&>mO zHwo@%H=NX@3)8j_-9An3>2M<6r)ZQ#Oo>L-U7xtvrcySZf-FugIswR0@;F!rsPyqv z^c!U-gLLiA%mP*JShAV-7%)<2I1H&m;p_|8g63p5{=TVY3l4#B~m3M=aRVX}{w5k2C z$$}b6+GY|`%)*FfG(50{+LF^)+}Dejl?uD*@ELenTdt;P=6xI&muib3Vs)qV+pe~R z#6fX7ryNRE(6BXeXv#PXjWE>=sUO@lW)IlJao``#C2an7q%>OH==HL*Fgl^5ed5to zNK!;O#C2k0TQZLxSRPmo0QKm8=d@GOm#-M?rQ$L2)}9#3E<~1w6{nT-O~$CxfZ#_M zlWZ7rdP*FTo%x~LLKW~V=10{}j*~-xQpD=s_b1_pIjJZHr*;cWySpoy5!qDW$?JQL zxSfJT7nl{%tcEp!@k%cJiwem%a7(aynp^kXib7=<01~y8dGz58nN&^afkb=b@zYb) zQbitH5vxb+$S!sW(}t95-|7M+;$<{ry_DQ@bjKZnajGRwefjBx6Ef9@oJr!{T}=v+ z7AkpqZ8KTvkLsnz=LU(vQjPn?H1i?C)_jTI3xhc{gCg~d+(yh7=~+NINj(_}>Td2l zF=qZxng4;$NTdkSouwV7V18lb5Y2Is)jDHTa=BAZH=#N}qqVfZ5tSEh>c}`CsGmfl z2jEbf>7yZF1aysUGon*%+M9lrluRj@AqsFhg)WgswB>N}C~TBt%L?<=+^GYWasnw+;I_vAduAimJcDB2E$oLRbX z;ssQ{3l3gLmc@F;G^O~>@%JAhT%YG77`+cDuG5{{TnOsUM| zl=rq=$bHh;GAL>&?aavENMH|Fh(tG0HaJFkFQ5X>N#{OW@o#&vh>9||8J{Y)Q7Zv; zH5F335HIXYfHUDGgnmB{aiJxPhI`hDFGgvurCJT9bx$dZE~&DrFXs>pEqjp~q@w_F zPeeQ)*1i}RYT)%ftHWwI=8Zb$F4dglpTn8`0Ym}<`~VL^ZF=B<3XH|Q9*HOsl-COM zyQoLO7dQ0upkL@>zxqzX(!$E`AeuJ{fN=*{xgdH;zDO#Du6A}2TNcywZINph=PDt^-=@qbEJWl ztf*q0XC!h-Hm^zim9vLjT*7M}e14!FPTeze9402}HS=}b z+V#fpA2N{|Nd;?k+JN)mM%{Vt-4QI}zUv z!bZv(-bH}tHduzrx$vQY8H8q4k{sv!C>%YiRp9B<8?W+ZIg{q zO~QH%a}MOv=QSP$h>J_lGv-6MhjbPN2tWi%-Lp@K4OGCb;RpyoY|+uB{tguN#J!m{ zj8vWcarH{Yb6`?Z)_EJTgxZErtbs-apTq=dWQ6StN`7GqXl2FIzF;l+2Dj(~O2Z)o6mZ?kKTMD7?<5Vt9@qcK|PT}uIxuU9F!M_~P{qDfUGmuF#EK6bu zCEjd0*=aHfub{?4wpFol$$by*K*4<@=Wr#I)CBrxfXDM?@v_nCS$X$z7`=6<;d(5E zojx3>iMrwsVy)KbfF`N{Nwfh?o3c<YKExYxJ zbw0$IPhjPEozhsj&Qo@H?XW!%?_J?PQlTcgYbFBbv2k-0%G=5Q{26&A^!otzqb6#Q z%9<>zJQX`@e>1iZh_p>&&yI1>m3{U|-;}7xOWbBlz0-T?PH`04rKc!l7y8@o>GGPd zd8MdH2GKL!W;2g$XuxO|TH_|ssX6BW!s~iM&t)Qf9q7p~b)X{U7A*^6UK%+|K($Aw z264kSJCFSyKEssL&Tt&P^LK(=AF9LTKk-c}@jXQ}zA*K3oWl=_-|j;EJpN+M6SgDF z1@2p1DS~3|to`YTh?0~nZ`=1Hr}CXA+VWc_EtraVxo=_@&&+xHvF3eH;aWos0CT#Q z8c=yhtABCkLiIK@d_BZWj+_a3&>Wpkb=k-tG7bxVdC{>;&$Pnla+9ap!ky8+4$;Fj zKZnC%AMU$q-z5OlAUe%hfel$pKjbD_(xXzx%Hs5p5^&~SxA*K7KVVXoX;q4P#k^uR zG?9YIY|3dOj@nK(=9a#pS#kKlVD}yNtH|jWbUz4(d@7CxIDhGQcP-y-EC<(^PVMZI zvFAgHIm#lPD@BlDS?9nsDMugIO5UIb*v8~lcmKDuzT+-MwCsh?Sxqy&LVTn2Kypij>kkA7{u5f2gi z+PIccV2P+azv}?TG@?703jvdZGOOYF5eOBdV3$ly#I0;*1sv z&jrh_=g0h&$|{KXV0zDz-ubtYBVZ|sWQ0n)t#4MTMLqy~Ls`a} zL*Hn~!)=6<|5fX^l;s7=(q!&#bIaJ*1VL}Cw5sYI++*MJ8M^wrNR>Jzsp&U>RbPo< zvZhu}zGR|o_*q$H=RE%#mjr1@iOlFNv#)O2g|;hMQbkQZb0zpOD|nGnVe&OxXd#>? z)|9s0CA@qe@iC8gV9UW;wKMnVgVy+&7gSc~Yz590(FtmR-m-B}T5(lx!fx{$N=d&T z<7G$j2FCehK}NKg7!G0cKdkjnw+9^bl21x5dKZ_NxUSHWZQH-&?S zHO#pT^rD~*dCwR^)2Ddb*~TtdxPLY9!sIK2=|jEnMg258JyOfk)4sw_AL?UC+?c?D zI+?b{vRouYiuG>(zEt*}C)~eainW`PYm$OVmZf-jcm};y0|cEf$&g~MJg7pJ$RR-8 zT*{XWKchT-+-4$ek&dt*$m$wwf zzSl9qA0?dt>LKY`x8RO@%AZ0&HH{IpqvCCkPIEq)+aE$v+jB#vM(F^U-bY`XyPi`lQ zq18mYHOS97;3~Q0+=x)}d(4}{KE#M=lawv2SJcoU3DFr`trNEiv@SmXn=gNUIEF}%Fd@QNu0ty>#+6X7OV=9q1 zly?Te+E3%GN73iGNv^6%nL{t82YP!qwh^Z?tv03PHzYGil-+uHfX~F>JPR|m10fSh zQCH6Q4m@B!#50W#T)y;b{n*ffRhyq2$G0BM=@c9qzp)R)CRaZ+_x0Jjl(1uwRUsS$99RJQp zZXrI*T zo>HyPArOOd(qCu5-BI($X%U@$yzZED)K>(j`GIZ^R!>}r6Ln6 zbvo~H^JbMQWTbd}N$%+bp4tN{|Ku1ngr%REml5U1e<;)W9A0CyKlb+6k7da(ioqJH zLP>X5xrtZDmx3W5z=Mu<-PiBE;)dRPTidC}R}xQpcCxqB=4SjR`;?%lP2sD*8JL?- eIKZ0h>t~&*>qt~lQoKKwk!}7TF{JxHU;GzH&CYcI literal 0 HcmV?d00001 diff --git a/introduction_to_amazon_algorithms/linear_learner_abalone/Linear_Learner_Regression_csv_format.ipynb b/introduction_to_amazon_algorithms/linear_learner_abalone/Linear_Learner_Regression_csv_format.ipynb index d113908120..19129ca971 100644 --- a/introduction_to_amazon_algorithms/linear_learner_abalone/Linear_Learner_Regression_csv_format.ipynb +++ b/introduction_to_amazon_algorithms/linear_learner_abalone/Linear_Learner_Regression_csv_format.ipynb @@ -13,16 +13,15 @@ "## Contents\n", "1. [Introduction](#Introduction)\n", "2. [Setup](#Setup)\n", - " 1. [Exploring the dataset](#Exploring-the-dataset)\n", + " 1. [Fetching and exploring the dataset](#Fetching-and-exploring-the-dataset)\n", + " 2. [libvsm to csv convertion](#libvsm-to-csv-convertion)\n", + " 3. [Dividing the data](#Dividing-the-data)\n", + " 4. [Data Ingestion](#Data-ingestion)\n", "3. [Training the Linear Learner model](#Training-the-Linear-Learner-model)\n", "4. [Set up hosting for the model](#Set-up-hosting-for-the-model)\n", "5. [Inference](#Inference)\n", "6. [Delete the Endpoint](#Delete-the-Endpoint)\n", - "7. [Appendix](#Appendix)\n", - " 1. [Downloading the dataset](#Downloading-the-dataset)\n", - " 2. [libvsm to csv convertion](#libvsm-to-csv-convertion)\n", - " 3. [Dividing the data](#Dividing-the-data)\n", - " 4. [Data Ingestion](#Data-ingestion)\n", + "\n", "---\n", "## Introduction\n", "\n", @@ -43,10 +42,10 @@ "## Setup\n", "\n", "\n", - "This notebook was tested in Amazon SageMaker Studio on a ml.t3.medium instance with Python 3 (Data Science) kernel.\n", + "This notebook was created and tested on an ml.m4.4xlarge notebook instance.\n", "\n", "Let's start by specifying:\n", - "1. The S3 buckets and prefixes that you want to use for training data and model data. This should be within the same region as the Notebook Instance, training, and hosting.\n", + "1. The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.\n", "1. The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto regexp with a the appropriate full IAM role arn string(s)." ] }, @@ -65,25 +64,21 @@ "role = sagemaker.get_execution_role()\n", "region = boto3.Session().region_name\n", "\n", - "# S3 bucket for training data.\n", - "# Feel free to specify a different bucket and prefix.\n", - "data_bucket = f\"jumpstart-cache-prod-{region}\"\n", - "data_prefix = \"1p-notebooks-datasets/abalone/text-csv\"\n", - "\n", - "\n", "# S3 bucket for saving code and model artifacts.\n", "# Feel free to specify a different bucket and prefix\n", - "output_bucket = sagemaker.Session().default_bucket()\n", - "output_prefix = \"sagemaker/DEMO-linear-learner-abalone-regression\"" + "bucket = sagemaker.Session().default_bucket()\n", + "prefix = 'sagemaker/DEMO-linear-learner-abalone-regression'\n", + "bucket_path = 'https://s3-{}.amazonaws.com/{}'.format(region, bucket)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Exploring the dataset\n", + "## Fetching and exploring the dataset\n", + "\n", + "First we are downloading the dataset in the original libvsm format, more info about this format can be found [here](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html).\n", "\n", - "We pre-processed the Abalone dataset [1] and stored in a S3 bucket. It was downloaded from the [National Taiwan University's CS department's tools for regression on the abalone dataset](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/abalone). Scripts used in downloading and pre-processing can be found in the [Appendix](#Appendix). These include downloading data, converting data from libsvm format to csv format, dividing it into train, validation and test and uploading it to S3 bucket. \n", "\n", "The dataset contains a total of 9 fields. Throughout this notebook, they will be named as follows 'age','sex','Length','Diameter','Height','Whole.weight','Shucked.weight','Viscera.weight' and 'Shell.weight' respictively.\n", "\n", @@ -101,8 +96,7 @@ "Shucked.weight : float 0.2245 0.0995 0.2565 0.2155 0.0895 ...\n", "Viscera.weight : float 0.101 0.0485 0.1415 0.114 0.0395 ...\n", "Shell.weight : float 0.15 0.07 0.21 0.155 0.055 0.12 ...\n", - "```\n", - ">[1] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science." + "```" ] }, { @@ -111,14 +105,15 @@ "metadata": {}, "outputs": [], "source": [ - "import boto3\n", - "FILE_TRAIN = \"abalone_dataset1_train.csv\"\n", - "s3 = boto3.client(\"s3\")\n", - "s3.download_file(data_bucket, f\"{data_prefix}/train/{FILE_TRAIN}\", FILE_TRAIN)\n", + "%%time\n", + "import urllib.request\n", "\n", - "import pandas as pd # Read in csv and store in a pandas dataframe\n", + "# Load the dataset\n", + "SOURCE_DATA = 'abalone'\n", + "urllib.request.urlretrieve(\"https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/abalone\", SOURCE_DATA)\n", "\n", - "df = pd.read_csv(FILE_TRAIN, sep=\",\", encoding=\"latin1\", names=[\"age\",\"sex\",\"Length\",\"Diameter\",\"Height\",\"Whole.weight\",\"Shucked.weight\",\"Viscera.weight\",\"Shell.weight\"])\n", + "import pandas as pd# Read in csv and store in a pandas dataframe\n", + "df = pd.read_csv(SOURCE_DATA, sep=' ', encoding='latin1', names=['age','sex','Length','Diameter','Height','Whole.weight','Shucked.weight','Viscera.weight','Shell.weight'])\n", "print(df.head(1))" ] }, @@ -126,9 +121,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "## libvsm to csv convertion\n", "\n", - "---\n", - "Let us prepare the handshake between our data channels and the algorithm. To do this, we need to create the `sagemaker.session.s3_input` objects from our [data channels](https://sagemaker.readthedocs.io/en/v1.2.4/session.html#). These objects are then put in a simple dictionary, which the algorithm consumes. Notice that here we use a `content_type` as `text/csv` for the pre-processed file in the data_bucket. We use two channels here one for training and the second one for validation. The testing samples from above will be used on the prediction step." + "Then we convert this dataset into csv format which is one of the accepted formats by the Linear Learner Algorithm, more information [here](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html#ll-input_output). \n", + "\n", + "The value of the age field is parsed as integer and the value for the features is extracted from the format of \"feature_number\":\"feature_value\" to return only the value of the corresponding feature then the final frame is written to the output file." ] }, { @@ -137,57 +134,41 @@ "metadata": {}, "outputs": [], "source": [ - "# creating the inputs for the fit() function with the training and validation location\n", - "s3_train_data = f\"s3://{data_bucket}/{data_prefix}/train\"\n", - "print(f\"training files will be taken from: {s3_train_data}\")\n", - "s3_validation_data = f\"s3://{data_bucket}/{data_prefix}/validation\"\n", - "print(f\"validtion files will be taken from: {s3_validation_data}\")\n", - "output_location = f\"s3://{output_bucket}/{output_prefix}/output\"\n", - "print(f\"training artifacts output location: {output_location}\")\n", - "\n", - "# generating the session.s3_input() format for fit() accepted by the sdk\n", - "train_data = sagemaker.inputs.TrainingInput(\n", - " s3_train_data,\n", - " distribution=\"FullyReplicated\",\n", - " content_type=\"text/csv\",\n", - " s3_data_type=\"S3Prefix\",\n", - " record_wrapping=None,\n", - " compression=None,\n", - ")\n", - "validation_data = sagemaker.inputs.TrainingInput(\n", - " s3_validation_data,\n", - " distribution=\"FullyReplicated\",\n", - " content_type=\"text/csv\",\n", - " s3_data_type=\"S3Prefix\",\n", - " record_wrapping=None,\n", - " compression=None,\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Training the Linear Learner model\n", + "%%time\n", + "\n", + "# import numpy and pandas libraries for working with data\n", + "import numpy as np\n", + "import pandas as pd# Read in csv and store in a pandas dataframe\n", + "df = pd.read_csv(SOURCE_DATA, sep=' ', encoding='latin1', names=['age','sex','Length','Diameter','Height','Whole.weight','Shucked.weight','Viscera.weight','Shell.weight'])\n", + "\n", + "#converting the age to int value\n", + "df['age']= df['age'].astype(int)\n", + "\n", + "#drop any null values\n", + "df.dropna(inplace = True)\n", + "\n", + "#Extracting the features values from the libvsm format\n", + "features=['sex','Length','Diameter','Height','Whole.weight','Shucked.weight','Viscera.weight','Shell.weight']\n", + "for feature in features:\n", + " if feature=='sex':\n", + " df[feature]= (df[feature].str.split(\":\", n = 1, expand=True)[1]).astype(int)\n", + " else:\n", + " df[feature]= (df[feature].str.split(\":\", n = 1, expand=True)[1]).astype(float)\n", "\n", - "First, we retrieve the image for the Linear Learner Algorithm according to the region.\n", "\n", - "Then we create an [estimator from the SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) using the Linear Learner container image and we setup the training parameters and hyperparameters configuration.\n" + "# #writing the final data in the correct format\n", + "df.to_csv('new_data_set_float32.csv',sep=',',index=False,header=None)\n", + "\n", + "print(df.head(1))" ] }, { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], + "cell_type": "markdown", + "metadata": {}, "source": [ - "# getting the linear learner image according to the region\n", - "from sagemaker.image_uris import retrieve\n", + "## Dividing the data\n", "\n", - "container = retrieve(\"linear-learner\", boto3.Session().region_name, version=\"1\")\n", - "print(container)" + "Following methods split the data into train/test/validation datasets and upload files to S3.\n" ] }, { @@ -197,49 +178,64 @@ "outputs": [], "source": [ "%%time\n", + "\n", + "import io\n", "import boto3\n", - "import sagemaker\n", - "from time import gmtime, strftime\n", + "import random\n", "\n", - "sess = sagemaker.Session()\n", + "def data_split(FILE_DATA, FILE_TRAIN, FILE_VALIDATION, FILE_TEST, PERCENT_TRAIN, PERCENT_VALIDATION, PERCENT_TEST):\n", + " data = [l for l in open(FILE_DATA, 'r')]\n", + " train_file = open(FILE_TRAIN, 'w')\n", + " valid_file = open(FILE_VALIDATION, 'w')\n", + " tests_file = open(FILE_TEST, 'w')\n", "\n", - "job_name = \"DEMO-linear-learner-abalone-regression-\" + strftime(\"%H-%M-%S\", gmtime())\n", - "print(\"Training job\", job_name)\n", + " num_of_data = len(data)\n", + " num_train = int((PERCENT_TRAIN/100.0)*num_of_data)\n", + " num_valid = int((PERCENT_VALIDATION/100.0)*num_of_data)\n", + " num_tests = int((PERCENT_TEST/100.0)*num_of_data)\n", + "\n", + " data_fractions = [num_train, num_valid, num_tests]\n", + " split_data = [[],[],[]]\n", "\n", - "linear = sagemaker.estimator.Estimator(\n", - " container,\n", - " role,\n", - " input_mode=\"File\",\n", - " instance_count=1,\n", - " instance_type=\"ml.m4.xlarge\",\n", - " output_path=output_location,\n", - " sagemaker_session=sess,\n", - ")\n", - "\n", - "linear.set_hyperparameters(\n", - " feature_dim=8,\n", - " epochs=16,\n", - " wd=0.01,\n", - " loss=\"absolute_loss\",\n", - " predictor_type=\"regressor\",\n", - " normalize_data=True,\n", - " optimizer=\"adam\",\n", - " mini_batch_size=100,\n", - " lr_scheduler_step=100,\n", - " lr_scheduler_factor=0.99,\n", - " lr_scheduler_minimum_lr=0.0001,\n", - " learning_rate=0.1,\n", - ")" + " rand_data_ind = 0\n", + "\n", + " for split_ind, fraction in enumerate(data_fractions):\n", + " for i in range(fraction):\n", + " rand_data_ind = random.randint(0, len(data)-1)\n", + " split_data[split_ind].append(data[rand_data_ind])\n", + " data.pop(rand_data_ind)\n", + "\n", + " for l in split_data[0]:\n", + " train_file.write(l)\n", + "\n", + " for l in split_data[1]:\n", + " valid_file.write(l)\n", + "\n", + " for l in split_data[2]:\n", + " tests_file.write(l)\n", + "\n", + " train_file.close()\n", + " valid_file.close()\n", + " tests_file.close()\n", + "\n", + "def write_to_s3(fobj, bucket, key):\n", + " return boto3.Session(region_name=region).resource('s3').Bucket(bucket).Object(key).upload_fileobj(fobj)\n", + "\n", + "def upload_to_s3(bucket, channel, filename):\n", + " fobj=open(filename, 'rb')\n", + " key = prefix+'/' + channel + '/' + filename\n", + " url = 's3://{}/{}'.format(bucket, key)\n", + " print('Writing to {}'.format(url))\n", + " write_to_s3(fobj, bucket, key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "---\n", - "After configuring the Estimator object and setting the hyperparameters for this object. The only remaining thing to do is to train the algorithm. The following cell will train the algorithm. Training the algorithm involves a few steps. Firstly, the instances that we requested while creating the Estimator classes are provisioned and are setup with the appropriate libraries. Then, the data from our channels are downloaded into the instance. Once this is done, the training job begins. The provisioning and data downloading will take time, depending on the size of the data. Therefore it might be a few minutes before we start getting data logs for our training jobs. The data logs will also print out Mean Average Precision (mAP) on the validation data, among other losses, for every run of the dataset once or one epoch. This metric is a proxy for the quality of the algorithm.\n", + "### Data ingestion\n", "\n", - "Once the job has finished a \"Job complete\" message will be printed. The trained model can be found in the S3 bucket that was setup as output_path in the estimator. For this example,the training time takes between 4 and 6 minutes.\n" + "Next, we read the dataset from the existing repository into memory, for preprocessing prior to training. This processing could be done *in situ* by Glue, Amazon Athena, Apache Spark in Amazon EMR, Amazon Redshift, etc., assuming the dataset is present in the appropriate location. Then, the next step would be to transfer the data to S3 for use in training. For small datasets, such as this one, reading into memory isn't onerous, though it would be for larger datasets." ] }, { @@ -249,61 +245,76 @@ "outputs": [], "source": [ "%%time\n", - "linear.fit(inputs={\"train\": train_data, \"validation\": validation_data}, job_name=job_name)" + "# Load the dataset\n", + "FILE_DATA = 'new_data_set_float32.csv'\n", + "\n", + "#split the downloaded data into train/test/validation files\n", + "FILE_TRAIN = 'abalone_dataset1_train.csv'\n", + "FILE_VALIDATION = 'abalone_dataset1_validation.csv'\n", + "FILE_TEST = 'abalone_dataset1_test.csv'\n", + "PERCENT_TRAIN = 70\n", + "PERCENT_VALIDATION = 15\n", + "PERCENT_TEST = 15\n", + "data_split(FILE_DATA, FILE_TRAIN, FILE_VALIDATION, FILE_TEST, PERCENT_TRAIN, PERCENT_VALIDATION, PERCENT_TEST)\n", + "\n", + "#upload the files to the S3 bucket\n", + "upload_to_s3(bucket, 'train', FILE_TRAIN)\n", + "upload_to_s3(bucket, 'validation', FILE_VALIDATION)\n", + "upload_to_s3(bucket, 'test', FILE_TEST)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Set up hosting for the model\n", - "\n", - "Once the training is done, we can deploy the trained model as an Amazon SageMaker real-time hosted endpoint. This will allow us to make predictions (or inference) from the model. Note that we don't have to host on the same insantance (or type of instance) that we used to train. Training is a prolonged and compute heavy job that require a different of compute and memory requirements that hosting typically do not. We can choose any type of instance we want to host the model. In our case we chose the ml.m4.xlarge instance to train, but we choose to host the model on the less expensive cpu instance, ml.c4.xlarge. The endpoint deployment can be accomplished as follows:\n" + "---\n", + "let us prepare the handshake between our data channels and the algorithm. To do this, we need to create the sagemaker.session.s3_input objects from our data channels [link](https://sagemaker.readthedocs.io/en/v1.2.4/session.html#). These objects are then put in a simple dictionary, which the algorithm consumes. Notice that here we use a content_type as text/csv for the file that we have uploaded in previous step. We use two channels here one for training and the second one for validation. The testing samples from above will be used on the prediction step." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "scrolled": true - }, + "metadata": {}, "outputs": [], "source": [ - "%%time\n", - "# creating the endpoint out of the trained model\n", - "linear_predictor = linear.deploy(initial_instance_count=1, instance_type=\"ml.c4.xlarge\")\n", - "print(f\"\\ncreated endpoint: {linear_predictor.endpoint_name}\")" + "#creating the inputs for the fit() function with the training and validation location\n", + "s3_train_data='s3://{}/{}/train'.format(bucket, prefix)\n", + "print('training files will be taken from: {}'.format(s3_train_data))\n", + "s3_validation_data='s3://{}/{}/validation'.format(bucket, prefix)\n", + "print('validtion files will be taken from: {}'.format(s3_validation_data))\n", + "output_location = 's3://{}/{}/output'.format(bucket, prefix)\n", + "print('training artifacts output location: {}'.format(output_location))\n", + "\n", + "#genrating the session.s3_input() format for fit() accepted by the sdk\n", + "train_data = sagemaker.inputs.TrainingInput(s3_train_data, distribution='FullyReplicated', \n", + " content_type='text/csv', s3_data_type='S3Prefix',record_wrapping=None,compression=None)\n", + "validation_data = sagemaker.inputs.TrainingInput(s3_validation_data, distribution='FullyReplicated', \n", + " content_type='text/csv', s3_data_type='S3Prefix',record_wrapping=None,compression=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Inference\n", + "## Training the Linear Learner model\n", + "\n", + "First, we retrieve the image for the Linear Learner Algorithm according to the region.\n", "\n", - "Now that the trained model is deployed at an endpoint that is up-and-running, we can use this endpoint for inference. To do this, we are going to configure the [predictor object](https://sagemaker.readthedocs.io/en/v1.2.4/predictors.html) to parse contents of type text/csv and deserialize the reply received from the endpoint to json format.\n" + "Then we create estimator from sagemaker sdk [link](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) using the Linear Learner container image and we setup the training parameters and hyperparameters configuration.\n" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "scrolled": true + }, "outputs": [], "source": [ - "# configure the predictor to accept to serialize csv input and parse the reposne as json\n", - "from sagemaker.serializers import CSVSerializer\n", - "from sagemaker.deserializers import JSONDeserializer\n", - "\n", - "linear_predictor.serializer = CSVSerializer()\n", - "linear_predictor.deserializer = JSONDeserializer()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---\n", - "We then use the test file containing the records of the data that we kept to test the model prediction. By running below cell multiple times we are selecting random sample from the testing samples to perform inference with." + "#getting the linear learner image according to the region\n", + "from sagemaker.image_uris import retrieve\n", + "container = retrieve('linear-learner', boto3.Session().region_name, version='1')\n", + "print(container)" ] }, { @@ -313,42 +324,47 @@ "outputs": [], "source": [ "%%time\n", - "import json\n", - "from itertools import islice\n", - "import math\n", - "import struct\n", "import boto3\n", - "import random\n", - "\n", - "# downloading the test file from data_bucket\n", - "FILE_TEST = \"abalone_dataset1_test.csv\"\n", - "s3 = boto3.client(\"s3\")\n", - "s3.download_file(data_bucket, f\"{data_prefix}/test/{FILE_TEST}\", FILE_TEST)\n", - "\n", - "# getting testing sample from our test file\n", - "test_data = [l for l in open(FILE_TEST, \"r\")]\n", - "sample = random.choice(test_data).split(\",\")\n", - "actual_age = sample[0]\n", - "payload = sample[1:] # removing actual age from the sample\n", - "payload = \",\".join(map(str, payload))\n", - "\n", - "# Invoke the predicor and analyise the result\n", - "result = linear_predictor.predict(payload)\n", + "import sagemaker\n", + "from time import gmtime, strftime\n", "\n", - "# extracting the prediction value\n", - "result = round(float(result[\"predictions\"][0][\"score\"]), 2)\n", + "sess = sagemaker.Session()\n", "\n", + "job_name = 'DEMO-linear-learner-abalone-regression-' + strftime(\"%H-%M-%S\", gmtime())\n", + "print(\"Training job\", job_name)\n", "\n", - "accuracy = str(round(100 - ((abs(float(result) - float(actual_age)) / float(actual_age)) * 100), 2))\n", - "print(f\"Actual age: {actual_age}\\nPrediction: {result}\\nAccuracy: {accuracy}\")" + "linear = sagemaker.estimator.Estimator(container,\n", + " role,\n", + " input_mode='File',\n", + " train_instance_count=1, \n", + " train_instance_type='ml.m4.xlarge',\n", + " output_path=output_location,\n", + " sagemaker_session=sess\n", + " )\n", + "\n", + "linear.set_hyperparameters(feature_dim=8,\n", + " epochs=16,\n", + " wd=0.01,\n", + " loss='absolute_loss',\n", + " predictor_type='regressor',\n", + " normalize_data=True,\n", + " optimizer='adam',\n", + " mini_batch_size=100,\n", + " lr_scheduler_step=100,\n", + " lr_scheduler_factor=0.99,\n", + " lr_scheduler_minimum_lr=0.0001,\n", + " learning_rate=0.1\n", + " )\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Delete the Endpoint\n", - "Having an endpoint running will incur some costs. Therefore as a clean-up job, we should delete the endpoint." + "---\n", + "After configuring the Estimator object and setting the hyperparameters for this object. The only remaining thing to do is to train the algorithm. The following cell will train the algorithm. Training the algorithm involves a few steps. Firstly, the instances that we requested while creating the Estimator classes are provisioned and are setup with the appropriate libraries. Then, the data from our channels are downloaded into the instance. Once this is done, the training job begins. The provisioning and data downloading will take time, depending on the size of the data. Therefore it might be a few minutes before we start getting data logs for our training jobs. The data logs will also print out Mean Average Precision (mAP) on the validation data, among other losses, for every run of the dataset once or one epoch. This metric is a proxy for the quality of the algorithm.\n", + "\n", + "Once the job has finished a \"Job complete\" message will be printed. The trained model can be found in the S3 bucket that was setup as output_path in the estimator. For this example,the training time takes between 4 and 6 minutes.\n" ] }, { @@ -357,50 +373,42 @@ "metadata": {}, "outputs": [], "source": [ - "sagemaker.Session().delete_endpoint(linear_predictor.endpoint_name)\n", - "print(f\"deleted {linear_predictor.endpoint_name} successfully!\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Appendix" + "%%time\n", + "linear.fit(inputs={'train': train_data, \n", + " 'validation': validation_data},job_name=job_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Downloading the dataset\n", + "## Set up hosting for the model\n", "\n", - "We are downloading the dataset in the original libvsm format, more info about this format can be found [here](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html)." + "Once the training is done, we can deploy the trained model as an Amazon SageMaker real-time hosted endpoint. This will allow us to make predictions (or inference) from the model. Note that we don't have to host on the same insantance (or type of instance) that we used to train. Training is a prolonged and compute heavy job that require a different of compute and memory requirements that hosting typically do not. We can choose any type of instance we want to host the model. In our case we chose the ml.m4.xlarge instance to train, but we choose to host the model on the less expensive cpu instance, ml.c4.xlarge. The endpoint deployment can be accomplished as follows:\n" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "scrolled": true + }, "outputs": [], "source": [ "%%time\n", - "import urllib.request\n", - "\n", - "# Load the dataset\n", - "SOURCE_DATA = \"abalone\"\n", - "urllib.request.urlretrieve(\n", - " \"https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/abalone\", SOURCE_DATA\n", - ")" + "#creating the endpoint out of the trained model\n", + "linear_predictor = linear.deploy(initial_instance_count=1,\n", + " instance_type='ml.c4.xlarge')\n", + "print(\"\\ncreated endpoint: \" + linear_predictor.endpoint_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## libvsm to csv convertion\n", - "Then we convert this dataset into csv format which is one of the accepted formats by the Linear Learner Algorithm, more information [here](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html#ll-input_output).\n", + "## Inference\n", "\n", - "The value of the age field is parsed as integer and the value for the features is extracted from the format of \"feature_number\":\"feature_value\" to return only the value of the corresponding feature then the final frame is written to the output file." + "Now that the trained model is deployed at an endpoint that is up-and-running, we can use this endpoint for inference. To do this, we are going to configure the predictor object [link](https://sagemaker.readthedocs.io/en/v1.2.4/predictors.html) to parse contents of type text/csv and deserialize the reply recieved from the endpoint to json format.\n" ] }, { @@ -409,42 +417,19 @@ "metadata": {}, "outputs": [], "source": [ - "%%time\n", - "\n", - "# import numpy and pandas libraries for working with data\n", - "import numpy as np\n", - "import pandas as pd # Read in csv and store in a pandas dataframe\n", - "\n", - "df = pd.read_csv(SOURCE_DATA, sep=\" \", encoding=\"latin1\", names=[\"age\",\"sex\",\"Length\",\"Diameter\",\"Height\",\"Whole.weight\",\"Shucked.weight\",\"Viscera.weight\",\"Shell.weight\"])\n", - "\n", - "# converting the age to int value\n", - "df[\"age\"] = df[\"age\"].astype(int)\n", - "\n", - "# drop any null values\n", - "df.dropna(inplace=True)\n", - "\n", - "# Extracting the features values from the libvsm format\n", - "features=[\"sex\",\"Length\",\"Diameter\",\"Height\",\"Whole.weight\",\"Shucked.weight\",\"Viscera.weight\",\"Shell.weight\"]\n", - "for feature in features:\n", - " if feature == \"sex\":\n", - " df[feature] = (df[feature].str.split(\":\", n=1, expand=True)[1]).astype(int)\n", - " else:\n", - " df[feature] = (df[feature].str.split(\":\", n=1, expand=True)[1]).astype(float)\n", - "\n", + "#configure the predictor to accept to serialize csv input and parse the reposne as json\n", + "from sagemaker.predictor import csv_serializer, json_deserializer\n", "\n", - "# #writing the final data in the correct format\n", - "df.to_csv(\"new_data_set_float32.csv\", sep=\",\", index=False, header=None)\n", - "\n", - "print(df.head(1))" + "linear_predictor.serializer = csv_serializer\n", + "linear_predictor.deserializer = json_deserializer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Dividing the data\n", - "\n", - "Following methods split the data into train/test/validation datasets and upload files to S3.\n" + "---\n", + "We then use the test file containing the records of the data that we kept to test the model prediction. By running below cell multiple times we are selecting random sample from the testing samples to perform the Inference with." ] }, { @@ -453,70 +438,37 @@ "metadata": {}, "outputs": [], "source": [ - "import io\n", - "import boto3\n", - "import random\n", - "\n", - "\n", - "def data_split(\n", - " FILE_DATA, FILE_TRAIN, FILE_VALIDATION, FILE_TEST, PERCENT_TRAIN, PERCENT_VALIDATION, PERCENT_TEST\n", - "):\n", - " data = [l for l in open(FILE_DATA, \"r\")]\n", - " train_file = open(FILE_TRAIN, \"w\")\n", - " valid_file = open(FILE_VALIDATION, \"w\")\n", - " tests_file = open(FILE_TEST, \"w\")\n", - "\n", - " num_of_data = len(data)\n", - " num_train = int((PERCENT_TRAIN / 100.0) * num_of_data)\n", - " num_valid = int((PERCENT_VALIDATION / 100.0) * num_of_data)\n", - " num_tests = int((PERCENT_TEST / 100.0) * num_of_data)\n", - "\n", - " data_fractions = [num_train, num_valid, num_tests]\n", - " split_data = [[], [], []]\n", - "\n", - " rand_data_ind = 0\n", - "\n", - " for split_ind, fraction in enumerate(data_fractions):\n", - " for i in range(fraction):\n", - " rand_data_ind = random.randint(0, len(data) - 1)\n", - " split_data[split_ind].append(data[rand_data_ind])\n", - " data.pop(rand_data_ind)\n", - "\n", - " for l in split_data[0]:\n", - " train_file.write(l)\n", - "\n", - " for l in split_data[1]:\n", - " valid_file.write(l)\n", + "%%time\n", + "import json\n", + "from itertools import islice\n", + "import math\n", + "import struct\n", "\n", - " for l in split_data[2]:\n", - " tests_file.write(l)\n", + "#getting testing sample from our test file\n", + "test_data = [l for l in open(FILE_TEST, 'r')]\n", + "sample=random.choice(test_data).split(',')\n", + "actual_age=sample[0] \n", + "payload=sample[1:] #removing actual age from the sample\n", + "payload=','.join(map(str, payload))\n", "\n", - " train_file.close()\n", - " valid_file.close()\n", - " tests_file.close()\n", "\n", + "#Invoke the predicor and analyise the result\n", + "result = linear_predictor.predict(payload)\n", "\n", - "def write_to_s3(fobj, bucket, key):\n", - " return (\n", - " boto3.Session(region_name=region).resource(\"s3\").Bucket(bucket).Object(key).upload_fileobj(fobj)\n", - " )\n", + "#extracting the prediction value\n", + "result = round(float(result['predictions'][0]['score']),2)\n", "\n", "\n", - "def upload_to_s3(bucket, prefix, channel, filename):\n", - " fobj = open(filename, \"rb\")\n", - " key = f\"{prefix}/{channel}/{filename}\"\n", - " url = f\"s3://{bucket}/{key}\"\n", - " print(f\"Writing to {url}\")\n", - " write_to_s3(fobj, bucket, key)" + "accuracy=str(round(100-((abs(float(result)-float(actual_age))/float(actual_age))*100),2))\n", + "print ('Actual age: ',actual_age,'\\nPrediction: ', result, '\\nAccuracy: ', accuracy)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Data ingestion\n", - "\n", - "Next, we read the dataset from the existing repository into memory, for preprocessing prior to training. This processing could be done *in situ* by Glue, Amazon Athena, Apache Spark in Amazon EMR, Amazon Redshift, etc., assuming the dataset is present in the appropriate location. Then, the next step would be to transfer the data to S3 for use in training. For small datasets, such as this one, reading into memory isn't onerous, though it would be for larger datasets." + "## Delete the Endpoint\n", + "Having an endpoint running will incur some costs. Therefore as a clean-up job, we should delete the endpoint." ] }, { @@ -525,39 +477,16 @@ "metadata": {}, "outputs": [], "source": [ - "%%time\n", - "# Load the dataset\n", - "FILE_DATA = \"new_data_set_float32.csv\"\n", - "\n", - "# split the downloaded data into train/test/validation files\n", - "FILE_TRAIN = \"abalone_dataset1_train.csv\"\n", - "FILE_VALIDATION = \"abalone_dataset1_validation.csv\"\n", - "FILE_TEST = \"abalone_dataset1_test.csv\"\n", - "PERCENT_TRAIN = 70\n", - "PERCENT_VALIDATION = 15\n", - "PERCENT_TEST = 15\n", - "data_split(\n", - " FILE_DATA, FILE_TRAIN, FILE_VALIDATION, FILE_TEST, PERCENT_TRAIN, PERCENT_VALIDATION, PERCENT_TEST\n", - ")\n", - "\n", - "# S3 bucket to store training data.\n", - "# Feel free to specify a different bucket and prefix.\n", - "bucket = sagemaker.Session().default_bucket()\n", - "prefix = \"sagemaker/DEMO-linear-learner-abalone-regression\"\n", - "\n", - "# upload the files to the S3 bucket\n", - "upload_to_s3(bucket, prefix, \"train\", FILE_TRAIN)\n", - "upload_to_s3(bucket, prefix, \"validation\", FILE_VALIDATION)\n", - "upload_to_s3(bucket, prefix, \"test\", FILE_TEST)" + "sagemaker.Session().delete_endpoint(linear_predictor.endpoint_name)\n", + "print(f\"deleted {linear_predictor.endpoint_name} successfully!\")" ] } ], "metadata": { - "instance_type": "ml.t3.medium", "kernelspec": { - "display_name": "Python 3 (Data Science)", + "display_name": "conda_python3", "language": "python", - "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/datascience-1.0" + "name": "conda_python3" }, "language_info": { "codemirror_mode": { @@ -569,7 +498,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.6" + "version": "3.6.10" } }, "nbformat": 4, diff --git a/introduction_to_amazon_algorithms/pca_mnist/pca_mnist.ipynb b/introduction_to_amazon_algorithms/pca_mnist/pca_mnist.ipynb index bdd347fadd..328dc3e8ab 100644 --- a/introduction_to_amazon_algorithms/pca_mnist/pca_mnist.ipynb +++ b/introduction_to_amazon_algorithms/pca_mnist/pca_mnist.ipynb @@ -320,7 +320,6 @@ "source": [ "from sagemaker.predictor import csv_serializer, json_deserializer\n", "\n", - "pca_predictor.ContentType = \"text/csv\"\n", "pca_predictor.serializer = csv_serializer\n", "pca_predictor.deserializer = json_deserializer" ] @@ -428,5 +427,5 @@ "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." }, "nbformat": 4, - "nbformat_minor": 4 -} + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/prep_data/text_data/04_preprocessing_text_data_v3.ipynb b/prep_data/text_data/04_preprocessing_text_data_v3.ipynb index 102f193b6f..ded13ed86d 100644 --- a/prep_data/text_data/04_preprocessing_text_data_v3.ipynb +++ b/prep_data/text_data/04_preprocessing_text_data_v3.ipynb @@ -4,9 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Preprocessing Text Data\n", + "# Pre-processing Text Data\n", "\n", - "The purpose of this notebook is to demonstrate how to preprocessing text data for next-step feature engineering and training a machine learning model via Amazon SageMaker. In this notebook we will focus on preprocessing our text data, and we will use the text data we ingested in a [sequel notebook](https://sagemaker-examples.readthedocs.io/en/latest/data_ingestion/012_Ingest_text_data_v2.html) to showcase text data preprocessing methodologies. We are going to discuss many possible methods to clean and enrich your text, but you do not need to run through every single step below. Usually, a rule of thumb is: if you are dealing with very noisy text, like social media text data, or nurse notes, then medium to heavy preprocessing effort might be needed, and if it's domain-specific corpus, text enrichment is helpful as well; if you are dealing with long and well-written documents such as news articles and papers, very light preprocessing is needed; you can add some enrichment to the data to better capture the sentence to sentence relationship and overall meaning. \n" + "The purpose of this notebook is to demonstrate how to pre-processing text data for next-step feature engineering and training a machine learning model via Amazon SageMaker. In this notebook we will focus on pre-processing our text data, and we will use the text data we ingested in a [sequel notebook](https://github.com/aws/amazon-sagemaker-examples/blob/master/data_ingestion/012_Ingest_text_data_v2.ipynb) to showcase text data pre-processing methodologies. We are going to discuss many possible methods to clean and enrich your text, but you do not need to run through every single step below. Usually, a rule of thumb is: if you are dealing with very noisy text, like social media text data, or nurse notes, then medium to heavy pre-processing effort might be needed, and if it's domain-specific corpus, text enrichment is helpful as well; if you are dealing with long and well-written documents such as news articles and papers, very light pre-processing is needed; you can add some enrichment to the data to better capture the sentence to sentence relationship and overall meaning. \n" ] }, { @@ -20,8 +20,8 @@ "### Use Cases\n", "Text data contains rich information and it's everywhere. Applicable use cases include Voice of Customer (VOC), fraud detection, warranty analysis, chatbot and customer service routing, audience analysis, and much more. \n", "\n", - "### What's the difference between preprocessing and feature engineering for text data?\n", - "In the preprocessing stage, you want to clean and transfer the text data from human language to standard, machine-analyzable format for further processing. For feature engineering, you extract predictive factors (features) from the text. For example, for a matching equivalent question pairs task, the features you can extract include words overlap, cosine similarity, inter-word relationships, parse tree structure similarity, TF-IDF (frequency-inverse document frequency) scores, etc.; for some language model like topic modeling, words embeddings themselves can also be features.\n", + "### What's the difference between pre-processing and feature engineering for text data?\n", + "In the pre-processing stage, you want to clean and transfer the text data from human language to standard, machine-analyzable format for further processing. For feature engineering, you extract predictive factors (features) from the text. For example, for a matching equivalent question pairs task, the features you can extract include words overlap, cosine similarity, inter-word relationships, parse tree structure similarity, TF-IDF (frequency-inverse document frequency) scores, etc.; for some language model like topic modeling, words embeddings themselves can also be features.\n", "\n", "### When is my text data ready for feature engineering?\n", "When the data is ready to be vectorized and fit your specific use case." @@ -34,7 +34,7 @@ "## Set Up Notebook\n", "There are several python packages designed specifically for natural language processing (NLP) tasks. In this notebook, you will use the following packages:\n", "\n", - "* [`nltk`(natrual language toolkit)](https://www.nltk.org/), a leading platform includes multiple text processing libraries, which covers almost all aspects of preprocessing we will discuss in this section: tokenization, stemming, lemmatization, parsing, chunking, POS tagging, stop words, etc.\n", + "* [`nltk`(natrual language toolkit)](https://www.nltk.org/), a leading platform includes multiple text processing libraries, which covers almost all aspects of pre-processing we will discuss in this section: tokenization, stemming, lemmatization, parsing, chunking, POS tagging, stop words, etc.\n", "\n", "* [`SpaCy`](https://spacy.io/), offers most functionality provided by `nltk`, and provides pre-trained word vectors and models. It is scalable and designed for production usage.\n", "\n", @@ -229,7 +229,7 @@ "metadata": {}, "source": [ "## Examine Your Text Data \n", - "Here you will explore common methods and steps for text preprocessing. Text preprocessing is highly specific to each individual corpus and different tasks, so it is important to examine your text data first and decide what steps are necessary. \n", + "Here you will explore common methods and steps for text pre-processing. Text pre-processing is highly specific to each individual corpus and different tasks, so it is important to examine your text data first and decide what steps are necessary. \n", "\n", "First, look at your text data. Seems like there are whitespaces to trim, URLs, smiley faces, numbers, abbreviations, spelling, names, etc. Tweets are less than 140 characters so there is less need for document segmentation and sentence dependencies." ] @@ -264,7 +264,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Preprocessing\n", + "## Pre-Processing\n", "\n", "### Step 1: Noise Removal\n", "\n", @@ -276,7 +276,7 @@ "* Remove URLs -- reviews, web content, emails\n", "* Convert accented characters to ASCII characters -- e.g. tweets, contents that may contain foreign language\n", "\n", - "Note that preprocessing is an iterative process, so it is common to revisit any of these steps after you have cleaned and normalized your data.\n", + "Note that pre-processing is an iterative process, so it is common to revisit any of these steps after you have cleaned and normalized your data.\n", "\n", "Here you will look at tweets and decide how you are going to process URL, emojis and emoticons.\n", "\n", @@ -495,10 +495,15 @@ "\n", "**Note:** some normalization processes are better to perform at sentence and document level, and some processes are word-level and should happen after tokenization and segmentation, which we will cover right after normalization.\n", "\n", - "Here you will convert the text to lower case, remove punctuation, remove numbers, remove white spaces, and complete other word-level processing steps after tokenizing the sentences.\n", - "\n", + "Here you will convert the text to lower case, remove punctuation, remove numbers, remove white spaces, and complete other word-level processing steps after tokenizing the sentences." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ "#### Normalization - Convert all text to lower case\n", - "Usually, this is a must for all language preprocessing. Since \"Word\" and \"word\" will essentially be considered two different elements in word representation, and we want words that have the same meaning to be represented the same in numbers (vectors), we want to convert all text into the same case. " + "Usually, this is a must for all language pre-processing. Since \"Word\" and \"word\" will essentially be considered two different elements in word representation, and we want words that have the same meaning to be represented the same in numbers (vectors), we want to convert all text into the same case. " ] }, { @@ -1949,25 +1954,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Conclusion\n", + "# Conclusion\n", "\n", "Congratulations! You cleaned and prepared your text data and it is now ready to be vectorized or used for feature engineering. \n", "Now that your data is ready to be converted into machine-readable format (numbers), we will cover extracting features and word embeddings in the next section **text data feature engineering**." ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "conda_python3", "language": "python", - "name": "python3" + "name": "conda_python3" }, "language_info": { "codemirror_mode": { @@ -1979,7 +1977,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.6.10" } }, "nbformat": 4, diff --git a/reinforcement_learning/rl_cartpole_coach/rl_cartpole_coach_gymEnv.ipynb b/reinforcement_learning/rl_cartpole_coach/rl_cartpole_coach_gymEnv.ipynb index d4735123ff..b9fb1fda6c 100644 --- a/reinforcement_learning/rl_cartpole_coach/rl_cartpole_coach_gymEnv.ipynb +++ b/reinforcement_learning/rl_cartpole_coach/rl_cartpole_coach_gymEnv.ipynb @@ -112,6 +112,7 @@ "cell_type": "code", "execution_count": null, "metadata": { + "collapsed": true, "tags": [ "parameters" ] @@ -564,7 +565,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.5" + "version": "3.7.9" }, "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." }, diff --git a/reinforcement_learning/rl_cartpole_ray/rl_cartpole_ray_gymEnv.ipynb b/reinforcement_learning/rl_cartpole_ray/rl_cartpole_ray_gymEnv.ipynb index 3a6fcb210f..cd71ce5757 100644 --- a/reinforcement_learning/rl_cartpole_ray/rl_cartpole_ray_gymEnv.ipynb +++ b/reinforcement_learning/rl_cartpole_ray/rl_cartpole_ray_gymEnv.ipynb @@ -275,8 +275,8 @@ " image_uri=custom_image_name,\n", " role=role,\n", " debugger_hook_config=False,\n", - " instance_type=instance_type,\n", - " instance_count=train_instance_count,\n", + " train_instance_type=instance_type,\n", + " train_instance_count=train_instance_count,\n", " output_path=s3_output_path,\n", " base_job_name=job_name_prefix,\n", " metric_definitions=metric_definitions,\n", @@ -573,10 +573,10 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.10" + "version": "3.7.9" }, "notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/template.ipynb b/template.ipynb new file mode 100644 index 0000000000..8f42b0a07c --- /dev/null +++ b/template.ipynb @@ -0,0 +1,333 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Title\n", + "The title should be similar to the filename, but the filename should be very concise and compact, so people can read what it is when displayed in a list view in JupyterLab.\n", + "\n", + "Example title - **Amazon SageMaker Processing: pre-processing images with PyTorch using a GPU instance type**\n", + "\n", + "* Bad example filename: *amazon_sagemaker-processing-images_with_pytorch_on_GPU.ipynb* (too long & mixes case, dashes, and underscores)\n", + "* Good example filename: *processing_images_pytorch_gpu.ipynb* (succinct, all lowercase, all underscores)\n", + "\n", + "**IMPORTANT:** Use only one maining heading with `#`, so your next subheading is `##` or `###` and so on.\n", + "\n", + "## Overview\n", + "1. What does this notebook do?\n", + " - What will the user learn how to do?\n", + "1. Is this an end-to-end tutorial or it is a how-to (procedural) example?\n", + " - Tutorial: add conceptual information, flowcharts, images\n", + " - How to: notebook should be lean. More of a list of steps. No conceptual info, but links to resources for more info.\n", + "1. Who is the audience? \n", + " - What should the user be familiar with before running this? \n", + " - Link to other examples they should have run first.\n", + "1. How much will this cost?\n", + " - Some estimate of both time and money is recommended.\n", + " - List the instance types and other resources that are created.\n", + "\n", + "\n", + "## Prerequisites\n", + "1. Which environments does this notebook work in? Select all that apply.\n", + " - Notebook Instances: Jupyter?\n", + " - Notebook Instances: JupyterLab?\n", + " - Studio?\n", + "1. Which conda kernel is required?\n", + "1. Is there a previous notebook that is required?\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup \n", + "\n", + "### Setup Dependencies\n", + "\n", + "1. Describe any pip or conda or apt installs or setup scripts that are needed.\n", + "1. Pin sagemaker if version <2 is required.\n", + "\n", + " `%pip install \"sagemaker>=1.14.2,<2\"`\n", + " \n", + " \n", + "1. Upgrade sagemaker if version 2 is required, but rollback upgrades to packages that might taint the user's kernel and make other notebooks break. Do this at the end of the notebook in the cleanup cell.\n", + "\n", + " ```python\n", + " # setup\n", + " import sagemaker\n", + " version = sagemaker.__version__\n", + " %pip install 'sagemaker>=2.0.0'\n", + " ...\n", + " # cleanup\n", + " %pip install 'sagemaker=={}'.format(version)\n", + " ```\n", + " \n", + "\n", + "1. Use flags that facilitate automatic, end-to-end running without a user prompt, so that the notebook can run in CI without any updates or special configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SageMaker Python SDK version 1.x is required\n", + "import sys\n", + "%pip install \"sagemaker>=1.14.2,<2\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SageMaker Python SDK version 2.x is required\n", + "import sagemaker\n", + "import sys\n", + "original_version = sagemaker.__version__\n", + "%pip install 'sagemaker>=2.0.0'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup Python Modules\n", + "1. Import modules, set options, and activate extensions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "ExecuteTime": { + "end_time": "2019-06-16T14:44:50.874881Z", + "start_time": "2019-06-16T14:44:38.616867Z" + } + }, + "outputs": [], + "source": [ + "# imports\n", + "import sagemaker\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "# options\n", + "pd.options.display.max_columns = 50\n", + "pd.options.display.max_rows = 30\n", + "\n", + "# visualizations\n", + "import plotly\n", + "import plotly.graph_objs as go\n", + "import plotly.offline as ply\n", + "plotly.offline.init_notebook_mode(connected=True)\n", + "\n", + "# extensions\n", + "if 'autoreload' not in get_ipython().extension_manager.loaded:\n", + " %load_ext autoreload\n", + " \n", + "%autoreload 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Parameters\n", + "1. Setup user supplied parameters like custom bucket names and roles in a separated cell and call out what their options are.\n", + "1. Use defaults, so the notebook will still run end-to-end without any user modification.\n", + "\n", + "For example, the following description & code block prompts the user to select the preferred dataset.\n", + "\n", + "~~~\n", + "\n", + "To do select a particular dataset, assign choosen_data_set below to be one of 'diabetes', 'california', or 'boston' where each name corresponds to the it's respective dataset.\n", + "\n", + "'boston' : boston house data\n", + "'california' : california house data\n", + "'diabetes' : diabetes data\n", + "\n", + "~~~\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data_sets = {'diabetes': 'load_diabetes()', 'california': 'fetch_california_housing()', 'boston' : 'load_boston()'}\n", + "\n", + "# Change choosen_data_set variable to one of the data sets above. \n", + "choosen_data_set = 'california'\n", + "assert choosen_data_set in data_sets.keys()\n", + "print(\"I selected the '{}' dataset!\".format(choosen_data_set))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Data import\n", + "1. Look for the data that was stored by a previous notebook run `%store -r variableName`\n", + "1. If that doesn't exist, look in S3 in their default bucket\n", + "1. If that doesn't exist, download it from the [SageMaker dataset bucket](https://sagemaker-sample-files.s3.amazonaws.com/) \n", + "1. If that doesn't exist, download it from origin\n", + "\n", + "For example, the following code block will pull training and validation data that was created in a previous notebook. This allows the customer to experiment with features, re-run the notebook, and not have it pull the dataset over and over." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load relevant dataframes and variables from preprocessing_tabular_data.ipynb required for this notebook\n", + "%store -r X_train\n", + "%store -r X_test\n", + "%store -r X_val\n", + "%store -r Y_train\n", + "%store -r Y_test\n", + "%store -r Y_val\n", + "%store -r choosen_data_set" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Procedure or tutorial\n", + "1. Break up processes with Markdown blocks to explain what's going on.\n", + "1. Make use of visualizations to better demonstrate each step." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Cleanup\n", + "1. If you upgraded their `sagemaker` SDK, roll it back.\n", + "1. Delete any endpoints or other resources that linger and might cost the user money.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# rollback the SageMaker Python SDK to the kernel's original version\n", + "print(\"Original version: {}\".format(original_version))\n", + "print(\"Current version: {}\".format(sagemaker.__version__))\n", + "s = 'sagemaker=={}'.format(version)\n", + "print(\"Rolling back to... {}\".format(s))\n", + "%pip install {s}\n", + "import sagemaker\n", + "print(\"{} installed!\".format(sagemaker.__version__))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps\n", + "\n", + "1. Wrap up with some conclusion or overview of what was accomplished.\n", + "1. Offer another notebook or more resources or some other call to action." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## References\n", + "1. author1, article1, journal1, year1, url1\n", + "2. author2, article2, journal2, year2, url2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "conda_python3", + "language": "python", + "name": "conda_python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + }, + "pycharm": { + "stem_cell": { + "cell_type": "raw", + "metadata": { + "collapsed": false + }, + "source": [] + } + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": false, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": false + }, + "varInspector": { + "cols": { + "lenName": 16, + "lenType": 16, + "lenVar": 40 + }, + "kernels_config": { + "python": { + "delete_cmd_postfix": "", + "delete_cmd_prefix": "del ", + "library": "var_list.py", + "varRefreshCmd": "print(var_dic_list())" + }, + "r": { + "delete_cmd_postfix": ") ", + "delete_cmd_prefix": "rm(", + "library": "var_list.r", + "varRefreshCmd": "cat(var_dic_list()) " + } + }, + "types_to_exclude": [ + "module", + "function", + "builtin_function_or_method", + "instance", + "_Feature" + ], + "window_display": false + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}