Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sagemaker DataWrangler Samples addition #3510

Merged
merged 101 commits into from
Sep 21, 2022
Merged
Show file tree
Hide file tree
Changes from 93 commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
ff29ced
Create readme.md
neelamkoshiya Jul 25, 2022
440066f
Add files via upload
neelamkoshiya Jul 25, 2022
17a481f
Add files via upload
neelamkoshiya Jul 25, 2022
b4eb82b
Add files via upload
neelamkoshiya Jul 25, 2022
3b53e2d
Add files via upload
neelamkoshiya Jul 25, 2022
788fe24
Merge branch 'main' into main
atqy Jul 27, 2022
7fddbfe
Merge branch 'main' into main
neelamkoshiya Aug 9, 2022
5659774
Delete TS-Workshop-Advanced.ipynb
neelamkoshiya Aug 9, 2022
3cc1696
Delete TS-Workshop-Cleanup.ipynb
neelamkoshiya Aug 9, 2022
32bc20f
Delete TS-Workshop.ipynb
neelamkoshiya Aug 9, 2022
6b9ba89
Add files via upload
neelamkoshiya Aug 9, 2022
b4f7570
Create test.txt
neelamkoshiya Aug 9, 2022
18d9507
Add files via upload
neelamkoshiya Aug 9, 2022
2448cf0
Delete sagemaker-datawrangler/timeseries-dataflow/pictures directory
neelamkoshiya Aug 9, 2022
a9202c0
Delete timeseries.flow
neelamkoshiya Aug 9, 2022
aca9e18
Add files via upload
neelamkoshiya Aug 9, 2022
99e4c44
Add files via upload
neelamkoshiya Aug 9, 2022
c74a77a
Add files via upload
neelamkoshiya Sep 8, 2022
bf26839
Merge branch 'aws:main' into main
neelamkoshiya Sep 8, 2022
6fe67ea
Update index.rst
neelamkoshiya Sep 8, 2022
a50f3d5
Add files via upload
neelamkoshiya Sep 9, 2022
842f196
Add files via upload
neelamkoshiya Sep 9, 2022
0a92fc5
Add files via upload
neelamkoshiya Sep 9, 2022
7e60fef
Delete sagemaker-datawrangler/tabular-dataflow/img directory
neelamkoshiya Sep 9, 2022
9c93e21
Update README.md
neelamkoshiya Sep 9, 2022
4a459fb
Update and rename sagemaker-datawrangler/tabular-dataflow/Data-Explor…
neelamkoshiya Sep 9, 2022
f802728
Add files via upload
neelamkoshiya Sep 9, 2022
0fa060c
Update and rename sagemaker-datawrangler/tabular-dataflow/Data-Import…
neelamkoshiya Sep 9, 2022
0be7db6
Add files via upload
neelamkoshiya Sep 9, 2022
48631e8
Update Data-Transformations.md
neelamkoshiya Sep 9, 2022
fdf7cb2
Rename sagemaker-datawrangler/tabular-dataflow/Data-Transformations.m…
neelamkoshiya Sep 9, 2022
d2ddef9
Add files via upload
neelamkoshiya Sep 9, 2022
5cc3e30
Update readme.md
neelamkoshiya Sep 9, 2022
b1e5e7a
Delete sagemaker-datawrangler/joined-dataflow/img directory
neelamkoshiya Sep 9, 2022
d9ac1e4
Update readme.md
neelamkoshiya Sep 9, 2022
81a0aac
Delete sagemaker-datawrangler/timeseries-dataflow/img directory
neelamkoshiya Sep 9, 2022
380f9e7
Update index.rst
neelamkoshiya Sep 9, 2022
a22946e
Update index.rst
neelamkoshiya Sep 9, 2022
c2277e0
Update index.rst
neelamkoshiya Sep 9, 2022
2980986
Update index.rst
neelamkoshiya Sep 9, 2022
cea144a
Update index.rst
neelamkoshiya Sep 9, 2022
ca96a7e
Update index.rst
neelamkoshiya Sep 9, 2022
f5a1457
Update index.rst
neelamkoshiya Sep 9, 2022
2932e6b
Update index.rst
neelamkoshiya Sep 9, 2022
20dd30b
Update index.rst
neelamkoshiya Sep 9, 2022
f37003f
Update index.rst
neelamkoshiya Sep 9, 2022
982e0a1
Update index.rst
neelamkoshiya Sep 9, 2022
65ffbce
Update README.md
neelamkoshiya Sep 9, 2022
0a363d1
Update README.md
neelamkoshiya Sep 9, 2022
d22f2fc
Update README.md
neelamkoshiya Sep 9, 2022
5551da0
Update README.md
neelamkoshiya Sep 9, 2022
3a449dc
Update README.md
neelamkoshiya Sep 9, 2022
553d5e8
Merge branch 'aws:main' into main
neelamkoshiya Sep 9, 2022
e78ce3e
Update README.md
neelamkoshiya Sep 9, 2022
d4f1baf
Update README.md
neelamkoshiya Sep 9, 2022
ebfe9d7
Update index.rst
neelamkoshiya Sep 9, 2022
7fb2368
Update index.rst
neelamkoshiya Sep 10, 2022
0adb299
Update index.rst
neelamkoshiya Sep 10, 2022
8811d0f
Update index.rst
neelamkoshiya Sep 10, 2022
93fe3a5
Add files via upload
neelamkoshiya Sep 10, 2022
ba911da
Add files via upload
neelamkoshiya Sep 12, 2022
dbbd236
Update index.rst
neelamkoshiya Sep 12, 2022
8c5c138
Create index.rst
neelamkoshiya Sep 12, 2022
547b9f1
Update index.rst
neelamkoshiya Sep 12, 2022
a11ed9b
Update index.rst
neelamkoshiya Sep 12, 2022
3d68b8c
Add files via upload
neelamkoshiya Sep 12, 2022
33ae956
Update index.rst
neelamkoshiya Sep 12, 2022
1169b1e
Update index.rst
neelamkoshiya Sep 12, 2022
ebbeddc
Update index.rst
neelamkoshiya Sep 12, 2022
587c3c5
Update index.rst
neelamkoshiya Sep 12, 2022
2e02901
Update index.rst
neelamkoshiya Sep 12, 2022
6aebd0b
Update index.rst
neelamkoshiya Sep 15, 2022
75b6651
Update index.rst
neelamkoshiya Sep 15, 2022
f0de6ad
Update index.rst
neelamkoshiya Sep 15, 2022
d179b50
Update index.rst
neelamkoshiya Sep 15, 2022
cc831d2
Update index.rst
neelamkoshiya Sep 15, 2022
1871e6f
Update index.rst
neelamkoshiya Sep 15, 2022
6356f6e
Update index.rst
neelamkoshiya Sep 15, 2022
790d26b
Update index.rst
neelamkoshiya Sep 15, 2022
27f42e1
Update index.rst
neelamkoshiya Sep 15, 2022
24c8809
Update index.rst
neelamkoshiya Sep 15, 2022
c77479b
Update index.rst
neelamkoshiya Sep 15, 2022
c1d6add
Merge branch 'aws:main' into main
neelamkoshiya Sep 15, 2022
fa51918
Delete sagemaker-datawrangler/import-flow directory
neelamkoshiya Sep 15, 2022
7c39733
Update index.rst
neelamkoshiya Sep 15, 2022
0479e62
Update index.rst
neelamkoshiya Sep 15, 2022
cf69ae8
Update index.rst
neelamkoshiya Sep 15, 2022
26b5a01
Update index.rst
neelamkoshiya Sep 15, 2022
6a241d5
Update index.rst
neelamkoshiya Sep 15, 2022
e9f02e2
Update index.rst
neelamkoshiya Sep 15, 2022
0f90173
Update index.rst
neelamkoshiya Sep 15, 2022
2371f8c
Update index.rst
neelamkoshiya Sep 16, 2022
41d1d44
Update index.rst
neelamkoshiya Sep 16, 2022
6663ba7
Merge branch 'main' into main
neelamkoshiya Sep 18, 2022
ecf4ab5
Add files via upload
neelamkoshiya Sep 20, 2022
af07a80
Update explore_data.ipynb
neelamkoshiya Sep 20, 2022
db8d6ed
Update index.rst
neelamkoshiya Sep 20, 2022
98c913b
Update index.rst
neelamkoshiya Sep 20, 2022
51b9ebd
Update index.rst
neelamkoshiya Sep 20, 2022
029f3f3
Merge branch 'main' into main
aaronmarkham Sep 20, 2022
ae8e734
Update import-flow.md
neelamkoshiya Sep 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ We recommend the following notebooks as a broad introduction to the capabilities
:maxdepth: 1
:caption: Prepare data

sagemaker-datawrangler/index
sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing_outputs
sagemaker_processing/basic_sagemaker_data_processing/basic_sagemaker_processing_outputs

Expand Down Expand Up @@ -210,10 +211,16 @@ More examples
sagemaker-clarify/index
scientific_details_of_algorithms/index
aws_marketplace/index



.. toctree::
:maxdepth: 1
:caption: Community examples

contrib/index
contrib/index





41 changes: 41 additions & 0 deletions sagemaker-datawrangler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
![Amazon SageMaker Data Wrangler](https://github.com/aws/amazon-sagemaker-examples/raw/main/_static/sagemaker-banner.png)

# Amazon SageMaker Data Wrangler Examples

Example flows that demonstrate how to aggregate and prepare data for Machine Learning using Amazon SageMaker Data Wrangler.

## :books: Background

[Amazon SageMaker Data Wrangler](https://aws.amazon.com/sagemaker/data-wrangler/) reduces the time it takes to aggregate and prepare data for ML. From a single interface in SageMaker Studio, you can import data from Amazon S3, Amazon Athena, Amazon Redshift, AWS Lake Formation, and Amazon SageMaker Feature Store, and in just a few clicks SageMaker Data Wrangler will automatically load, aggregate, and display the raw data. It will then make conversion recommendations based on the source data, transform the data into new features, validate the features, and provide visualizations with recommendations on how to remove common sources of error such as incorrect labels. Once your data is prepared, you can build fully automated ML workflows with Amazon SageMaker Pipelines or import that data into Amazon SageMaker Feature Store.



The [SageMaker example notebooks](https://sagemaker-examples.readthedocs.io/en/latest/) are Jupyter notebooks that demonstrate the usage of Amazon SageMaker.

## :hammer_and_wrench: Setup

Amazon SageMaker Data Wrangler is a feature in Amazon SageMaker Studio. Use this section to learn how to access and get started using Data Wrangler. Do the following:

* Complete each step in [Prerequisites](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-prerequisite).

* Follow the procedure in [Access Data Wrangler](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-access) to start using Data Wrangler.




## :notebook: Examples

### **[Tabular DataFlow](tabular-dataflow/README.md)**

This example provide quick walkthrough of how to aggregate and prepare data for Machine Learning using Amazon SageMaker Data Wrangler for Tabular dataset.

### **[Timeseries DataFlow](timeseries-dataflow/readme.md)**

This example provide quick walkthrough of how to aggregate and prepare data for Machine Learning using Amazon SageMaker Data Wrangler for Timeseries dataset.

### **[Joined DataFlow](joined-dataflow/readme.md)**

This example provide quick walkthrough of how to aggregate and prepare data for Machine Learning using Amazon SageMaker Data Wrangler for Joined dataset.



11 changes: 11 additions & 0 deletions sagemaker-datawrangler/import-flow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Import Flow

Each of the example has a flow file available which you can directly import to expediate the process or validate the flow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling mistake: should be expedite


Here are the steps to import the flow

* Download the flow file

* In Sagemaker Studio, drag and drop the flow file or use the upload button to browse the flow and upload

![uploadflow](/uploadflow.png)
69 changes: 69 additions & 0 deletions sagemaker-datawrangler/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@


Amazon SageMaker Data Wrangler
=======================================

These example flows demonstrates how to aggregate and prepare data for
Machine Learning using Amazon SageMaker Data Wrangler.


------------------

`Amazon SageMaker Data
Wrangler <https://aws.amazon.com/sagemaker/data-wrangler/>`__ reduces
the time it takes to aggregate and prepare data for ML. From a single
interface in SageMaker Studio, you can import data from Amazon S3,
Amazon Athena, Amazon Redshift, AWS Lake Formation, and Amazon SageMaker
Feature Store, and in just a few clicks SageMaker Data Wrangler will
automatically load, aggregate, and display the raw data. It will then
make conversion recommendations based on the source data, transform the
data into new features, validate the features, and provide
visualizations with recommendations on how to remove common sources of
error such as incorrect labels. Once your data is prepared, you can
build fully automated ML workflows with Amazon SageMaker Pipelines or
import that data into Amazon SageMaker Feature Store.

The `SageMaker example
notebooks <https://sagemaker-examples.readthedocs.io/en/latest/>`__ are
Jupyter notebooks that demonstrate the usage of Amazon SageMaker.

Setup
-------------------------

Amazon SageMaker Data Wrangler is a feature in Amazon SageMaker Studio.
Use this section to learn how to access and get started using Data
Wrangler. Do the following:

- Complete each step in
`Prerequisites <https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-prerequisite>`__.

- Follow the procedure in `Access Data
Wrangler <https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-access>`__
to start using Data Wrangler.

Examples
-------------------

Tabular Dataflow
---------------------------

.. toctree::
:maxdepth: 1

tabular-dataflow/index

Timeseries Dataflow
----------------------------

.. toctree::
:maxdepth: 1

timeseries-dataflow/index

Joined Dataflow
----------------------------

.. toctree::
:maxdepth: 1

joined-dataflow/index
Loading