Sagemaker DataWrangler Samples addition (#3510)

* Create readme.md * Add files via upload Joined flow added * Add files via upload * Add files via upload * Add files via upload * Delete TS-Workshop-Advanced.ipynb * Delete TS-Workshop-Cleanup.ipynb * Delete TS-Workshop.ipynb * Add files via upload Updated after the CI errors * Create test.txt * Add files via upload * Delete sagemaker-datawrangler/timeseries-dataflow/pictures directory * Delete timeseries.flow * Add files via upload * Add files via upload * Add files via upload * Update index.rst * Add files via upload Added rst file for joined * Add files via upload added tabular index.rst file * Add files via upload Uploaded index.rst for time series data * Delete sagemaker-datawrangler/tabular-dataflow/img directory Images are now in S3 bucket so deleting this * Update README.md updating image links with s3 links * Update and rename sagemaker-datawrangler/tabular-dataflow/Data-Exploration.md to sagemaker-datawrangler/tabular-dataflow/data-exploration/Data-Exploration.md updating image link and folder * Add files via upload uploading index.rst * Update and rename sagemaker-datawrangler/tabular-dataflow/Data-Import.md to sagemaker-datawrangler/tabular-dataflow/data-import/Data-Import.md updated image links * Add files via upload index.rst for data import * Update Data-Transformations.md * Rename sagemaker-datawrangler/tabular-dataflow/Data-Transformations.md to sagemaker-datawrangler/tabular-dataflow/data-transformations/Data-Transformations.md * Add files via upload * Update readme.md * Delete sagemaker-datawrangler/joined-dataflow/img directory * Update readme.md * Delete sagemaker-datawrangler/timeseries-dataflow/img directory * Update index.rst * Update index.rst Updated index.rst to link to other files * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update README.md referring to /readme.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Add files via upload * Add files via upload * Update index.rst * Create index.rst * Update index.rst * Update index.rst * Add files via upload * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Delete sagemaker-datawrangler/import-flow directory * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst * Update index.rst added data wrangler to the prep section * Update index.rst * Update index.rst * Add files via upload Updated per comments from aqyt * Update explore_data.ipynb Updated per Amelia comment - present tense * Update index.rst Grammer * Update index.rst Grammer * Update index.rst * Update import-flow.md Co-authored-by: atqy <[email protected]> Co-authored-by: Aaron Markham <[email protected]>
aws · Sep 21, 2022 · ffef369 · ffef369
1 parent c11efc9
commit ffef369
Show file tree

Hide file tree

Showing 22 changed files with 248,274 additions and 1 deletion.
diff --git a/index.rst b/index.rst
@@ -39,6 +39,7 @@ We recommend the following notebooks as a broad introduction to the capabilities
    :maxdepth: 1
    :caption: Prepare data
 
+   sagemaker-datawrangler/index
    sagemaker_processing/spark_distributed_data_processing/sagemaker-spark-processing_outputs
    sagemaker_processing/basic_sagemaker_data_processing/basic_sagemaker_processing_outputs
 
@@ -210,10 +211,16 @@ More examples
    sagemaker-clarify/index
    scientific_details_of_algorithms/index
    aws_marketplace/index
+
 
 
 .. toctree::
    :maxdepth: 1
    :caption: Community examples
 
-   contrib/index
+   contrib/index
+
+
+
+
+
diff --git a/sagemaker-datawrangler/README.md b/sagemaker-datawrangler/README.md
@@ -0,0 +1,41 @@
+![Amazon SageMaker Data Wrangler](https://github.com/aws/amazon-sagemaker-examples/raw/main/_static/sagemaker-banner.png)
+
+# Amazon SageMaker Data Wrangler Examples
+
+Example flows that demonstrate how to aggregate and prepare data for Machine Learning using Amazon SageMaker Data Wrangler.
+
+## :books: Background
+
+[Amazon SageMaker Data Wrangler](https://aws.amazon.com/sagemaker/data-wrangler/) reduces the time it takes to aggregate and prepare data for ML. From a single interface in SageMaker Studio, you can import data from Amazon S3, Amazon Athena, Amazon Redshift, AWS Lake Formation, and Amazon SageMaker Feature Store, and in just a few clicks SageMaker Data Wrangler will automatically load, aggregate, and display the raw data. It will then make conversion recommendations based on the source data, transform the data into new features, validate the features, and provide visualizations with recommendations on how to remove common sources of error such as incorrect labels. Once your data is prepared, you can build fully automated ML workflows with Amazon SageMaker Pipelines or import that data into Amazon SageMaker Feature Store.
+
+
+
+The [SageMaker example notebooks](https://sagemaker-examples.readthedocs.io/en/latest/) are Jupyter notebooks that demonstrate the usage of Amazon SageMaker.
+
+## :hammer_and_wrench: Setup
+
+Amazon SageMaker Data Wrangler is a feature in Amazon SageMaker Studio. Use this section to learn how to access and get started using Data Wrangler. Do the following:
+
+* Complete each step in [Prerequisites](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-prerequisite).
+
+* Follow the procedure in [Access Data Wrangler](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-access) to start using Data Wrangler.
+
+
+
+
+## :notebook: Examples
+
+### **[Tabular DataFlow](tabular-dataflow/README.md)**
+
+This example provide quick walkthrough of how to aggregate and prepare data for Machine Learning using Amazon SageMaker Data Wrangler for Tabular dataset.
+
+### **[Timeseries DataFlow](timeseries-dataflow/readme.md)**
+
+This example provide quick walkthrough of how to aggregate and prepare data for Machine Learning using Amazon SageMaker Data Wrangler for Timeseries dataset.
+
+### **[Joined DataFlow](joined-dataflow/readme.md)**
+
+This example provide quick walkthrough of how to aggregate and prepare data for Machine Learning using Amazon SageMaker Data Wrangler for Joined dataset.
+
+
+
diff --git a/sagemaker-datawrangler/import-flow.md b/sagemaker-datawrangler/import-flow.md
@@ -0,0 +1,11 @@
+## Import Flow
+
+Each of the example has a flow file available which you can directly import to expedite the process or validate the flow.
+
+Here are the steps to import the flow
+
+* Download the flow file
+
+* In Sagemaker Studio, drag and drop the flow file or use the upload button to browse the flow and upload
+
+![uploadflow](/uploadflow.png)
diff --git a/sagemaker-datawrangler/index.rst b/sagemaker-datawrangler/index.rst
@@ -0,0 +1,69 @@
+
+
+Amazon SageMaker Data Wrangler
+=======================================
+
+These example flows demonstrates how to aggregate and prepare data for
+Machine Learning using Amazon SageMaker Data Wrangler.
+
+
+------------------
+
+`Amazon SageMaker Data
+Wrangler <https://aws.amazon.com/sagemaker/data-wrangler/>`__ reduces
+the time it takes to aggregate and prepare data for ML. From a single
+interface in SageMaker Studio, you can import data from Amazon S3,
+Amazon Athena, Amazon Redshift, AWS Lake Formation, and Amazon SageMaker
+Feature Store, and in just a few clicks SageMaker Data Wrangler will
+automatically load, aggregate, and display the raw data. It will then
+make conversion recommendations based on the source data, transform the
+data into new features, validate the features, and provide
+visualizations with recommendations on how to remove common sources of
+error such as incorrect labels. Once your data is prepared, you can
+build fully automated ML workflows with Amazon SageMaker Pipelines or
+import that data into Amazon SageMaker Feature Store.
+
+The `SageMaker example
+notebooks <https://sagemaker-examples.readthedocs.io/en/latest/>`__ are
+Jupyter notebooks that demonstrate the usage of Amazon SageMaker.
+
+Setup
+-------------------------
+
+Amazon SageMaker Data Wrangler is a feature in Amazon SageMaker Studio.
+Use this section to learn how to access and get started using Data
+Wrangler. Do the following:
+
+-  Complete each step in
+   `Prerequisites <https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-prerequisite>`__.
+
+-  Follow the procedure in `Access Data
+   Wrangler <https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-getting-started.html#data-wrangler-getting-started-access>`__
+   to start using Data Wrangler.
+
+Examples
+-------------------
+
+Tabular Dataflow
+---------------------------
+
+.. toctree::
+   :maxdepth: 1
+
+   tabular-dataflow/index
+
+Timeseries Dataflow
+----------------------------
+
+.. toctree::
+   :maxdepth: 1
+
+   timeseries-dataflow/index
+
+Joined Dataflow
+----------------------------
+
+.. toctree::
+   :maxdepth: 1
+
+   joined-dataflow/index