Merge remote-tracking branch 'origin/main' into standardize-gem-name-…

…format
SimpleDataLabsInc · Nov 27, 2024 · dcff75a · dcff75a
2 parents d164450 + f258943
commit dcff75a
Show file tree

Hide file tree

Showing 7 changed files with 131 additions and 0 deletions.
diff --git a/docs/Spark/gems/transform/bulk-column-expressions.md b/docs/Spark/gems/transform/bulk-column-expressions.md
@@ -0,0 +1,36 @@
+---
+sidebar_position: 11
+title: Bulk Column Expressions
+id: bulk-column-expressions
+description: Change the data type of multiple columns at once.
+tags:
+  - gems
+  - type
+  - columns
+---
+
+The Bulk Column Expressions Gem primarily lets you cast or change the data type of multiple columns at once. It provides additional functionality, including:
+
+- Adding a prefix or suffix to selected columns.
+- Applying a custom expression to selected columns.
+
+## Parameters
+
+| Parameter                                    | Description                                                        |
+| -------------------------------------------- | ------------------------------------------------------------------ |
+| Data Type of the columns to do operations on | The data type of columns to select.                                |
+| Selected Columns                             | The columns on which to apply transformations                      |
+| Change output column name                    | An option to add a prefix or suffix to the selected column names   |
+| Change output column type                    | The data type that the columns will be transformed into            |
+| Output Expression                            | A Spark SQL expression that can be applied to the selected columns |
+
+## Example
+
+Assume you have some columns in a table that represent zero-based indices and are stored as long data types. You want them to represent one-based indices and be stored as integers to optimize memory use.
+
+Using the Bulk Column Expressions Gem, you can:
+
+- Filter your columns by long data types.
+- Select the columns you wish to transform.
+- Cast the output column(s) to be integers.
+- Include `column_value + 1` in the expression field to shift the indices.
diff --git a/docs/Spark/gems/transform/bulk-column-rename.md b/docs/Spark/gems/transform/bulk-column-rename.md
@@ -0,0 +1,33 @@
+---
+sidebar_position: 10
+title: Bulk Column Rename
+id: bulk-column-rename
+description: Rename multiple columns in your Dataset in a systematic way.
+tags:
+  - gems
+  - rename
+  - columns
+---
+
+Use the Bulk Column Rename Gem to rename multiple columns in your Dataset in a systematic way.
+
+## Parameters
+
+| Parameter         | Description                                                                              |
+| ----------------- | ---------------------------------------------------------------------------------------- |
+| Columns to rename | Select one or more columns to rename from the dropdown.                                  |
+| Method            | Choose to add a prefix, add a suffix, or use a custom expression to change column names. |
+
+Based on the method you select, you will see an option to enter the prefix, suffix, or expression of your choice.
+
+## Examples
+
+### Add a prefix
+
+One example is to add the prefix `meta_` to tag columns that contain metadata.
+
+![Add prefix to multiple columns](./img/bulk-add-prefix.png)
+
+### Use a custom expression
+
+You can accomplish the same or more complex changes using a custom expression like `concat('meta_', column_name)`.
diff --git a/docs/Spark/gems/transform/data-cleansing.md b/docs/Spark/gems/transform/data-cleansing.md
@@ -0,0 +1,27 @@
+---
+sidebar_position: 12
+title: Data Cleansing
+id: data-cleansing
+description: Standardize data formats and address missing or null values in the data.
+tags:
+  - gems
+  - clean
+  - format
+---
+
+Use the Data Cleansing Gem to standardize data formats and address missing or null values in the data.
+
+## Parameters
+
+| Parameter                        | Description                                                     |
+| -------------------------------- | --------------------------------------------------------------- |
+| Select columns you want to clean | The set of columns on which to perform cleaning transformations |
+| Remove null data                 | The method used to remove null data                             |
+| Replace null values in column    | The method used to replace null values                          |
+| Clean data                       | Different ways to standardize the format of data in columns     |
+
+## Example
+
+Assume you have a table that includes customer feedback on individual orders. In this scenario, some customers may not provide feedback, resulting in null values in the data. You can use the Data Cleansing Gem to replace null values with the string `NA`.
+
+![Replace null with string](./img/replace-null-with-string.png)
diff --git a/docs/Spark/gems/transform/dynamic-select.md b/docs/Spark/gems/transform/dynamic-select.md
@@ -0,0 +1,35 @@
+---
+sidebar_position: 13
+title: Dynamic Select
+id: dynamic-select
+description: Dynamically filter columns of your dataset based on a set of conditions.
+tags:
+  - gems
+  - filter
+  - dynamic
+---
+
+Use the Dynamic Select Gem to dynamically filter columns of your Dataset based on a set of conditions.
+
+## Configuration
+
+There are two ways to configure the Dynamic Select.
+
+| Configuration         | Description                                                                                   |
+| --------------------- | --------------------------------------------------------------------------------------------- |
+| Select field types    | Choose one or more types of columns to keep in the Dataset, such as string, decimal, or date. |
+| Select via expression | Create an expression that limits the type of columns to keep in the Dataset.                  |
+
+## Examples
+
+You’ll use Dynamic Select when you want to avoid hard-coding your choice of columns. In other words, rather than define each column to keep in your Pipeline, you let the system automatically choose the columns based on certain conditions or rules.
+
+### Remove date columns using field type
+
+Assume you would like to remove irrelevant date and timestamp columns from your Dataset. You can do so with the **Select field types** method by selecting all field types to maintain, except for date and timestamp.
+
+![Keep all columns except Date and Timestamp column using the visual interface](./img/remove-date-timestamp.png)
+
+### Remove date columns with an expression
+
+Using the same example, you can accomplish the same task with the **Select via expression** method by inputting the the expression `column_type NOT IN ('date', 'timestamp')`.
diff --git a/docs/Spark/gems/transform/img/bulk-add-prefix.png b/docs/Spark/gems/transform/img/bulk-add-prefix.png
diff --git a/docs/Spark/gems/transform/img/remove-date-timestamp.png b/docs/Spark/gems/transform/img/remove-date-timestamp.png
diff --git a/docs/Spark/gems/transform/img/replace-null-with-string.png b/docs/Spark/gems/transform/img/replace-null-with-string.png