-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'origin/main' into standardize-gem-name-…
…format
- Loading branch information
Showing
7 changed files
with
131 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
--- | ||
sidebar_position: 11 | ||
title: Bulk Column Expressions | ||
id: bulk-column-expressions | ||
description: Change the data type of multiple columns at once. | ||
tags: | ||
- gems | ||
- type | ||
- columns | ||
--- | ||
|
||
The Bulk Column Expressions Gem primarily lets you cast or change the data type of multiple columns at once. It provides additional functionality, including: | ||
|
||
- Adding a prefix or suffix to selected columns. | ||
- Applying a custom expression to selected columns. | ||
|
||
## Parameters | ||
|
||
| Parameter | Description | | ||
| -------------------------------------------- | ------------------------------------------------------------------ | | ||
| Data Type of the columns to do operations on | The data type of columns to select. | | ||
| Selected Columns | The columns on which to apply transformations | | ||
| Change output column name | An option to add a prefix or suffix to the selected column names | | ||
| Change output column type | The data type that the columns will be transformed into | | ||
| Output Expression | A Spark SQL expression that can be applied to the selected columns | | ||
|
||
## Example | ||
|
||
Assume you have some columns in a table that represent zero-based indices and are stored as long data types. You want them to represent one-based indices and be stored as integers to optimize memory use. | ||
|
||
Using the Bulk Column Expressions Gem, you can: | ||
|
||
- Filter your columns by long data types. | ||
- Select the columns you wish to transform. | ||
- Cast the output column(s) to be integers. | ||
- Include `column_value + 1` in the expression field to shift the indices. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
--- | ||
sidebar_position: 10 | ||
title: Bulk Column Rename | ||
id: bulk-column-rename | ||
description: Rename multiple columns in your Dataset in a systematic way. | ||
tags: | ||
- gems | ||
- rename | ||
- columns | ||
--- | ||
|
||
Use the Bulk Column Rename Gem to rename multiple columns in your Dataset in a systematic way. | ||
|
||
## Parameters | ||
|
||
| Parameter | Description | | ||
| ----------------- | ---------------------------------------------------------------------------------------- | | ||
| Columns to rename | Select one or more columns to rename from the dropdown. | | ||
| Method | Choose to add a prefix, add a suffix, or use a custom expression to change column names. | | ||
|
||
Based on the method you select, you will see an option to enter the prefix, suffix, or expression of your choice. | ||
|
||
## Examples | ||
|
||
### Add a prefix | ||
|
||
One example is to add the prefix `meta_` to tag columns that contain metadata. | ||
|
||
![Add prefix to multiple columns](./img/bulk-add-prefix.png) | ||
|
||
### Use a custom expression | ||
|
||
You can accomplish the same or more complex changes using a custom expression like `concat('meta_', column_name)`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
--- | ||
sidebar_position: 12 | ||
title: Data Cleansing | ||
id: data-cleansing | ||
description: Standardize data formats and address missing or null values in the data. | ||
tags: | ||
- gems | ||
- clean | ||
- format | ||
--- | ||
|
||
Use the Data Cleansing Gem to standardize data formats and address missing or null values in the data. | ||
|
||
## Parameters | ||
|
||
| Parameter | Description | | ||
| -------------------------------- | --------------------------------------------------------------- | | ||
| Select columns you want to clean | The set of columns on which to perform cleaning transformations | | ||
| Remove null data | The method used to remove null data | | ||
| Replace null values in column | The method used to replace null values | | ||
| Clean data | Different ways to standardize the format of data in columns | | ||
|
||
## Example | ||
|
||
Assume you have a table that includes customer feedback on individual orders. In this scenario, some customers may not provide feedback, resulting in null values in the data. You can use the Data Cleansing Gem to replace null values with the string `NA`. | ||
|
||
![Replace null with string](./img/replace-null-with-string.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
--- | ||
sidebar_position: 13 | ||
title: Dynamic Select | ||
id: dynamic-select | ||
description: Dynamically filter columns of your dataset based on a set of conditions. | ||
tags: | ||
- gems | ||
- filter | ||
- dynamic | ||
--- | ||
|
||
Use the Dynamic Select Gem to dynamically filter columns of your Dataset based on a set of conditions. | ||
|
||
## Configuration | ||
|
||
There are two ways to configure the Dynamic Select. | ||
|
||
| Configuration | Description | | ||
| --------------------- | --------------------------------------------------------------------------------------------- | | ||
| Select field types | Choose one or more types of columns to keep in the Dataset, such as string, decimal, or date. | | ||
| Select via expression | Create an expression that limits the type of columns to keep in the Dataset. | | ||
|
||
## Examples | ||
|
||
You’ll use Dynamic Select when you want to avoid hard-coding your choice of columns. In other words, rather than define each column to keep in your Pipeline, you let the system automatically choose the columns based on certain conditions or rules. | ||
|
||
### Remove date columns using field type | ||
|
||
Assume you would like to remove irrelevant date and timestamp columns from your Dataset. You can do so with the **Select field types** method by selecting all field types to maintain, except for date and timestamp. | ||
|
||
![Keep all columns except Date and Timestamp column using the visual interface](./img/remove-date-timestamp.png) | ||
|
||
### Remove date columns with an expression | ||
|
||
Using the same example, you can accomplish the same task with the **Select via expression** method by inputting the the expression `column_type NOT IN ('date', 'timestamp')`. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.