diff --git a/docs/SQL/gems/Transformations/_category_.json b/docs/SQL/gems/transform/_category_.json similarity index 68% rename from docs/SQL/gems/Transformations/_category_.json rename to docs/SQL/gems/transform/_category_.json index ba3a576885..b1d7ac0134 100644 --- a/docs/SQL/gems/Transformations/_category_.json +++ b/docs/SQL/gems/transform/_category_.json @@ -1,5 +1,5 @@ { - "label": "Transformations", + "label": "Transform", "position": 1, "collapsible": true, "collapsed": true diff --git a/docs/SQL/gems/Transformations/aggregate.md b/docs/SQL/gems/transform/aggregate.md similarity index 100% rename from docs/SQL/gems/Transformations/aggregate.md rename to docs/SQL/gems/transform/aggregate.md diff --git a/docs/SQL/gems/transform/deduplicate.md b/docs/SQL/gems/transform/deduplicate.md new file mode 100644 index 0000000000..c213748e2f --- /dev/null +++ b/docs/SQL/gems/transform/deduplicate.md @@ -0,0 +1,98 @@ +--- +title: Deduplicate +id: deduplicate +description: Remove rows with duplicate values of specified columns +sidebar_position: 3 +tags: + - gems + - dedupe + - distinct + - unique +--- + +Removes rows with duplicate values of specified columns. + +## Parameters + +| Parameter | Description | Required | +| :--------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- | +| Source | Input source | True | +| Row to keep | - `Distinct Rows`: Keeps all distinct rows. This is equivalent to performing a `select distinct` operation
- `Unique Only`: Keeps rows that don't have duplicates
- `First`: Keeps first occurrence of the duplicate row
- `Last`: Keeps last occurrence of the duplicate row
Default is `Distinct Rows` | True | +| Deduplicate On Columns | Columns to consider while removing duplicate rows (not required for `Distinct Rows`) | True | + +## Row to keep options + +As mentioned in the previous parameters, there are four **Row to keep** options that you can use in your deduplicate Gem. + +![Deduplicate row to keep](./img/deduplicate_row_to_keep.png) + +In the Code view, you can see that the Deduplicate Gem contains `SELECT DISTINCT *` when using the `Distinct Rows` option. + +![Deduplicate code view](./img/deduplicate_code_view.png) + +## Example + +Suppose you're deduplicating the following table. + +| First_Name | Last_Name | Type | Contact | +| :--------- | :-------- | :---- | :---------------- | +| John | Doe | phone | 123-456-7890 | +| John | Doe | phone | 123-456-7890 | +| John | Doe | phone | 123-456-7890 | +| Alice | Johnson | phone | 246-135-0987 | +| Alice | Johnson | phone | 246-135-0987 | +| Alice | Johnson | email | alice@johnson.com | +| Alice | Johnson | email | alice@johnson.com | +| Bob | Smith | email | bob@smith.com | + +For `Distinct Rows`, the interim data will show the following: + +| First_Name | Last_Name | Type | Contact | +| :--------- | :-------- | :---- | :---------------- | +| John | Doe | phone | 123-456-7890 | +| Alice | Johnson | phone | 246-135-0987 | +| Alice | Johnson | email | alice@johnson.com | +| Bob | Smith | email | bob@smith.com | + +The `First` and `Last` options work similarly to `Distinct Rows`, but they keep the first and last occurrence of the duplicate rows respectively. + +For `Unique Only`, the interim data will look like the following: + +| First_Name | Last_Name | Type | Contact | +| :--------- | :-------- | :---- | :------------ | +| Bob | Smith | email | bob@smith.com | + +You'll be left with only one unique row since the rest were all duplicates. + +--- + +You can add `First_Name` and `Last_Name` to Deduplicate On Columns if you want to further deduplicate the table. + +For `Distinct Rows`, the interim data will show the following: + +| First_Name | Last_Name | +| :--------- | :-------- | +| John | Doe | +| Alice | Johnson | +| Bob | Smith | + +:::note + +For `First`, `Last`, and `Unique Only`, the interim data will contain all columns, irrespective of the columns that were added. + +For `First` and `Last`, the interim data will look like the following: + +| First_Name | Last_Name | Type | Contact | +| :--------- | :-------- | :---- | :---------------- | +| John | Doe | phone | 123-456-7890 | +| Alice | Johnson | phone | 246-135-0987 | +| Alice | Johnson | email | alice@johnson.com | +| Bob | Smith | email | bob@smith.com | + +For `Unique Only`, the interim data will look like the following: + +| First_Name | Last_Name | Type | Contact | +| :--------- | :-------- | :---- | :------------ | +| Bob | Smith | email | bob@smith.com | + +::: diff --git a/docs/SQL/gems/transform/flattenschema.md b/docs/SQL/gems/transform/flattenschema.md new file mode 100644 index 0000000000..d6dd06b9c5 --- /dev/null +++ b/docs/SQL/gems/transform/flattenschema.md @@ -0,0 +1,68 @@ +--- +title: Flatten Schema +id: flattenschema +description: Flatten nested data +sidebar_position: 4 +tags: + - gems + - schema + - explode + - flatten +--- + +When processing raw data it can be useful to flatten complex data types like `Struct`s and `Array`s into simpler, flatter schemas. This allows you to preserve all schemas, and not just the first one. You can use FlattenSchema with Snowflake Models. + +![The FlattenSchema gem](./img/flatten_gem.png) + +## The Input + +FlattenSchema works on Snowflake sources that have nested columns that you'd like to extract into a flat schema. + +For example, with an input schema like so: + +![Input schema](./img/flatten_input.png) + +And the data looks like so: + +![Input data](./img/flatten_input_interim.png) + +We want to extract the `contact`, and all of the columns from the `struct`s in `content` into a flattened schema. + +## The Expressions + +Having added a `FlattenSchema` Gem to your Model, all you need to do is click the column names you wish to extract and they'll be added to the `Expressions` section. + +:::tip + +You can click to add all columns, which would make all nested leaf level values of an object visible as columns. + +::: + +Once added you can change the `Output Column` for a given row to change the name of the Column in the output. + +![Adding expressions](./img/flatten_add_exp.png) + +## The Output + +If we check the `Output` tab in the Gem, you'll see the schema that we've created using the selected columns. + +And here's what the output data looks like: + +![Output interim](./img/flatten_output_interim.png) + +The nested contact information has been flatten so that you have individual rows for each content type. + +## Advanced settings + +If you're familiar with Snowflake's `FLATTEN` table function, you can use the advanced settings to customize the optional column arguments. + +To use the advanced settings, hover over a column, and click the dropdown arrow. + +![Advanced settings](./img/flatten_advanced_settings.png) + +You can customize the following options: + +- Path to the element: The path to the element within the variant data structure that you want to flatten. +- Flatten all elements recursively: If set to `false`, only the element mentioned in the path is expanded. If set to `true`, all sub-elements are expanded recursively. This is set to false by default. +- Preserve rows with missing fields: If set to `false`, rows with missing fields are omitted from the output. If set to `true`, rows with missing fields are generated with `null` in the key, index, and value columns. This is set to false by default. +- Datatype that needs to be flattened: The data type that you want to flatten. You can choose `Object`, `Array`, or `Both`. This is set to `Both` by default. diff --git a/docs/SQL/gems/transformations/img/deduplicate_code_view.png b/docs/SQL/gems/transform/img/deduplicate_code_view.png similarity index 100% rename from docs/SQL/gems/transformations/img/deduplicate_code_view.png rename to docs/SQL/gems/transform/img/deduplicate_code_view.png diff --git a/docs/SQL/gems/transformations/img/deduplicate_row_to_keep.png b/docs/SQL/gems/transform/img/deduplicate_row_to_keep.png similarity index 100% rename from docs/SQL/gems/transformations/img/deduplicate_row_to_keep.png rename to docs/SQL/gems/transform/img/deduplicate_row_to_keep.png diff --git a/docs/SQL/gems/transformations/img/flatten_add_exp.png b/docs/SQL/gems/transform/img/flatten_add_exp.png similarity index 100% rename from docs/SQL/gems/transformations/img/flatten_add_exp.png rename to docs/SQL/gems/transform/img/flatten_add_exp.png diff --git a/docs/SQL/gems/transformations/img/flatten_advanced_settings.png b/docs/SQL/gems/transform/img/flatten_advanced_settings.png similarity index 100% rename from docs/SQL/gems/transformations/img/flatten_advanced_settings.png rename to docs/SQL/gems/transform/img/flatten_advanced_settings.png diff --git a/docs/SQL/gems/transformations/img/flatten_gem.png b/docs/SQL/gems/transform/img/flatten_gem.png similarity index 100% rename from docs/SQL/gems/transformations/img/flatten_gem.png rename to docs/SQL/gems/transform/img/flatten_gem.png diff --git a/docs/SQL/gems/transformations/img/flatten_input.png b/docs/SQL/gems/transform/img/flatten_input.png similarity index 100% rename from docs/SQL/gems/transformations/img/flatten_input.png rename to docs/SQL/gems/transform/img/flatten_input.png diff --git a/docs/SQL/gems/transformations/img/flatten_input_interim.png b/docs/SQL/gems/transform/img/flatten_input_interim.png similarity index 100% rename from docs/SQL/gems/transformations/img/flatten_input_interim.png rename to docs/SQL/gems/transform/img/flatten_input_interim.png diff --git a/docs/SQL/gems/transformations/img/flatten_output_interim.png b/docs/SQL/gems/transform/img/flatten_output_interim.png similarity index 100% rename from docs/SQL/gems/transformations/img/flatten_output_interim.png rename to docs/SQL/gems/transform/img/flatten_output_interim.png diff --git a/docs/SQL/gems/Transformations/sql-transformations.md b/docs/SQL/gems/transform/transform.md similarity index 97% rename from docs/SQL/gems/Transformations/sql-transformations.md rename to docs/SQL/gems/transform/transform.md index a5a0890eee..5b414303f3 100644 --- a/docs/SQL/gems/Transformations/sql-transformations.md +++ b/docs/SQL/gems/transform/transform.md @@ -1,6 +1,6 @@ --- -title: SQL Transformations -id: sql-transformations +title: Transform +id: transform description: Data transformation steps in SQL sidebar_position: 1 tags: diff --git a/docs/getting-started/getting-started-sql-snowflake.md b/docs/getting-started/getting-started-sql-snowflake.md index 71e29e807c..badfa648fd 100644 --- a/docs/getting-started/getting-started-sql-snowflake.md +++ b/docs/getting-started/getting-started-sql-snowflake.md @@ -251,7 +251,7 @@ Here we create a `customers_nations` model that’s going to enrich our customer The `customers_nations` model is stored as a `.sql` file on Git. The table or view defined by the model is stored on the SQL warehouse, database, and schema defined in the attached Fabric. -Suggestions are provided each step of the way. If Copilot's suggestions aren't exactly what you need, just select and configure the Gems as desired. Click [here](../SQL/gems/joins.md) for details on configuring joins or [here](../SQL/gems/transformations/sql-aggregate) for aggregations. +Suggestions are provided each step of the way. If Copilot's suggestions aren't exactly what you need, just select and configure the Gems as desired. Click [here](../SQL/gems/joins.md) for details on configuring joins or [here](../SQL/gems/transform/aggregate.md) for aggregations. ### 4.5 Interactively Test