Skip to content

Commit

Permalink
mior
Browse files Browse the repository at this point in the history
  • Loading branch information
ArnoMicrosoft committed Jun 17, 2020
1 parent 00f8c8c commit a4aafc6
Showing 1 changed file with 0 additions and 3 deletions.
3 changes: 0 additions & 3 deletions articles/synapse-analytics/how-to-analyze-complex-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ When printing the schema of the data frame of that object (called **df**) with t
* the yellow color represents nested structure
* the green color represents an array with two elements

[!div class="mx-imgBorder"]
[![Schema origin](./media/how-to-complex-schema/schema-origin.png)](./media/how-to-complex-schema/schema-origin.png#lightbox)

_rid, _ts and _etag have been added in the system as the document was ingested into Azure Cosmos DB transactional store.
Expand All @@ -84,7 +83,6 @@ The data frame above counts for 5 columns and 1 row only. After transformation,

With Synapse Spark, transforming nested structures into columns and array elements into multiple rows, is easy. The steps below can be used by everyone for their own implementation.

[!div class="mx-imgBorder"]
[![Spark transformations steps](./media/how-to-complex-schema/spark-transfo-steps.png)](./media/how-to-complex-schema/spark-transfo-steps.png#lightbox)

**Step 1**: We define a function to flatten nested schema. This function can be used without change. Create a cell in a Pyspark notebook with that function:
Expand Down Expand Up @@ -154,7 +152,6 @@ The display function should show 13 columns and 2 rows:

The function printSchema of the data frame df_flat_explode_flat returns the following result:

[!div class="mx-imgBorder"]
[![Schema final](./media/how-to-complex-schema/schema-final.png)](./media/how-to-complex-schema/schema-final.png#lightbox)

## Read arrays and nested structures directly with SQL serverless
Expand Down

0 comments on commit a4aafc6

Please sign in to comment.