Merge branch 'main' into 3.4.2-release-notes

SimpleDataLabsInc · Dec 12, 2024 · cfdf552 · cfdf552
2 parents 08193fa + ffae0c7
commit cfdf552
Show file tree

Hide file tree

Showing 160 changed files with 1,296 additions and 1,179 deletions.
diff --git a/docs/SQL/gems/custom/custom.md b/docs/SQL/gems/custom/custom.md
@@ -8,6 +8,8 @@ tags:
   - sql
 ---
 
+<h3><span class="badge">SQL Gem</span></h3>
+
 :::caution
 This page about Custom SQL Gems is under construction. Please pardon our dust.
 :::

diff --git a/docs/SQL/gems/gems.md b/docs/SQL/gems/gems.md
@@ -1,5 +1,5 @@
 ---
-title: Gems
+title: SQL Gems
 id: sql-gems
 description: Gems are data seeds, sources, transformations, and targets
 sidebar_position: 2
@@ -11,9 +11,11 @@ tags:
   - cte
 ---
 
-In Prophecy and dbt, Data [Models](/docs/concepts/project/models.md) are SQL statements that build a single table or view. Prophecy visualizes Data Models to illustrate the many steps needed to generate a single table or view. Gems represent the individual steps. A Gem is a unit of functionality ranging from reading, transforming, writing, and various other ad-hoc operations on data.
+In Prophecy and dbt, data [models](/docs/concepts/project/models.md) are groups of SQL statements used to create a single table or view. Prophecy simplifies data modeling by visualizing the data model as a series of steps, each represented by a [Gem](/docs/concepts/project/gems.md). Gems are functional units that perform tasks such as reading, transforming, writing, or handling other data operations.
 
-Each Gem represents a SQL statement, and allows users to construct that statement by configuring a visual interface. Prophecy is smart about whether to construct a CTE or subquery for each Gem; users just configure the visual interface, and Prophecy includes the Gem's SQL statement as part of the Model. Here is a nice [overview](/docs/concepts/project/gems.md) of all the aspects of the Gem user interface. The table below outlines each Gem category:
+Each Gem corresponds to a SQL statement, which users can construct through an intuitive visual interface. Prophecy handles the underlying complexity by deciding whether each Gem should generate a CTE or a subquery. Users simply configure the Gem's interface, and Prophecy integrates the resulting SQL into the larger data model.
+
+The table below outlines the different SQL Gem categories.
 
 <div class="gems-table">
 

diff --git a/docs/SQL/gems/joins.md b/docs/SQL/gems/joins.md
@@ -1,5 +1,5 @@
 ---
-title: Joins
+title: Join
 id: data-joins
 description: Join data from multiple tables
 sidebar_position: 3
@@ -10,7 +10,9 @@ tags:
   - transformation
 ---
 
-Upon opening the join Gem, you can see a pop-up which provides several helpful features.
+<h3><span class="badge">SQL Gem</span></h3>
+
+Upon opening the Join Gem, you can see a pop-up which provides several helpful features.
 
 ![Join definition](img/JoinCondition.png)
 
@@ -20,7 +22,7 @@ To fill-in our **(5) Join condition** within the **(4) Conditions** section, sta
 
 When you’re writing your join conditions, you’ll see available functions and columns to speed up your development. When the autocomplete appears, press ↑, ↓ to navigate between the suggestions and press tab to accept the suggestion.
 
-Select the **(6)Join Type** according to the provider, eg [Databricks](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-join.html) or [Snowflake.](https://docs.snowflake.com/en/user-guide/querying-joins)
+Select the **(6)Join Type** according to the provider, e.g. [Databricks](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-qry-select-join.html) or [Snowflake.](https://docs.snowflake.com/en/user-guide/querying-joins)
 
 The **(7) Expressions** tab allows you to define the set of output columns that are going to be returned from the Gem. Here we leave it empty, which by default passes through all the input columns, from both of the joined sources, without any modifications.
 

diff --git a/docs/SQL/gems/subgraph/subgraph.md b/docs/SQL/gems/subgraph/subgraph.md
@@ -8,7 +8,9 @@ tags:
   - SQL
 ---
 
-Subgraph allows you to take multiple distinct Gems and wrap them under a single parent Gem. Doing so can help you decompose complex logic into more manageable components and simplify the Visual view of your model.
+<h3><span class="badge">SQL Gem</span></h3>
+
+Subgraph Gems let you take multiple different Gems and wrap them under a single reusable parent Gem. In other words, they allow you to decompose complex logic into reusable components and simplify the visual view of your data model.
 
 ## Basic Subgraph
 

diff --git a/docs/SQL/gems/transform/aggregate.md b/docs/SQL/gems/transform/aggregate.md
@@ -11,6 +11,8 @@ tags:
   - transformation
 ---
 
+<h3><span class="badge">SQL Gem</span></h3>
+
 Together let's deconstruct a commonly used Transformation, the Aggregate Gem. Follow along in the `HelloWorld_SQL` Project.
 
 ## Using the Gem

diff --git a/docs/SQL/gems/transform/deduplicate.md b/docs/SQL/gems/transform/deduplicate.md
@@ -10,6 +10,8 @@ tags:
   - unique
 ---
 
+<h3><span class="badge">SQL Gem</span></h3>
+
 Removes rows with duplicate values of specified columns.
 
 ## Parameters

diff --git a/docs/SQL/gems/transform/flattenschema.md b/docs/SQL/gems/transform/flattenschema.md
@@ -10,6 +10,8 @@ tags:
   - flatten
 ---
 
+<h3><span class="badge">SQL Gem</span></h3>
+
 When processing raw data it can be useful to flatten complex data types like `Struct`s and `Array`s into simpler, flatter schemas. This allows you to preserve all schemas, and not just the first one. You can use FlattenSchema with Snowflake Models.
 
 ![The FlattenSchema gem](./img/flatten_gem.png)

diff --git a/docs/Spark/extensibility/img/add-function.png b/docs/Spark/extensibility/img/add-function.png
diff --git a/docs/Spark/extensibility/img/call-function.png b/docs/Spark/extensibility/img/call-function.png
diff --git a/docs/Spark/extensibility/img/define-function.png b/docs/Spark/extensibility/img/define-function.png
diff --git a/docs/Spark/extensibility/user-defined-functions.md b/docs/Spark/extensibility/user-defined-functions.md
@@ -9,46 +9,32 @@ tags:
   - udafs
 ---
 
-Allows you to create user defined functions (UDF) which are then usable anywhere in the Pipeline
+Prophecy lets you create user-defined functions (UDFs) which can be used anywhere in the Pipeline.
 
-### Parameters
+## Parameters
 
-| Parameter               | Description                                                                                                                              | Required |
-| :---------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- | :------- |
-| UDF Name                | Name of the UDF to be used to register it. All calls to the UDF will use this name                                                       | True     |
-| Definition              | Definition of the UDF function. <br/> Eg: `udf((value:Int)=>value*value)`                                                                | True     |
-| UDF initialization code | Code block that contains initialization of entities used by UDFs. This could for example contain any static mapping that a UDF might use | False    |
+| Parameter               | Description                                                                                                                                 | Required |
+| :---------------------- | :------------------------------------------------------------------------------------------------------------------------------------------ | :------- |
+| Function name           | The name of the function as it appears in your project.                                                                                     | True     |
+| UDF Name                | The name of the UDF that will register it. All calls to the UDF will use this name.                                                         | True     |
+| Definition              | Definition of the UDF function. <br/> For example, `udf((value:Int)=>value*value)`                                                          | True     |
+| UDF initialization code | Code block that contains initialization of entities used by UDFs. This could, for example, contain any static mapping that a UDF might use. | False    |
 
-### Examples
+## Steps
 
----
+There are a few steps to take to create and use a new UDF.
 
-#### Defining and Using UDF
-
-```mdx-code-block
-import App from '@site/src/components/slider';
-
-export const ImageData = [
-  {
-    "image":"/img/udf/1.png",
-    "description":<h3 style={{padding:'10px'}}>Step 1 - Open UDF definition window</h3>,
-  },
-  {
-    "image":"/img/udf/2.1.png",
-    "description":<h3 style={{padding:'10px'}}>Step 2 (Python)- Define Python UDF</h3>,
-  },
-  {
-    "image":"/img/udf/2.2.png",
-    "description":<h3 style={{padding:'10px'}}> Step 2 (Scala) - Define Scala UDf</h3>
-  },
-  {
-    "image":"/img/udf/3.png",
-    "description":<h3 style={{padding:'10px'}}>Step 3 - UDFs can now be called by their defined names</h3>,
-  },
-];
-
-<App ImageData={ImageData}></App>
-```
+1. Create a new function. You can find the **Functions** section in the left sidebar of a project page.
+
+![Add a function to the pipeline](img/add-function.png)
+
+2. Define the function.
+
+![Define the function](img/define-function.png)
+
+3. Call the function.
+
+![Call the function](img/call-function.png)
 
 ````mdx-code-block
 import Tabs from '@theme/Tabs';

diff --git a/docs/Spark/fabrics/dataproc/_category_.json b/docs/Spark/fabrics/dataproc/_category_.json
@@ -0,0 +1,6 @@
+{
+  "label": "Google Cloud Dataproc",
+  "position": 8,
+  "collapsible": true,
+  "collapsed": true
+}
diff --git a/docs/Spark/fabrics/dataproc/dataproc-tips.md b/docs/Spark/fabrics/dataproc/dataproc-tips.md
@@ -0,0 +1,44 @@
+---
+title: "Connectivity Tips"
+id: gcp-dataproc-fabric-tips
+description: If your cluster doesn't connect, try these tips
+sidebar_position: 1
+tags:
+  - deployment
+  - configuration
+  - google
+  - gcp
+  - dataproc
+  - livy
+---
+
+:::tip
+Sometimes the Livy Cluster cannot access the Scala or Python libraries.
+:::
+
+### Error
+
+```
+Creating new Livy Session...
+Using prophecy libs path...repo1.maven.org...
+Using python libraries...files.pythonhosted.org...
+...
+org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)\n\nYARN Diagnostics: ","level":"error"
+```
+
+### Corrective Actions
+
+**Option 1:**  
+Adjust network settings on the Livy Cluster to allow traffic from the Scala Prophecy Library url
+`repo1.maven.org` and the Python Prophecy Library url
+`files.pythonhosted.org`.
+
+**Option 2:**  
+Configure the Scala and Python Library Paths as mentioned [here](./dataproc.md).  
+Configure Scala Library Path.
+`gs://prophecy-public-gcp/prophecy-scala-libs/`.  
+Configure Python Library Path.
+`gs://prophecy-public-gcp/prophecy-python-libs/`.
+
+**Option 3:**  
+Setup an GCS bucket internally. Create two folders as in the previous option, and add `prophecy-scala-libs` and `prophecy-python-libs` in those folders.
diff --git a/docs/Spark/fabrics/dataproc.md → docs/Spark/fabrics/dataproc/dataproc.md b/docs/Spark/fabrics/dataproc.md → docs/Spark/fabrics/dataproc/dataproc.md
@@ -2,7 +2,7 @@
 title: "Google Cloud Dataproc"
 id: gcp-dataproc-fabric-guide
 description: Configuring GCP Dataproc Fabric
-sidebar_position: 7
+sidebar_position: 8
 tags:
   - deployment
   - configuration
@@ -26,7 +26,7 @@ Livy is required for the Fabric. Prophecy provides a script required to deploy a
 
 1. If you don't already have a private key, create a private key for the service account that you're using.
    <br/><br/>
-   <img src={require('./img/createkey.png').default} alt="dataproc security" width="75%" />
+   <img src={require('./../img/createkey.png').default} alt="dataproc security" width="75%" />
    <br/><br/>
 2. Ensure you have the following permissions configured.
 
@@ -79,35 +79,42 @@ gcloud config set account [email protected]
 
 1. Create a Fabric and select **Dataproc**.
    <br/><br/>
-   <img src={require('./img/selectdataproc.png').default} alt="select dataproc" width="75%" />
+   <img src={require('./../img/selectdataproc.png').default} alt="select dataproc" width="75%" />
    <br/><br/>
 2. Fill out your **Project Name** and **Region**, and upload the **Private Key**.
    <br/><br/>
-   <img src={require('./img/configuredataproc.png').default} alt="configure dataproc" width="75%" />
+   <img src={require('./../img/configuredataproc.png').default} alt="configure dataproc" width="75%" />
    <br/><br/>
 3. Click on **Fetch environments** and select the Dataproc **cluster** that you created earlier.
    <br/><br/>
-   <img src={require('./img/selectenv.png').default} alt="select cluster" width="75%" />
+   <img src={require('./../img/selectenv.png').default} alt="select cluster" width="75%" />
    <br/><br/>
 4. Leave everything as default and provide the **Livy URL**. Locate the **External IP** of your cluster instance. Optionally, you may configure the DNS instead of using the IP. The URL is `http://<external-ip>:8998`.
    <br/><br/>
-   <img src={require('./img/externalip.png').default} alt="livy ip" width="75%" />
+   <img src={require('./../img/externalip.png').default} alt="livy ip" width="75%" />
    <br/><br/>
 5. Configure the bucket associated with your cluster.
    <br/><br/>
-   <img src={require('./img/bucketloc.png').default} alt="bucket location" width="75%" />
+   <img src={require('./../img/bucketloc.png').default} alt="bucket location" width="75%" />
    <br/><br/>
 6. Add the **Job Size**.
    <br/><br/>
-   <img src={require('./img/procjobsize.png').default} alt="Job Size" width="55%" />
+   <img src={require('./../img/procjobsize.png').default} alt="Job Size" width="55%" />
    <br/><br/>
 7. Configure Scala Library Path.
    `gs://prophecy-public-gcp/prophecy-scala-libs/`.
 8. Configure Python Library Path.
    `gs://prophecy-public-gcp/prophecy-python-libs/`.
    <br/><br/>
-   <img src={require('./img/proclib.png').default} alt="dependences" width="85%" />
+   <img src={require('./../img/proclib.png').default} alt="dependences" width="85%" />
    <br/><br/>
 9. Click on **Complete**.
    <br/><br/>
    Run a simple Pipeline and make sure that the interim returns data properly.
+
+```mdx-code-block
+import DocCardList from '@theme/DocCardList';
+import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
+
+<DocCardList items={useCurrentSidebarCategory().items}/>
+```
diff --git a/docs/Spark/fabrics/diagnostics.md b/docs/Spark/fabrics/diagnostics.md
@@ -2,7 +2,7 @@
 title: "Diagnostics"
 id: fabric-diagnostics
 description: Troubleshooting Fabrics using diagnostics
-sidebar_position: 8
+sidebar_position: 9
 tags:
   - diagnostics
   - fabric

diff --git a/docs/Spark/fabrics/emr-fabric-serverless.md b/docs/Spark/fabrics/emr-fabric-serverless.md
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,6 +8,8 @@ tags: @@
       - sql
     ---
+    <h3><span class="badge">SQL Gem</span></h3>
     :::caution
     This page about Custom SQL Gems is under construction. Please pardon our dust.
     :::
@@ Expand Down @@