Skip to content

Commit

Permalink
Reorganise the documentation structure for Kedro / Databricks integra…
Browse files Browse the repository at this point in the history
…tion (#2442)

* Remove visualization docs from deployment section

Signed-off-by: Jannic Holzer <[email protected]>

* Add new directory and new visualization docs

Signed-off-by: Jannic Holzer <[email protected]>

* Remove old deployment docs

Signed-off-by: Jannic Holzer <[email protected]>

* Add deployment docs to new directory

Signed-off-by: Jannic Holzer <[email protected]>

* Modify spelling of visualize to British English 'visualise'

Signed-off-by: Jannic Holzer <[email protected]>

* Add new documentation to index

Signed-off-by: Jannic Holzer <[email protected]>

* Fix lint

Signed-off-by: Jannic Holzer <[email protected]>

* Modify title of deployment guide

Signed-off-by: Jannic Holzer <[email protected]>

* Remove spurious max depth to test if docs build

Signed-off-by: Jannic Holzer <[email protected]>

* Refactor index.rst to try to avoid build failing

Signed-off-by: Jannic Holzer <[email protected]>

* Modify call to sphinx-build to test if RTD will work

Signed-off-by: Jannic Holzer <[email protected]>

* Revise index.rst

Signed-off-by: Jo Stichbury <[email protected]>

* Lint and resolve

Signed-off-by: Jo Stichbury <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>

* Change title to include mention of Notebooks

Signed-off-by: Jannic Holzer <[email protected]>

* Remove verbosity from viz on Databricks intro

Signed-off-by: Jannic Holzer <[email protected]>

* Revert command modifcation

Signed-off-by: Jannic Holzer <[email protected]>

* Rename databricks visualisation docs

Signed-off-by: Jannic Holzer <[email protected]>

* Add a line between copy and code snippets for rendering

Co-authored-by: Jo Stichbury <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>

* Remove gerund

Co-authored-by: Jo Stichbury <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>

* Remove spurious 'i.e.'

Co-authored-by: Jo Stichbury <[email protected]>
Signed-off-by: Jannic Holzer <[email protected]>

* Rename workflow_integration to integrations

Signed-off-by: Jannic Holzer <[email protected]>

* Rename index entry to 'Integrations'

Signed-off-by: Jannic Holzer <[email protected]>

* Convert databricks.rst to MyST format

Signed-off-by: Jannic Holzer <[email protected]>

* Rename databricks.rst to databricks.md

Signed-off-by: Jannic Holzer <[email protected]>

* Remove spurious conflict messages

Signed-off-by: Jannic Holzer <[email protected]>

---------

Signed-off-by: Jannic Holzer <[email protected]>
Signed-off-by: Jo Stichbury <[email protected]>
Co-authored-by: Jo Stichbury <[email protected]>
  • Loading branch information
jmholzer and stichbury authored Mar 30, 2023
1 parent cdcd665 commit c2968ba
Show file tree
Hide file tree
Showing 7 changed files with 52 additions and 14 deletions.
1 change: 0 additions & 1 deletion docs/source/deployment/deployment_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ We also provide information to help you deploy to the following:
* to [Prefect](prefect.md)
* to [Kubeflow Workflows](kubeflow.md)
* to [AWS Batch](aws_batch.md)
* to [Databricks](databricks.md)
* to [Dask](dask.md)

<!--- There has to be some non-link text in the bullets above, if it's just links, there's a Sphinx bug that fails the build process-->
Expand Down
23 changes: 21 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,13 @@ Welcome to Kedro's documentation!

logging/logging

.. toctree::
:maxdepth: 2
:caption: Integrations

integrations/databricks.rst
integrations/pyspark.rst

.. toctree::
:maxdepth: 2
:caption: Development
Expand All @@ -174,12 +181,17 @@ Welcome to Kedro's documentation!
deployment/prefect
deployment/kubeflow
deployment/aws_batch
deployment/databricks
deployment/aws_sagemaker
deployment/aws_step_functions
deployment/airflow_astronomer
deployment/dask

.. toctree::
:maxdepth: 2
:caption: Databricks integration

databricks_integration/visualisation

.. toctree::
:maxdepth: 2
:caption: PySpark integration
Expand All @@ -188,9 +200,16 @@ Welcome to Kedro's documentation!

.. toctree::
:maxdepth: 2
:caption: Resources
:caption: FAQs

faq/faq
faq/architecture_overview
faq/kedro_principles

.. toctree::
:maxdepth: 2
:caption: Resources

resources/glossary


Expand Down
9 changes: 9 additions & 0 deletions docs/source/integrations/databricks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Databricks integration

```{toctree}
:caption: Databricks
:maxdepth: 2
databricks_workspace.md
visualisation.md
```
12 changes: 12 additions & 0 deletions docs/source/integrations/databricks_visualisation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# How to run Kedro-Viz on Databricks

[Kedro-Viz](../visualisation/kedro-viz_visualisation.md) is a tool that allows you to visualise your Kedro pipeline. It is a standalone web application that runs on a web browser, it can be run on a local machine or on Databricks itself.

For Kedro-Viz to run with your Kedro project, you need to ensure that both the packages are installed in the same scope (notebook-scoped vs. cluster library). This means that if you `%pip install kedro` from inside your notebook then you should also `%pip install kedro-viz` from inside your notebook.
If your cluster comes with Kedro installed on it as a library already then you should also add Kedro-Viz as a [cluster library](https://docs.microsoft.com/en-us/azure/databricks/libraries/cluster-libraries).

Kedro-Viz can then be launched in a new browser tab with the `%run_viz` line magic:

```ipython
In [2]: %run_viz
```
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Deployment to a Databricks cluster
# Develop a project with Databricks Workspace and Notebooks

This tutorial uses the [PySpark Iris Kedro Starter](https://github.com/kedro-org/kedro-starters/tree/main/pyspark-iris) to illustrate how to bootstrap a Kedro project using Spark and deploy it to a [Databricks cluster on AWS](https://databricks.com/aws).

Expand Down Expand Up @@ -252,16 +252,6 @@ You must explicitly upgrade your `pip` version by doing the below:

After this, you can reload Kedro by running the line magic command `%reload_kedro <project_root>`.

### 10. Running Kedro-Viz on Databricks

For Kedro-Viz to run with your Kedro project, you need to ensure that both the packages are installed in the same scope (notebook-scoped vs. cluster library). i.e. if you `%pip install kedro` from inside your notebook then you should also `%pip install kedro-viz` from inside your notebook.
If your cluster comes with Kedro installed on it as a library already then you should also add Kedro-Viz as a [cluster library](https://docs.microsoft.com/en-us/azure/databricks/libraries/cluster-libraries).

Kedro-Viz can then be launched in a new browser tab with the `%run_viz` line magic:
```ipython
In [2]: %run_viz
```

## How to use datasets stored on Databricks DBFS

DBFS is a distributed file system mounted into a DataBricks workspace and accessible on a DataBricks cluster. It maps cloud object storage URIs to relative paths so as to simplify the process of persisting files. With DBFS, libraries can read from or write to distributed storage as if it's a local file.
Expand Down
9 changes: 9 additions & 0 deletions docs/source/integrations/pyspark.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

PySpark integration
=============================================

.. toctree::
:maxdepth: 2
:caption: PySpark

pyspark_integration.md
File renamed without changes.

0 comments on commit c2968ba

Please sign in to comment.