Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Preview Dataset #1757

Merged
merged 26 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
691c7ba
first draft
rashidakanchwala Feb 21, 2024
70fa78f
add to toctree
rashidakanchwala Feb 21, 2024
1b692f8
fix build errors
rashidakanchwala Feb 21, 2024
9e93ca8
revise structure
rashidakanchwala Feb 21, 2024
f57b1b6
test layout
rashidakanchwala Feb 21, 2024
2628396
test layout
rashidakanchwala Feb 21, 2024
93b79ae
typo in title
rashidakanchwala Feb 21, 2024
6e83fc5
doc fixes based on review
rashidakanchwala Feb 22, 2024
fb2d6e7
Merge branch 'main' into docs/preview-datasets
rashidakanchwala Feb 22, 2024
8c80933
typo
rashidakanchwala Feb 22, 2024
2a9c923
Merge branch 'docs/preview-datasets' of https://github.com/kedro-org/…
rashidakanchwala Feb 22, 2024
919ab73
edit disable preview sec
rashidakanchwala Feb 22, 2024
b57b3d7
Merge branch 'main' into docs/preview-datasets
ravi-kumar-pilla Feb 22, 2024
5d4963f
fixes based on review
rashidakanchwala Feb 23, 2024
401434e
fix typo
rashidakanchwala Feb 23, 2024
c8e8f84
fix based on reviews -2
rashidakanchwala Feb 23, 2024
e296352
fixes based on reviews
rashidakanchwala Feb 26, 2024
ff7b87b
add link to typing file
rashidakanchwala Feb 26, 2024
22809be
Update docs/source/preview_pandas_datasets.md
rashidakanchwala Feb 26, 2024
fdd9613
typo and spacing fix
rashidakanchwala Feb 26, 2024
9f67c74
Merge branch 'docs/preview-datasets' of https://github.com/kedro-org/…
rashidakanchwala Feb 26, 2024
1d89fed
non caps
rashidakanchwala Feb 26, 2024
ed42972
based on reviews
rashidakanchwala Feb 26, 2024
cf7a2cb
fix docs lint
rashidakanchwala Feb 26, 2024
ebf6f30
revert changes
rashidakanchwala Feb 26, 2024
074c92b
Merge branch 'main' into docs/preview-datasets
ravi-kumar-pilla Feb 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/experiment_tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Kedro has always supported parameter versioning (as part of your codebase with a

Kedro-Viz version 4.1.1 introduced metadata capture, visualisation, discovery and comparison, enabling you to access, edit and [compare your experiments](#access-run-data-and-compare-runs) and additionally [track how your metrics change over time](#view-and-compare-metrics-data).

Kedro-Viz version 5.0 also supports the [display and comparison of plots, such as Plotly and Matplotlib](./visualise_charts_with_plotly.md). Support for metric plots (timeseries and parellel coords) was added to Kedro-Viz version 5.2.1.
Kedro-Viz version 5.0 also supports the [display and comparison of plots, such as Plotly and Matplotlib](./preview_plotly_datasets.md). Support for metric plots (timeseries and parellel coords) was added to Kedro-Viz version 5.2.1.
merelcht marked this conversation as resolved.
Show resolved Hide resolved

Kedro-Viz version 6.2 includes support for collaborative experiment tracking using a cloud storage solution. This means that multiple users can store their experiment data in a centralized remote storage, such as AWS S3, and access it through Kedro-Viz.

Expand Down
2 changes: 0 additions & 2 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,5 @@ Take a look at the <a href="https://demo.kedro.org/" target="_blank" rel="noopen
kedro-viz_visualisation
share_kedro_viz
preview_datasets
visualise_charts_with_plotly
visualise_charts_with_matplotlib
experiment_tracking
```
24 changes: 24 additions & 0 deletions docs/source/preview_custom_datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Extend Preview to Custom Datasets
astrojuanlu marked this conversation as resolved.
Show resolved Hide resolved

When creating a custom dataset, if you wish to enable data preview for that dataset, you must implement a `preview()` function within the custom dataset class. Kedro-viz currently supports previewing Tables, Plotly charts, Images, and JSON objects.
merelcht marked this conversation as resolved.
Show resolved Hide resolved

The return type of the `preview()` function should match one of the following types, as defined in the `kedro-datasets` source code (_typing.py file):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance to add a link to such file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the github link fine ? - https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/kedro_datasets/_typing.py

I can't seem to find docs source code link for _typing.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is not documented, so a link to the source code is fine for now.


```python
TablePreview = NewType("TablePreview", dict)
ImagePreview = NewType("ImagePreview", bytes)
PlotlyPreview = NewType("PlotlyPreview", dict)
JSONPreview = NewType("JSONPreview", dict)
```
Comment on lines +7 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show an example for each type? This help users to link what are they expecting to see in the UI.

https://clear.ml/docs/latest/docs/guides/reporting/plotly_reporting


Below is an example demonstrating how to implement the preview() function for a CustomDataset class that utilizes TablePreview to enable previewing tabular data on Kedro-viz:
merelcht marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Below is an example demonstrating how to implement the preview() function for a CustomDataset class that utilizes TablePreview to enable previewing tabular data on Kedro-viz:
Below is an example demonstrating how to implement the preview() function for a CustomDataset class that utilizes `TablePreview` to enable previewing tabular data on Kedro-viz:


```python

from kedro_datasets._typing import TablePreview

class CustomDataset:
def preview(self, nrows: int = 5) -> TablePreview:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe add an example that works, to give users a bit more guidance on how they should realistically implement a preview() method?

Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need some help on this as I can't think of a realistic CustomDataset that is not a part of kedro-datasets. @astrojuanlu -- do you have some ideas?

# Add logic for generating preview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I suggested adding a working example, I didn't necessarily mean anything complex, just in this case some code that would produce a working TablePreview and demonstrates how nrows, ncolumns and filters would be used.

pass
```
96 changes: 37 additions & 59 deletions docs/source/preview_datasets.md
Original file line number Diff line number Diff line change
@@ -1,84 +1,62 @@
# Preview data in Kedro-Viz
# Preview datasets in Kedro-Viz

This page describes how to preview data from different datasets in a Kedro project with Kedro-Viz. Dataset preview was introduced in Kedro-Viz version 6.3.0, which offers preview for `CSVDatasets` and `ExcelDatasets`.
To provide users with a glimpse of their datasets within a Kedro project, Kedro-Viz offers a preview feature. This feature was introduced in Kedro-Viz version 6.3.0 and expanded upon in version 8.0.0. Initially, it supported `CSVDatasets` and `ExcelDatasets`, and was later extended to encompass additional dataset types such as `PlotlyDatasets` and image datasets like `MatplotlibWriter`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To provide users with a glimpse of their datasets within a Kedro project, Kedro-Viz offers a preview feature. This feature was introduced in Kedro-Viz version 6.3.0 and expanded upon in version 8.0.0. Initially, it supported `CSVDatasets` and `ExcelDatasets`, and was later extended to encompass additional dataset types such as `PlotlyDatasets` and image datasets like `MatplotlibWriter`.
To provide users with a glimpse of their datasets within a Kedro project, Kedro-Viz offers a preview feature. This feature was introduced in Kedro-Viz version 6.3.0 and expanded upon in version 8.0.0. Initially, it supported `CSVDataset` and `ExcelDataset`, and was later extended to encompass additional dataset types such as `PlotlyDataset` and image datasets like `MatplotlibWriter`.

We use the {doc}`spaceflights tutorial<kedro:tutorial/spaceflights_tutorial>` to demonstrate how to add data preview for the `customer`, `shuttle` and `reviews` datasets. Even if you have not yet worked through the tutorial, you can still follow this example; you'll need to use the Kedro starter for the spaceflights tutorial to generate a copy of the project with working code in place.

If you haven't installed Kedro {doc}`follow the documentation to get set up<kedro:get_started/install>`.
Currently, Kedro-Viz supports four types of previews:

```{important}
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type `kedro -V` in your terminal window.
```
1. **TablePreview:** For datasets returning tables/dataframes.
2. **JSONPreview:** For datasets returning JSON objects.
3. **PlotlyPreview:** For datasets returning Plotly JSON objects.
4. **ImagePreview:** For datasets returning base64-encoded image strings.

In your terminal window, navigate to the folder you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):
While we currently support the aforementioned datasets, we are soon going to extend this functionality to include other datasets. Users with custom datasets can also expand the preview functionality, and we will cover that in the following sections.

merelcht marked this conversation as resolved.
Show resolved Hide resolved
```{note}
Starting from Kedro-Viz version 8.0.0, the preview functionality on Kedro-Viz is now opt-out. For versions preceding this, you were required to specify `preview-args` for the preview to be enabled.

merelcht marked this conversation as resolved.
Show resolved Hide resolved
```bash
kedro new --starter=spaceflights-pandas
By default, preview is now enabled for datasets. If you wish to disable preview for datasets, please refer to the [Disable Preview section](./preview_datasets.md#disabling-previews) for instructions.
```

When prompted for a project name, you can enter anything, but we will assume `Spaceflights` throughout.

When your project is ready, navigate to the root directory of the project.
**Preview Tabular Data**

## Configure the Data Catalog
The page titled [Preview Tabular Data in Kedro-viz](./preview_pandas_datasets.md) contains a spaceflight tutorial that explains how you can enable preview on Tabular datasets such as `pandas.CSVDataset` and `pandas.ExcelDataset`.

Kedro-Viz version 6.3.0 currently supports preview of two types of datasets:
**Preview Plotly Charts**

* `pandas.CSVDataset`
* `pandas.ExcelDataset`
The page titled [Preview Plotly charts in Kedro-viz](./preview_plotly_datasets.md) contains a spaceflight tutorial that explains how you can create interactive visualizations using `PlotlyDatasets` on Kedro-viz.

**Preview Matplotlib Charts**

To enable dataset preview, add the `preview_args` attribute to the kedro-viz configuration under the `metadata` section in the Data Catalog. Within preview_args, specify `nrows` as the number of rows to preview for the dataset.
The page titled [Preview Matplotlib charts in Kedro-viz](./preview_matplotlib_datasets.md) contains a spaceflight tutorial that explains how you can create static visualizations using `MatplotlibWriterDataset` on Kedro-viz.

```yaml
companies:
type: pandas.CSVDataSet
filepath: data/01_raw/companies.csv
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 5
**Extend Preview to Custom Datasets**

reviews:
type: pandas.CSVDataSet
filepath: data/01_raw/reviews.csv
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 10
The page titled [Extend Preview to Custom Datasets](./preview_custom_datasets.md) contains information on how to set up previews for custom datasets and which types are supported by Kedro-Viz.

astrojuanlu marked this conversation as resolved.
Show resolved Hide resolved
shuttles:
type: pandas.ExcelDataSet
filepath: data/01_raw/shuttles.xlsx
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 15
```{toctree}
:maxdepth: 1
:hidden:
preview_matplotlib_datasets
preview_plotly_datasets
preview_pandas_datasets
preview_custom_datasets
```



## Previewing Data on Kedro-viz
## Disabling Previews
Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - this entire section has now moved to Preview Tabular Data on Kedro-viz


After you've configured the Data Catalog, you can preview the datasets on Kedro-Viz. Start Kedro-Viz by running the following command in your terminal:

```bash
kedro viz run
```

The previews are shown as follows:

Click on each dataset node to see a small preview in the metadata panel:

To disable dataset previews for specific datasets, you need to set `preview: false` under the kedro-viz key within the metadata section of your conf.yml file. Here's an example configuration:

![](./images/preview_datasets_metadata.png)


View the larger preview of the dataset by clicking the `Expand Preview Table` button on the bottom of the metadata panel.

```yaml
companies:
type: pandas.CSVDataSet
filepath: data/01_raw/companies.csv
metadata:
kedro-viz:
layer: raw
preview: false
```

![](./images/preview_datasets_expanded.png)
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Visualise charts in Kedro-Viz with Matplotlib
# Preview Matplotlib charts in Kedro-viz

This page describes how to output interactive visualisations of a Kedro project with Kedro-Viz, which supports integration with [Matplotlib](https://matplotlib.org/). You can view Matplotlib charts in Kedro-Viz when you use the MatplotLibWriter dataset.
This page describes how to output static visualisations of a Kedro project with Kedro-Viz, which supports integration with [Matplotlib](https://matplotlib.org/). You can view Matplotlib charts in Kedro-Viz when you use the MatplotLibWriter dataset.

astrojuanlu marked this conversation as resolved.
Show resolved Hide resolved

```{note}
The MatplotlibWriter dataset converts Matplotlib objects to image files. This means that Matplotlib charts within Kedro-Viz are static and not interactive, unlike the [Plotly charts seen separately](./visualise_charts_with_plotly.md).
The MatplotlibWriter dataset converts Matplotlib objects to image files. This means that Matplotlib charts within Kedro-Viz are static and not interactive, unlike the [Plotly charts seen separately](./preview_plotly_datasets.md).
```
astrojuanlu marked this conversation as resolved.
Show resolved Hide resolved

We use the {doc}`spaceflights tutorial<kedro:tutorial/spaceflights_tutorial>` and add a reporting pipeline. Even if you have not yet worked through the tutorial, you can still follow this example; you'll need to use the Kedro starter for the spaceflights tutorial to generate a copy of the project with working code in place.
Expand Down
74 changes: 74 additions & 0 deletions docs/source/preview_pandas_datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Preview Tabular Data in Kedro-viz
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This content in this section is not new and it is just moved to a new page. Earlier it was a part of the 'Preview Datasets' page


We use the {doc}`spaceflights tutorial<kedro:tutorial/spaceflights_tutorial>` to demonstrate how to add data preview for the `customer`, `shuttle` and `reviews` datasets. Even if you have not yet worked through the tutorial, you can still follow this example; you'll need to use the Kedro starter for the spaceflights tutorial to generate a copy of the project with working code in place.

If you haven't installed Kedro {doc}`follow the documentation to get set up<kedro:get_started/install>`.

```{important}
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type `kedro -V` in your terminal window.
```

In your terminal window, navigate to the folder you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably capitalise Spaceflights?

Suggested change
In your terminal window, navigate to the folder you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):
In your terminal window, navigate to the folder you want to store the project. Generate the Spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed it's mostly lower-case elsewhere in the docs so I will leave it as is.

merelcht marked this conversation as resolved.
Show resolved Hide resolved


```bash
kedro new --starter=spaceflights-pandas
```

When prompted for a project name, you can enter anything, but we will assume `Spaceflights` throughout.
astrojuanlu marked this conversation as resolved.
Show resolved Hide resolved

When your project is ready, navigate to the root directory of the project.

## Configure the Data Catalog

Kedro-Viz version 8.0.0 supports previews for two types of tabular datasets: pandas.CSVDataset and pandas.ExcelDataset. Previews are enabled by default, showing the first 5 rows unless otherwise specified using preview_args.
astrojuanlu marked this conversation as resolved.
Show resolved Hide resolved

Example configuration in catalog.yml:
rashidakanchwala marked this conversation as resolved.
Show resolved Hide resolved

```yaml
companies:
type: pandas.CSVDataSet
filepath: data/01_raw/companies.csv

reviews:
type: pandas.CSVDataSet
filepath: data/01_raw/reviews.csv
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 10

shuttles:
type: pandas.ExcelDataSet
filepath: data/01_raw/shuttles.xlsx
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 15
```

If no preview_args are specified, the default preview will show the first 5 rows.
astrojuanlu marked this conversation as resolved.
Show resolved Hide resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was asked in the Slack once, I think we should make it obvious that the preview_args is the argument that get pass into the preview function directly, and user can have arbitary arguments.

def preview(self, arg1, arg2):
  ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think for pandas datasets .. it is specific to nrows as we wrote the preview() func.

But I have updated the custom dataset docs to include arguments. Thanks for highlighting this @noklam


## Previewing Data on Kedro-viz

After you've configured the Data Catalog, you can preview the datasets on Kedro-Viz. Start Kedro-Viz by running the following command in your terminal:

```bash
kedro viz run
```

The previews are shown as follows:

Click on each dataset node to see a small preview in the metadata panel:


![](./images/preview_datasets_metadata.png)


View the larger preview of the dataset by clicking the `Expand Preview Table` button on the bottom of the metadata panel.


![](./images/preview_datasets_expanded.png)
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Visualise charts in Kedro-Viz with Plotly
# Preview Plotly charts in Kedro-viz

This page describes how to make interactive visualisations of a Kedro project with Kedro-Viz, which supports integration with [Plotly](https://plotly.com/python/).

Expand Down
Loading