Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Preview Dataset #1757

Merged
merged 26 commits into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
691c7ba
first draft
rashidakanchwala Feb 21, 2024
70fa78f
add to toctree
rashidakanchwala Feb 21, 2024
1b692f8
fix build errors
rashidakanchwala Feb 21, 2024
9e93ca8
revise structure
rashidakanchwala Feb 21, 2024
f57b1b6
test layout
rashidakanchwala Feb 21, 2024
2628396
test layout
rashidakanchwala Feb 21, 2024
93b79ae
typo in title
rashidakanchwala Feb 21, 2024
6e83fc5
doc fixes based on review
rashidakanchwala Feb 22, 2024
fb2d6e7
Merge branch 'main' into docs/preview-datasets
rashidakanchwala Feb 22, 2024
8c80933
typo
rashidakanchwala Feb 22, 2024
2a9c923
Merge branch 'docs/preview-datasets' of https://github.com/kedro-org/…
rashidakanchwala Feb 22, 2024
919ab73
edit disable preview sec
rashidakanchwala Feb 22, 2024
b57b3d7
Merge branch 'main' into docs/preview-datasets
ravi-kumar-pilla Feb 22, 2024
5d4963f
fixes based on review
rashidakanchwala Feb 23, 2024
401434e
fix typo
rashidakanchwala Feb 23, 2024
c8e8f84
fix based on reviews -2
rashidakanchwala Feb 23, 2024
e296352
fixes based on reviews
rashidakanchwala Feb 26, 2024
ff7b87b
add link to typing file
rashidakanchwala Feb 26, 2024
22809be
Update docs/source/preview_pandas_datasets.md
rashidakanchwala Feb 26, 2024
fdd9613
typo and spacing fix
rashidakanchwala Feb 26, 2024
9f67c74
Merge branch 'docs/preview-datasets' of https://github.com/kedro-org/…
rashidakanchwala Feb 26, 2024
1d89fed
non caps
rashidakanchwala Feb 26, 2024
ed42972
based on reviews
rashidakanchwala Feb 26, 2024
cf7a2cb
fix docs lint
rashidakanchwala Feb 26, 2024
ebf6f30
revert changes
rashidakanchwala Feb 26, 2024
074c92b
Merge branch 'main' into docs/preview-datasets
ravi-kumar-pilla Feb 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/source/experiment_tracking.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Kedro has always supported parameter versioning (as part of your codebase with a

Kedro-Viz version 4.1.1 introduced metadata capture, visualisation, discovery and comparison, enabling you to access, edit and [compare your experiments](#access-run-data-and-compare-runs) and additionally [track how your metrics change over time](#view-and-compare-metrics-data).

Kedro-Viz version 5.0 also supports the [display and comparison of plots, such as Plotly and Matplotlib](./visualise_charts_with_plotly.md). Support for metric plots (timeseries and parellel coords) was added to Kedro-Viz version 5.2.1.
Kedro-Viz version 5.0 also supports the [display and comparison of plots, such as Plotly and Matplotlib](./preview_plotly_datasets.md). Support for metric plots (timeseries and parallel coords) was added to Kedro-Viz version 5.2.1.

Kedro-Viz version 6.2 includes support for collaborative experiment tracking using a cloud storage solution. This means that multiple users can store their experiment data in a centralized remote storage, such as AWS S3, and access it through Kedro-Viz.

Expand Down Expand Up @@ -141,11 +141,11 @@ Set up two datasets to log the columns used in the companies dataset (`companies

```yaml
metrics:
type: tracking.MetricsDataSet
type: tracking.MetricsDataset
filepath: data/09_tracking/metrics.json

companies_columns:
type: tracking.JSONDataSet
type: tracking.JSONDataset
filepath: data/09_tracking/companies_columns.json
```

Expand Down
2 changes: 0 additions & 2 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,5 @@ Take a look at the <a href="https://demo.kedro.org/" target="_blank" rel="noopen
kedro-viz_visualisation
share_kedro_viz
preview_datasets
visualise_charts_with_plotly
visualise_charts_with_matplotlib
experiment_tracking
```
18 changes: 9 additions & 9 deletions docs/source/kedro-viz_visualisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Here's an example of how to use the Kedro-Viz metadata to define layers:

```yaml
companies:
type: pandas.CSVDataSet
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
metadata:
kedro-viz:
Expand All @@ -91,7 +91,7 @@ In earlier versions of Kedro, layers were specified within a dataset's definitio

```diff
companies:
type: pandas.CSVDataSet
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
- layer: raw
+ metadata:
Expand All @@ -103,49 +103,49 @@ Open `catalog.yml` for the completed spaceflights tutorial and define layers in

```yaml
companies:
type: pandas.CSVDataSet
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
metadata:
kedro-viz:
layer: raw

reviews:
type: pandas.CSVDataSet
type: pandas.CSVDataset
filepath: data/01_raw/reviews.csv
metadata:
kedro-viz:
layer: raw

shuttles:
type: pandas.ExcelDataSet
type: pandas.ExcelDataset
filepath: data/01_raw/shuttles.xlsx
metadata:
kedro-viz:
layer: raw

preprocessed_companies:
type: pandas.ParquetDataSet
type: pandas.ParquetDataset
filepath: data/02_intermediate/preprocessed_companies.pq
metadata:
kedro-viz:
layer: intermediate

preprocessed_shuttles:
type: pandas.ParquetDataSet
type: pandas.ParquetDataset
filepath: data/02_intermediate/preprocessed_shuttles.pq
metadata:
kedro-viz:
layer: intermediate

model_input_table:
type: pandas.ParquetDataSet
type: pandas.ParquetDataset
filepath: data/03_primary/model_input_table.pq
metadata:
kedro-viz:
layer: primary

regressor:
type: pickle.PickleDataSet
type: pickle.PickleDataset
filepath: data/06_models/regressor.pickle
versioned: true
metadata:
Expand Down
69 changes: 69 additions & 0 deletions docs/source/preview_custom_datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Extend preview to custom datasets

When creating a custom dataset, if you wish to enable data preview for that dataset, you must implement a `preview()` function within the custom dataset class. Kedro-Viz currently supports previewing tables, Plotly charts, images, and JSON objects.

The return type of the `preview()` function should match one of the following types, as defined in the `kedro-datasets` source code ([_typing.py file](https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/kedro_datasets/_typing.py)):

```python
TablePreview = NewType("TablePreview", dict)
ImagePreview = NewType("ImagePreview", bytes)
PlotlyPreview = NewType("PlotlyPreview", dict)
JSONPreview = NewType("JSONPreview", dict)
```
Comment on lines +7 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we show an example for each type? This help users to link what are they expecting to see in the UI.

https://clear.ml/docs/latest/docs/guides/reporting/plotly_reporting


Arbitrary arguments can be included in the `preview()` function, which can be later specified in the `catalog.yml` file.

Below is an example demonstrating how to implement the `preview()` function with user-specified arguments for a `CustomDataset` class that utilizes `TablePreview` to enable previewing tabular data on Kedro-Viz:

```yaml
companies:
type: CustomDataset
filepath: ${_base_location}/01_raw/companies.csv
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 5
ncolumns: 2
filters: {
gender: male
}
```

```python

from kedro_datasets._typing import TablePreview

class CustomDataset:
def preview(self, nrows, ncolumns, filters) -> TablePreview:
filtered_data = self.data
for column, value in filters.items():
filtered_data = filtered_data[filtered_data[column] == value]
subset = filtered_data.iloc[:nrows, :ncolumns]
df_dict = {}
for column in subset.columns:
df_dict[column] = subset[column]
return df_dict

```


## Examples of Previews
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there way to show the JSONPreview as well?

Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, the JSON preview is actually experiment tracking oriented hence I was hesitant to share as it might create some confusion. In the next couple of sprints, we will enable preview for a JSONDataset and then I could add that example then.


1. TablePreview

![](./images/preview_datasets_expanded.png)


2. ImagePreview

![](./images/pipeline_visualisation_matplotlib_expand.png)


3. PlotlyPreview

![](./images/pipeline_visualisation_plotly_expand_1.png)




95 changes: 35 additions & 60 deletions docs/source/preview_datasets.md
Original file line number Diff line number Diff line change
@@ -1,84 +1,59 @@
# Preview data in Kedro-Viz
# Preview datasets in Kedro-Viz

This page describes how to preview data from different datasets in a Kedro project with Kedro-Viz. Dataset preview was introduced in Kedro-Viz version 6.3.0, which offers preview for `CSVDatasets` and `ExcelDatasets`.
To provide users with a glimpse of their datasets within a Kedro project, Kedro-Viz offers a preview feature.

We use the {doc}`spaceflights tutorial<kedro:tutorial/spaceflights_tutorial>` to demonstrate how to add data preview for the `customer`, `shuttle` and `reviews` datasets. Even if you have not yet worked through the tutorial, you can still follow this example; you'll need to use the Kedro starter for the spaceflights tutorial to generate a copy of the project with working code in place.
Currently, Kedro-Viz supports four types of previews:

If you haven't installed Kedro {doc}`follow the documentation to get set up<kedro:get_started/install>`.
1. **TablePreview:** For datasets returning tables/dataframes.
2. **JSONPreview:** For datasets returning JSON objects.
3. **PlotlyPreview:** For datasets returning Plotly JSON objects.
4. **ImagePreview:** For datasets returning base64-encoded image strings.

```{important}
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type `kedro -V` in your terminal window.
While we currently support the above mentioned datasets, we are soon going to extend this functionality to include other datasets. Users with custom datasets can also expand the preview functionality, which is covered in the section [Extend Preview to Custom Datasets](./preview_custom_datasets.md).

```{note}
Starting from Kedro-Viz 8.0.0, previews are now enabled by default. If you wish to disable it for a specific dataset, refer to the [Disable Preview section](./preview_datasets.md#disabling-previews) for instructions.
```

In your terminal window, navigate to the folder you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):
**Preview tabular data**

See [Preview tabular data in Kedro-Viz](./preview_pandas_datasets.md) for a guide on how you can enable preview on tabular datasets such as `pandas.CSVDataset` and `pandas.ExcelDataset`.

```bash
kedro new --starter=spaceflights-pandas
```
**Preview Plotly Charts**

See [Preview Plotly charts in Kedro-Viz](./preview_plotly_datasets.md) for a guide on how you can create interactive visualizations using `PlotlyDataset` on Kedro-Viz.

**Preview Matplotlib Charts**

When prompted for a project name, you can enter anything, but we will assume `Spaceflights` throughout.
See [Preview Matplotlib charts in Kedro-Viz](./preview_matplotlib_datasets.md) for a guide on how you can create static visualizations using `MatplotlibWriterDataset` on Kedro-Viz.

When your project is ready, navigate to the root directory of the project.
**Extend Preview to custom catasets**

See [Extend Preview to custom catasets](./preview_custom_datasets.md) for a guide on how to set up previews for custom datasets and which types are supported by Kedro-Viz.

```{toctree}
:maxdepth: 1
:hidden:
preview_matplotlib_datasets
preview_plotly_datasets
preview_pandas_datasets
preview_custom_datasets
```

## Configure the Data Catalog

Kedro-Viz version 6.3.0 currently supports preview of two types of datasets:

* `pandas.CSVDataset`
* `pandas.ExcelDataset`
## Disabling Previews


To enable dataset preview, add the `preview_args` attribute to the kedro-viz configuration under the `metadata` section in the Data Catalog. Within preview_args, specify `nrows` as the number of rows to preview for the dataset.
To disable dataset previews for specific datasets, you need to set `preview: false` under the `kedro-viz` key within the `metadata` section of your `catalog.yml` file. Here's an example configuration:

```yaml
companies:
type: pandas.CSVDataSet
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 5

reviews:
type: pandas.CSVDataSet
filepath: data/01_raw/reviews.csv
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 10

shuttles:
type: pandas.ExcelDataSet
filepath: data/01_raw/shuttles.xlsx
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 15
```



## Previewing Data on Kedro-viz
Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - this entire section has now moved to Preview Tabular Data on Kedro-viz


After you've configured the Data Catalog, you can preview the datasets on Kedro-Viz. Start Kedro-Viz by running the following command in your terminal:

```bash
kedro viz run
preview: false
```

The previews are shown as follows:

Click on each dataset node to see a small preview in the metadata panel:


![](./images/preview_datasets_metadata.png)


View the larger preview of the dataset by clicking the `Expand Preview Table` button on the bottom of the metadata panel.


![](./images/preview_datasets_expanded.png)
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Visualise charts in Kedro-Viz with Matplotlib
# Preview Matplotlib charts in Kedro-Viz

This page describes how to output interactive visualisations of a Kedro project with Kedro-Viz, which supports integration with [Matplotlib](https://matplotlib.org/). You can view Matplotlib charts in Kedro-Viz when you use the MatplotLibWriter dataset.
This page describes how to output static visualisations of a Kedro project with Kedro-Viz, which supports integration with [Matplotlib](https://matplotlib.org/). You can view Matplotlib charts in Kedro-Viz when you use the MatplotlibWriter dataset.


```{note}
The MatplotlibWriter dataset converts Matplotlib objects to image files. This means that Matplotlib charts within Kedro-Viz are static and not interactive, unlike the [Plotly charts seen separately](./visualise_charts_with_plotly.md).
The `MatplotlibWriter` dataset converts Matplotlib objects to image files. This means that Matplotlib charts within Kedro-Viz are static and not interactive, unlike the [Plotly charts seen separately](./preview_plotly_datasets.md).
```

We use the {doc}`spaceflights tutorial<kedro:tutorial/spaceflights_tutorial>` and add a reporting pipeline. Even if you have not yet worked through the tutorial, you can still follow this example; you'll need to use the Kedro starter for the spaceflights tutorial to generate a copy of the project with working code in place.
Expand Down
74 changes: 74 additions & 0 deletions docs/source/preview_pandas_datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Preview tabular data in Kedro-Viz

We use the {doc}`spaceflights tutorial<kedro:tutorial/spaceflights_tutorial>` to demonstrate how to add data preview for the `customer`, `shuttle` and `reviews` datasets. Even if you have not yet worked through the tutorial, you can still follow this example; you'll need to use the Kedro starter for the spaceflights tutorial to generate a copy of the project with working code in place.

If you haven't installed Kedro {doc}`follow the documentation to get set up<kedro:get_started/install>`.

```{important}
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type `kedro -V` in your terminal window.
```

In your terminal window, navigate to the folder where you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):


```bash
kedro new --starter=spaceflights-pandas
```

When prompted for a project name, you can enter anything, but we will assume `Spaceflights` throughout.
astrojuanlu marked this conversation as resolved.
Show resolved Hide resolved

When your project is ready, navigate to the root directory of the project.

## Configure the Data Catalog

Kedro-Viz version 8.0.0 supports previews for two types of tabular datasets: `pandas.CSVDataset` and `pandas.ExcelDataset`. Previews are enabled by default, showing the first 5 rows unless otherwise specified using `preview_args`.

Example configuration in `catalog.yml`:

```yaml
companies:
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv

reviews:
type: pandas.CSVDataset
filepath: data/01_raw/reviews.csv
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 10

shuttles:
type: pandas.ExcelDataset
filepath: data/01_raw/shuttles.xlsx
metadata:
kedro-viz:
layer: raw
preview_args:
nrows: 15
```

If no `preview_args` are specified, the default preview will show the first 5 rows.


## Previewing data on Kedro-Viz

After you've configured the Data Catalog, you can preview the datasets on Kedro-Viz. Start Kedro-Viz by running the following command in your terminal:

```bash
kedro viz run
```

The previews are shown as follows:

Click on each dataset node to see a small preview in the metadata panel:


![](./images/preview_datasets_metadata.png)


View the larger preview of the dataset by clicking the `Expand Preview Table` button on the bottom of the metadata panel.


![](./images/preview_datasets_expanded.png)
Loading