-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for Preview Dataset #1757
Changes from all commits
691c7ba
70fa78f
1b692f8
9e93ca8
f57b1b6
2628396
93b79ae
6e83fc5
fb2d6e7
8c80933
2a9c923
919ab73
b57b3d7
5d4963f
401434e
c8e8f84
e296352
ff7b87b
22809be
fdd9613
9f67c74
1d89fed
ed42972
cf7a2cb
ebf6f30
074c92b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Extend preview to custom datasets | ||
|
||
When creating a custom dataset, if you wish to enable data preview for that dataset, you must implement a `preview()` function within the custom dataset class. Kedro-Viz currently supports previewing tables, Plotly charts, images, and JSON objects. | ||
|
||
The return type of the `preview()` function should match one of the following types, as defined in the `kedro-datasets` source code ([_typing.py file](https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/kedro_datasets/_typing.py)): | ||
|
||
```python | ||
TablePreview = NewType("TablePreview", dict) | ||
ImagePreview = NewType("ImagePreview", bytes) | ||
PlotlyPreview = NewType("PlotlyPreview", dict) | ||
JSONPreview = NewType("JSONPreview", dict) | ||
``` | ||
|
||
Arbitrary arguments can be included in the `preview()` function, which can be later specified in the `catalog.yml` file. | ||
|
||
Below is an example demonstrating how to implement the `preview()` function with user-specified arguments for a `CustomDataset` class that utilizes `TablePreview` to enable previewing tabular data on Kedro-Viz: | ||
|
||
```yaml | ||
companies: | ||
type: CustomDataset | ||
filepath: ${_base_location}/01_raw/companies.csv | ||
metadata: | ||
kedro-viz: | ||
layer: raw | ||
preview_args: | ||
nrows: 5 | ||
ncolumns: 2 | ||
filters: { | ||
gender: male | ||
} | ||
``` | ||
|
||
```python | ||
|
||
from kedro_datasets._typing import TablePreview | ||
|
||
class CustomDataset: | ||
def preview(self, nrows, ncolumns, filters) -> TablePreview: | ||
filtered_data = self.data | ||
for column, value in filters.items(): | ||
filtered_data = filtered_data[filtered_data[column] == value] | ||
subset = filtered_data.iloc[:nrows, :ncolumns] | ||
df_dict = {} | ||
for column in subset.columns: | ||
df_dict[column] = subset[column] | ||
return df_dict | ||
|
||
``` | ||
|
||
|
||
## Examples of Previews | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there way to show the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so, the JSON preview is actually experiment tracking oriented hence I was hesitant to share as it might create some confusion. In the next couple of sprints, we will enable preview for a JSONDataset and then I could add that example then. |
||
|
||
1. TablePreview | ||
|
||
![](./images/preview_datasets_expanded.png) | ||
|
||
|
||
2. ImagePreview | ||
|
||
![](./images/pipeline_visualisation_matplotlib_expand.png) | ||
|
||
|
||
3. PlotlyPreview | ||
|
||
![](./images/pipeline_visualisation_plotly_expand_1.png) | ||
|
||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,84 +1,59 @@ | ||
# Preview data in Kedro-Viz | ||
# Preview datasets in Kedro-Viz | ||
|
||
This page describes how to preview data from different datasets in a Kedro project with Kedro-Viz. Dataset preview was introduced in Kedro-Viz version 6.3.0, which offers preview for `CSVDatasets` and `ExcelDatasets`. | ||
To provide users with a glimpse of their datasets within a Kedro project, Kedro-Viz offers a preview feature. | ||
|
||
We use the {doc}`spaceflights tutorial<kedro:tutorial/spaceflights_tutorial>` to demonstrate how to add data preview for the `customer`, `shuttle` and `reviews` datasets. Even if you have not yet worked through the tutorial, you can still follow this example; you'll need to use the Kedro starter for the spaceflights tutorial to generate a copy of the project with working code in place. | ||
Currently, Kedro-Viz supports four types of previews: | ||
|
||
If you haven't installed Kedro {doc}`follow the documentation to get set up<kedro:get_started/install>`. | ||
1. **TablePreview:** For datasets returning tables/dataframes. | ||
2. **JSONPreview:** For datasets returning JSON objects. | ||
3. **PlotlyPreview:** For datasets returning Plotly JSON objects. | ||
4. **ImagePreview:** For datasets returning base64-encoded image strings. | ||
|
||
```{important} | ||
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type `kedro -V` in your terminal window. | ||
While we currently support the above mentioned datasets, we are soon going to extend this functionality to include other datasets. Users with custom datasets can also expand the preview functionality, which is covered in the section [Extend Preview to Custom Datasets](./preview_custom_datasets.md). | ||
|
||
```{note} | ||
Starting from Kedro-Viz 8.0.0, previews are now enabled by default. If you wish to disable it for a specific dataset, refer to the [Disable Preview section](./preview_datasets.md#disabling-previews) for instructions. | ||
``` | ||
|
||
In your terminal window, navigate to the folder you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): | ||
**Preview tabular data** | ||
|
||
See [Preview tabular data in Kedro-Viz](./preview_pandas_datasets.md) for a guide on how you can enable preview on tabular datasets such as `pandas.CSVDataset` and `pandas.ExcelDataset`. | ||
|
||
```bash | ||
kedro new --starter=spaceflights-pandas | ||
``` | ||
**Preview Plotly Charts** | ||
|
||
See [Preview Plotly charts in Kedro-Viz](./preview_plotly_datasets.md) for a guide on how you can create interactive visualizations using `PlotlyDataset` on Kedro-Viz. | ||
|
||
**Preview Matplotlib Charts** | ||
|
||
When prompted for a project name, you can enter anything, but we will assume `Spaceflights` throughout. | ||
See [Preview Matplotlib charts in Kedro-Viz](./preview_matplotlib_datasets.md) for a guide on how you can create static visualizations using `MatplotlibWriterDataset` on Kedro-Viz. | ||
|
||
When your project is ready, navigate to the root directory of the project. | ||
**Extend Preview to custom catasets** | ||
|
||
See [Extend Preview to custom catasets](./preview_custom_datasets.md) for a guide on how to set up previews for custom datasets and which types are supported by Kedro-Viz. | ||
|
||
```{toctree} | ||
:maxdepth: 1 | ||
:hidden: | ||
preview_matplotlib_datasets | ||
preview_plotly_datasets | ||
preview_pandas_datasets | ||
preview_custom_datasets | ||
``` | ||
|
||
## Configure the Data Catalog | ||
|
||
Kedro-Viz version 6.3.0 currently supports preview of two types of datasets: | ||
|
||
* `pandas.CSVDataset` | ||
* `pandas.ExcelDataset` | ||
## Disabling Previews | ||
|
||
|
||
To enable dataset preview, add the `preview_args` attribute to the kedro-viz configuration under the `metadata` section in the Data Catalog. Within preview_args, specify `nrows` as the number of rows to preview for the dataset. | ||
To disable dataset previews for specific datasets, you need to set `preview: false` under the `kedro-viz` key within the `metadata` section of your `catalog.yml` file. Here's an example configuration: | ||
|
||
```yaml | ||
companies: | ||
type: pandas.CSVDataSet | ||
type: pandas.CSVDataset | ||
filepath: data/01_raw/companies.csv | ||
metadata: | ||
kedro-viz: | ||
layer: raw | ||
preview_args: | ||
nrows: 5 | ||
|
||
reviews: | ||
type: pandas.CSVDataSet | ||
filepath: data/01_raw/reviews.csv | ||
metadata: | ||
kedro-viz: | ||
layer: raw | ||
preview_args: | ||
nrows: 10 | ||
|
||
shuttles: | ||
type: pandas.ExcelDataSet | ||
filepath: data/01_raw/shuttles.xlsx | ||
metadata: | ||
kedro-viz: | ||
layer: raw | ||
preview_args: | ||
nrows: 15 | ||
``` | ||
|
||
|
||
|
||
## Previewing Data on Kedro-viz | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI - this entire section has now moved to |
||
|
||
After you've configured the Data Catalog, you can preview the datasets on Kedro-Viz. Start Kedro-Viz by running the following command in your terminal: | ||
|
||
```bash | ||
kedro viz run | ||
preview: false | ||
``` | ||
|
||
The previews are shown as follows: | ||
|
||
Click on each dataset node to see a small preview in the metadata panel: | ||
|
||
|
||
![](./images/preview_datasets_metadata.png) | ||
|
||
|
||
View the larger preview of the dataset by clicking the `Expand Preview Table` button on the bottom of the metadata panel. | ||
|
||
|
||
![](./images/preview_datasets_expanded.png) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
# Preview tabular data in Kedro-Viz | ||
|
||
We use the {doc}`spaceflights tutorial<kedro:tutorial/spaceflights_tutorial>` to demonstrate how to add data preview for the `customer`, `shuttle` and `reviews` datasets. Even if you have not yet worked through the tutorial, you can still follow this example; you'll need to use the Kedro starter for the spaceflights tutorial to generate a copy of the project with working code in place. | ||
|
||
If you haven't installed Kedro {doc}`follow the documentation to get set up<kedro:get_started/install>`. | ||
|
||
```{important} | ||
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type `kedro -V` in your terminal window. | ||
``` | ||
|
||
In your terminal window, navigate to the folder where you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas): | ||
|
||
|
||
```bash | ||
kedro new --starter=spaceflights-pandas | ||
``` | ||
|
||
When prompted for a project name, you can enter anything, but we will assume `Spaceflights` throughout. | ||
astrojuanlu marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
When your project is ready, navigate to the root directory of the project. | ||
|
||
## Configure the Data Catalog | ||
|
||
Kedro-Viz version 8.0.0 supports previews for two types of tabular datasets: `pandas.CSVDataset` and `pandas.ExcelDataset`. Previews are enabled by default, showing the first 5 rows unless otherwise specified using `preview_args`. | ||
|
||
Example configuration in `catalog.yml`: | ||
|
||
```yaml | ||
companies: | ||
type: pandas.CSVDataset | ||
filepath: data/01_raw/companies.csv | ||
|
||
reviews: | ||
type: pandas.CSVDataset | ||
filepath: data/01_raw/reviews.csv | ||
metadata: | ||
kedro-viz: | ||
layer: raw | ||
preview_args: | ||
nrows: 10 | ||
|
||
shuttles: | ||
type: pandas.ExcelDataset | ||
filepath: data/01_raw/shuttles.xlsx | ||
metadata: | ||
kedro-viz: | ||
layer: raw | ||
preview_args: | ||
nrows: 15 | ||
``` | ||
|
||
If no `preview_args` are specified, the default preview will show the first 5 rows. | ||
|
||
|
||
## Previewing data on Kedro-Viz | ||
|
||
After you've configured the Data Catalog, you can preview the datasets on Kedro-Viz. Start Kedro-Viz by running the following command in your terminal: | ||
|
||
```bash | ||
kedro viz run | ||
``` | ||
|
||
The previews are shown as follows: | ||
|
||
Click on each dataset node to see a small preview in the metadata panel: | ||
|
||
|
||
![](./images/preview_datasets_metadata.png) | ||
|
||
|
||
View the larger preview of the dataset by clicking the `Expand Preview Table` button on the bottom of the metadata panel. | ||
|
||
|
||
![](./images/preview_datasets_expanded.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we show an example for each type? This help users to link what are they expecting to see in the UI.
https://clear.ml/docs/latest/docs/guides/reporting/plotly_reporting