Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor datasets #1698

Merged
merged 36 commits into from
Feb 21, 2024
Merged

Refactor datasets #1698

merged 36 commits into from
Feb 21, 2024

Conversation

rashidakanchwala
Copy link
Contributor

@rashidakanchwala rashidakanchwala commented Jan 5, 2024

Description

Related to: #1700

Kedro-Viz and Kedro-Datasets are tightly coupled , leading to issues where modifications in Kedro-Datasets can disrupt the functionality of Kedro-Viz. This ticket aims to refactor the codebase to enhance modularity and stability. The goal is to decouple the two components, allowing for smoother integration in the future without excessive interdependencies.

Development notes

We've moved the logic for generating previews for several datasets from kedro-viz to kedro-datasets, reducing the coupling between kedro-viz and kedro-datasets.

Previews are rendered differently in the front-end, so we've introduced aliasing using NewType. Currently, we support four types of previews in the front-end: json, image [png], plotly, and dataframe [as tables]. Users can enable previews for custom datasets as long as they fall into one of these categories.

Previously, we utilized the load function and overrode it in kedro-viz to load dataset previews. However, this logic has been migrated to kedro-datasets within the preview() function. In the kedro-viz API, we no longer include plot, image, or tracking_data as fields of DataNodeMetadata. Instead, we now only send preview and preview_type. Preview can be a preview of any dataset that has the preview function, and preview_type informs the front-end on how to render the preview.

QA notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

@SajidAlamQB SajidAlamQB marked this pull request as ready for review February 8, 2024 14:15
@merelcht
Copy link
Member

merelcht commented Feb 8, 2024

Can the PR description be updated to add some context? It will make it easier to review.

Signed-off-by: ravi-kumar-pilla <[email protected]>
@ravi-kumar-pilla
Copy link
Contributor

ravi-kumar-pilla commented Feb 9, 2024

Hi @SajidAlamQB, The reason tests are failing for python3.8 is kedro-datasets >= 2 require python >= 3.9 . I am not sure if we want to drop viz support for python3.8 or do a conditional check for the preview feature that is added to kedro-datasets. If we support python3.8, we need to conditionally add the dataset type checks again.

@merelcht @noklam what do you suggest here ?

@SajidAlamQB SajidAlamQB requested a review from noklam February 12, 2024 15:06
@noklam
Copy link
Contributor

noklam commented Feb 12, 2024

@ravi-kumar-pilla I think it's fine to drop Python 3.8 support, with the caveat that is it easy for user find out which version to install?

If an user is using Python 3.8, which combination of kedro/kedro-datasets/kedro-viz would work for them?

It's a result of kedro-plugins (at least kedro-datasets) are moving faster and we are following https://endoflife.date/python.

@ravi-kumar-pilla
Copy link
Contributor

@ravi-kumar-pilla I think it's fine to drop Python 3.8 support, with the caveat that is it easy for user find out which version to install?

If an user is using Python 3.8, which combination of kedro/kedro-datasets/kedro-viz would work for them?

We need to add/update the compatibility matrix in the ReadME which includes kedro-viz/kedro/kedro-datasets.

It's a result of kedro-plugins (at least kedro-datasets) are moving faster and we are following https://endoflife.date/python.

Got it !

Comment on lines 594 to 599
def is_preview_disabled(self):
"""Checks if the dataset has a preview disabled"""
if self.viz_metadata:
preview_value = self.viz_metadata.get("preview")
return preview_value is not None and not preview_value
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_preview_disabled(self):
"""Checks if the dataset has a preview disabled"""
if self.viz_metadata:
preview_value = self.viz_metadata.get("preview")
return preview_value is not None and not preview_value
return False
def is_preview_disabled(self):
"""Checks if the dataset has a preview disabled"""
if self.viz_metadata:
preview_value = self.viz_metadata.get("preview")
return preview_value is not None and not preview_value
return False

I found this complicated, can we assume it's True unless specified as False?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this:

Suggested change
def is_preview_disabled(self):
"""Checks if the dataset has a preview disabled"""
if self.viz_metadata:
preview_value = self.viz_metadata.get("preview")
return preview_value is not None and not preview_value
return False
def is_preview_disabled(self):
"""Checks if the dataset has a preview disabled"""
return self.viz_metadata and self.viz_metadata.get("preview") is False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, the preview feature is always enabled unless the user specifies viz_metadata: preview: false ? @rashidakanchwala

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that's true. by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.viz_metadata is None, the function will return None. Can we have a return value of boolean in all cases ? Can we do something like - return viz_metadata is not None and viz_metadata.get("preview") is False . Thank you

Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok have more clarity on what you and @merelcht meant. It returns None
so i have changed it to return bool as suggested-- now it will always return False except when preview: False

Copy link
Contributor

@jitu5 jitu5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Front end part looks good me!

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one more question and wanted to check again if the docs will also be updated before release?

Otherwise, this looks all good from a code point of view 👍

package/kedro_viz/models/flowchart.py Outdated Show resolved Hide resolved
package/kedro_viz/models/flowchart.py Show resolved Hide resolved
@rashidakanchwala
Copy link
Contributor Author

I left one more question and wanted to check again if the docs will also be updated before release?

Otherwise, this looks all good from a code point of view 👍

Yes, before the release the docs will be updated. I will create a PR tomm for the docs.

@ravi-kumar-pilla
Copy link
Contributor

LGTM. Awesome work !! @rashidakanchwala

We need to make sure to revert the kedro-datasets source in requirements.in, docker_requirements.txt, test_requirements.txt files after kedro-datasets release. Thank you !

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying my earlier question @rashidakanchwala. This looks good to be merged now ⭐ 👍

@rashidakanchwala
Copy link
Contributor Author

Thanks team. I am pushing this, I will revert the dependencies once kedro-datasets is released. I will start on the docs now.

@rashidakanchwala rashidakanchwala merged commit 5ca1550 into main Feb 21, 2024
14 checks passed
@rashidakanchwala rashidakanchwala deleted the refactor-datasets branch February 21, 2024 12:24
@ravi-kumar-pilla ravi-kumar-pilla mentioned this pull request Mar 1, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants