Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Experiment] Show/Hide Memory Datasets on the flowchart #1707

Closed

Conversation

rashidakanchwala
Copy link
Contributor

Description

Related to #1706. should we provide users with the option to display or hide Memory Datasets in the view? This feature can be particularly helpful for larger and more complex pipelines. This PR offers an experimental toggle to show/hide Memory Datasets, functioning similarly to the show/hide dataset.

This is currently not under experiment flag.

Development notes

MemoryDatasetsHidden

QA notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

@rashidakanchwala rashidakanchwala marked this pull request as ready for review January 16, 2024 21:07
@rashidakanchwala rashidakanchwala requested review from astrojuanlu, noklam, datajoely, merelcht and NeroOkwa and removed request for tynandebold January 16, 2024 21:07
@rashidakanchwala rashidakanchwala changed the base branch from main to memorydatasetdiff January 16, 2024 21:08
@inigohidalgo
Copy link

What happens to nodes which only have memory datasets as inputs and outputs? just a line connecting them to the last dataset which isn't memory?

@ravi-kumar-pilla
Copy link
Contributor

This is a great feature to show/hide dataset types reducing the size of the flowchart. I have concerns over the placement of Memory Datasets under Element Types. It would be nice to have a hierarchy where these fall under Datasets.

@merelcht
Copy link
Member

Would this just be for MemoryDataset or any non persistent datasets?

@datajoely
Copy link
Contributor

For the purposes of this experiment I think MemoryDataSet gets us 80% of the way. There has already been questions in this PR whether we can visually differentiate datasets of different types.

@DebanjanBanerjeeQB
Copy link

DebanjanBanerjeeQB commented Jan 17, 2024

Unpopular opinion and this is already fantastic work but 2 thoughts. I dont see why this feature will be super impactful. Would love to hear your thoughts.

  • i would treat memorydatasets as any other dataset in the logical flow , why it should have any different treatment / handling ? Also talking about [Experiment] Distinctive MemoryDataset view on flowchart #1706 here.
  • Even if this feature existed , i wouldn't hide it ever , i wouldn't hide any dataset ever because that risks an incorrect data lineage or risk of discounting a dataset while demoing or explaining the pipelines to anyone.

If the goal is to signal users that these datasets use cache or inmemory storage and not persisted storage , i would create a tooltip or a separate action to explain datasets (all datasets) . Something that on hovering gives basic information about the datasets like

  • Dataset Type
  • Some stats (Optional)
  • Path

WDYT ?

@astrojuanlu
Copy link
Member

astrojuanlu commented Jan 17, 2024

If we are just discussing the idea, I like it (although I like #1706 more).


Going a bit beyond, I have the same question as @merelcht:

Would this just be for MemoryDataset or any non persistent datasets?

Currently this PR introduces some coupling between the frontend and the dataset names, which as far as I understand goes against the idea of #1698 (although this PR is more concerned with the backend). Moreover, it's not even complete, because what if the user defines another non persistent dataset?

Wondering if this use case could be solved by letting the user configure the appearance of certain types of nodes in general, some sort of "style mapping" or even custom CSS rules.

@rashidakanchwala
Copy link
Contributor Author

rashidakanchwala commented Jan 17, 2024

What happens to nodes which only have memory datasets as inputs and outputs? just a line connecting them to the last dataset which isn't memory?

Hey, there would be line connecting the two task/function nodes. Below is an example of Memory Dataset shown and then hidden (Split Data --> Train Evaluation)

Screenshot 2024-01-17 at 21 56 03 Screenshot 2024-01-17 at 21 57 11

On second thoughts, I too find this show/hide a bit misleading, I would rather do #1706 then hide them completely.

@datajoely
Copy link
Contributor

I think it might be nice to have more powerful, persistent exclusion functionality - but I agree that #1706 is better.

Again pointing to dbt, they have an ability for the user to provide custom colors - a long time ago I pitched #480 which was turned into #1148 but hasn't been prioritised.

@DebanjanBanerjeeQB
Copy link

One more solution could be

  • Identify all the different types of datasets in the pipeline
  • put them as a list and checkboxes
  • whatever user needs can select / deselect from the view to highlight.

That covers the memory dataset but also makes the approach more open to other dataset types.

@DebanjanBanerjeeQB
Copy link

Screenshot 2024-01-18 at 10 30 28

Something like this ?

@rashidakanchwala
Copy link
Contributor Author

Closing this experiment PR for now -- allowing users to differentiating datasets on kedro-viz seems like a good idea. We are going to try and do #1148 first and see how that picks up with users.

@rashidakanchwala rashidakanchwala deleted the feat/hide-show-memory-dataset branch May 30, 2024 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants