[Experiment] Show/Hide Memory Datasets on the flowchart #1707

rashidakanchwala · 2024-01-16T21:07:10Z

Description

Related to #1706. should we provide users with the option to display or hide Memory Datasets in the view? This feature can be particularly helpful for larger and more complex pipelines. This PR offers an experimental toggle to show/hide Memory Datasets, functioning similarly to the show/hide dataset.

This is currently not under experiment flag.

Development notes

QA notes

Checklist

Read the contributing guidelines
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added new entries to the RELEASE.md file
Added tests to cover my changes

inigohidalgo · 2024-01-17T10:53:42Z

What happens to nodes which only have memory datasets as inputs and outputs? just a line connecting them to the last dataset which isn't memory?

ravi-kumar-pilla · 2024-01-17T12:01:14Z

This is a great feature to show/hide dataset types reducing the size of the flowchart. I have concerns over the placement of Memory Datasets under Element Types. It would be nice to have a hierarchy where these fall under Datasets.

merelcht · 2024-01-17T14:26:25Z

Would this just be for MemoryDataset or any non persistent datasets?

datajoely · 2024-01-17T14:57:15Z

For the purposes of this experiment I think MemoryDataSet gets us 80% of the way. There has already been questions in this PR whether we can visually differentiate datasets of different types.

DebanjanBanerjeeQB · 2024-01-17T16:21:22Z

Unpopular opinion and this is already fantastic work but 2 thoughts. I dont see why this feature will be super impactful. Would love to hear your thoughts.

i would treat memorydatasets as any other dataset in the logical flow , why it should have any different treatment / handling ? Also talking about [Experiment] Distinctive MemoryDataset view on flowchart #1706 here.
Even if this feature existed , i wouldn't hide it ever , i wouldn't hide any dataset ever because that risks an incorrect data lineage or risk of discounting a dataset while demoing or explaining the pipelines to anyone.

If the goal is to signal users that these datasets use cache or inmemory storage and not persisted storage , i would create a tooltip or a separate action to explain datasets (all datasets) . Something that on hovering gives basic information about the datasets like

Dataset Type
Some stats (Optional)
Path

WDYT ?

astrojuanlu · 2024-01-17T18:42:58Z

If we are just discussing the idea, I like it (although I like #1706 more).

Going a bit beyond, I have the same question as @merelcht:

Would this just be for MemoryDataset or any non persistent datasets?

Currently this PR introduces some coupling between the frontend and the dataset names, which as far as I understand goes against the idea of #1698 (although this PR is more concerned with the backend). Moreover, it's not even complete, because what if the user defines another non persistent dataset?

Wondering if this use case could be solved by letting the user configure the appearance of certain types of nodes in general, some sort of "style mapping" or even custom CSS rules.

rashidakanchwala · 2024-01-17T21:57:32Z

What happens to nodes which only have memory datasets as inputs and outputs? just a line connecting them to the last dataset which isn't memory?

Hey, there would be line connecting the two task/function nodes. Below is an example of Memory Dataset shown and then hidden (Split Data --> Train Evaluation)

On second thoughts, I too find this show/hide a bit misleading, I would rather do #1706 then hide them completely.

datajoely · 2024-01-18T10:11:15Z

I think it might be nice to have more powerful, persistent exclusion functionality - but I agree that #1706 is better.

Again pointing to dbt, they have an ability for the user to provide custom colors - a long time ago I pitched #480 which was turned into #1148 but hasn't been prioritised.

DebanjanBanerjeeQB · 2024-01-18T10:25:45Z

One more solution could be

Identify all the different types of datasets in the pipeline
put them as a list and checkboxes
whatever user needs can select / deselect from the view to highlight.

That covers the memory dataset but also makes the approach more open to other dataset types.

DebanjanBanerjeeQB · 2024-01-18T10:30:39Z

Something like this ?

rashidakanchwala · 2024-01-19T10:58:54Z

Closing this experiment PR for now -- allowing users to differentiating datasets on kedro-viz seems like a good idea. We are going to try and do #1148 first and see how that picks up with users.

done

63e90c7

rashidakanchwala marked this pull request as ready for review January 16, 2024 21:07

rashidakanchwala requested a review from tynandebold as a code owner January 16, 2024 21:07

rashidakanchwala requested review from astrojuanlu, noklam, datajoely, merelcht and NeroOkwa and removed request for tynandebold January 16, 2024 21:07

rashidakanchwala changed the base branch from main to memorydatasetdiff January 16, 2024 21:08

NeroOkwa mentioned this pull request Jan 17, 2024

[Experiment] Memory Datasets on the flowchart #1709

Closed

2 tasks

astrojuanlu mentioned this pull request Jan 17, 2024

[Experiment] Distinctive MemoryDataset view on flowchart #1706

Closed

5 tasks

rashidakanchwala closed this Jan 19, 2024

rashidakanchwala deleted the feat/hide-show-memory-dataset branch May 30, 2024 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experiment] Show/Hide Memory Datasets on the flowchart #1707

[Experiment] Show/Hide Memory Datasets on the flowchart #1707

rashidakanchwala commented Jan 16, 2024

inigohidalgo commented Jan 17, 2024

ravi-kumar-pilla commented Jan 17, 2024

merelcht commented Jan 17, 2024

datajoely commented Jan 17, 2024

DebanjanBanerjeeQB commented Jan 17, 2024 •

edited

Loading

astrojuanlu commented Jan 17, 2024 •

edited

Loading

rashidakanchwala commented Jan 17, 2024 •

edited

Loading

datajoely commented Jan 18, 2024

DebanjanBanerjeeQB commented Jan 18, 2024

DebanjanBanerjeeQB commented Jan 18, 2024

rashidakanchwala commented Jan 19, 2024

[Experiment] Show/Hide Memory Datasets on the flowchart #1707

[Experiment] Show/Hide Memory Datasets on the flowchart #1707

Conversation

rashidakanchwala commented Jan 16, 2024

Description

Development notes

QA notes

Checklist

inigohidalgo commented Jan 17, 2024

ravi-kumar-pilla commented Jan 17, 2024

merelcht commented Jan 17, 2024

datajoely commented Jan 17, 2024

DebanjanBanerjeeQB commented Jan 17, 2024 • edited Loading

astrojuanlu commented Jan 17, 2024 • edited Loading

rashidakanchwala commented Jan 17, 2024 • edited Loading

datajoely commented Jan 18, 2024

DebanjanBanerjeeQB commented Jan 18, 2024

DebanjanBanerjeeQB commented Jan 18, 2024

rashidakanchwala commented Jan 19, 2024

DebanjanBanerjeeQB commented Jan 17, 2024 •

edited

Loading

astrojuanlu commented Jan 17, 2024 •

edited

Loading

rashidakanchwala commented Jan 17, 2024 •

edited

Loading