Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-1951] Backend to send modular pipelines to kedro viz #394

Merged
merged 19 commits into from
Mar 22, 2021

Conversation

merelcht
Copy link
Member

@merelcht merelcht commented Mar 12, 2021

Description

As part of the work on showing modular pipelines in viz.

Development notes

Added modular pipelines as top level key as well as to individual nodes (both "node"-nodes and parameter/dataset nodes).

Merging it into feature branch feature/modular-pipelines-complete, where the FE changes will be merged as well. Then we'll do a full check and make sure it all works before creating a final PR to main.

QA notes

Update test setup and animals.mock.json to contain modular pipeline data.

FYI: JS tests will fail now I've updated the test data. @studioswong will address these tests once this PR has been merged to the main feature branch feature/modular-pipelines-complete

Todo:

  • Handle modular pipelines for parameter nodes. Will be done in a separate PR.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes

Legal notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@idanov idanov changed the title Backend to send modular pipelines to kedro viz [KED-1951] Backend to send modular pipelines to kedro viz Mar 12, 2021
@merelcht merelcht requested a review from idanov March 12, 2021 10:19
Copy link
Member

@idanov idanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the code combines all modular pipelines from all registered pipelines. However for the proper functioning of the frontend, we need the modular pipelines only from the currently selected registered pipeline.

@merelcht merelcht requested a review from idanov March 15, 2021 15:17
Copy link
Collaborator

@limdauto limdauto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. I think we are nearly there. Could we add an unit tests for _expand_namespace to document its behaviour please? Also if we make the logic to prettify a modular pipeline name into a function to unit test it as well, that'd be helpful.

Copy link
Member

@idanov idanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see we're getting there and almost ready! There's two main things to address here:

  • In the current implementation, the membership of a dataset is determined by the nodes using it, which is incorrect. Modular pipeline membership is determined entirely from the name of the dataset, otherwise it will produce incorrect result.
  • Currently the modular pipelines for the whole pipeline do not include all expanded pipelines. Not sure if that is added to the requirement in the ticket and we can probably get away for now without adding it, but the most correct definition of the top level modular_pipelines field should be a superset of all modular_pipelines fields of all the nodes in the pipeline. That's actually a very easy way to implement it as well, so maybe worth looking into doing it this way. Could be in a separate PR though, since it's not critical :)

Well done 👏 on actually prompting a refactoring for the code here and highlighting how confusing it is already.

@merelcht
Copy link
Member Author

Glad to see we're getting there and almost ready! There's two main things to address here:

  • In the current implementation, the membership of a dataset is determined by the nodes using it, which is incorrect. Modular pipeline membership is determined entirely from the name of the dataset, otherwise it will produce incorrect result.

I'm definitely missing something here. How do you get the modular pipeline from the name of the dataset? How does a user add that? 🤯

  • Currently the modular pipelines for the whole pipeline do not include all expanded pipelines. Not sure if that is added to the requirement in the ticket and we can probably get away for now without adding it, but the most correct definition of the top level modular_pipelines field should be a superset of all modular_pipelines fields of all the nodes in the pipeline. That's actually a very easy way to implement it as well, so maybe worth looking into doing it this way. Could be in a separate PR though, since it's not critical :)

I might be misunderstanding something, but the modular pipelines added for the full pipeline does include all extended pipelines. If you look at the animals.mock.json file you'll see that spread around the nodes we have ["pipeline1", "pipeline1.data_engineering"] and ["pipeline2", "pipeline2.data_science"] and for the full pipeline we have:

"modular_pipelines": [
    {
      "id": "pipeline1",
      "name": "Pipeline1"
    },
    {
      "id": "pipeline1.data_engineering",
      "name": "Data Engineering"
    },
    {
      "id": "pipeline2",
      "name": "Pipeline2"
    },
    {
      "id": "pipeline2.data_science",
      "name": "Data Science"
    }
  ],

"source": "e27376a9",
"target": "1769e230"
"id": "pipeline2.data_science",
"name": "Data Science"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@limdauto Added pipeline2.data_science and pipeline1.data_science to show the "nameclash" case.

def get_pipelines():
return {
"de": de_pipeline(),
"ds": ds_pipeline(),
"pre_ds": pipeline(pre_ds_pipeline(), namespace="pipeline1.data_science"),
"ds": pipeline(ds_pipeline(), namespace="pipeline2.data_science"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the setup to include modular pipelines properly.

@merelcht merelcht requested review from limdauto and idanov March 19, 2021 11:44
Copy link
Collaborator

@limdauto limdauto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from one last comment around dataset namespace, this is it :D Thank you so much for pushing through!

"pipelines": [pipeline_key],
"modular_pipelines": dataset_modular_pipelines,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed that people can technically name their dataset foo.bar with foo & bar not being modular pipeline, so we need to check whether they are in the valid modular pipelines list (which comes from the nodes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, thanks for flagging that! I had that in mind for parameters, but will check for it for datasets as well 👍

Copy link
Member Author

@merelcht merelcht Mar 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 50ca16c
Removed this again as agreed in standup.

Copy link
Member

@idanov idanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done 🎉 Super curious to see this served to Viz. I think we can start experimenting even before we have the parameters work done.

@merelcht merelcht merged commit 2b817a6 into feature/modular-pipelines-complete Mar 22, 2021
@merelcht merelcht deleted the modular-pipelines-api branch March 22, 2021 10:44
studioswong added a commit that referenced this pull request Apr 16, 2021
…oncept (#421)

* [KED-1951] Backend to send modular pipelines to kedro viz (#394)

* WIP add modular pipelines

* Expose modular pipelines and add testing data

* Lint

* undo push of package-lock

* Revert package lock

* Fix lint

* Return modular_pipelines in pipeline data endpoint for selected pipeline + update test data

* Address comments on PR

* Cleanup and lint

* Add modular pipelines to datasets and parameter nodes. Some refactoring for clarity

* Temporarily skip js tests to make sure all python stuff works

* Put back JS tests for CI

* First iteration of addressing comments on PR

* Correctly deduce modular pipeline from dataset

* Add all modular pipelines for all nodes

* Check that dataset namespace is actually a modular pipeline

* Undo check if namespace is modular pipeline

* updated FE tests to match new animals dataset (#401)

* Add modular pipelines to parameter nodes (#402)

* WIP add modular pipelines

* Expose modular pipelines and add testing data

* Lint

* undo push of package-lock

* Revert package lock

* Revert package lock

* Fix lint

* Return modular_pipelines in pipeline data endpoint for selected pipeline + update test data

* Address comments on PR

* Cleanup and lint

* Add modular pipelines to datasets and parameter nodes. Some refactoring for clarity

* Temporarily skip js tests to make sure all python stuff works

* Put back JS tests for CI

* First iteration of addressing comments on PR

* Correctly deduce modular pipeline from dataset

* Add all modular pipelines for all nodes

* Check that dataset namespace is actually a modular pipeline

* Undo check if namespace is modular pipeline

* Add modular pipelines for parameter nodes

* Verify if modular pipelines listed are actual modular piplines

* Temporarily disable JS tests to make sure other steps pass

* Put back JS tests, all other checks pass ✅

* Update package/kedro_viz/server.py

Co-authored-by: Ivan Danov <[email protected]>

* Address review comments

* Treat dataset node modular pipelines the same as task node modular pipelines.

Co-authored-by: Ivan Danov <[email protected]>

* Send correct format for modular pipelines from /pipelines endpoint (#408)

* Modular pipeline tags concept (#410)

* set up store functions for incoming modular pipeline data

* added additional test for modular pipeline

* set up flag for modular pipeline

* set up selector to get list of modular pipelines and nodes

* add ncheck for node modular pipeline data in selector

* set up modular pipeline on sidebar

* refactor node-list to enable change of both modular pipeline and tags

* further setup reducer and node selection

* added item label check

* hide modular pipeline features behind a flag

* fix failing tests and set up new data set

* added tests for modular pipeline actions

* further revisions

* enable indented styling for lazy list

* update readme to explain modular pipeline and tag filtering behaviour

* Fix pylint

* updates as per PR comments

* further adjustments per PR comments

* update tests to reflect latest PR changes

* refactor getItemLabel in node-list-row-list

* fix spelling in random-data

* further refactoring of getItemLabel

Co-authored-by: Merel Theisen <[email protected]>

* quick fix to ensure selector works for pipelines with no defined modular pipelines

* delete unneeded selector

* delete unneeded selector

* Bugfix: Ensure JSON backwards-compatibility

The application should still work without throwing an error, even when "modular_pipelines" is not present in the JSON dataset

Co-authored-by: Merel Theisen <[email protected]>
Co-authored-by: Ivan Danov <[email protected]>
Co-authored-by: Merel Theisen <[email protected]>
Co-authored-by: Richard Westenra <[email protected]>
@studioswong studioswong mentioned this pull request Apr 16, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants