Refactor flowchart models for modular pipelines change #1937

ravi-kumar-pilla · 2024-06-10T19:38:52Z

Description

Partially resolves - #1899

Development notes

Removes namespace property from GraphNode and retain the property only for TaskNode following Kedro Framework
The data type of modular pipelines is changed to Optional[Set(str)]
Updated all the create_node class methods like (create_task_node, create_data_node, create_parameters_node etc) to accept modular_pipelines parameter as we are populating modular pipelines before creating the nodes and associating each node with the associated modular pipeline during node creation.
Remove internal_inputs/outputs and external_inputs/outputs from ModularPipelineNode. We will only have inputs/outputs for a ModularPipelineNode (i.e., the datasets which are free inputs/outputs as per Kedro)
Update tests

QA notes

This PR is part of a bigger refactor (Refactor Namespace Pipelines #1897) of modular pipelines and is created to ease review process.
The CI build might fail as this PR is not self-sufficient

Checklist

Read the contributing guidelines
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added new entries to the RELEASE.md file
Added tests to cover my changes

Signed-off-by: ravi-kumar-pilla <[email protected]>

merelcht

In the description you mention

Removes namespace property from GraphNode and retain the property only for TaskNode following Kedro Framework

and I see a lot of methods related to namespaces have been removed. Can you explain why that change was needed and how namespaces are now set and used in Viz?

ravi-kumar-pilla · 2024-06-21T00:28:48Z

In the description you mention

Removes namespace property from GraphNode and retain the property only for TaskNode following Kedro Framework

and I see a lot of methods related to namespaces have been removed. Can you explain why that change was needed and how namespaces are now set and used in Viz?

So earlier we used to have namespaces at the GraphNode level (which includes TaskNode, DataNode and others). After having discussion with Ivan, we realized datasets were never a part of a namespace and to have the same schema as Kedro, we removed the namespace from GraphNode and retained it to TaskNode.

For a TaskNode namespace is set as before via kedro object like -

    @field_validator("namespace")
    @classmethod
    def set_namespace(cls, _, info: ValidationInfo):
        return info.data["kedro_obj"].namespace

For other nodes, we do not need namespace. Lot of methods related to namespace like _expand_namespaces, _get_namespace in GraphNode are not required as they were used to calculate the modular_pipelines each node belongs to . Since we shifted the logic ( as you mentioned here ) before creating the nodes, we do not need them now.

Thank you

idanov

Nice :)

idanov · 2024-06-25T12:19:56Z

package/kedro_viz/models/flowchart.py

-    # https://kedro.readthedocs.io/en/latest/06_nodes_and_pipelines/03_modular_pipelines.html#how-to-connect-existing-pipelines
-    internal_inputs: Set[str] = Field(
-        set(), description="The dataset inputs within the modular pipeline node"
+    inputs: Set[str] = Field(


So much clearer now 👌

idanov · 2024-06-25T12:20:10Z

package/kedro_viz/models/flowchart.py

-        description="""The dataset outputs connecting the modular
-        pipeline node with other modular pipelines""",
+
+    outputs: Set[str] = Field(


So much clearer now 👌

idanov · 2024-06-25T12:22:24Z

package/kedro_viz/models/flowchart.py

            name=dataset_name,
            tags=tags,
            layer=layer,
            kedro_obj=dataset,
            is_free_input=is_free_input,
            stats=stats,
+            modular_pipelines=modular_pipelines,
        )

    @classmethod
    def create_parameters_node(


These factory methods look more and more useless to me. They seem to be simply indirections to the constructors and making the creating of the specific nodes more verbose without any benefit from having them. We should revisit them at some point.

idanov · 2024-06-25T12:25:50Z

package/kedro_viz/models/flowchart.py

        Returns:
            An instance of TaskNode.
        """
        node_name = node._name or node._func_name
        return TaskNode(
-            id=cls._hash(str(node)),
+            id=node_id,


By not deriving the node_id automatically, we make these create methods even more useless.

split refactor modular pipelines model changes

f2c6a7b

Signed-off-by: ravi-kumar-pilla <[email protected]>

ravi-kumar-pilla mentioned this pull request Jun 10, 2024

Refactor modular pipelines #1941

Closed

9 tasks

ravi-kumar-pilla added 2 commits June 11, 2024 09:16

revert run command

9362a86

Signed-off-by: ravi-kumar-pilla <[email protected]>

revert task node namespace change

e44316a

Signed-off-by: ravi-kumar-pilla <[email protected]>

merelcht reviewed Jun 19, 2024

View reviewed changes

ravi-kumar-pilla marked this pull request as ready for review June 24, 2024 15:02

ravi-kumar-pilla requested a review from rashidakanchwala as a code owner June 24, 2024 15:02

idanov approved these changes Jun 26, 2024

View reviewed changes

merelcht approved these changes Jun 27, 2024

View reviewed changes

rashidakanchwala mentioned this pull request Jul 2, 2024

Refactor Namespace Pipelines #1897

Merged

9 tasks

rashidakanchwala closed this Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor flowchart models for modular pipelines change #1937

Refactor flowchart models for modular pipelines change #1937

ravi-kumar-pilla commented Jun 10, 2024 •

edited

Loading

merelcht left a comment

ravi-kumar-pilla commented Jun 21, 2024

idanov left a comment

idanov Jun 25, 2024

idanov Jun 25, 2024

idanov Jun 25, 2024

idanov Jun 25, 2024

Refactor flowchart models for modular pipelines change #1937

Refactor flowchart models for modular pipelines change #1937

Conversation

ravi-kumar-pilla commented Jun 10, 2024 • edited Loading

Description

Development notes

QA notes

Checklist

merelcht left a comment

Choose a reason for hiding this comment

ravi-kumar-pilla commented Jun 21, 2024

idanov left a comment

Choose a reason for hiding this comment

idanov Jun 25, 2024

Choose a reason for hiding this comment

idanov Jun 25, 2024

Choose a reason for hiding this comment

idanov Jun 25, 2024

Choose a reason for hiding this comment

idanov Jun 25, 2024

Choose a reason for hiding this comment

ravi-kumar-pilla commented Jun 10, 2024 •

edited

Loading