Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualisation of Kedro Hooks #836

Closed
3 tasks
daBlesr opened this issue Apr 28, 2022 · 5 comments
Closed
3 tasks

Visualisation of Kedro Hooks #836

daBlesr opened this issue Apr 28, 2022 · 5 comments

Comments

@daBlesr
Copy link

daBlesr commented Apr 28, 2022

Description

Currently, I would like to get insight which of our nodes/datasets have a hook implementation.

Context

I have written an implementation that reads a yaml file with great expectations configurations (not using Kedro-Great) for specific nodes and datasets. A Hook class reads this config file and executes the respective validation rules on the input/output dataframes to either make the pipeline fail or output warnings. I'd like to get insight in the Kedro Visualisation which of the input/output datasets have great expectations rules applied to them.

Possible Implementation

I have too little knowledge on how kedro-viz works exactly, so forgive my ignorance: Export per node and dataset which hooks have an implementation, by maybe overriding a method on a Hook class that returns a list of nodes/datasets with some metadata (the specific GE rules per node/dataset).

Alternative Implementation

A specific GE implementation reading the GE config file, and appending the results to the saved json file. This is possible for me to write for myself, but then it would not be something generalisable.

Checklist

  • Append saved json file with some additional data on hooks x datasets/nodes
  • Visualize the datasets and nodes with a mark that it has Hook behaviour
  • Render the metadata for the dataset/node.

Possibly related: #194

@tynandebold
Copy link
Member

@MerelTheisenQB do you have any thoughts here about this idea now that hooks are becoming more popular with the release of 0.18? Have you thought at all about what we could do with hooks on the Viz side?

cc @AntonyMilneQB @noklam

@limdauto
Copy link
Collaborator

limdauto commented May 4, 2022

This is going to be challenging. Hooks are not declarative. Its behaviour, e.g. which nodes to apply to, changes at runtime.

To be able to do this, we need runtime information, which isn't all bad. We can use this as an excuse to go realtime.

@antonymilne
Copy link
Contributor

antonymilne commented May 4, 2022

This is a really cool idea, but unfortunately I agree with @limdauto that this it sounds very tricky to do. I don't even see how runtime information would help here actually? e.g. if I have a hook

def before_node_run(node: Node):
    if node.name == "blah":
        do_stuff()

then how do I detect that it's acting on node "blah" and no others? Would be really interested in understanding how you think it might work @limdauto.

My first thought here is that this sort of customisation might somehow be enabled by something like #662. Here you would add some viz_widgets attribute to relevant entries in your data catalog (or maybe a new metadata.yml file that gets picked up by viz) saying which hooks are applied to which datasets. The advantage of this is that it's not GE hook-specific: you could use it to somehow inject custom metadata for any dataset. There are disadvantages too, e.g. need to maintain multiple yml files rather than working automatically in realtime, how to extend to work for nodes.

@daBlesr
Copy link
Author

daBlesr commented May 4, 2022

An idea to support this feature would be to extend the Hook class with some visualisation specific logic. A draft of what this could look like:

class DataValidationHook():

    @hook_impl
    def before_node_run(self, inputs: Dict[str, Any]) -> None:
        ...

    @viz_impl
    def viz_behaviour(self) -> HookVizBehaviour:
        return DataValidationVizBehaviour()

class DataValidationVizBehaviour(HookVizBehaviour):

    @hook_impl
    def before_node_run(self, node: Node, inputs: Dict[str, Any]) -> None:
        if some_situation:
            self.add_node(node, meta=some_meta_info)

@tynandebold
Copy link
Member

Hey @daBlesr, curious what your main use case is for this? I see from your example code you're doing something with data validation and perhaps want to see the outcome of that. Is there anything else you're doing that you'd like to see visualized?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants