Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node caching: avoid unnecessary node execution when inputs, outputs, or logic remain unchanged #4350

Open
froxec opened this issue Nov 24, 2024 · 0 comments
Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature

Comments

@froxec
Copy link

froxec commented Nov 24, 2024

Description

First of all, thank you for your efforts in developing Kedro.
I believe it would be highly beneficial if Kedro had a built-in node caching feature. By node caching, I mean a mechanism to avoid re-executing a node when its inputs, outputs, and logic remain unchanged.

Context

This feature is important to me because, in some scenarios, it is necessary to run the entire pipeline multiple times with different configurations. Re-executing nodes that remain unchanged between runs can significantly increase the time required for experiments.

For instance, when tracking pipeline parameters using MLFlow, we need to run the entire pipeline to record parameters for every node. This is because kedro-mlflow records parameters node by node.

Possible Implementation

There is already an existing plugin, kedro-cache, that implements similar functionality. The plugin is well-written and could work effectively with some adjustments. However, it is outdated and incompatible with the most recent Kedro releases. Moreover, there are compatibility issues with specific datasets, such as tracking.JSONDataset and tracking.MetricsDataset, which are write-only and cannot be loaded.

I believe that integrating node caching directly into Kedro's core design would help mitigate such compatibility issues and provide a more robust solution for users.

@froxec froxec added the Issue: Feature Request New feature or improvement to existing feature label Nov 24, 2024
@merelcht merelcht added the Community Issue/PR opened by the open-source community label Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

2 participants