Node caching: avoid unnecessary node execution when inputs, outputs, or logic remain unchanged #4350

froxec · 2024-11-24T20:59:55Z

Description

First of all, thank you for your efforts in developing Kedro.
I believe it would be highly beneficial if Kedro had a built-in node caching feature. By node caching, I mean a mechanism to avoid re-executing a node when its inputs, outputs, and logic remain unchanged.

Context

This feature is important to me because, in some scenarios, it is necessary to run the entire pipeline multiple times with different configurations. Re-executing nodes that remain unchanged between runs can significantly increase the time required for experiments.

For instance, when tracking pipeline parameters using MLFlow, we need to run the entire pipeline to record parameters for every node. This is because kedro-mlflow records parameters node by node.

Possible Implementation

There is already an existing plugin, kedro-cache, that implements similar functionality. The plugin is well-written and could work effectively with some adjustments. However, it is outdated and incompatible with the most recent Kedro releases. Moreover, there are compatibility issues with specific datasets, such as tracking.JSONDataset and tracking.MetricsDataset, which are write-only and cannot be loaded.

I believe that integrating node caching directly into Kedro's core design would help mitigate such compatibility issues and provide a more robust solution for users.

froxec added the Issue: Feature Request New feature or improvement to existing feature label Nov 24, 2024

merelcht added the Community Issue/PR opened by the open-source community label Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node caching: avoid unnecessary node execution when inputs, outputs, or logic remain unchanged #4350

Node caching: avoid unnecessary node execution when inputs, outputs, or logic remain unchanged #4350

froxec commented Nov 24, 2024

Node caching: avoid unnecessary node execution when inputs, outputs, or logic remain unchanged #4350

Node caching: avoid unnecessary node execution when inputs, outputs, or logic remain unchanged #4350

Comments

froxec commented Nov 24, 2024

Description

Context

Possible Implementation