The single objective of kedro-inspect is to decouple the representation of a Kedro pipeline from its implementation and execution. This is useful for inspecting the pipeline without having access to the Kedro project or setting up dependencies that are only needed when running the pipeline.
Once we isolate the pipeline representation, we can use it for various purposes, such as analysing its structure, document it, or share it with others.
This representation can be saved to a static file (e.g. JSON). Then, the saved pipeline can be visualized using the Kedro-Viz package, or any other tool (written in any programming language) that can read the pipeline file format.
The plan is to inspect the pipeline better, i.e. add more information to the pipeline representation over time, such as fine-grained type information or package dependencies per node.
This added information can be useful for various purposes, such as:
- Generating documentation & schemas for the pipeline
- Visualisation
- Optimising pipeline execution
- Generating a pipeline test suite
Kedro provides serialisation of the pipeline. The crucial difference is that kedro-inspect does not require the Kedro project, hence can be used without setting up the project or its dependencies.
usage: kedro-inspect [-h] [-p PIPELINE] [-o OUTPUT] [--indent INDENT] project_path
Inspect a Kedro pipeline.
positional arguments:
project_path path to the Kedro project
optional arguments:
-h, --help show this help message and exit
-p PIPELINE, --pipeline PIPELINE
name of the pipeline to inspect (default: __default__)
-o OUTPUT, --output OUTPUT
path to the output file (default: None)
--indent INDENT indentation for JSON output (default: None)
Running kedro-inspect
on spaceflights-pandas, we get a list of representations
of the nodes in the pipeline. For example, the first node is represented as
follows:
"nodes": [
{
"name": "preprocess_companies_node",
"tags": [],
"confirms": [],
"namespace": null,
"inputs": "companies",
"outputs": "preprocessed_companies",
"function": {
"func": "spaceflights_pandas.pipelines.data_processing.nodes.preprocess_companies",
"parameters": [
{
"name": "companies",
"kind": "POSITIONAL_OR_KEYWORD",
"type_hint": "pandas.core.frame.DataFrame"
}
],
"return_value": "pandas.core.frame.DataFrame"
},
"param_to_input": {
"companies": [
"companies"
]
}
},
...
]