Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightweight Kedro Viz Experimentation using AST #1966

Merged
merged 63 commits into from
Sep 3, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
0e7f24d
merge main from remote
ravi-kumar-pilla Apr 25, 2024
c1aae75
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Apr 26, 2024
177ccbc
merging remote
ravi-kumar-pilla May 1, 2024
8ecf9bf
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 2, 2024
37f3bf4
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 8, 2024
499d8c4
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 14, 2024
b3ab479
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 16, 2024
e295e92
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 20, 2024
905b198
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 21, 2024
490a89f
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 30, 2024
c1a099b
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla May 31, 2024
573e3c0
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 10, 2024
5a12c65
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 13, 2024
960c113
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 18, 2024
49c05b1
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 21, 2024
354e024
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 21, 2024
60e2f27
Merge branch 'main' of https://github.com/kedro-org/kedro-viz
ravi-kumar-pilla Jun 26, 2024
52c2060
partially working parser - WIP
ravi-kumar-pilla Jun 27, 2024
cfd99a7
partial working commit
ravi-kumar-pilla Jun 29, 2024
de4a4ef
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 3, 2024
7125927
testing show code
ravi-kumar-pilla Jul 3, 2024
bff5a4c
adjust file permissions
ravi-kumar-pilla Jul 3, 2024
3038afd
update comments and rename parser file
ravi-kumar-pilla Jul 3, 2024
0e91504
remove gitignore
ravi-kumar-pilla Jul 3, 2024
a4b3b1a
handle func lambda case
ravi-kumar-pilla Jul 3, 2024
0a80f6c
mocking working draft proposal
ravi-kumar-pilla Jul 12, 2024
e31242f
reuse session with mock modules
ravi-kumar-pilla Jul 15, 2024
8b8e337
wip integration tests
ravi-kumar-pilla Jul 17, 2024
8e0ae73
sporadic working needs testing
ravi-kumar-pilla Jul 18, 2024
38782e3
update sys modules with patch
ravi-kumar-pilla Jul 18, 2024
1fc1faf
fix lint and pytests
ravi-kumar-pilla Jul 18, 2024
98361e3
add dataset factories test
ravi-kumar-pilla Jul 22, 2024
e120ccc
add e2e test
ravi-kumar-pilla Jul 22, 2024
a711cf0
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 22, 2024
b7a1862
fix CI
ravi-kumar-pilla Jul 22, 2024
c5a6f2a
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 22, 2024
06e35bf
dataset factory pattern support in lite mode
ravi-kumar-pilla Jul 23, 2024
78cd413
add doc strings
ravi-kumar-pilla Jul 23, 2024
f2dda93
add e2e test and clear unused func
ravi-kumar-pilla Jul 24, 2024
bfe069f
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 24, 2024
35f1ed5
Merge branch 'main' into feature/kedro-viz-lite
ravi-kumar-pilla Jul 24, 2024
1cffd8a
Merge branch 'main' into feature/kedro-viz-lite
ravi-kumar-pilla Jul 25, 2024
fc8f7e4
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Jul 30, 2024
c31fbda
Merge branch 'main' into feature/kedro-viz-lite
ravi-kumar-pilla Aug 9, 2024
bc4aea2
testing relative to absolute imports
ravi-kumar-pilla Aug 13, 2024
60f9cd3
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Aug 16, 2024
8162147
testing relative imports
ravi-kumar-pilla Aug 16, 2024
840cb9f
working draft for relative imports multi-level
ravi-kumar-pilla Aug 17, 2024
76e3c2b
remove resolving relative dependencies
ravi-kumar-pilla Aug 19, 2024
2d18e9a
test
ravi-kumar-pilla Aug 19, 2024
16e1ef5
working draft
ravi-kumar-pilla Aug 19, 2024
8c6d878
modify test and standalone support for lite
ravi-kumar-pilla Aug 19, 2024
f9de2fe
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Aug 19, 2024
db1b416
improve readability
ravi-kumar-pilla Aug 20, 2024
fe09d20
fix lint and pytest
ravi-kumar-pilla Aug 20, 2024
fefafa6
revert link redirect
ravi-kumar-pilla Aug 21, 2024
ae94f1e
remove side effects
ravi-kumar-pilla Aug 21, 2024
57ea66a
Merge branch 'main' of https://github.com/kedro-org/kedro-viz into fe…
ravi-kumar-pilla Aug 22, 2024
45da624
pr suggestions addressed
ravi-kumar-pilla Aug 22, 2024
bcdd304
fix dict issue
ravi-kumar-pilla Aug 22, 2024
f4cd1dd
merge main
ravi-kumar-pilla Aug 22, 2024
050bff2
moved package check under dirs and add exception block
ravi-kumar-pilla Aug 22, 2024
63b9fd3
merge main
ravi-kumar-pilla Sep 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 54 additions & 29 deletions package/kedro_viz/integrations/kedro/data_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@
from typing import Any, Dict, Optional, Tuple

from kedro import __version__
from kedro.framework.project import configure_project, pipelines
from kedro.framework.session import KedroSession
from kedro.config.omegaconf_config import OmegaConfigLoader
from kedro.framework.context.context import KedroContext
from kedro.framework.session.store import BaseSessionStore
from kedro.framework.startup import bootstrap_project
from kedro.io import DataCatalog
from kedro.pipeline import Pipeline

from kedro_viz.constants import VIZ_METADATA_ARGS
from kedro_viz.integrations.kedro.lite_parser import parse_project

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -75,6 +75,7 @@ def load_data(
include_hooks: bool = False,
package_name: Optional[str] = None,
extra_params: Optional[Dict[str, Any]] = None,
is_lite: bool = False,
) -> Tuple[DataCatalog, Dict[str, Pipeline], BaseSessionStore, Dict]:
"""Load data from a Kedro project.
Args:
Expand All @@ -91,30 +92,54 @@ def load_data(
A tuple containing the data catalog and the pipeline dictionary
and the session store.
"""
if package_name:
configure_project(package_name)
else:
# bootstrap project when viz is run in dev mode
bootstrap_project(project_path)

with KedroSession.create(
project_path=project_path,
env=env,
save_on_close=False,
extra_params=extra_params,
) as session:
# check for --include-hooks option
if not include_hooks:
session._hook_manager = _VizNullPluginManager() # type: ignore

context = session.load_context()
session_store = session._store
catalog = context.catalog

# Pipelines is a lazy dict-like object, so we force it to populate here
# in case user doesn't have an active session down the line when it's first accessed.
# Useful for users who have `get_current_session` in their `register_pipelines()`.
pipelines_dict = dict(pipelines)
stats_dict = _get_dataset_stats(project_path)
if is_lite:
# [TODO: Confirm on the context creation]
context = KedroContext(
package_name="{{ cookiecutter.python_package }}",
project_path=project_path,
config_loader=OmegaConfigLoader(conf_source=str(project_path)),
hook_manager=_VizNullPluginManager(),
env=env,
)

# [TODO: Confirm on the session store creation]
session_store = None
ravi-kumar-pilla marked this conversation as resolved.
Show resolved Hide resolved
ravi-kumar-pilla marked this conversation as resolved.
Show resolved Hide resolved

return catalog, pipelines_dict, session_store, stats_dict
# [TODO: Confirm on the DataCatalog creation]
catalog = DataCatalog()

stats_dict = _get_dataset_stats(project_path)
pipelines_dict = dict(parse_project(project_path))
return catalog, pipelines_dict, session_store, stats_dict
else:
from kedro.framework.project import configure_project, pipelines
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project

if package_name:
configure_project(package_name)
else:
# bootstrap project when viz is run in dev mode
bootstrap_project(project_path)

with KedroSession.create(
project_path=project_path,
env=env,
save_on_close=False,
extra_params=extra_params,
) as session:
# check for --include-hooks option
if not include_hooks:
session._hook_manager = _VizNullPluginManager() # type: ignore

context = session.load_context()
session_store = session._store
catalog = context.catalog

# Pipelines is a lazy dict-like object, so we force it to populate here
# in case user doesn't have an active session down the line when it's first accessed.
# Useful for users who have `get_current_session` in their `register_pipelines()`.
pipelines_dict = dict(pipelines)
stats_dict = _get_dataset_stats(project_path)

return catalog, pipelines_dict, session_store, stats_dict
242 changes: 242 additions & 0 deletions package/kedro_viz/integrations/kedro/lite_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
import ast
import logging
from collections import defaultdict
from pathlib import Path
from typing import Dict, Iterable, List

from kedro.pipeline.modular_pipeline import pipeline as ModularPipeline
from kedro.pipeline.pipeline import Node, Pipeline

logger = logging.getLogger(__name__)


class KedroPipelineLocator(ast.NodeVisitor):
"""
Represents a pipeline that is located when parsing
the Kedro project's `create_pipeline` function

"""

def __init__(self):
self.pipeline = None

def visit_FunctionDef(self, node):
try:
if node.name == "create_pipeline":
# Explore the located pipeline for nodes
# and other keyword args
kedro_pipeline_explorer = KedroPipelineExplorer()
kedro_pipeline_explorer.visit(node)
try:
# modular pipeline
if kedro_pipeline_explorer.namespace:
self.pipeline = ModularPipeline(
pipe=kedro_pipeline_explorer.nodes,
inputs=kedro_pipeline_explorer.inputs,
outputs=kedro_pipeline_explorer.outputs,
parameters=kedro_pipeline_explorer.parameters,
tags=kedro_pipeline_explorer.tags,
namespace=kedro_pipeline_explorer.namespace,
)
else:
# kedro pipeline
self.pipeline = Pipeline(
nodes=kedro_pipeline_explorer.nodes,
tags=kedro_pipeline_explorer.tags,
ravi-kumar-pilla marked this conversation as resolved.
Show resolved Hide resolved
)
except Exception as exc:
# [TODO: Error with modular pipeline, try creating regular pipeline]
logger.error(exc)
self.pipeline = Pipeline(
nodes=kedro_pipeline_explorer.nodes,
tags=kedro_pipeline_explorer.tags,
)

self.generic_visit(node)

except Exception as exc:
# [TODO: Error with parsing the file,
# dump the visiting node for debugging]
logger.error(exc)
logger.info(ast.dump(node, indent=2))


class KedroPipelineExplorer(ast.NodeVisitor):
# [TODO: Current explorer only serves for 1 pipeline() function within a create_pipeline def]
def __init__(self):
# keeping these here for future use-case
# when dealing with multiple `pipeline()` functions
ravi-kumar-pilla marked this conversation as resolved.
Show resolved Hide resolved
# within a create_pipeline def
self.nodes: List[Node] = []
self.inputs = None
self.outputs = None
self.namespace = None
self.parameters = None
self.tags = None

def visit_Call(self, node):
if isinstance(node.func, ast.Name) and node.func.id == "pipeline":
# for a modular pipeline
# [TODO: pipe to be explored later]
# pipe: Iterable[Node | Pipeline] | Pipeline

pipeline_inputs: str | set[str] | dict[str, str] | None = None
pipeline_outputs: str | set[str] | dict[str, str] | None = None
pipeline_namespace: str | None = None
pipeline_parameters: str | set[str] | dict[str, str] | None = None
pipeline_tags: str | Iterable[str] | None = None

for keyword in node.keywords:
if keyword.arg == "namespace":
pipeline_namespace = parse_value(keyword.value)
elif keyword.arg == "inputs":
pipeline_inputs = parse_value(keyword.value)
elif keyword.arg == "outputs":
pipeline_outputs = parse_value(keyword.value)
elif keyword.arg == "parameters":
pipeline_parameters = parse_value(keyword.value)
elif keyword.arg == "tags":
pipeline_tags = parse_value(keyword.value)

# exploring nodes
for arg in node.args:
if isinstance(arg, ast.List):
for elt in arg.elts:
if (
isinstance(elt, ast.Call)
and isinstance(elt.func, ast.Name)
and elt.func.id == "node"
):
node_func = None
node_inputs: str | list[str] | dict[str, str] | None = None
node_outputs: str | list[str] | dict[str, str] | None = None
node_name: str | None = None
node_tags: str | Iterable[str] | None = None
node_confirms: str | list[str] | None = None
node_namespace: str | None = None

for keyword in elt.keywords:
# [TODO: func is WIP. Need to create a Callable]
if keyword.arg == "func":
if isinstance(keyword.value, ast.Name):
func_name = keyword.value.id
exec(
f"def {func_name}(*args, **kwargs): pass",
globals(),
)
node_func = globals()[func_name]
else:
node_func = lambda *args, **kwargs: None
elif keyword.arg == "inputs":
node_inputs = parse_value(keyword.value)
elif keyword.arg == "outputs":
node_outputs = parse_value(keyword.value)
elif keyword.arg == "name":
node_name = parse_value(keyword.value)
elif keyword.arg == "tags":
node_tags = parse_value(keyword.value)
elif keyword.arg == "confirms":
node_confirms = parse_value(keyword.value)
elif keyword.arg == "namespace":
node_namespace = parse_value(keyword.value)

# Create Node
kedro_node = Node(
func=node_func,
inputs=node_inputs,
outputs=node_outputs,
name=node_name,
tags=node_tags,
confirms=node_confirms,
namespace=node_namespace,
)

self.nodes.append(kedro_node)

# These will be used for modular pipeline creation
self.inputs = pipeline_inputs
self.outputs = pipeline_outputs
self.namespace = pipeline_namespace
self.parameters = pipeline_parameters
self.tags = pipeline_tags

self.generic_visit(node)


# Helper functions
def parse_value(keyword_value):
"""Helper to parse values assigned to node/pipeline properties"""
if isinstance(keyword_value, ast.Constant):
if not keyword_value.value:
return None
return str(keyword_value.value)
elif isinstance(keyword_value, (ast.List, ast.Set)):
return [parse_value(elt) for elt in keyword_value.elts]
elif isinstance(keyword_value, ast.Dict):
return {
parse_value(k): parse_value(v)
for k, v in zip(keyword_value.keys, keyword_value.values)
}
elif isinstance(keyword_value, ast.ListComp):
# [TODO: For list comprehensions, complex case handling]
# [Example can be found under demo_project/pipelines/modelling]
return f"ListComp({ast.dump(keyword_value)})"
elif isinstance(keyword_value, ast.DictComp):
# [TODO: For dict comprehensions, complex case handling]
# [Example can be found under demo_project/pipelines/modelling]
return f"DictComp({ast.dump(keyword_value)})"
elif isinstance(keyword_value, ast.FormattedValue):
# [TODO: For formatted strings i.e., single formatted fields,
# complex case handling]
# [Example can be found under demo_project/pipelines/modelling]
return f"FormattedValue({ast.dump(keyword_value)})"
elif isinstance(keyword_value, ast.JoinedStr):
# [TODO: For joined strings i.e., multiple formatted fields,
# complex case handling]
# [Example can be found under demo_project/pipelines/modelling]
return f"JoinedStr({ast.dump(keyword_value)})"
elif isinstance(keyword_value, ast.Name):
# [TODO: For variable references, complex case handling]
# [Example can be found under demo_project/pipelines/modelling]
return f"Variable({ast.dump(keyword_value)})"
else:
# [TODO: For any other complex case handling]
return f"Unsupported({ast.dump(keyword_value)})"


# [WIP: Naive parsing and exploring pipelines. Not sure of any better way for now]
def parse_project(project_path: Path) -> Dict[str, Pipeline]:
# Result
pipelines: Dict[str, Pipeline] = defaultdict(dict)

# Loop through all the .py files in the kedro project
# and start locating create_pipeline
for filepath in project_path.rglob("*.py"):
with open(filepath, "r") as file:
file_content = file.read()

# parse file content using ast
parsed_content_ast_node = ast.parse(file_content)

# extract pipeline name from file path
pipeline_name = filepath.relative_to(project_path).parent.name

# Locate pipelines (tested for only 1 create_pipeline per pipeline file)
# [TODO: confirm with Kedro team if more than 1 create_pipeline existence]
kedro_pipeline_locator = KedroPipelineLocator()
kedro_pipeline_locator.visit(parsed_content_ast_node)
located_pipeline = kedro_pipeline_locator.pipeline

# add to the result if a pipeline is located
if located_pipeline:
pipelines[pipeline_name] = located_pipeline

# foolproof to have atleast 1 pipeline
# so the UI won't break
if len(pipelines.keys()):
# creating a default pipeline
pipelines["__default__"] = sum(pipelines.values())
else:
pipelines["__default__"] = Pipeline(nodes=[])

return pipelines
7 changes: 7 additions & 0 deletions package/kedro_viz/launchers/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,11 @@ def viz(ctx): # pylint: disable=unused-argument
help=PARAMS_ARG_HELP,
callback=_split_params,
)
@click.option(
"--lite",
is_flag=True,
help="A flag to load an experimental light-weight Kedro Viz",
)
# pylint: disable=import-outside-toplevel, too-many-locals
def run(
host,
Expand All @@ -128,6 +133,7 @@ def run(
autoreload,
include_hooks,
params,
lite,
):
"""Launch local Kedro Viz instance"""
from kedro_viz.server import run_server
Expand Down Expand Up @@ -171,6 +177,7 @@ def run(
"include_hooks": include_hooks,
"package_name": PACKAGE_NAME,
"extra_params": params,
"is_lite": lite,
}
if autoreload:
run_process_kwargs = {
Expand Down
Loading
Loading