Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate YAML files without loading the nodes #2438

Merged
merged 80 commits into from
May 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
172c5e9
Remove BasePipeline and make a module for RayPipeline
ZanSara Apr 14, 2022
ee97934
Can load pipelines from yaml, plenty of issues left
ZanSara Apr 14, 2022
47f82df
Extract graph validation logic into _add_node_to_pipeline_graph & ref…
ZanSara Apr 20, 2022
c50ad91
Update Documentation & Code Style
github-actions[bot] Apr 20, 2022
0db7762
Merge branch 'master' into validate_yaml_without_loading
ZanSara Apr 20, 2022
4b978f4
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara Apr 20, 2022
9fc20a3
Update Documentation & Code Style
github-actions[bot] Apr 20, 2022
f521afe
Fix pipeline tests
ZanSara Apr 20, 2022
91c6276
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara Apr 20, 2022
2ed4eaa
Update Documentation & Code Style
github-actions[bot] Apr 20, 2022
ede24b5
Move some tests out of test_pipeline.py and create MockDenseRetriever
ZanSara Apr 20, 2022
a2d054c
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara Apr 20, 2022
af70ba9
Update Documentation & Code Style
github-actions[bot] Apr 20, 2022
0dbf466
myoy and pylint (silencing too-many-public-methods)
ZanSara Apr 20, 2022
fec9d02
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara Apr 20, 2022
b2ebd88
Fix issue found in some yaml files and in schema files
ZanSara Apr 20, 2022
3aa66a9
Update Documentation & Code Style
github-actions[bot] Apr 20, 2022
62fcf4a
Unused import
ZanSara Apr 20, 2022
11cb865
Remove duplicate test
ZanSara Apr 20, 2022
b11ea05
Fix paths to YAML and fix some typos in Ray
ZanSara Apr 20, 2022
6b51db2
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara Apr 20, 2022
e0af829
Fix eval tests
ZanSara Apr 20, 2022
2b0126a
Update Documentation & Code Style
github-actions[bot] Apr 20, 2022
6bdc9f1
Simplify MockDenseRetriever
ZanSara Apr 20, 2022
6a12508
Typo
ZanSara Apr 20, 2022
92d0fcc
Fix Ray test
ZanSara Apr 20, 2022
7be796a
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara Apr 20, 2022
95f1c6f
Accidentally pushed merge coinflict, fixed
ZanSara Apr 20, 2022
f310fd4
Typo in schemas
ZanSara Apr 21, 2022
5c65da3
Typo in _json_schema.py
ZanSara Apr 21, 2022
32c21da
Slightly reduce noisyness of version validation warnings
ZanSara Apr 21, 2022
f568d44
Update Documentation & Code Style
github-actions[bot] Apr 21, 2022
f44de23
Fix version logs tests
ZanSara Apr 21, 2022
53a5187
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara Apr 21, 2022
89872b7
Fix version logs tests again
ZanSara Apr 21, 2022
5a378b8
remove seemingly unused file
ZanSara Apr 21, 2022
da06f2e
Add check and test to avoid adding the same node to the pipeline twice
ZanSara Apr 26, 2022
68a3720
Update Documentation & Code Style
github-actions[bot] Apr 26, 2022
636b195
Revert config to pipeline_config
ZanSara Apr 26, 2022
6c47016
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara Apr 26, 2022
1585212
Remo0ve unused import
ZanSara Apr 26, 2022
08c2530
Merge branch 'master' into validate_yaml_without_loading
ZanSara Apr 26, 2022
29727f2
Complete reverting to pipeline_config
ZanSara Apr 26, 2022
92f29b5
Some more stray config=
ZanSara Apr 26, 2022
657e630
Update Documentation & Code Style
github-actions[bot] Apr 26, 2022
1077eba
Feedback
ZanSara May 2, 2022
a0e1e81
Move back other_nodes tests into pipeline tests temporarily
ZanSara May 2, 2022
e8dcd1a
Update Documentation & Code Style
github-actions[bot] May 2, 2022
99883b3
Fixing tests
ZanSara May 2, 2022
9d885fe
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara May 2, 2022
23739d2
Update Documentation & Code Style
github-actions[bot] May 2, 2022
207d4f9
Fixing ray and standard pipeline tests
ZanSara May 2, 2022
1561a2d
Rename colliding load() methods in dense retrievers and faiss
ZanSara May 2, 2022
121d8c6
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara May 2, 2022
1c67df6
Update Documentation & Code Style
github-actions[bot] May 2, 2022
7f8148e
Fix mypy on ray.py as well
ZanSara May 2, 2022
22e13d1
Add check for no root node
ZanSara May 2, 2022
079cdb5
Fix tests to use load_from_directory and load_index
ZanSara May 2, 2022
9c94689
Try to workaround the disabled add_node of RayPipeline
ZanSara May 2, 2022
cc6c17f
Update Documentation & Code Style
github-actions[bot] May 2, 2022
95045f7
Fix Ray test
ZanSara May 3, 2022
c994f71
Fix FAISS tests
ZanSara May 3, 2022
fb3504c
Relax class check in _add_node_to_pipeline_graph
ZanSara May 3, 2022
90b2a51
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara May 3, 2022
099112a
Update Documentation & Code Style
github-actions[bot] May 3, 2022
b0ec2ed
Try to fix mypy in ray.py
ZanSara May 3, 2022
9e95cce
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara May 3, 2022
fbc67b8
unused import
ZanSara May 3, 2022
8c22014
Try another fix for Ray
ZanSara May 3, 2022
e19c541
Fix connector tests
ZanSara May 3, 2022
c997a03
Update Documentation & Code Style
github-actions[bot] May 3, 2022
a0aa7be
Fix ray
ZanSara May 3, 2022
6a7eaaf
Update Documentation & Code Style
github-actions[bot] May 3, 2022
b4ee7ba
use BaseComponent.load() in pipelines/base.py
ZanSara May 3, 2022
6a1c48a
another round of feedback
ZanSara May 4, 2022
47462a5
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara May 4, 2022
5e590a1
stray BaseComponent.load()
ZanSara May 4, 2022
2be673d
Update Documentation & Code Style
github-actions[bot] May 4, 2022
28dd934
Fix FAISS tests too
ZanSara May 4, 2022
1cc52ed
Merge branch 'validate_yaml_without_loading' of github.com:deepset-ai…
ZanSara May 4, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
264 changes: 72 additions & 192 deletions docs/_src/api/api/pipelines.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/_src/api/pydoc/pipelines.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
loaders:
- type: python
search_path: [../../../../haystack/pipelines]
modules: ['base', 'standard_pipelines']
modules: ['base', 'ray', 'standard_pipelines']
ignore_when_discovered: ['__init__']
processors:
- type: filter
Expand Down
4 changes: 2 additions & 2 deletions haystack/document_stores/faiss.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,8 +180,8 @@ def _validate_index_sync(self):
# used when creating the original FAISS index
if not self.get_document_count() == self.get_embedding_count():
raise ValueError(
"The number of documents present in the SQL database does not "
"match the number of embeddings in FAISS. Make sure your FAISS "
f"The number of documents present in the SQL database ({self.get_document_count()}) does not "
f"match the number of embeddings in FAISS ({self.get_embedding_count()}). Make sure your FAISS "
"configuration file correctly points to the same database that "
"was used when creating the original index."
)
Expand Down
2 changes: 1 addition & 1 deletion haystack/json-schemas/haystack-pipeline-1.0.0.schema.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://haystack.deepset.ai/json-schemas/haystack-pipeline-1.0.0.schema.json",
"$id": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-1.0.0.schema.json",
"title": "Haystack Pipeline",
"description": "Haystack Pipeline YAML file describing the nodes of the pipelines. For more info read the docs at: https://haystack.deepset.ai/components/pipelines#yaml-file-definitions",
"type": "object",
Expand Down
2 changes: 1 addition & 1 deletion haystack/json-schemas/haystack-pipeline-1.1.0.schema.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://haystack.deepset.ai/json-schemas/haystack-pipeline-1.1.0.schema.json",
"$id": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-1.1.0.schema.json",
"title": "Haystack Pipeline",
"description": "Haystack Pipeline YAML file describing the nodes of the pipelines. For more info read the docs at: https://haystack.deepset.ai/components/pipelines#yaml-file-definitions",
"type": "object",
Expand Down
2 changes: 1 addition & 1 deletion haystack/json-schemas/haystack-pipeline-1.2.0.schema.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://haystack.deepset.ai/json-schemas/haystack-pipeline-1.2.0.schema.json",
"$id": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-1.2.0.schema.json",
"title": "Haystack Pipeline",
"description": "Haystack Pipeline YAML file describing the nodes of the pipelines. For more info read the docs at: https://haystack.deepset.ai/components/pipelines#yaml-file-definitions",
"type": "object",
Expand Down
2 changes: 1 addition & 1 deletion haystack/json-schemas/haystack-pipeline-1.3.0.schema.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://haystack.deepset.ai/haystack/json-schemas/haystack-pipeline-1.2.1rc0.schema.json",
"$id": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-1.2.1rc0.schema.json",
"title": "Haystack Pipeline",
"description": "Haystack Pipeline YAML file describing the nodes of the pipelines. For more info read the docs at: https://haystack.deepset.ai/components/pipelines#yaml-file-definitions",
"type": "object",
Expand Down
2 changes: 1 addition & 1 deletion haystack/json-schemas/haystack-pipeline-master.schema.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://haystack.deepset.ai/haystack/json-schemas/haystack-pipeline-master.schema.json",
"$id": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-master.schema.json",
"title": "Haystack Pipeline",
"description": "Haystack Pipeline YAML file describing the nodes of the pipelines. For more info read the docs at: https://haystack.deepset.ai/components/pipelines#yaml-file-definitions",
"type": "object",
Expand Down
12 changes: 6 additions & 6 deletions haystack/json-schemas/haystack-pipeline.schema.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://haystack.deepset.ai/json-schemas/haystack-pipeline.schema.json",
"$id": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline.schema.json",
"title": "Haystack Pipeline",
"description": "Haystack Pipeline YAML file describing the nodes of the pipelines. For more info read the docs at: https://haystack.deepset.ai/components/pipelines#yaml-file-definitions",
"type": "object",
Expand All @@ -15,7 +15,7 @@
}
},
{
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/json-schemas/haystack-pipeline-master.schema.json"
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-master.schema.json"
}
]
},
Expand All @@ -29,7 +29,7 @@
}
},
{
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/json-schemas/haystack-pipeline-1.0.0.schema.json"
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-1.0.0.schema.json"
}
]
},
Expand All @@ -43,7 +43,7 @@
}
},
{
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/json-schemas/haystack-pipeline-1.1.0.schema.json"
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-1.1.0.schema.json"
}
]
},
Expand All @@ -57,7 +57,7 @@
}
},
{
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/json-schemas/haystack-pipeline-1.2.0.schema.json"
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-1.2.0.schema.json"
}
]
},
Expand All @@ -71,7 +71,7 @@
}
},
{
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/json-schemas/haystack-pipeline-1.3.0.schema.json"
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/haystack-pipeline-1.3.0.schema.json"
}
]
}
Expand Down
4 changes: 2 additions & 2 deletions haystack/nodes/_json_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@


JSON_SCHEMAS_PATH = Path(__file__).parent.parent.parent / "haystack" / "json-schemas"
SCHEMA_URL = "https://haystack.deepset.ai/haystack/json-schemas/"
SCHEMA_URL = "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/"
tstadel marked this conversation as resolved.
Show resolved Hide resolved

# Allows accessory classes (like enums and helpers) to be registered as valid input for
# custom node's init parameters. For now we disable this feature, but flipping this variables
Expand Down Expand Up @@ -351,7 +351,7 @@ def update_json_schema(destination_path: Path = JSON_SCHEMAS_PATH):
"allOf": [
{"properties": {"version": {"const": haystack_version}}},
{
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/json-schemas/"
"$ref": "https://raw.githubusercontent.com/deepset-ai/haystack/master/haystack/json-schemas/"
f"haystack-pipeline-{haystack_version}.schema.json"
},
]
Expand Down
46 changes: 17 additions & 29 deletions haystack/nodes/base.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from __future__ import annotations
from typing import Any, Optional, Dict, List, Tuple
from typing import Any, Optional, Dict, List, Tuple, Type

from copy import deepcopy
from abc import ABC, abstractmethod
Expand Down Expand Up @@ -109,47 +109,24 @@ def get_params(self, return_defaults: bool = False) -> Dict[str, Any]:
return params

@classmethod
def get_subclass(cls, component_type: str):
def get_subclass(cls, component_type: str) -> Type[BaseComponent]:
if component_type not in cls._subclasses.keys():
raise PipelineSchemaError(f"Haystack component with the name '{component_type}' not found.")
subclass = cls._subclasses[component_type]
return subclass

@classmethod
def load_from_args(cls, component_type: str, **kwargs):
def _create_instance(cls, component_type: str, component_params: Dict[str, Any]):
"""
Load a component instance of the given type using the kwargs.
Returns an instance of the given subclass of BaseComponent.

:param component_type: name of the component class to load.
:param kwargs: parameters to pass to the __init__() for the component.
:param component_params: parameters to pass to the __init__() for the component.
"""
subclass = cls.get_subclass(component_type)
instance = subclass(**kwargs)
instance = subclass(**component_params)
return instance

@classmethod
def load_from_pipeline_config(cls, pipeline_config: dict, component_name: str):
"""
Load an individual component from a YAML config for Pipelines.

:param pipeline_config: the Pipelines YAML config parsed as a dict.
:param component_name: the name of the component to load.
"""
if pipeline_config:
all_component_configs = pipeline_config["components"]
all_component_names = [comp["name"] for comp in all_component_configs]
component_config = next(comp for comp in all_component_configs if comp["name"] == component_name)
component_params = component_config["params"]

for key, value in component_params.items():
if value in all_component_names: # check if the param value is a reference to another component
component_params[key] = cls.load_from_pipeline_config(pipeline_config, value)

component_instance = cls.load_from_args(component_config["type"], **component_params)
else:
component_instance = cls.load_from_args(component_name)
return component_instance

@abstractmethod
def run(
self,
Expand Down Expand Up @@ -251,3 +228,14 @@ def _get_signature(cls) -> Dict[str, inspect.Parameter]:
for param_key, parameter in inspect.signature(class_).parameters.items()
}
return component_signature


class RootNode(BaseComponent):
"""
RootNode feeds inputs together with corresponding params to a Pipeline.
"""

outgoing_edges = 1

def run(self): # type: ignore
return {}, "output_1"
3 changes: 2 additions & 1 deletion haystack/pipelines/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from haystack.pipelines.base import Pipeline, RootNode, RayPipeline
from haystack.pipelines.base import Pipeline, RootNode
from haystack.pipelines.ray import RayPipeline
from haystack.pipelines.standard_pipelines import (
BaseStandardPipeline,
DocumentSearchPipeline,
Expand Down
Loading