Classify pipeline's type based on its components #3132

vblagoje · 2022-09-01T14:30:12Z

Related Issues

The encompassing issue is https://github.com/deepset-ai/haystack-private/issues/9

Proposed Changes:

Add a method that classifies the type of the pipeline based on components found

How did you test it?

Repeated invocations from colab notebook on a branch where the update trigger is 2 min instead of 24 hours
Added a unit test for pipeline classification

Notes for the reviewer

Focus on pipeline_invocation_counter decorator in haystack/utils/reflection.py, how the even is triggered (time) and what details are sent to telemetry server

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added tests that demonstrate the correct behaviour of the change
I've used the conventional commit convention for my PR title
I documented my code
I ran pre-commit hooks and fixed any issue

julian-risch

Great to see this issue addressed so quickly now. 👍 The PR is a good start and as discussed via slack, I would suggest to add a get_type() method to the BaseStandardPipeline class as well, which would internally just call self.pipeline.get_type().
Regarding the pipeline_types it just came to my mind that an enum would be an alternative to the dictionary. We aren't expecting a pipeline to be of multiple types at the same type.
Something that might be interesting in addition is to distinguish whether the retrieval part of the different pipeline types is a sparse retriever or a dense retriever or both.

masci

Left a comment/question but overall LGTM

masci · 2022-09-06T09:33:34Z

haystack/pipelines/base.py

+            "RetrieverQuestionGenerationPipeline": lambda x: {"Retriever", "Question Generator"} <= set(x.keys()),
+            "QuestionAnswerGenerationPipeline": lambda x: {"QuestionGenerator", "Reader"} <= set(x.keys()),
+            "MostSimilarDocumentsPipeline": lambda x: len(x.values()) == 1
+            and any(comp for comp in x.values() if isinstance(comp, BaseDocumentStore)),


I'm not sure I get this: if x.values() is 1, why the comprehension comp for comp in x.values()?

Good point - thanks @masci - will correct it

bogdankostic

Nice PR! Almost good to go, I think only get_type needs to be slightly adapted as it contains cases that do not seem to be reachable.

bogdankostic · 2022-09-19T15:08:02Z

haystack/pipelines/base.py

+            "GenerativeQAPipeline": lambda x: {"Generator", "Retriever"} <= set(x.keys()),
+            "FAQPipeline": lambda x: {"Docs2Answers"} <= set(x.keys()),
+            "ExtractiveQAPipeline": lambda x: {"Reader", "Retriever"} <= set(x.keys()),
+            "DocumentSearchPipeline": lambda x: {"Retriever"} <= set(x.keys()),


I think the Pipeline types "SearchSummarizationPipeline" and "RetrieverQuestionGenerationPipeline" are not reachable, given that here we define that each Pipeline that contains a "Retriever" is a "DocumentSearchPipeline".

+1, great catch @bogdankostic

bogdankostic · 2022-09-19T15:08:51Z

haystack/pipelines/base.py

+            "SearchSummarizationPipeline": lambda x: {"Retriever", "Summarizer"} <= set(x.keys()),
+            "TranslationWrapperPipeline": lambda x: {"InputTranslator", "OutputTranslator"} <= set(x.keys()),
+            "QuestionGenerationPipeline": lambda x: {"QuestionGenerator"} <= set(x.keys()),
+            "RetrieverQuestionGenerationPipeline": lambda x: {"Retriever", "Question Generator"} <= set(x.keys()),


"Question Generator" should probably be "QuestionGenerator" here?

+1, another one

Yes, and also moved to the back of the list

vblagoje · 2022-09-20T09:07:18Z

haystack/pipelines/standard_pipelines.py

@@ -654,7 +654,7 @@ class RetrieverQuestionGenerationPipeline(BaseStandardPipeline):
    def __init__(self, retriever: BaseRetriever, question_generator: QuestionGenerator):
        self.pipeline = Pipeline()
        self.pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
-        self.pipeline.add_node(component=question_generator, name="Question Generator", inputs=["Retriever"])
+        self.pipeline.add_node(component=question_generator, name="QuestionGenerator", inputs=["Retriever"])


@bogdankostic note this. Suprised this didn't cause any issues

vblagoje · 2022-09-20T09:07:46Z

test/pipelines/test_pipeline.py

+    )
+    pipe.get_type().startswith("TranslationWrapperPipeline")
+
+    # pipe = MostSimilarDocumentsPipeline(document_store=MockDocumentStore())


@bogdankostic we have an issue with MostSimilarDocumentsPipeline

vblagoje · 2022-09-20T09:08:20Z

@bogdankostic added a unit test, and I found one subtle bug - MostSimilarDocumentsPipeline doesn't initialize the pipeline attribute. All pipeline methods that rely on this property will fail (draw, get_node, save_to_yaml, and, of course, get_type). We have to fix that issue! Today, wdyt?

vblagoje · 2022-09-20T10:00:39Z

@bogdankostic I found instances of "Question Generator" used in tests.

…on pt2

vblagoje · 2022-09-20T11:13:58Z

@masci would love to consult with you regarding this PR. By adding unit tests, I discovered some subtle bugs in MostSimilarDocumentsPipeline. See the discussion above.

bogdankostic · 2022-09-20T13:40:41Z

@bogdankostic added a unit test, and I found one subtle bug - MostSimilarDocumentsPipeline doesn't initialize the pipeline attribute. All pipeline methods that rely on this property will fail (draw, get_node, save_to_yaml, and, of course, get_type). We have to fix that issue! Today, wdyt?

I see the problem. How would you propose fixing this?

vblagoje · 2022-09-21T09:56:30Z

@bogdankostic added a unit test, and I found one subtle bug - MostSimilarDocumentsPipeline doesn't initialize the pipeline attribute. All pipeline methods that rely on this property will fail (draw, get_node, save_to_yaml, and, of course, get_type). We have to fix that issue! Today, wdyt?

I see the problem. How would you propose fixing this?

We'll fix it soon. Here is the issue

masci

Let's merge, we will follow up on the bug you found in a separate PR

Bogdan is unavailable today and we want to try shipping this with 1.9

* Add pipeline get_type mehod * Add pipeline uptime * Add pipeline telemetry event sending * Send pipeline telemetry once a day (at most) * Add pipeline invocation counter, change invocation counter logic * Update allowed telemetry parameters - allow pipeline parameters * PR review: add unit test

vblagoje requested review from a team as code owners September 1, 2022 14:30

vblagoje requested review from masci and removed request for a team September 1, 2022 14:30

julian-risch reviewed Sep 1, 2022

View reviewed changes

agnieszka-m approved these changes Sep 2, 2022

View reviewed changes

masci reviewed Sep 6, 2022

View reviewed changes

bogdankostic previously requested changes Sep 19, 2022

View reviewed changes

vblagoje added 17 commits September 20, 2022 09:20

Add pipeline get_type mehod

8ccb106

Add pipeline uptime

13daf77

Add pipeline telemetry event sending

2281837

Send pipeline telemetry once a day (at most)

875ff33

Fix docs

69943c4

Fix mypy

62c716f

Fix docs

b7096d0

Add pipeline invocation counter, change invocation counter logic

629853f

Update docs

4229c57

Update allowed telemetry parameters - allow pipeline parameters

ffeeab0

Pipeline uptime: round it to the closest second

a1cda50

Pipeline uptime: docs

c6592cc

PR review - Massi's correction

a5613fd

PR review - small docs correction

a995e24

Fix PR failures

4e44ada

PR review: add Bogdan's corrections

d7eb14f

PR review: add unit test

bb6c81b

vblagoje commented Sep 20, 2022

View reviewed changes

PR review: fix failing unit tests after QuestionGenerator concatenation

e41d6e0

PR review: fix failing unit tests after QuestionGenerator concatenati…

ece51f6

…on pt2

masci approved these changes Sep 21, 2022

View reviewed changes

vblagoje merged commit 938e6fd into deepset-ai:main Sep 21, 2022

vblagoje deleted the classify_pipe branch October 24, 2022 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classify pipeline's type based on its components #3132

Classify pipeline's type based on its components #3132

vblagoje commented Sep 1, 2022 •

edited

Loading

julian-risch left a comment

masci left a comment

masci Sep 6, 2022

vblagoje Sep 6, 2022

bogdankostic left a comment

bogdankostic Sep 19, 2022

vblagoje Sep 20, 2022

bogdankostic Sep 19, 2022

vblagoje Sep 20, 2022

vblagoje Sep 20, 2022

vblagoje Sep 20, 2022

vblagoje Sep 20, 2022

vblagoje commented Sep 20, 2022

vblagoje commented Sep 20, 2022

vblagoje commented Sep 20, 2022

bogdankostic commented Sep 20, 2022

vblagoje commented Sep 21, 2022

masci left a comment

Classify pipeline's type based on its components #3132

Classify pipeline's type based on its components #3132

Conversation

vblagoje commented Sep 1, 2022 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

julian-risch left a comment

Choose a reason for hiding this comment

masci left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bogdankostic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vblagoje commented Sep 20, 2022

vblagoje commented Sep 20, 2022

vblagoje commented Sep 20, 2022

bogdankostic commented Sep 20, 2022

vblagoje commented Sep 21, 2022

masci left a comment

Choose a reason for hiding this comment

vblagoje commented Sep 1, 2022 •

edited

Loading