-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Classify pipeline's type based on its components #3132
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this issue addressed so quickly now. 👍 The PR is a good start and as discussed via slack, I would suggest to add a get_type()
method to the BaseStandardPipeline
class as well, which would internally just call self.pipeline.get_type()
.
Regarding the pipeline_types
it just came to my mind that an enum would be an alternative to the dictionary. We aren't expecting a pipeline to be of multiple types at the same type.
Something that might be interesting in addition is to distinguish whether the retrieval part of the different pipeline types is a sparse retriever or a dense retriever or both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment/question but overall LGTM
haystack/pipelines/base.py
Outdated
"RetrieverQuestionGenerationPipeline": lambda x: {"Retriever", "Question Generator"} <= set(x.keys()), | ||
"QuestionAnswerGenerationPipeline": lambda x: {"QuestionGenerator", "Reader"} <= set(x.keys()), | ||
"MostSimilarDocumentsPipeline": lambda x: len(x.values()) == 1 | ||
and any(comp for comp in x.values() if isinstance(comp, BaseDocumentStore)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get this: if x.values()
is 1, why the comprehension comp for comp in x.values()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - thanks @masci - will correct it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR! Almost good to go, I think only get_type
needs to be slightly adapted as it contains cases that do not seem to be reachable.
haystack/pipelines/base.py
Outdated
"GenerativeQAPipeline": lambda x: {"Generator", "Retriever"} <= set(x.keys()), | ||
"FAQPipeline": lambda x: {"Docs2Answers"} <= set(x.keys()), | ||
"ExtractiveQAPipeline": lambda x: {"Reader", "Retriever"} <= set(x.keys()), | ||
"DocumentSearchPipeline": lambda x: {"Retriever"} <= set(x.keys()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Pipeline types "SearchSummarizationPipeline"
and "RetrieverQuestionGenerationPipeline"
are not reachable, given that here we define that each Pipeline that contains a "Retriever"
is a "DocumentSearchPipeline"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, great catch @bogdankostic
haystack/pipelines/base.py
Outdated
"SearchSummarizationPipeline": lambda x: {"Retriever", "Summarizer"} <= set(x.keys()), | ||
"TranslationWrapperPipeline": lambda x: {"InputTranslator", "OutputTranslator"} <= set(x.keys()), | ||
"QuestionGenerationPipeline": lambda x: {"QuestionGenerator"} <= set(x.keys()), | ||
"RetrieverQuestionGenerationPipeline": lambda x: {"Retriever", "Question Generator"} <= set(x.keys()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Question Generator"
should probably be "QuestionGenerator"
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, another one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and also moved to the back of the list
@@ -654,7 +654,7 @@ class RetrieverQuestionGenerationPipeline(BaseStandardPipeline): | |||
def __init__(self, retriever: BaseRetriever, question_generator: QuestionGenerator): | |||
self.pipeline = Pipeline() | |||
self.pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) | |||
self.pipeline.add_node(component=question_generator, name="Question Generator", inputs=["Retriever"]) | |||
self.pipeline.add_node(component=question_generator, name="QuestionGenerator", inputs=["Retriever"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogdankostic note this. Suprised this didn't cause any issues
) | ||
pipe.get_type().startswith("TranslationWrapperPipeline") | ||
|
||
# pipe = MostSimilarDocumentsPipeline(document_store=MockDocumentStore()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogdankostic we have an issue with MostSimilarDocumentsPipeline
@bogdankostic added a unit test, and I found one subtle bug - MostSimilarDocumentsPipeline doesn't initialize the pipeline attribute. All pipeline methods that rely on this property will fail (draw, get_node, save_to_yaml, and, of course, get_type). We have to fix that issue! Today, wdyt? |
@bogdankostic I found instances of "Question Generator" used in tests. |
@masci would love to consult with you regarding this PR. By adding unit tests, I discovered some subtle bugs in MostSimilarDocumentsPipeline. See the discussion above. |
I see the problem. How would you propose fixing this? |
We'll fix it soon. Here is the issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge, we will follow up on the bug you found in a separate PR
Bogdan is unavailable today and we want to try shipping this with 1.9
* Add pipeline get_type mehod * Add pipeline uptime * Add pipeline telemetry event sending * Send pipeline telemetry once a day (at most) * Add pipeline invocation counter, change invocation counter logic * Update allowed telemetry parameters - allow pipeline parameters * PR review: add unit test
* Add pipeline get_type mehod * Add pipeline uptime * Add pipeline telemetry event sending * Send pipeline telemetry once a day (at most) * Add pipeline invocation counter, change invocation counter logic * Update allowed telemetry parameters - allow pipeline parameters * PR review: add unit test
Related Issues
The encompassing issue is https://github.com/deepset-ai/haystack-private/issues/9
Proposed Changes:
Add a method that classifies the type of the pipeline based on components found
How did you test it?
Repeated invocations from colab notebook on a branch where the update trigger is 2 min instead of 24 hours
Added a unit test for pipeline classification
Notes for the reviewer
Focus on
pipeline_invocation_counter
decorator in haystack/utils/reflection.py, how the even is triggered (time) and what details are sent to telemetry serverChecklist