-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test: Show current pipeline run issues (DO NOT MERGE) #8695
base: main
Are you sure you want to change the base?
Conversation
return {"noop": None} | ||
|
||
pipeline = Pipeline(max_runs_per_component=1) | ||
pipeline.add_component("third_creator", ConditionalDocumentCreator(content="Third document")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a minor variation on the test above. The only change is swapping the insertion order of document creators. As a consequence, the order of documents in the DocumentJoiner
's output will be swapped.
This behavior is consistent but AFAIK it is not documented anywhere and it is very unexpected for a user to have the output of their pipeline change, just because they changed the insertion order of their components.
"joiner", | ||
"agent_llm", | ||
"router", | ||
"answer_builder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test, rag_prompt
and rag_llm
will run again although they did not receive any input from the cycle during this iteration. In turn, a PipelineMaxComponentRuns
exception is raised because they are attempting to run a third time. The expected behavior with a proper run logic would be 2 iterations.
}, | ||
expected_run_order=[ | ||
'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'concatenator', | ||
'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'answer_builder'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is very similar to the one above but without user input to the second PromptBuilder
. The test will fail because feedback_prompt
and feedback_llm
are executed a third time. feedback_prompt
does not receive any inputs but still executes.
inputs={"code_prompt": {"task": task}, "answer_builder": {"query": task}}, | ||
expected_outputs={ | ||
"answer_builder": { | ||
"answers": [GeneratedAnswer(data="valid code", query=task, documents=[])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is very similar to the one above but we changed the order of connections.
+ pp.connect("concatenator.output", "code_prompt.feedback")
pp.connect("code_prompt.prompt", "code_llm.prompt")
pp.connect("code_llm.replies", "feedback_prompt.code")
pp.connect("feedback_llm.replies", "router.replies")
pp.connect("router.fail", "concatenator.feedback")
pp.connect("feedback_prompt.prompt", "feedback_llm.prompt")
pp.connect("router.pass", "answer_builder.replies")
pp.connect("code_llm.replies", "router.code")
pp.connect("code_llm.replies", "concatenator.current_prompt")
- pp.connect("concatenator.output", "code_prompt.feedback")
As a result, at least when I run the tests locally, the failure reason becomes indeterministic. For some test runs the expected run order differs from the actual run order and for some test runs, the output is an Answer with "invalid code" instead of "valid code".
Looking into both our code and networkx, I don't understand how these different outcomes can occur.
) | ||
|
||
@given("a pipeline that has an agent with a feedback cycle", target_fixture="pipeline_data") | ||
def agent_with_feedback_cycle(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test intends to replicate: #8657
However, the runtime behavior of the test is not deterministic. It sometimes fails with the expected error:
AttributeError("'NoneType' object has no attribute 'keys'")
but sometimes it raises ValueError('BranchJoiner expects only one input, but 0 were received.')
instead.
Confirmed point 3 in the CI. Both tests yield different results on different runs. See here for an example: |
Pull Request Test Coverage Report for Build 12693854002Details
💛 - Coveralls |
Related Issues
Proposed Changes:
This PR's purpose is to highlight and discuss current issues with the
Pipeline.run
logic.It adds behavioral tests that display different failures for the method (some of these have the same underlying reason).
This is a first step towards resolving these issues and getting to a robust execution logic for our pipelines.
Findings
(3. at least locally, some tests fail for different reasons with no changes to the code)
How did you test it?
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
and added!
in case the PR includes breaking changes.