New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Test: Show current pipeline run issues (DO NOT MERGE) #8695

Open

mathislucka wants to merge 5 commits into main from fix/pipeline_run_issues

Member

mathislucka commented Jan 9, 2025

Related Issues

Proposed Changes:

This PR's purpose is to highlight and discuss current issues with the Pipeline.run logic.
It adds behavioral tests that display different failures for the method (some of these have the same underlying reason).
This is a first step towards resolving these issues and getting to a robust execution logic for our pipelines.

Findings

Insertion order of components and connections influences component execution order, number of times a component is executed and pipeline outputs
Components with only optional inputs run in cases where they did not receive any input from connected components and already ran

(3. at least locally, some tests fail for different reasons with no changes to the code)

How did you test it?

behavioral tests

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I documented my code
I ran pre-commit hooks and fixed any issue


          chore: add pipeline run tests showing bugs

26394f1

github-actions bot added the topic:tests label

mathislucka commented

View reviewed changes

test/core/pipeline/features/test_run.py

+                          return {"noop": None}
+                  pipeline = Pipeline(max_runs_per_component=1)
+                  pipeline.add_component("third_creator", ConditionalDocumentCreator(content="Third document"))

Member Author

mathislucka Jan 9, 2025

This test is a minor variation on the test above. The only change is swapping the insertion order of document creators. As a consequence, the order of documents in the DocumentJoiner's output will be swapped.

This behavior is consistent but AFAIK it is not documented anywhere and it is very unexpected for a user to have the output of their pipeline change, just because they changed the insertion order of their components.

mathislucka commented

View reviewed changes

test/core/pipeline/features/test_run.py Outdated

+                                  "joiner",
+                                  "agent_llm",
+                                  "router",
+                                  "answer_builder"

Member Author

mathislucka Jan 9, 2025

In this test, rag_prompt and rag_llm will run again although they did not receive any input from the cycle during this iteration. In turn, a PipelineMaxComponentRuns exception is raised because they are attempting to run a third time. The expected behavior with a proper run logic would be 2 iterations.

mathislucka commented

View reviewed changes

test/core/pipeline/features/test_run.py Outdated

+                              },
+                              expected_run_order=[
+                                  'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'concatenator',
+                                  'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'answer_builder'],

Member Author

mathislucka Jan 9, 2025

This test is very similar to the one above but without user input to the second PromptBuilder. The test will fail because feedback_prompt and feedback_llm are executed a third time. feedback_prompt does not receive any inputs but still executes.

mathislucka commented

View reviewed changes

test/core/pipeline/features/test_run.py Outdated

+                              inputs={"code_prompt": {"task": task}, "answer_builder": {"query": task}},
+                              expected_outputs={
+                                  "answer_builder": {
+                                      "answers": [GeneratedAnswer(data="valid code", query=task, documents=[])]

Member Author

mathislucka Jan 9, 2025

This test is very similar to the one above but we changed the order of connections.

    + pp.connect("concatenator.output", "code_prompt.feedback")
    pp.connect("code_prompt.prompt", "code_llm.prompt")
    pp.connect("code_llm.replies", "feedback_prompt.code")
    pp.connect("feedback_llm.replies", "router.replies")
    pp.connect("router.fail", "concatenator.feedback")
    pp.connect("feedback_prompt.prompt", "feedback_llm.prompt")
    pp.connect("router.pass", "answer_builder.replies")
    pp.connect("code_llm.replies", "router.code")
    pp.connect("code_llm.replies", "concatenator.current_prompt")
    - pp.connect("concatenator.output", "code_prompt.feedback")

As a result, at least when I run the tests locally, the failure reason becomes indeterministic. For some test runs the expected run order differs from the actual run order and for some test runs, the output is an Answer with "invalid code" instead of "valid code".

Looking into both our code and networkx, I don't understand how these different outcomes can occur.

mathislucka commented

View reviewed changes

test/core/pipeline/features/test_run.py

+                  )
+              @given("a pipeline that has an agent with a feedback cycle", target_fixture="pipeline_data")
+              def agent_with_feedback_cycle():

Member Author

mathislucka Jan 9, 2025

This test intends to replicate: #8657

However, the runtime behavior of the test is not deterministic. It sometimes fails with the expected error:
AttributeError("'NoneType' object has no attribute 'keys'") but sometimes it raises ValueError('BranchJoiner expects only one input, but 0 were received.') instead.

mathislucka marked this pull request as ready for review

January 9, 2025 14:49

mathislucka requested a review from a team as a code owner

January 9, 2025 14:49

mathislucka requested review from vblagoje and removed request for a team

January 9, 2025 14:49

julian-risch requested review from Amnah199 and julian-risch and removed request for vblagoje

January 9, 2025 14:49

mathislucka added 2 commits

January 9, 2025 15:56


          chore: format

ab03473


          chore: format

438ffaa

github-actions bot added the topic:core label

mathislucka added 2 commits

January 9, 2025 17:04


          chore: trigger CI

7461a39


          chore: format

2f525a8

Member Author

mathislucka commented Jan 9, 2025

Confirmed point 3 in the CI. Both tests yield different results on different runs.

See here for an example:
https://github.com/deepset-ai/haystack/actions/runs/12692547619/job/35379360449#step:8:605
https://github.com/deepset-ai/haystack/actions/runs/12692547619/job/35378636775#step:8:590

Collaborator

coveralls commented Jan 9, 2025 •

edited

Loading

Pull Request Test Coverage Report for Build 12693854002

Details

0 of 0 changed or added relevant lines in 0 files are covered.
9 unchanged lines in 3 files lost coverage.
Overall coverage remained the same at 91.094%

Files with Coverage Reduction	New Missed Lines	%
components/converters/openapi_functions.py	1	78.63%
marshal/yaml.py	1	95.24%
document_stores/in_memory/document_store.py	7	96.03%

Totals
Change from base Build 12689025274:	0.0%
Covered Lines:	8653
Relevant Lines:	9499

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:core topic:tests