bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node #3170

JeffRisberg · 2022-09-06T03:14:31Z

Related Issues

fixes Join Nodes produce incorrect results if preceded by another JoinNode #3169

Proposed Changes:

There is subtle bug in the pipeline execution code when a pipeline includes a joinNode followed by another joinNode.
We have a pipeline has four retrievers. They are joined by two pair of JoinDocuments nodes, followed by another JoinDocuments node that uses the results of the prior joins.

However, not all results from the retrievers are processed and returned by the final JoinDocuments node. Documents are lost

The pipeline is built correctly, because all nodes are connected correctly in the DiGraph of class Pipeline.
However, the code at line 526 of pipelines/base.py, builds up a list of inputs. It assumes that the parameters dict does not have a key called "inputs" for the new node.

However, when a joinNode is called, it does have parameter key called "inputs".
This value is returned from execution of the node.
Hence for the second node in the chain, it will receive inputs which include the inputs from the prior node.
Hence the number of inputs is not equal to the number of weights in the join, and the documents are not joined together correctly.

How did you test it?

There is a test located at https://github.com/JeffRisberg/HaystackPipelineTest

Notes for the reviewer

I determined this by putting a breakpoint into the run() method of the JoinNode class, and checking that the inputs are correct.

The solution is at line 258 in nodes/base.py
# add "extra" args that were not used by the node and are not inputs
for k, v in arguments.items():
if k not in output.keys() and k != "inputs":
output[k] = v

Checklist

[yes] I have read the contributors guidelines and the code of conduct
[yes] I have updated the related issue with new insights and changes
[yes] I added tests that demonstrate the correct behavior of the change
[yes] I've used the conventional commit convention for my PR title
[yes] I documented my code
[yes] I ran pre-commit hooks and fixed any issue

…ode.

masci · 2022-09-10T17:08:51Z

@JeffRisberg apologies for the latency here, this is on my radar I'll come back to you in a couple of days.

ZanSara · 2022-09-13T10:45:44Z

Hello @JeffRisberg, thank you for this tricky fix. Much appreciated!

I tested out your change and it seems safe to me. However, could you add some tests? They should be added to test/pipelines/test_pipeline.py. Once the tests are added and passing, I'll approve. Thanks again 😊

JeffRisberg · 2022-09-26T16:57:16Z

@ZanSara I have added test case as requested, and all PR checks ran successful in the last 24-48 hours. Is there anything else you need from me?

ZanSara · 2022-09-27T07:30:29Z

Hey @JeffRisberg ! Thanks for the ping, I lost sight of this PR. I'll review it shortly and be back with some feedback.

JeffRisberg · 2022-09-30T04:25:51Z

@ZanSara I have added test case as requested, and all PR checks ran successful in the last 24-48 hours. Is there anything else you need from me?

ZanSara

Looks good! Thank you and sorry again for the delay

don't send the list of inputs back as an output in the running of a n…

2159414

…ode.

JeffRisberg requested a review from a team as a code owner September 6, 2022 03:14

JeffRisberg requested review from masci and removed request for a team September 6, 2022 03:14

ZanSara requested review from ZanSara and removed request for masci September 13, 2022 15:01

ZanSara added type:bug Something isn't working topic:pipeline journey:advanced labels Sep 13, 2022

ZanSara self-assigned this Sep 14, 2022

JeffRisberg and others added 4 commits September 14, 2022 16:03

updated documentation

fd46229

Update pydoc-markdown.py

b370cc2

added test case for pipeline join fix

0ffd7a1

Merge branch 'main' into fix_input_processing_for_join_nodes

0c6be27

ZanSara approved these changes Sep 30, 2022

View reviewed changes

ZanSara merged commit ad8fbe5 into deepset-ai:main Sep 30, 2022

JeffRisberg deleted the fix_input_processing_for_join_nodes branch September 30, 2022 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node #3170

bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node #3170

JeffRisberg commented Sep 6, 2022 •

edited

Loading

masci commented Sep 10, 2022

ZanSara commented Sep 13, 2022

JeffRisberg commented Sep 26, 2022

ZanSara commented Sep 27, 2022

JeffRisberg commented Sep 30, 2022

ZanSara left a comment

bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node #3170

bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node #3170

Conversation

JeffRisberg commented Sep 6, 2022 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

masci commented Sep 10, 2022

ZanSara commented Sep 13, 2022

JeffRisberg commented Sep 26, 2022

ZanSara commented Sep 27, 2022

JeffRisberg commented Sep 30, 2022

ZanSara left a comment

Choose a reason for hiding this comment

JeffRisberg commented Sep 6, 2022 •

edited

Loading