Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: JoinDocuments nodes produce incorrect results if preceded by another JoinDocuments node #3170

Merged

Conversation

JeffRisberg
Copy link
Contributor

@JeffRisberg JeffRisberg commented Sep 6, 2022

Related Issues

Proposed Changes:

There is subtle bug in the pipeline execution code when a pipeline includes a joinNode followed by another joinNode.
We have a pipeline has four retrievers. They are joined by two pair of JoinDocuments nodes, followed by another JoinDocuments node that uses the results of the prior joins.

However, not all results from the retrievers are processed and returned by the final JoinDocuments node. Documents are lost

The pipeline is built correctly, because all nodes are connected correctly in the DiGraph of class Pipeline.
However, the code at line 526 of pipelines/base.py, builds up a list of inputs. It assumes that the parameters dict does not have a key called "inputs" for the new node.

However, when a joinNode is called, it does have parameter key called "inputs".
This value is returned from execution of the node.
Hence for the second node in the chain, it will receive inputs which include the inputs from the prior node.
Hence the number of inputs is not equal to the number of weights in the join, and the documents are not joined together correctly.

How did you test it?

There is a test located at https://github.com/JeffRisberg/HaystackPipelineTest

Notes for the reviewer

I determined this by putting a breakpoint into the run() method of the JoinNode class, and checking that the inputs are correct.

The solution is at line 258 in nodes/base.py
# add "extra" args that were not used by the node and are not inputs
for k, v in arguments.items():
if k not in output.keys() and k != "inputs":
output[k] = v

Checklist

@JeffRisberg JeffRisberg requested a review from a team as a code owner September 6, 2022 03:14
@JeffRisberg JeffRisberg requested review from masci and removed request for a team September 6, 2022 03:14
@masci
Copy link
Contributor

masci commented Sep 10, 2022

@JeffRisberg apologies for the latency here, this is on my radar I'll come back to you in a couple of days.

@ZanSara
Copy link
Contributor

ZanSara commented Sep 13, 2022

Hello @JeffRisberg, thank you for this tricky fix. Much appreciated!

I tested out your change and it seems safe to me. However, could you add some tests? They should be added to test/pipelines/test_pipeline.py. Once the tests are added and passing, I'll approve. Thanks again 😊

@ZanSara ZanSara requested review from ZanSara and removed request for masci September 13, 2022 15:01
@ZanSara ZanSara self-assigned this Sep 14, 2022
@JeffRisberg
Copy link
Contributor Author

@ZanSara I have added test case as requested, and all PR checks ran successful in the last 24-48 hours. Is there anything else you need from me?

@ZanSara
Copy link
Contributor

ZanSara commented Sep 27, 2022

Hey @JeffRisberg ! Thanks for the ping, I lost sight of this PR. I'll review it shortly and be back with some feedback.

@JeffRisberg
Copy link
Contributor Author

@ZanSara I have added test case as requested, and all PR checks ran successful in the last 24-48 hours. Is there anything else you need from me?

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you and sorry again for the delay

@ZanSara ZanSara merged commit ad8fbe5 into deepset-ai:main Sep 30, 2022
@JeffRisberg JeffRisberg deleted the fix_input_processing_for_join_nodes branch September 30, 2022 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:pipeline type:bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Join Nodes produce incorrect results if preceded by another JoinNode
3 participants