Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed join_docs.py concatenate #5970

Merged
merged 47 commits into from
Oct 16, 2023
Merged

fixed join_docs.py concatenate #5970

merged 47 commits into from
Oct 16, 2023

Conversation

nickprock
Copy link
Contributor

Related Issues

Proposed Changes:

I merged the two lists keeping only the document with the highest score

How did you test it?

Manual verification.

Checklist

@nickprock nickprock requested a review from a team as a code owner October 4, 2023 18:20
@nickprock nickprock requested review from ZanSara and removed request for a team October 4, 2023 18:20
@github-actions github-actions bot added topic:pipeline type:documentation Improvements on the docs labels Oct 4, 2023
@ZanSara
Copy link
Contributor

ZanSara commented Oct 9, 2023

Hey @nickprock! When fixing issues and modifying the behavior of some node, we normally ask contributors to add automated tests to verify that their changes behave as expected.

In this case, the tests for JoinDocuments live in tests/nodes/test_join_documents.py.

You just need to add at the end of the file a piece of code similar to this:

@pytest.mark.unit
@pytest.mark.parametrize("join_mode", ["concatenate", "merge", "reciprocal_rank_fusion"])
def test_joindocuments_keep_only_highest_ranking_duplicate(join_mode):
    inputs = [
        {"documents": [Document(content="text document 1", content_type="text", score=0.2)]},
        {"documents": [Document(content="text document 2", content_type="text", score=0.3)]},
        {"documents": [Document(content="text document 2", content_type="text", score=0.7)]},
    ]
    expected_outputs = [
        {"documents": [Document(content="text document 1", content_type="text", score=0.2)]},
        {"documents": [Document(content="text document 2", content_type="text", score=0.7)]},
    ]

    join_docs = JoinDocuments(join_mode=join_mode)
    result, _ = join_docs.run(inputs)
    assert len(result["documents"]) == 2
    assert result["documents"] == expected_outputs

(note: I haven't tested this code, you might have to adjust it a bit).

Once the test is added this PR is ready for another review 🙂

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about the release notes

@ZanSara
Copy link
Contributor

ZanSara commented Oct 9, 2023

Also, don't forget to update your branch every now and then, to make sure your PR is not too out of date with main.

@nickprock nickprock requested a review from ZanSara October 10, 2023 06:41
@nickprock
Copy link
Contributor Author

Hi @ZanSara I hope the PR is ok now.

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last detail and we're ready to merge 😊

test/nodes/test_join_documents.py Outdated Show resolved Hide resolved
@nickprock
Copy link
Contributor Author

Hi @ZanSara,
I made all the required changes and all the tests were ok after this commit.
After branch alignment with main keeps giving me this error:

ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

@anakin87
Copy link
Member

Hey, @nickprock! 👋

We have problems with main, which we are trying to solve in #6029.

When these problems are solved, we can take care of pushing this PR to the finish line...

@nickprock nickprock requested a review from ZanSara October 14, 2023 07:34
Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your contribution!

@ZanSara ZanSara merged commit 32e87d3 into deepset-ai:main Oct 16, 2023
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JoinDocuments should use highest score when multiple retrievers recall the same document
4 participants