-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: simplify Summarizer, add Document Merger #3452
Conversation
As expected, this refactoring was breaking a few things, which I tried to fix. 😄 @ZanSara @brandenchan @bogdankostic please feel free to jump in and help to go in the right direction... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great so far! I don't think you need more tests for this node right now, given we're only removing functionality. I found a coupe of leftover prints and one question only.
In addition, let's add the DocumentMerger
in this same PR. This way we can re-introduce some of the removed tests with the new setup (docs merger + summarizer) and make sure no functionality is lost.
@ZanSara thanks! When I have some time, I'll make the changes you suggested... Just some questions: |
So, the We might want to deal with the possibility of receiving input from multiple nodes. That's a messy topic though, so if you see that it becomes intractable and makes the node complex, ignore that. If anyone needs to handle multiple inputs, they can use
Now that's a great question 😄 My intuition says that we should apply the following heuristic:
Mind that keys might contain nested values. I'd apply such heuristic recursively on dictionary entries. Let's also pay attention to keep "important" keys like |
By introducing Document Merger, I think we would offer the same feature in a different and more structured way. @vblagoje @ZanSara Before investing time in this PR, I wait for your opinions! |
Hey @anakin87! I'm going to review this PR now. There has been two new test suites introduced a moment ago, they just need another rebase and they'll pass 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress! That's precisely how I imagined it.
A couple of things to note;
- I think I found a heavy simplification of the meta fields merging algorithm and I've suggested it, Test it out if it works as expected or if I forgot something!
- We need to make sure the documentation for
DocumentMerger
is generated. To do so, let's adddocument_merger
to this list:modules: ['docs2answers', 'join_docs', 'join_answers', 'route_documents']
@ZanSara thanks for the great review!!! |
Any ideas? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @anakin87 sorry for this CI issue. We're working on it!
Hi @ZanSara, I see that thanks to your contributions the tests are passing now! Can you explain to me very quickly what you did? |
Sure! This was an OOM (out-of-memory) error on the Windows GH runner. It's not the first time we see them... What it means is that the machine simply doesn't have enough RAM left for all tests. What I've done was to reduce the memory footprint of the tests in two ways:
Unfortunately the test suites are very heavy and Windows runners tend to collapse rather fast. If that happens again, reducing the model size will probably help, but don't be afraid to ask for help if that's not enough. That's our fault after all! 😄 |
Related Issues
Proposed Changes:
As discussed in #3403
generate_single_summary
parameter: currently it transforms several documents into one(it is better to design a dedicated node for the purpose of merging documents)
How did you test it?
Notes for the reviewer
Just a first draft to understand what this change breaks
Checklist