Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langchain: Use map() and avoid "ValueError: max() arg is an empty sequence" in MergerRetriever #18679

Merged
merged 5 commits into from
Mar 28, 2024

Conversation

n1lanjan
Copy link
Contributor

@n1lanjan n1lanjan commented Mar 6, 2024

  • Issue: When passing an empty list to MergerRetriever it fails with error: ValueError: max() arg is an empty sequence

  • Description: We have a use case where we dynamically select retrievers and use MergerRetriever for merging the output of the retrievers. We faced this issue when the retriever_docs list is empty. Adding a default 0 for cases when retriever_docs is an empty list to avoid "ValueError: max() arg is an empty sequence". Also, changed to use map() which is more than twice as fast compared to the current implementation.

import timeit
# Sample retriever_docs with varying lengths of sublists
retriever_docs = [[i for i in range(j)] for j in range(1, 1000)]
# First code snippet
code1 = '''
max_docs = max(len(docs) for docs in retriever_docs)
'''
# Second code snippet
code2 = '''
max_docs = max(map(len, retriever_docs), default=0)
'''
# Benchmarking
time1 = timeit.timeit(stmt=code1, globals=globals(), number=10000)
time2 = timeit.timeit(stmt=code2, globals=globals(), number=10000)
# Output
print(f"Execution time for code snippet 1: {time1} seconds")
print(f"Execution time for code snippet 2: {time2} seconds")
  • Dependencies: none

Adding a default 0 for cases when retriever_docs is empty list to avoid "ValueError: max() arg is an empty sequence"
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Mar 6, 2024
Copy link

vercel bot commented Mar 6, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 6, 2024 7:03pm

@dosubot dosubot bot added Ɑ: retriever Related to retriever module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Mar 6, 2024
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Mar 28, 2024
@baskaryan baskaryan merged commit 239dd7c into langchain-ai:master Mar 28, 2024
41 checks passed
gkorland pushed a commit to FalkorDB/langchain that referenced this pull request Mar 30, 2024
…pty sequence" in MergerRetriever (langchain-ai#18679)

- **Issue:** When passing an empty list to MergerRetriever it fails with
error: ValueError: max() arg is an empty sequence

- **Description:** We have a use case where we dynamically select
retrievers and use MergerRetriever for merging the output of the
retrievers. We faced this issue when the retriever_docs list is empty.
Adding a default 0 for cases when retriever_docs is an empty list to
avoid "ValueError: max() arg is an empty sequence". Also, changed to use
map() which is more than twice as fast compared to the current
implementation.
```
import timeit
# Sample retriever_docs with varying lengths of sublists
retriever_docs = [[i for i in range(j)] for j in range(1, 1000)]
# First code snippet
code1 = '''
max_docs = max(len(docs) for docs in retriever_docs)
'''
# Second code snippet
code2 = '''
max_docs = max(map(len, retriever_docs), default=0)
'''
# Benchmarking
time1 = timeit.timeit(stmt=code1, globals=globals(), number=10000)
time2 = timeit.timeit(stmt=code2, globals=globals(), number=10000)
# Output
print(f"Execution time for code snippet 1: {time1} seconds")
print(f"Execution time for code snippet 2: {time2} seconds")
```

- **Dependencies:** none
hinthornw pushed a commit that referenced this pull request Apr 26, 2024
…pty sequence" in MergerRetriever (#18679)

- **Issue:** When passing an empty list to MergerRetriever it fails with
error: ValueError: max() arg is an empty sequence

- **Description:** We have a use case where we dynamically select
retrievers and use MergerRetriever for merging the output of the
retrievers. We faced this issue when the retriever_docs list is empty.
Adding a default 0 for cases when retriever_docs is an empty list to
avoid "ValueError: max() arg is an empty sequence". Also, changed to use
map() which is more than twice as fast compared to the current
implementation.
```
import timeit
# Sample retriever_docs with varying lengths of sublists
retriever_docs = [[i for i in range(j)] for j in range(1, 1000)]
# First code snippet
code1 = '''
max_docs = max(len(docs) for docs in retriever_docs)
'''
# Second code snippet
code2 = '''
max_docs = max(map(len, retriever_docs), default=0)
'''
# Benchmarking
time1 = timeit.timeit(stmt=code1, globals=globals(), number=10000)
time2 = timeit.timeit(stmt=code2, globals=globals(), number=10000)
# Output
print(f"Execution time for code snippet 1: {time1} seconds")
print(f"Execution time for code snippet 2: {time2} seconds")
```

- **Dependencies:** none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature lgtm PR looks good. Use to confirm that a PR is ready for merging. Ɑ: retriever Related to retriever module size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants