Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch top k parameter fix #5216

Conversation

dev2049
Copy link
Contributor

@dev2049 dev2049 commented May 24, 2023

For most queries it's the size parameter that determines final number of documents to return. Since our abstractions refer to this as k, set this to be k everywhere instead of expecting a separate param. Would be great to have someone more familiar with OpenSearch validate that this is reasonable (e.g. that having size and what OpenSearch calls k be the same won't lead to any strange behavior). cc @naveentatikonda

Closes #5212

@dev2049 dev2049 requested a review from hwchase17 May 24, 2023 22:36
@naveentatikonda
Copy link
Contributor

@dev2049 size and k are different parameters. k is the number of neighbors the search of each graph will return. You must also include the size option, which indicates how many results the query actually returns.

@dev2049
Copy link
Contributor Author

dev2049 commented May 24, 2023

@dev2049 size and k are different parameters. k is the number of neighbors the search of each graph will return. You must also include the size option, which indicates how many results the query actually returns.

right, but in LangChain k refers to total num docs to return. happy to add an optional knn_top_k parameter as well and pass that in

@naveentatikonda
Copy link
Contributor

naveentatikonda commented May 24, 2023

@dev2049 size and k are different parameters. k is the number of neighbors the search of each graph will return. You must also include the size option, which indicates how many results the query actually returns.

right, but in LangChain k refers to total num docs to return. happy to add an optional knn_top_k parameter as well and pass that in

People might get confused if we introduce new parameter. I guess that should be fine let's go with what you have already implemented in this PR.

@dev2049 dev2049 merged commit 3be9ba1 into master May 25, 2023
@dev2049 dev2049 deleted the 5212-opensearch-vectorstore-cannot-return-more-than-4-retrieved-result branch May 25, 2023 16:51
@danielchalef danielchalef mentioned this pull request Jun 5, 2023
Undertone0809 pushed a commit to Undertone0809/langchain that referenced this pull request Jun 19, 2023
For most queries it's the `size` parameter that determines final number
of documents to return. Since our abstractions refer to this as `k`, set
this to be `k` everywhere instead of expecting a separate param. Would
be great to have someone more familiar with OpenSearch validate that
this is reasonable (e.g. that having `size` and what OpenSearch calls
`k` be the same won't lead to any strange behavior). cc @naveentatikonda

Closes langchain-ai#5212
This was referenced Jun 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenSearch VectorStore cannot return more than 4 retrieved result.
2 participants