-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Adding default pre/post process function for neural search text embedding model #1304
Comments
Hi all, Please create a doc issue or PR ASAP if this has doc implications for 2.11. Thanks. |
Hi @hdhalter , I've created this doc issue: opensearch-project/documentation-website#5081. |
@zane-neo - Can you please confirm that this feature has been moved to 2.12? Thanks. |
@hdhalter From my knowledge, this is still a 2.11 feature. |
This feature released in 2.11 |
Can you please update the release train? It is showing up in the 2.12 roadmap. Thanks! |
Is your feature request related to a problem?
ml-commons has two default pre/post process function which are for OpenAI and Cohere and written in painless script. There's no default pre/post process function for neural search plugin text embedding case, if user want to use neural search with remote model to embed texts, when creating the connector, user has to write complex pre-process(required) painless script like this:
This painless script is to build a
parameter map
that will be used to substitute placeholders in the connector request body. Post process function is not required but to adapt to the code which extracts the tensors in neural search plugin, user has to write post-process like this:As we can see from the above example, it's not an easy task to write either the pre process or post process function.
What solution would you like?
If user follows the suggested format in model serving side including the model input and output data structure, it's possible to provide a default pre/post process function for user.
Suggested format
Suggested model input format should be list of string, an example would be:
Suggested model output format should be a two dimension array, each inner element represents the embedding result of a input text, E.g.:
Suggested request body template is:
"request_body": "${parameters.input}"
.Default process functions
With these premise, user can use default process functions:
connector.pre_process.neural_search.text_embedding
andconnector.post_process.neural_search.text_embedding
instead of painless script. The default pre-process function will parse the neural search input text docs to model input and the default post-process function will parse the model response to ModelTensorOutput.What alternatives have you considered?
NA
Do you have any additional context?
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: