Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ Bug ] Neural Search Leads to "modelId is marked non-null but is null" when Targeting Multiple Indices #759

Closed
imbarazz opened this issue May 24, 2024 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@imbarazz
Copy link

Note: this error was observed on OpenSearch 2.11 running on AWS cloud.

Performing a neural search against an alias, or performing a multi-search with multiple indices in a single header leads to the following error:

"null_pointer_exception: modelId is marked non-null but is null".

This is problematic when searching across different indices, each with their own embedding model.

Reproduction

Search Pipeline for Embedder Model 1

PUT /_search/pipeline/embed_pipeline_1

{
  "request_processors": [
    {
      "neural_query_enricher": {
        "neural_field_default_id": {
          "common_vector_field": "embed_model_id_1"
        }
      }
    }
  ]
}

Search Pipeline for Embedder Model 2

PUT /_search/pipeline/embed_pipeline_2

{
  "request_processors": [
    {
      "neural_query_enricher": {
        "neural_field_default_id": {
          "common_vector_field": "embed_model_id_2"
        }
      }
    }
  ]
}

Update Index 1 with Pipeline 1

PUT /index1/_settings

{
  "index.search.default_pipeline" : "embed_pipeline_1"
}

Update Index 2 with Pipeline 2

PUT /index2/_settings

{
  "index.search.default_pipeline" : "embed_pipeline_2"
}

Perform Multi-Search

GET /_msearch

{
  "index": [
    "index1",
    "index2"
  ]
}
{
  "query": {
    "neural": {
      "common_vector_field": {
        "query_text": "How do I perform a neural multi-search when dealing with multiple indices?",
        "k": 5
      }
    }
  },
  "from": 0,
  "size": 5
}
@imbarazz imbarazz added bug Something isn't working untriaged labels May 24, 2024
@dblock
Copy link
Member

dblock commented Jun 24, 2024

Catch All Triage - 1 2 3 4 5 6

@dblock dblock removed the untriaged label Jun 24, 2024
@imbarazz
Copy link
Author

imbarazz commented Jul 12, 2024

Playing around with this further, I've come to the realization that the issue here is that the index.search.default_pipeline assigned to an index does not take effect when targeting indices via alias. To get this to work, one must explicitly pass the search-pipeline via url parameter like so:

/my_alias/_search?search_pipeline=my_search_pipeline

@vibrantvarun
Copy link
Member

@martin-gaievski I don't think it as a bug. thoughts?

@martin-gaievski
Copy link
Member

martin-gaievski commented Jul 15, 2024

if I understood this correctly the index.search.default_pipeline does support only single index search, correct? If that's the case then enricher processor will work the same way, this is expected. I have doubt about the error message, we can give some meaningful error, something that is actionable from user perspective.
Also we need to state this clearly in the documentation for enricher processor https://opensearch.org/docs/latest/search-plugins/search-pipelines/neural-query-enricher/

@imbarazz
Copy link
Author

if I understood this correctly the index.search.default_pipeline does support only single index search, correct? If that's the case then enricher processor will work the same way, this is expected.

Thanks for the reply.

Currently, an index's configured neural_enricher is being completely ignored during a neural search against an alias. Is this expected behaviour? This seems strange to me.

@martin-gaievski
Copy link
Member

martin-gaievski commented Jul 17, 2024

It's not an expected behavior, more like a gap, team didn't check this scenario.

Do these steps summarize the issue correctly @imbarazz :

  • create index index_A with default model assigned via index.search.default_pipeline index setting, e.g. model_id_1
  • create alias alias_A with some filter and refer to index_A
  • run hybrid query, set alias_A as an index for the query. use neural query, do not put any model id

expected result is: hybrid query executed, for neural search sub-query model model_id_1 got picked up from the index setting for index_A

@Sylver11
Copy link

is this being picked up? To me this is a fatal one and the opposite of expected behaviour. Took me an hour to figure out what the problem was..

@navneet1v
Copy link
Collaborator

@Romasato
Copy link

Ran into this very same issue just today - was hoping to get away from needing to pass in ML Model ID in Search query...

Then, as a workaround, I was planning to use the search_pipeline= query string param with search request, but I could not find a way to pass this extra parameter via .NET higher nor lower level client lib..

Any suggestions?

@aalbahem
Copy link

I encountered the same bug today. It was unexpected, and it would be beneficial to resolve it or prioritize it. Additionally, highlighting this issue in the documentation for now would be helpful.

@naveentatikonda naveentatikonda changed the title Neural Search Leads to "modelId is marked non-null but is null" when Targeting Multiple Indices [ Bug ] Neural Search Leads to "modelId is marked non-null but is null" when Targeting Multiple Indices Sep 18, 2024
@naveentatikonda naveentatikonda moved this from Backlog to Backlog (Hot) in Vector Search RoadMap Sep 18, 2024
@vibrantvarun
Copy link
Member

Search pipelines are not supported with msearch until now. PR of the support has already been raised and merged by @owaiskazi19 in OpenSearch. In the next OpenSearch release, the support for search pipelines will be enabled with msearch.

@vibrantvarun
Copy link
Member

vibrantvarun commented Oct 28, 2024

@imbarazz From OpenSearch 2.18, You can explicitly pass search pipeline in _msearch api.

{"index":["my-nlp-index","my-nlp-index1"]}
{"query" : {"hybrid" : {"queries":[{"neural":"common_vector_field": {"query_text": "How do I perform a neural multi-search when dealing with multiple indices?","k": 5}}]}}, "from" : 0, "size" : 10,"search_pipeline":"nlp-search-pipeline"}

@vibrantvarun
Copy link
Member

For multiple index you can explicitly define the search pipeline now. Therefore after enabling the support of search pipeline this bug is resolved.

@github-project-automation github-project-automation bot moved this from Backlog (Hot) to ✅ Done in Vector Search RoadMap Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

8 participants