Add field for warnings (and maybe debug/info?) to SearchResponse #6794

msfroh · 2023-03-22T18:43:34Z

Is your feature request related to a problem? Please describe.
When calling out to an external reranker, it's possible that the network call fails (because networks). For our plugin that calls out to AWS Kendra Intelligent Ranking service, we swallow and log any exceptions, and just return results in the original order. While the server-side application log captures what went wrong, the client who made the search request has no way of knowing that they're receiving degraded results.

More broadly, this will be a problem for any search pipeline processor that makes an external call and wants to "fail open", as called out in a comment by @lukas-vlcek on opensearch-project/search-processor#80.

Describe the solution you'd like
The best solution I've been able to come up with so far is that the search response could include an extra "warnings" property. This could be an array of strings (or objects?).

Ideally, any warnings could communicate what failed and why, so that a human observing these warnings might be able to take corrective action. For example, a call to an external service may have failed because credentials are no longer valid, or a call quota was exceeded.

For now, I'm leaning toward human-readable string warnings, since I haven't thought of a reason to make them machine-friendly (though that may just be a lack of imagination on my part).

Describe alternatives you've considered
The status quo, where we log the warning to application logs "works", but it assumes that callers who may not have access to the application logs don't need to know if a specific response was degraded.

Another option would be to recommend always "failing closed". That is, if a call to an external reranker fails, then just throw an exception (with explanation) instead of returning results. That's bad from an availability standpoint.

We could also optionally fail closed (i.e. via a request parameter), so production traffic fails open, but a "canary" service periodically probes the search cluster asking for an exception if the external call fails. That only works if the canary calls fail the same way as the production traffic calls.

Additional context
As a stretch goal, I think we could generalize this to include messages at other "log levels". In particular, it has always bothered me that OpenSearch and its predecessor don't have a debug URL parameter for search requests, the way that Solr has since forever.

Maybe this search response property could have a form like:

"additional_info": {
  "warn": [
    {
      "source": "org.opensearch.search.foo.RankingWhatever",
      "message": "Call to https://rankingservice.example.com/ failed -- invalid credentials"
    }
  ],
  "debug": [
    {
      "source": "org.opensearch.search.query.QueryPhase",
      "message": "Query object sent to Lucene was: '+foo:bar (field1:val1 field2:val2 field3:val3)~2'"
    },
    {
      "source": "org.opensearch.search.query.QueryPhase",
      "message": "Time spent in searcher.search was 53.25ms"
    }
  ]
}

The text was updated successfully, but these errors were encountered:

dblock · 2023-03-27T21:42:09Z

Rather than additional info, I propose that ?explain=true response contain all the information about which re-rankers were called, which ones succeeded and failed, and how long it took. This should work for your canary idea.

For our plugin that calls out to AWS Kendra Intelligent Ranking service, we swallow and log any exceptions, and just return results in the original order.
Another option would be to recommend always "failing closed". That is, if a call to an external reranker fails, then just throw an exception (with explanation) instead of returning results. That's bad from an availability standpoint.

I feel pretty strongly that this was the right option because as a user I don't want results that are "sometimes better ranked", I like my systems to be predictable. But I also do understand that calling out to Kendra isn't like all other rerankers because there's some networking involved, and I also understand a website may want some kind of fallback. So I can see how some administrators may want this behavior to be optional - there's another feature request here to support both hard failure and the current behavior.

msfroh · 2023-03-28T03:51:06Z

I don't think ?explain=true is a great approach, since the explain process is pretty expensive (since it's explaining the scoring applied to every returned result).

Maybe we could add levels of verbosity to the explain param so people can decide if they want a detailed explanation for every result or an overview of where their query spent time and what each component did to the query broadly (which is roughly what Solr's debug parameter offers)?

dblock · 2023-04-03T20:44:53Z

@msfroh Is it a problem that explain is expensive for when one wants to get debug info in search response? Isn't this exactly what it was designed for?

msfroh · 2023-04-04T16:48:52Z

Yeah, I guess it's not a problem.

I'm not a fan of the wall of text from explaining scores (and I don't usually need score explanations unless I'm specifically investigating a relevance issue), but sure, we can put all the debug, info, and warning stuff in one place.

dblock · 2023-04-04T17:50:48Z

Yeah, I guess it's not a problem.

I'm not a fan of the wall of text from explaining scores (and I don't usually need score explanations unless I'm specifically investigating a relevance issue), but sure, we can put all the debug, info, and warning stuff in one place.

I do like the idea of adding more options to limit the depth/verbosity of explain, too! But that's a different feature.

msfroh · 2023-04-14T18:19:51Z

I was looking into this a bit more, and the explain option (when true) attaches an org.apache.lucene.search.Explanation object to each SearchHit, since it exists to explain why documents are scored the way they are. The Explanation class is entirely devoted to details of score calculations, which is not what this issue is about.

I still don't know where we could put additional (debug/warning) information about query processing, but explain is definitely not the right API.

dblock · 2023-04-14T18:32:42Z

@msfroh Just because something is implemented in a certain way doesn't mean it's not what users want an API to return. What would our users want? Then we can discuss the implementation.

msfroh · 2023-04-14T23:25:00Z

I don't believe that our users would want a scoring-focused API to start returning things unrelated to scoring.

dblock · 2023-04-17T18:19:51Z

That is indeed what explain is designed to do: Wondering why a specific document ranks higher (or lower) for a query? You can use the explain API for an explanation of how the relevance score (_score) is calculated for every result., so you have a point.

Another related set of concerns may be #1061?

macohen · 2023-04-18T16:17:56Z

#1061 is probably closer to what we're looking for here. Following that issue...

msfroh · 2023-08-03T17:39:28Z

In opensearch-project/ml-commons#1150, @austintlee suggested the possible addition of an ext block to search responses, similar to the one in search requests, providing a flexible extension point for information returned in a search response. I think that idea is pretty intriguing and would satisfy the needs of this issue.

austintlee · 2023-08-16T00:14:13Z

I just created a PR and am looking to get feedback on that. Thanks.

saratvemulapalli · 2023-08-16T18:21:16Z

reading through the context, I like free form ext block with an extension point. It cleanly solves abstracts out additional information without fiddling with SearchHits.

msfroh added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 22, 2023

msfroh mentioned this issue Mar 22, 2023

[RFC] Search pipelines opensearch-project/search-processor#80

Closed

1 task

minalsha added Search Search query, autocomplete ...etc and removed untriaged labels Mar 23, 2023

msfroh mentioned this issue Jun 20, 2023

[BUG] Order of logic condition alternates query result #7525

Closed

msfroh mentioned this issue Aug 3, 2023

[Search Pipeline] Expose Search Pipeline Stats in Search Response #8635

Open

msfroh mentioned this issue Aug 10, 2023

Use Search Pipeline processors, Remote Inference and HttpConnector to enable Retrieval Augmented Generation (RAG) opensearch-project/ml-commons#1195

Merged

This was referenced Aug 15, 2023

[Feature Request] SearchExtBuilder in SearchResponse #9328

Closed

Add SearchExtBuilders to SearchResponse #9379

Merged

peterzhuamazon added this to Search Project Board Dec 19, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add field for warnings (and maybe debug/info?) to SearchResponse #6794

Add field for warnings (and maybe debug/info?) to SearchResponse #6794

msfroh commented Mar 22, 2023

dblock commented Mar 27, 2023

msfroh commented Mar 28, 2023

dblock commented Apr 3, 2023

msfroh commented Apr 4, 2023

dblock commented Apr 4, 2023

msfroh commented Apr 14, 2023

dblock commented Apr 14, 2023

msfroh commented Apr 14, 2023

dblock commented Apr 17, 2023 •

edited

Loading

macohen commented Apr 18, 2023

msfroh commented Aug 3, 2023

austintlee commented Aug 16, 2023

saratvemulapalli commented Aug 16, 2023

Add field for warnings (and maybe debug/info?) to SearchResponse #6794

Add field for warnings (and maybe debug/info?) to SearchResponse #6794

Comments

msfroh commented Mar 22, 2023

dblock commented Mar 27, 2023

msfroh commented Mar 28, 2023

dblock commented Apr 3, 2023

msfroh commented Apr 4, 2023

dblock commented Apr 4, 2023

msfroh commented Apr 14, 2023

dblock commented Apr 14, 2023

msfroh commented Apr 14, 2023

dblock commented Apr 17, 2023 • edited Loading

macohen commented Apr 18, 2023

msfroh commented Aug 3, 2023

austintlee commented Aug 16, 2023

saratvemulapalli commented Aug 16, 2023

dblock commented Apr 17, 2023 •

edited

Loading