Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field for warnings (and maybe debug/info?) to SearchResponse #6794

Open
msfroh opened this issue Mar 22, 2023 · 13 comments
Open

Add field for warnings (and maybe debug/info?) to SearchResponse #6794

msfroh opened this issue Mar 22, 2023 · 13 comments
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@msfroh
Copy link
Collaborator

msfroh commented Mar 22, 2023

Is your feature request related to a problem? Please describe.
When calling out to an external reranker, it's possible that the network call fails (because networks). For our plugin that calls out to AWS Kendra Intelligent Ranking service, we swallow and log any exceptions, and just return results in the original order. While the server-side application log captures what went wrong, the client who made the search request has no way of knowing that they're receiving degraded results.

More broadly, this will be a problem for any search pipeline processor that makes an external call and wants to "fail open", as called out in a comment by @lukas-vlcek on opensearch-project/search-processor#80.

Describe the solution you'd like
The best solution I've been able to come up with so far is that the search response could include an extra "warnings" property. This could be an array of strings (or objects?).

Ideally, any warnings could communicate what failed and why, so that a human observing these warnings might be able to take corrective action. For example, a call to an external service may have failed because credentials are no longer valid, or a call quota was exceeded.

For now, I'm leaning toward human-readable string warnings, since I haven't thought of a reason to make them machine-friendly (though that may just be a lack of imagination on my part).

Describe alternatives you've considered
The status quo, where we log the warning to application logs "works", but it assumes that callers who may not have access to the application logs don't need to know if a specific response was degraded.

Another option would be to recommend always "failing closed". That is, if a call to an external reranker fails, then just throw an exception (with explanation) instead of returning results. That's bad from an availability standpoint.

We could also optionally fail closed (i.e. via a request parameter), so production traffic fails open, but a "canary" service periodically probes the search cluster asking for an exception if the external call fails. That only works if the canary calls fail the same way as the production traffic calls.

Additional context
As a stretch goal, I think we could generalize this to include messages at other "log levels". In particular, it has always bothered me that OpenSearch and its predecessor don't have a debug URL parameter for search requests, the way that Solr has since forever.

Maybe this search response property could have a form like:

"additional_info": {
  "warn": [
    {
      "source": "org.opensearch.search.foo.RankingWhatever",
      "message": "Call to https://rankingservice.example.com/ failed -- invalid credentials"
    }
  ],
  "debug": [
    {
      "source": "org.opensearch.search.query.QueryPhase",
      "message": "Query object sent to Lucene was: '+foo:bar (field1:val1 field2:val2 field3:val3)~2'"
    },
    {
      "source": "org.opensearch.search.query.QueryPhase",
      "message": "Time spent in searcher.search was 53.25ms"
    }
  ]
}
@msfroh msfroh added enhancement Enhancement or improvement to existing feature or request untriaged labels Mar 22, 2023
@minalsha minalsha added Search Search query, autocomplete ...etc and removed untriaged labels Mar 23, 2023
@dblock
Copy link
Member

dblock commented Mar 27, 2023

Rather than additional info, I propose that ?explain=true response contain all the information about which re-rankers were called, which ones succeeded and failed, and how long it took. This should work for your canary idea.

For our plugin that calls out to AWS Kendra Intelligent Ranking service, we swallow and log any exceptions, and just return results in the original order.
Another option would be to recommend always "failing closed". That is, if a call to an external reranker fails, then just throw an exception (with explanation) instead of returning results. That's bad from an availability standpoint.

I feel pretty strongly that this was the right option because as a user I don't want results that are "sometimes better ranked", I like my systems to be predictable. But I also do understand that calling out to Kendra isn't like all other rerankers because there's some networking involved, and I also understand a website may want some kind of fallback. So I can see how some administrators may want this behavior to be optional - there's another feature request here to support both hard failure and the current behavior.

@msfroh
Copy link
Collaborator Author

msfroh commented Mar 28, 2023

I don't think ?explain=true is a great approach, since the explain process is pretty expensive (since it's explaining the scoring applied to every returned result).

Maybe we could add levels of verbosity to the explain param so people can decide if they want a detailed explanation for every result or an overview of where their query spent time and what each component did to the query broadly (which is roughly what Solr's debug parameter offers)?

@dblock
Copy link
Member

dblock commented Apr 3, 2023

@msfroh Is it a problem that explain is expensive for when one wants to get debug info in search response? Isn't this exactly what it was designed for?

@msfroh
Copy link
Collaborator Author

msfroh commented Apr 4, 2023

Yeah, I guess it's not a problem.

I'm not a fan of the wall of text from explaining scores (and I don't usually need score explanations unless I'm specifically investigating a relevance issue), but sure, we can put all the debug, info, and warning stuff in one place.

@dblock
Copy link
Member

dblock commented Apr 4, 2023

Yeah, I guess it's not a problem.

I'm not a fan of the wall of text from explaining scores (and I don't usually need score explanations unless I'm specifically investigating a relevance issue), but sure, we can put all the debug, info, and warning stuff in one place.

I do like the idea of adding more options to limit the depth/verbosity of explain, too! But that's a different feature.

@msfroh
Copy link
Collaborator Author

msfroh commented Apr 14, 2023

I was looking into this a bit more, and the explain option (when true) attaches an org.apache.lucene.search.Explanation object to each SearchHit, since it exists to explain why documents are scored the way they are. The Explanation class is entirely devoted to details of score calculations, which is not what this issue is about.

I still don't know where we could put additional (debug/warning) information about query processing, but explain is definitely not the right API.

@dblock
Copy link
Member

dblock commented Apr 14, 2023

@msfroh Just because something is implemented in a certain way doesn't mean it's not what users want an API to return. What would our users want? Then we can discuss the implementation.

@msfroh
Copy link
Collaborator Author

msfroh commented Apr 14, 2023

I don't believe that our users would want a scoring-focused API to start returning things unrelated to scoring.

@dblock
Copy link
Member

dblock commented Apr 17, 2023

That is indeed what explain is designed to do: Wondering why a specific document ranks higher (or lower) for a query? You can use the explain API for an explanation of how the relevance score (_score) is calculated for every result., so you have a point.

Another related set of concerns may be #1061?

@macohen
Copy link
Contributor

macohen commented Apr 18, 2023

#1061 is probably closer to what we're looking for here. Following that issue...

@msfroh
Copy link
Collaborator Author

msfroh commented Aug 3, 2023

In opensearch-project/ml-commons#1150, @austintlee suggested the possible addition of an ext block to search responses, similar to the one in search requests, providing a flexible extension point for information returned in a search response. I think that idea is pretty intriguing and would satisfy the needs of this issue.

@austintlee
Copy link
Contributor

I just created a PR and am looking to get feedback on that. Thanks.

@saratvemulapalli
Copy link
Member

reading through the context, I like free form ext block with an extension point. It cleanly solves abstracts out additional information without fiddling with SearchHits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Status: 🆕 New
Development

No branches or pull requests

6 participants