-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: implement fuzzy search API #10184
Conversation
I'll cover API docs and e2e tests in a followup PR. First I want to make sure @DingoEatingFuzz / @backspace are happy with the response format, of which a larger example is below. edit: see API docs examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @shoenig. The search/sorting algorithm is really clear and understandable. I've left some usability remarks and a few other odds and ends.
|
||
- `fuzzy_enabled` `(bool: true)` - Specifies whether the fuzzy search API is | ||
enabled. If not enabled, requests to the fuzzy search API endpoint will return | ||
an error response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a link here to the API docs for this new endpoint as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -0,0 +1,129 @@ | |||
package structs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ for splitting anything out of structs.go
return s.srv.blockingRPC(&opts) | ||
} | ||
|
||
func expandContext(context structs.Context) []structs.Context { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to factor this out of nomad/search_endpoint_ent.go
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allContexts
is defined per the _oss
and _ent
files; this wrapper Just Works with either one
|
||
prefix := alloc.ID[:len(alloc.ID)-2] | ||
data := structs.SearchRequest{Prefix: prefix, Context: structs.Allocs} | ||
req, err := http.NewRequest("POST", "/v1/search", encodeReq(data)) | ||
assert.Nil(err) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to tidy all these existing tests up, btw.
nomad/search_endpoint.go
Outdated
return fmt.Errorf("fuzzy search is not enabled") | ||
} | ||
|
||
// check the query term meets minimum length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ACL check should be before argument validation so that we return ACL errors first. Probably before config validation too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I agree. One thing though - to make namespace=*
work, this up front ACL check is basically bypassed, instead defering to iterators wrapped with filters, doing the same ACL checks removing objects not allowed by the given token. So this is really more of a "at least try to give them an error if they specify a namespace they can't access" optimization.
accumulateSet := func(limited bool, set map[structs.Context][]fuzzyMatch) { | ||
for ctx, matches := range set { | ||
for _, match := range matches { | ||
if len(unsorted[ctx]) < limitResults { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to sound nitpicky, but the results limit config talks about a maximum number of results, whereas this is getting us the maximum number of results per context. Should we update this so that we're tracking a count of how many results overall, or just change the docs? (Either seems reasonable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch; clarified in docs
if b.Search.LimitQuery > 0 { | ||
result.Search.LimitQuery = b.Search.LimitQuery | ||
} | ||
if b.Search.LimitResults > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we validate a relationship between LimitQuery
and LimitResults
, even if just to warn in the logs that it's unexpected to have LimitResults > LimitQuery
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could actually be the case where the number of results exceeds LimitQuery
- because of the way the jobs
type is expanded into its subtypes (groups
, tasks
, services
, images
, commands
, classes
). There could be only 1 job registered, but contains more of any of those subtypes than LimitResults
.
Hey @shoenig, I annotated the mock response with my thoughts in this gist: https://gist.github.com/DingoEatingFuzz/27ac49ea722c4df5011f58b6207fa3ab The tl;dr is we need to make sure that all the responses have all the requisite info to build links from. |
I poked at the code a bit but had trouble finding answers to my two questions so I'll just ask them:
|
re cross-namespace searches, it’s known to be desired 🤞 |
Hey @DingoEatingFuzz and @backspace I tweaked the response objects to better identify the matching object. I've got API docs in there now with more query examples.
It should, yes. With ACLs enabled, if
Nope, at least not yet. The substring matches we're doing now are already going to use a questionable amount of resources, so this change includes server config to disable and/or limit queries. Once we get a better understanding of the performance impact maybe we can open up the matching patterns. |
@shoenig I pulled this branch down locally and I'm not getting the results I'd expect to see with this endpoint. But maybe I'm missing something in the docs:
|
hmm is it possible your queries returned empty results, @tgross? I tried it and got some:
|
I'd run the example job from
|
Maybe I’m wrong? Maybe I’m missing something in local development 🤔 I’ll look more into this and let you know, sorry. |
yes, I was wrong, disregard 😳 |
I don’t know that this needs to be blocking, but now that I’m trying this, case-insensitivity would be nice to have. Compy is named Probably not possible to include now but a future improvement could be returning the substring indices matched for bolding in the UI. |
I think we can slip this in without much of a performance impact; I'll add it.
Yup can definitely add Name. Currently the search is performed on Name, and then ID is used in the ID part of the return struct; which is confusing 🙂 . I'm thinking we should change it so that the search is performed on Name, the Name is used in the ID field, and the job ID is appended to the scope. That would be most consistent with how all the other context types are structured.
Here is great! |
This PR introduces the /v1/search/fuzzy API endpoint, used for fuzzy searching objects in Nomad. The fuzzy search endpoint routes requests to the Nomad Server leader, which implements the Search.FuzzySearch RPC method. Requests to the fuzzy search API are based on the api.FuzzySearchRequest object, e.g. { "Text": "ed", "Context": "all" } Responses from the fuzzy search API are based on the api.FuzzySearchResponse object, e.g. { "Index": 27, "KnownLeader": true, "LastContact": 0, "Matches": { "tasks": [ { "ID": "redis", "Scope": [ "default", "example", "cache" ] } ], "evals": [], "deployment": [], "volumes": [], "scaling_policy": [], "images": [ { "ID": "redis:3.2", "Scope": [ "default", "example", "cache", "redis" ] } ] }, "Truncations": { "volumes": false, "scaling_policy": false, "evals": false, "deployment": false } } The API is tunable using the new server.search stanza, e.g. server { search { fuzzy_enabled = true limit_query = 200 limit_results = 1000 min_term_length = 5 } } These values can be increased or decreased, so as to provide more search results or to reduce load on the Nomad Server. The fuzzy search API can be disabled entirely by setting `fuzzy_enabled` to `false`.
Those 2 additions should work now @backspace
In this case the JobID and JobName are the same but it should be correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Let's ship it!
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This PR introduces the
/v1/search/fuzzy
API endpoint, used for fuzzysearching objects in Nomad. The fuzzy search endpoint routes requests
to the Nomad Server leader, which implements the
Search.FuzzySearch
RPCmethod.
Requests to the fuzzy search API are based on the
api.FuzzySearchRequest
object, e.g.
Responses from the fuzzy search API are based on the
api.FuzzySearchResponse
object, e.g.
The API is configurable using the
server.search
stanza, e.g.These values can be increased or decreased, so as to provide more
search results or to reduce load on the Nomad Server. The fuzzy search
API can be disabled entirely by setting
fuzzy_enabled
tofalse
.