-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add go-bexpr
filters to evals and deployment list endpoints
#12034
Conversation
} else if strings.HasSuffix(errMsg, structs.ErrIncompatibleFiltering.Error()) { | ||
errMsg = structs.ErrIncompatibleFiltering.Error() | ||
code = 400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is the right place to handle this, but it would be good to have it in a code path that is common to all endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is absolutely a weird place to jam it and not the API (HasSuffix?!) any of us would prefer but... it's idiomatic for now. 👍
// Iterator is the interface that must be implemented to use the Paginator. | ||
type Iterator interface { | ||
// Next returns the next element to be considered for pagination. | ||
// The page will end if nil is returned. | ||
Next() interface{} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this while working on a different approach, where the filter would create another iterator, but I kind of like this change regardless?
But I can revert it if we want to keep it like it was.
"github.com/hashicorp/nomad/nomad/structs" | ||
) | ||
|
||
func BenchmarkEvalListFilter(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark results from my machine:
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkEvalListFilter/filter_with_index-16 3700 6832056 ns/op 458 B/op 13 allocs/op
BenchmarkEvalListFilter/filter_with_go-bexpr-16 100 234049608 ns/op 47202395 B/op 4000009 allocs/op
BenchmarkEvalListFilter/paginated_filter_with_index-16 4819960 4628 ns/op 2616 B/op 24 allocs/op
BenchmarkEvalListFilter/paginated_filter_with_go-bexpr-16 59338 409490 ns/op 122438 B/op 8325 allocs/op
Filtered results are an order or two of magnitude slower than using an index so we should document query params that are index based.
I also had a version that used FilterIterator
, but the result was the same as filter_with_go-bexpr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 100 eval page in the paginated results are definitely more "realistic" than us trying to iterate over 100k evals at once.
Something else to keep in mind looking at this is that in paginated_filter_with_go-bexpr
we're creating the bexpr evaluator every iteration whereas in filter_with_go-bexpr
we're creating the bexpr evaluator once outside the b.N
loop. Unfortunately a given API call doesn't get to amortize creating the evaluator across multiple pages, so paginated_filter_with_go-bexpr
is going to be even worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point, should I move the call to CreateEvaluator
so it's within the bechmark for
loop?
Another thing that I thought is that there's a diminishing advantage of using pagination when querying later pages since we need to iterate over everything that comes before anyway, even if we don't apply the filter:
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkEvalListFilter/filter_with_index-16 3673 6704301 ns/op
BenchmarkEvalListFilter/filter_with_go-bexpr-16 100 220508127 ns/op
BenchmarkEvalListFilter/paginated_filter_with_index-16 5146934 4724 ns/op
BenchmarkEvalListFilter/paginated_filter_with_go-bexpr-16 55220 480332 ns/op
BenchmarkEvalListFilter/paginated_filter_with_index_last_page-16 2017 11966223 ns/op
BenchmarkEvalListFilter/paginated_filter_with_go-bexpr_last_page-16 872 27409941 ns/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we combine this with #12054 it seems like we "just" need to take into account the Filter values when picking the index to use, which we basically already do when the requested Namespace is set.
i.e. in this block
string parameter. | ||
string parameter and is backed by a data store index, so it can be used to | ||
reduce the number of entries processed by `filter`. | ||
|
||
- `namespace` `(string: "default")` - Specifies the target | ||
namespace. Specifying `*` will return all evaluations across all | ||
authorized namespaces. | ||
authorized namespaces. This parameter is backed by a data store index, so it | ||
can be used to reduce the number of entries processed by `filter`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if these additions are good. Feel free to suggest alternatives 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if here we focus on the gotcha and link to general filter documentation for more information?
This parameter is used before any
filter
is applied. See the thefilter
docs for more information.
Then we could add -filter
related documentation here: https://www.nomadproject.io/docs/commands#filter This would be the ideal place to call out how the NOMAD_NAMESPACE
env var will take precedence over any namespace filtering. Although I suppose api
package users will miss that... 🤔
And ?filter=...
related documentation here: https://www.nomadproject.io/api-docs#filter
With perhaps more expansive docs on the query language here? https://www.nomadproject.io/docs/internals/filtering
Or perhaps skip the command and api docs and link directly to an internals doc?
Point being while I think noting these performance impacts is important, I think we also need to guide users toward proper usage of env vars vs query params vs filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm planning on having more comprehensive docs about filtering and pagination in a follow-up PR. Basically copy most of Consul's docs 😅
Having a CLI section also sounds like a good idea.
In this PR I will add just the first sentence, and then add a link to the new sections in the follow-up PR.
nomad/state/paginator.go
Outdated
match, err := p.filterEvaluator.Evaluate(raw) | ||
if !match || err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What errors can occur here and should we return them and cease iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. An error would happen here if the user filters by an invalid field (second example below). I pushed a commit to improve error handling and also make their location more descriptive.
Some failed request samples:
$ curl --get 'http://localhost:4646/v1/evaluations?per_page=1' --data-urlencode 'filter=InvalidExpression'
failed to create result paginator: failed to read filter expression: 1:18 (17): no match found, expected: "!=", ".", "==", "[", [ \t\r\n] or [a-zA-Z0-9_/]
$ curl --get 'http://localhost:4646/v1/evaluations?per_page=1' --data-urlencode 'filter=InvalidField == ""'
failed to read result page: error finding value in datum: /InvalidField at part 0: couldn't find key: struct field with name "InvalidField"
string parameter. | ||
string parameter and is backed by a data store index, so it can be used to | ||
reduce the number of entries processed by `filter`. | ||
|
||
- `namespace` `(string: "default")` - Specifies the target | ||
namespace. Specifying `*` will return all evaluations across all | ||
authorized namespaces. | ||
authorized namespaces. This parameter is backed by a data store index, so it | ||
can be used to reduce the number of entries processed by `filter`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if here we focus on the gotcha and link to general filter documentation for more information?
This parameter is used before any
filter
is applied. See the thefilter
docs for more information.
Then we could add -filter
related documentation here: https://www.nomadproject.io/docs/commands#filter This would be the ideal place to call out how the NOMAD_NAMESPACE
env var will take precedence over any namespace filtering. Although I suppose api
package users will miss that... 🤔
And ?filter=...
related documentation here: https://www.nomadproject.io/api-docs#filter
With perhaps more expansive docs on the query language here? https://www.nomadproject.io/docs/internals/filtering
Or perhaps skip the command and api docs and link directly to an internals doc?
Point being while I think noting these performance impacts is important, I think we also need to guide users toward proper usage of env vars vs query params vs filters.
if err != nil { | ||
p.pageErr = err | ||
return nil, paginatorComplete | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgross any thoughts on this approach for handling errors during pagination?
I thought of creating a new paginatorState
called paginatorError
, but having a non-nil pageErr
would already indicate that so I just used the paginatorComplete
state to break the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine as-is, but if you want to go the extra mile and make this really conventional Go, take a look at the Err()
pattern used for bufio.Scanner
https://pkg.go.dev/bufio#example-Scanner-Lines
The caller then only deals with 1 error after iteration is "complete", rather than on each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting, good to know about this pattern. But I think that, in this case, callers would deal with one page at a time, so having to remember to call Err()
may be a little more error-prone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just some wording suggestions and the idea for cleaning up pagination errors, feel free to ignore
"github.com/hashicorp/nomad/nomad/structs" | ||
) | ||
|
||
func BenchmarkEvalListFilter(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we combine this with #12054 it seems like we "just" need to take into account the Filter values when picking the index to use, which we basically already do when the requested Namespace is set.
i.e. in this block
if err != nil { | ||
p.pageErr = err | ||
return nil, paginatorComplete | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine as-is, but if you want to go the extra mile and make this really conventional Go, take a look at the Err()
pattern used for bufio.Scanner
https://pkg.go.dev/bufio#example-Scanner-Lines
The caller then only deals with 1 error after iteration is "complete", rather than on each iteration.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Add support for
go-bexpr
filters to/v1/evaluations
and/v1/deployments
.Detailed documentation and changelog entry will come in follow-up PRs where this pattern will be applied to other endpoints.
Note to reviewers:
A previous version of this work had an attempt to use
go-memdb
indexes if possible when the filter expression was a simple equality check overnamespace
orprefix
, but I found some problems with that implementation.For
prefix
, there are multiple ways to express that withgo-bexpr
. The most strict way is usingmatches
and regexp:ID matches "^123"
, and checking for a strict equal value (ID == "..."
) seems unusual for a list endpoint. Calling/v1/evaluation/:id
directly would make more sense.Extracting
namespace
out of filters likeNamespace = "dev"
worked better, but the problem is that whenparseNamespace
is called at the HTTP layer, the RPC handler is not able to tell ifnamespace=default
means the user didn't set any namespace or if they explicitly setdefault
. Overwriting the namespace could then lead to surprising results when using thedefault
namespace.