Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: add support for partial shard results #116388

Merged
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
4aa4a97
EQL: add support for partial shard failures
luigidellaquila Nov 7, 2024
95f9a1b
Add shard_failures to the response
luigidellaquila Nov 7, 2024
d14c1f6
Docs + request param
luigidellaquila Nov 7, 2024
b72fd37
Add cluster setting
luigidellaquila Nov 8, 2024
a86ab6a
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 8, 2024
18cfa60
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 11, 2024
8777fc8
Randomize spec tests
luigidellaquila Nov 11, 2024
4e63b3c
More tests
luigidellaquila Nov 11, 2024
b6501cc
Update docs/changelog/116388.yaml
luigidellaquila Nov 11, 2024
f052782
Fine tune sequences behavior using allow_partial_sequence_results
luigidellaquila Nov 18, 2024
3e5439e
Merge remote-tracking branch 'luigidellaquila/eql/allow_partial_searc…
luigidellaquila Nov 18, 2024
90fc499
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 18, 2024
35eb31e
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 19, 2024
b003c1c
allow_partial_sequence_results=false by default
luigidellaquila Nov 21, 2024
8978a01
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 21, 2024
4c421c0
Add CCS test
luigidellaquila Nov 25, 2024
2ab3972
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 25, 2024
3ab9740
Fix license
luigidellaquila Nov 25, 2024
8207ea0
Refactor test
luigidellaquila Nov 25, 2024
f3a1a65
Make CCS test more deterministic
luigidellaquila Nov 25, 2024
c54a0c5
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 26, 2024
a8f5fb5
Cleanup
luigidellaquila Nov 26, 2024
fcfa021
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 27, 2024
706935c
Cleanup serialization code
luigidellaquila Nov 27, 2024
545e614
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 28, 2024
32a7aef
More yaml tests
luigidellaquila Nov 28, 2024
1e97b85
Add toml tests and fix REST params
luigidellaquila Nov 29, 2024
1da924c
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 29, 2024
3ebacb8
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 29, 2024
ed6b9a7
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 2, 2024
9f9eba8
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 4, 2024
7efff36
Merge remote-tracking branch 'luigidellaquila/eql/allow_partial_searc…
luigidellaquila Dec 4, 2024
045d8da
More tests
luigidellaquila Dec 6, 2024
672e512
Refactoring
luigidellaquila Dec 9, 2024
e1e83a6
More tests
luigidellaquila Dec 9, 2024
625acf4
API spec + more tests
luigidellaquila Dec 10, 2024
7f36a69
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 10, 2024
984fe02
Fix conflict
luigidellaquila Dec 10, 2024
a1c903f
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 13, 2024
f58fd1c
Refactor tests
luigidellaquila Dec 16, 2024
3aba03a
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/116388.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 116388
summary: Add support for partial shard results
area: EQL
type: enhancement
issues: []
47 changes: 47 additions & 0 deletions docs/reference/eql/eql-search-api.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,53 @@ request that targets only `bar*` still returns an error.
+
Defaults to `true`.

`allow_partial_search_results`::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ES search API supports partial results for async search as well, if I read the documentation correctly. Is there something stopping us for doing the same with EQL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when _search docs mention partial results at the beginning of the page, they refer to something slightly different, ie. the first part of a response that is still being calculated.
Search results can also be partial because of missing shards, that is the same we have in EQL, regardless of the request being sync or async.

This said, we definitely need some tests for async EQL queries with allow_partial_search_results=true. I'm adding them.

(Optional, Boolean)

If `false`, the request returns an error if one or more shards involved in the query are unavailable.
+
If `true`, the query is executed only on the available shards, ignoring shard request timeouts and
<<shard-failures,shard failures>>.
+
Defaults to `false`.
+
To override the default for this field, set the
`xpack.eql.default_allow_partial_results` cluster setting to `true`.


[IMPORTANT]
====
You can also specify this value using the `allow_partial_search_results` request body parameter.
If both parameters are specified, only the query parameter is used.
====


`allow_partial_sequence_results`::
(Optional, Boolean)


Used together with `allow_partial_search_results=true`, controls the behavior of sequence queries specifically
(if `allow_partial_search_results=false`, this setting has no effect).
If `true` and if some shards are unavailable, the sequences are calculated on available shards only.
+
If `false` and if some shards are unavailable, the query only returns information about the shard failures,
but no further results.
+
Defaults to `false`.
+
Consider that sequences calculated with `allow_partial_search_results=true` can return incorrect results
(eg. if a <<eql-missing-events, missing event>> clause matches records in unavailable shards)
+
To override the default for this field, set the
`xpack.eql.default_allow_partial_sequence_results` cluster setting to `true`.


[IMPORTANT]
====
You can also specify this value using the `allow_partial_sequence_results` request body parameter.
If both parameters are specified, only the query parameter is used.
====

`ccs_minimize_roundtrips`::
(Optional, Boolean) If `true`, network round-trips between the local and the
remote cluster are minimized when running cross-cluster search (CCS) requests.
Expand Down
2 changes: 2 additions & 0 deletions server/src/main/java/org/elasticsearch/TransportVersions.java
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,8 @@ static TransportVersion def(int id) {
public static final TransportVersion REINDEX_DATA_STREAMS = def(8_799_00_0);
public static final TransportVersion ESQL_REMOVE_NODE_LEVEL_PLAN = def(8_800_00_0);
public static final TransportVersion LOGSDB_TELEMETRY_CUSTOM_CUTOFF_DATE = def(8_801_00_0);
public static final TransportVersion EQL_ALLOW_PARTIAL_SEARCH_RESULTS = def(8_802_00_0);

/*
* STOP! READ THIS FIRST! No, really,
* ____ _____ ___ ____ _ ____ _____ _ ____ _____ _ _ ___ ____ _____ ___ ____ ____ _____ _
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2453,7 +2453,7 @@ public static void afterClass() throws Exception {
/**
* After the cluster is stopped, there are a few netty threads that can linger, so we make sure we don't leak any tasks on them.
*/
static void awaitGlobalNettyThreadsFinish() throws Exception {
public static void awaitGlobalNettyThreadsFinish() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find an obvious reason for this change. Can you shed some light, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a leftover, I don't need it anymore. Reverting

// Don't use GlobalEventExecutor#awaitInactivity. It will waste up to 1s for every call and we expect no tasks queued for it
// except for the odd scheduled shutdown task.
assertBusy(() -> assertEquals(0, GlobalEventExecutor.INSTANCE.pendingTasks()));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -163,13 +163,19 @@ protected ObjectPath runQuery(String index, String query) throws Exception {
if (maxSamplesPerKey > 0) {
builder.field("max_samples_per_key", maxSamplesPerKey);
}
if (randomBoolean()) {
builder.field("allow_partial_search_results", randomBoolean());
}
builder.endObject();

Request request = new Request("POST", "/" + index + "/_eql/search");
Boolean ccsMinimizeRoundtrips = ccsMinimizeRoundtrips();
if (ccsMinimizeRoundtrips != null) {
request.addParameter("ccs_minimize_roundtrips", ccsMinimizeRoundtrips.toString());
}
if (randomBoolean()) {
request.addParameter("allow_partial_search_results", String.valueOf(randomBoolean()));
}
int timeout = Math.toIntExact(timeout().millis());
RequestConfig config = RequestConfig.copy(RequestConfig.DEFAULT)
.setConnectionRequestTimeout(timeout)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -478,3 +478,34 @@ setup:
query: 'sequence with maxspan=10d [network where user == "ADMIN"] ![network where used == "SYSTEM"]'
- match: { error.root_cause.0.type: "verification_exception" }
- match: { error.root_cause.0.reason: "Found 1 problem\nline 1:75: Unknown column [used], did you mean [user]?" }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe in this or a new yml file to add the more varied scenarios for "failed shards" queries that I mentioned above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more cases to toml tests


---
"Execute query with allow_partial_search_results":
- do:
eql.search:
index: eql_test
body:
query: 'process where user == "SYSTEM"'
fields: [{"field":"@timestamp","format":"epoch_millis"},"id","valid","day_of_week"]
allow_partial_search_results: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vary the placement of the parameter as a request parameter as well, not only the body of the request.


- match: {timed_out: false}
- match: {hits.total.value: 3}
- match: {hits.total.relation: "eq"}
- match: {hits.events.0._source.user: "SYSTEM"}
- match: {hits.events.0._id: "1"}
- match: {hits.events.0.fields.@timestamp: ["1580733296000"]}
- match: {hits.events.0.fields.id: [123]}
- match: {hits.events.0.fields.valid: [false]}
- match: {hits.events.0.fields.day_of_week: ["Monday"]}
- match: {hits.events.1._id: "2"}
- match: {hits.events.1.fields.@timestamp: ["1580819696000"]}
- match: {hits.events.1.fields.id: [123]}
- match: {hits.events.1.fields.valid: [true]}
- match: {hits.events.1.fields.day_of_week: ["Tuesday"]}
- match: {hits.events.2._id: "3"}
- match: {hits.events.2.fields.@timestamp: ["1580906096000"]}
- match: {hits.events.2.fields.id: [123]}
- match: {hits.events.2.fields.valid: [true]}
- match: {hits.events.2.fields.day_of_week: ["Wednesday"]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have a test with a simulated shard failure to see that it gets printed out in the response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simulate shard failures in yaml tests? Do you have an example?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can certainly cause shard failures, for instance by querying multiple indices, out of which one returns an error for the query. Can be simple mapping related issues like a type conflict etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to play with it a bit; probably mapping related problems are not the way, since EQL does quite some pre-analysis based on field_caps before running the search queries, so this kind of problems will result in an early validation exception.
In the IT I just shut down nodes and make shards unavailable, but I don't think I can do the same with yaml tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more tests here, simulating shard failures with painless.

Loading