Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: add support for partial shard results #116388

Merged
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
4aa4a97
EQL: add support for partial shard failures
luigidellaquila Nov 7, 2024
95f9a1b
Add shard_failures to the response
luigidellaquila Nov 7, 2024
d14c1f6
Docs + request param
luigidellaquila Nov 7, 2024
b72fd37
Add cluster setting
luigidellaquila Nov 8, 2024
a86ab6a
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 8, 2024
18cfa60
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 11, 2024
8777fc8
Randomize spec tests
luigidellaquila Nov 11, 2024
4e63b3c
More tests
luigidellaquila Nov 11, 2024
b6501cc
Update docs/changelog/116388.yaml
luigidellaquila Nov 11, 2024
f052782
Fine tune sequences behavior using allow_partial_sequence_results
luigidellaquila Nov 18, 2024
3e5439e
Merge remote-tracking branch 'luigidellaquila/eql/allow_partial_searc…
luigidellaquila Nov 18, 2024
90fc499
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 18, 2024
35eb31e
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 19, 2024
b003c1c
allow_partial_sequence_results=false by default
luigidellaquila Nov 21, 2024
8978a01
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 21, 2024
4c421c0
Add CCS test
luigidellaquila Nov 25, 2024
2ab3972
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 25, 2024
3ab9740
Fix license
luigidellaquila Nov 25, 2024
8207ea0
Refactor test
luigidellaquila Nov 25, 2024
f3a1a65
Make CCS test more deterministic
luigidellaquila Nov 25, 2024
c54a0c5
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 26, 2024
a8f5fb5
Cleanup
luigidellaquila Nov 26, 2024
fcfa021
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 27, 2024
706935c
Cleanup serialization code
luigidellaquila Nov 27, 2024
545e614
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 28, 2024
32a7aef
More yaml tests
luigidellaquila Nov 28, 2024
1e97b85
Add toml tests and fix REST params
luigidellaquila Nov 29, 2024
1da924c
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 29, 2024
3ebacb8
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Nov 29, 2024
ed6b9a7
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 2, 2024
9f9eba8
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 4, 2024
7efff36
Merge remote-tracking branch 'luigidellaquila/eql/allow_partial_searc…
luigidellaquila Dec 4, 2024
045d8da
More tests
luigidellaquila Dec 6, 2024
672e512
Refactoring
luigidellaquila Dec 9, 2024
e1e83a6
More tests
luigidellaquila Dec 9, 2024
625acf4
API spec + more tests
luigidellaquila Dec 10, 2024
7f36a69
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 10, 2024
984fe02
Fix conflict
luigidellaquila Dec 10, 2024
a1c903f
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 13, 2024
f58fd1c
Refactor tests
luigidellaquila Dec 16, 2024
3aba03a
Merge branch 'main' into eql/allow_partial_search_results
luigidellaquila Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/116388.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 116388
summary: Add support for partial shard results
area: EQL
type: enhancement
issues: []
47 changes: 47 additions & 0 deletions docs/reference/eql/eql-search-api.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,53 @@ request that targets only `bar*` still returns an error.
+
Defaults to `true`.

`allow_partial_search_results`::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ES search API supports partial results for async search as well, if I read the documentation correctly. Is there something stopping us for doing the same with EQL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when _search docs mention partial results at the beginning of the page, they refer to something slightly different, ie. the first part of a response that is still being calculated.
Search results can also be partial because of missing shards, that is the same we have in EQL, regardless of the request being sync or async.

This said, we definitely need some tests for async EQL queries with allow_partial_search_results=true. I'm adding them.

(Optional, Boolean)

If `false`, the request returns an error if one or more shards involved in the query are unavailable.
+
If `true`, the query is executed only on the available shards, ignoring shard request timeouts and
<<shard-failures,shard failures>>.
+
Defaults to `false`.
+
To override the default for this field, set the
`xpack.eql.default_allow_partial_results` cluster setting to `true`.


[IMPORTANT]
====
You can also specify this value using the `allow_partial_search_results` request body parameter.
If both parameters are specified, only the query parameter is used.
====


`allow_partial_sequence_results`::
(Optional, Boolean)


Used together with `allow_partial_search_results=true`, controls the behavior of sequence queries specifically
(if `allow_partial_search_results=false`, this setting has no effect).
If `true` and if some shards are unavailable, the sequences are calculated on available shards only.
+
If `false` and if some shards are unavailable, the query only returns information about the shard failures,
but no further results.
+
Defaults to `false`.
+
Consider that sequences calculated with `allow_partial_search_results=true` can return incorrect results
(eg. if a <<eql-missing-events, missing event>> clause matches records in unavailable shards)
+
To override the default for this field, set the
`xpack.eql.default_allow_partial_sequence_results` cluster setting to `true`.


[IMPORTANT]
====
You can also specify this value using the `allow_partial_sequence_results` request body parameter.
If both parameters are specified, only the query parameter is used.
====

`ccs_minimize_roundtrips`::
(Optional, Boolean) If `true`, network round-trips between the local and the
remote cluster are minimized when running cross-cluster search (CCS) requests.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ static TransportVersion def(int id) {
public static final TransportVersion LOGSDB_TELEMETRY_CUSTOM_CUTOFF_DATE = def(8_801_00_0);
public static final TransportVersion SOURCE_MODE_TELEMETRY = def(8_802_00_0);
public static final TransportVersion NEW_REFRESH_CLUSTER_BLOCK = def(8_803_00_0);
public static final TransportVersion EQL_ALLOW_PARTIAL_SEARCH_RESULTS = def(8_804_00_0);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2453,7 +2453,7 @@ public static void afterClass() throws Exception {
/**
* After the cluster is stopped, there are a few netty threads that can linger, so we make sure we don't leak any tasks on them.
*/
static void awaitGlobalNettyThreadsFinish() throws Exception {
public static void awaitGlobalNettyThreadsFinish() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find an obvious reason for this change. Can you shed some light, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a leftover, I don't need it anymore. Reverting

// Don't use GlobalEventExecutor#awaitInactivity. It will waste up to 1s for every call and we expect no tasks queued for it
// except for the odd scheduled shutdown task.
assertBusy(() -> assertEquals(0, GlobalEventExecutor.INSTANCE.pendingTasks()));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@
import java.util.function.Function;
import java.util.stream.Collectors;

import static org.hamcrest.Matchers.greaterThan;
import static org.hamcrest.Matchers.is;

public abstract class BaseEqlSpecTestCase extends RemoteClusterAwareEqlRestTestCase {

protected static final String PARAM_FORMATTING = "%2$s";
Expand All @@ -52,6 +55,9 @@ public abstract class BaseEqlSpecTestCase extends RemoteClusterAwareEqlRestTestC
*/
private final int size;
private final int maxSamplesPerKey;
private final Boolean allowPartialSearchResults;
private final Boolean allowPartialSequenceResults;
private final Boolean expectShardFailures;

@Before
public void setup() throws Exception {
Expand Down Expand Up @@ -104,7 +110,16 @@ protected static List<Object[]> asArray(List<EqlSpec> specs) {
}

results.add(
new Object[] { spec.query(), name, spec.expectedEventIds(), spec.joinKeys(), spec.size(), spec.maxSamplesPerKey() }
new Object[] {
spec.query(),
name,
spec.expectedEventIds(),
spec.joinKeys(),
spec.size(),
spec.maxSamplesPerKey(),
spec.allowPartialSearchResults(),
spec.allowPartialSequenceResults(),
spec.expectShardFailures() }
);
}

Expand All @@ -118,7 +133,10 @@ protected static List<Object[]> asArray(List<EqlSpec> specs) {
List<long[]> eventIds,
String[] joinKeys,
Integer size,
Integer maxSamplesPerKey
Integer maxSamplesPerKey,
Boolean allowPartialSearchResults,
Boolean allowPartialSequenceResults,
Boolean expectShardFailures
) {
this.index = index;

Expand All @@ -128,6 +146,9 @@ protected static List<Object[]> asArray(List<EqlSpec> specs) {
this.joinKeys = joinKeys;
this.size = size == null ? -1 : size;
this.maxSamplesPerKey = maxSamplesPerKey == null ? -1 : maxSamplesPerKey;
this.allowPartialSearchResults = allowPartialSearchResults;
this.allowPartialSequenceResults = allowPartialSequenceResults;
this.expectShardFailures = expectShardFailures;
}

public void test() throws Exception {
Expand All @@ -137,6 +158,7 @@ public void test() throws Exception {
private void assertResponse(ObjectPath response) throws Exception {
List<Map<String, Object>> events = response.evaluate("hits.events");
List<Map<String, Object>> sequences = response.evaluate("hits.sequences");
Object shardFailures = response.evaluate("shard_failures");

if (events != null) {
assertEvents(events);
Expand All @@ -145,6 +167,7 @@ private void assertResponse(ObjectPath response) throws Exception {
} else {
fail("No events or sequences found");
}
assertShardFailures(shardFailures);
}

protected ObjectPath runQuery(String index, String query) throws Exception {
Expand All @@ -163,13 +186,34 @@ protected ObjectPath runQuery(String index, String query) throws Exception {
if (maxSamplesPerKey > 0) {
builder.field("max_samples_per_key", maxSamplesPerKey);
}
boolean allowPartialResultsInBody = randomBoolean();
if (allowPartialSearchResults != null) {
if (allowPartialResultsInBody) {
builder.field("allow_partial_search_results", String.valueOf(allowPartialSearchResults));
if (allowPartialSequenceResults != null) {
builder.field("allow_partial_sequence_results", String.valueOf(allowPartialSequenceResults));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if an user provides only allow_partial_sequece_results=true or uses allow_partial_search_results=false and allow_partial_sequece_results=true at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow_partial_sequece_results alone (or with allow_partial_search_results=false) has no effect.
There are randomized tests for this in PartialSearchResultsIT , but probably I can extend them a bit, randomly setting allow_partial_search_results=false

}
}
} else if (randomBoolean()) {
builder.field("allow_partial_search_results", randomBoolean());
}
builder.endObject();

Request request = new Request("POST", "/" + index + "/_eql/search");
Boolean ccsMinimizeRoundtrips = ccsMinimizeRoundtrips();
if (ccsMinimizeRoundtrips != null) {
request.addParameter("ccs_minimize_roundtrips", ccsMinimizeRoundtrips.toString());
}
if (allowPartialSearchResults != null) {
if (allowPartialResultsInBody == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowPartialResultsInBody value seems to suggest that "partial results" settings can be used either as request parameter or request body parameter. I think we should test the case when the user provides:

  • request parameter values AND request body parameters (both of them or just one)
  • either one of them

In other words, randomize any combination of request parameters presence and values in the two places where we allow them.

request.addParameter("allow_partial_search_results", String.valueOf(allowPartialSearchResults));
if (allowPartialSequenceResults != null) {
request.addParameter("allow_partial_sequence_results", String.valueOf(allowPartialSequenceResults));
}
}
} else if (randomBoolean()) {
request.addParameter("allow_partial_search_results", String.valueOf(randomBoolean()));
}
int timeout = Math.toIntExact(timeout().millis());
RequestConfig config = RequestConfig.copy(RequestConfig.DEFAULT)
.setConnectionRequestTimeout(timeout)
Expand All @@ -182,6 +226,18 @@ protected ObjectPath runQuery(String index, String query) throws Exception {
return ObjectPath.createFromResponse(client().performRequest(request));
}

private void assertShardFailures(Object shardFailures) {
if (expectShardFailures != null) {
if (expectShardFailures) {
assertNotNull(shardFailures);
List<?> list = (List<?>) shardFailures;
assertThat(list.size(), is(greaterThan(0)));
} else {
assertNull(shardFailures);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If expectShardFailures == null I think you should add assertNull(shardFailures).

}

private void assertEvents(List<Map<String, Object>> events) {
assertNotNull(events);
logger.debug("Events {}", new Object() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
*/
public class DataLoader {
public static final String TEST_INDEX = "endgame-140";
public static final String TEST_SHARD_FAILURES_INDEX = "endgame-shard-failures";
public static final String TEST_EXTRA_INDEX = "extra";
public static final String TEST_NANOS_INDEX = "endgame-140-nanos";
public static final String TEST_SAMPLE = "sample1,sample2,sample3";
Expand Down Expand Up @@ -103,6 +104,11 @@ public static void loadDatasetIntoEs(RestClient client, CheckedBiFunction<XConte
//
load(client, TEST_MISSING_EVENTS_INDEX, null, null, p);
load(client, TEST_SAMPLE_MULTI, null, null, p);
//
// index with a runtime field ("broken", type long) that causes shard failures.
// the rest of the mapping is the same as TEST_INDEX
//
load(client, TEST_SHARD_FAILURES_INDEX, null, DataLoader::timestampToUnixMillis, p);
}

private static void load(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,23 @@ public EqlDateNanosSpecTestCase(
List<long[]> eventIds,
String[] joinKeys,
Integer size,
Integer maxSamplesPerKey
Integer maxSamplesPerKey,
Boolean allowPartialSearchResults,
Boolean allowPartialSequenceResults,
Boolean expectShardFailures
) {
this(TEST_NANOS_INDEX, query, name, eventIds, joinKeys, size, maxSamplesPerKey);
this(
TEST_NANOS_INDEX,
query,
name,
eventIds,
joinKeys,
size,
maxSamplesPerKey,
allowPartialSearchResults,
allowPartialSequenceResults,
expectShardFailures
);
}

// constructor for multi-cluster tests
Expand All @@ -40,9 +54,23 @@ public EqlDateNanosSpecTestCase(
List<long[]> eventIds,
String[] joinKeys,
Integer size,
Integer maxSamplesPerKey
Integer maxSamplesPerKey,
Boolean allowPartialSearchResults,
Boolean allowPartialSequenceResults,
Boolean expectShardFailures
) {
super(index, query, name, eventIds, joinKeys, size, maxSamplesPerKey);
super(
index,
query,
name,
eventIds,
joinKeys,
size,
maxSamplesPerKey,
allowPartialSearchResults,
allowPartialSequenceResults,
expectShardFailures
);
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,23 @@ public EqlExtraSpecTestCase(
List<long[]> eventIds,
String[] joinKeys,
Integer size,
Integer maxSamplesPerKey
Integer maxSamplesPerKey,
Boolean allowPartialSearchResults,
Boolean allowPartialSequenceResults,
Boolean expectShardFailures
) {
this(TEST_EXTRA_INDEX, query, name, eventIds, joinKeys, size, maxSamplesPerKey);
this(
TEST_EXTRA_INDEX,
query,
name,
eventIds,
joinKeys,
size,
maxSamplesPerKey,
allowPartialSearchResults,
allowPartialSequenceResults,
expectShardFailures
);
}

// constructor for multi-cluster tests
Expand All @@ -40,9 +54,23 @@ public EqlExtraSpecTestCase(
List<long[]> eventIds,
String[] joinKeys,
Integer size,
Integer maxSamplesPerKey
Integer maxSamplesPerKey,
Boolean allowPartialSearchResults,
Boolean allowPartialSequenceResults,
Boolean expectShardFailures
) {
super(index, query, name, eventIds, joinKeys, size, maxSamplesPerKey);
super(
index,
query,
name,
eventIds,
joinKeys,
size,
maxSamplesPerKey,
allowPartialSearchResults,
allowPartialSequenceResults,
expectShardFailures
);
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,23 @@ public EqlMissingEventsSpecTestCase(
List<long[]> eventIds,
String[] joinKeys,
Integer size,
Integer maxSamplesPerKey
Integer maxSamplesPerKey,
Boolean allowPartialSearchResults,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if all the existent tests in EQL that have no idea about partial results (all the tests based on BaseEqlSpecTestCase before this PR) would completely randomize these two new settings. All of them should behave completely the same because we have no "broken" shard setup for them which means that irrespective of the values provided, the tests should pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some randomization already, but I'm making it more complete.

Boolean allowPartialSequenceResults,
Boolean expectShardFailures
) {
this(TEST_MISSING_EVENTS_INDEX, query, name, eventIds, joinKeys, size, maxSamplesPerKey);
this(
TEST_MISSING_EVENTS_INDEX,
query,
name,
eventIds,
joinKeys,
size,
maxSamplesPerKey,
allowPartialSearchResults,
allowPartialSequenceResults,
expectShardFailures
);
}

// constructor for multi-cluster tests
Expand All @@ -40,9 +54,23 @@ public EqlMissingEventsSpecTestCase(
List<long[]> eventIds,
String[] joinKeys,
Integer size,
Integer maxSamplesPerKey
Integer maxSamplesPerKey,
Boolean allowPartialSearchResults,
Boolean allowPartialSequenceResults,
Boolean expectShardFailures
) {
super(index, query, name, eventIds, joinKeys, size, maxSamplesPerKey);
super(
index,
query,
name,
eventIds,
joinKeys,
size,
maxSamplesPerKey,
allowPartialSearchResults,
allowPartialSequenceResults,
expectShardFailures
);
}

@Override
Expand Down
Loading