-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add specific cluster error info, shard info and additional metadata for CCS when minimizing roundtrips #97731
Conversation
2853c25
to
4dd24f7
Compare
[UPDATE] This comment is now out of date, as the code has changed based on feedback to this question. See this comment for the latest output format details: Current state for review. All of the discussion below pertains only to searches with Clusters for a CCS search can be in one of 5 states:
The testing/demo set up I used has three clusters, each with a (toggle) Query used for testingThe busywork function score is used to make queries run for 30 to 90 seconds depending on the value of
(toggle) Intermediate state where all clusters are still running the search
(toggle) Response showing all clusters having finished successfully
(toggle) Intermediate state shows updated state of clusters that have finished (successfully or not)
(toggle) Final state shows updated state of clusters and additional info about shards and duration
(toggle) Final state where a skip_unavailable:false cluster fully failed, causing the whole search to fail
(toggle) The `GET _async_search/status/:id` output also shows the new _cluster info:
Things to note:
Questions:
|
9eaa442
to
8c6f1c5
Compare
Sounds like a good idea to me, although skipped and failed are never going to be greater than 0 at the same time? Should we hide either of the two depending on skip_unavailable? Or maybe that's too magic?
sounds good to me.
I would call it
I would not include it, it can easily be computed.
I guess it won't matter much if Kibana ends up always providing the flag? What is the plan there in terms of consuming the additional output?
I am not sure we would be able to trim it down considerably. Maybe we could remove the shards information from it. I am not entirely sure, and depends again on how we expect this info to be consumed.
I suspect this applies to async search as it exposes partial state but less to _search. I would not consider this necessary for now. |
Yes, you can have both
|
3708f83
to
67628db
Compare
Hi @quux00, I've created a changelog YAML for you. |
Pinging @elastic/es-search (Team:Search) |
32a8d5a
to
ddf3b55
Compare
Hi @quux00, I've created a changelog YAML for you. |
7ae6ac2
to
45a0f6d
Compare
private final AtomicInteger skippedShards; | ||
private final AtomicInteger failedShards; | ||
private final AtomicLong took; // search latency in millis for this cluster sub-search | ||
private final AtomicBoolean timedOut; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thread safety model of this class does not use synchronization because it holds to a single-writer principle elsewhere in the code.
The Cluster object is updated by one ActionListener callback thread. No other thread should ever update a given Cluster object. (Each is updated by a separate single ActionListener).
However, the Cluster objects do need to be read by queries (e.g., GET _async_search/:id
), so all variables are set as volatile or Atomic (or final for clusterAlias
and indexExpression
) to allow getting latest state for status queries.
Note: the status access (e.g., GET _async_search/:id
) is "best effort" since there is no synchronization. It is possible that while the status is being read for a status response, the Cluster is being actively updated by a callback thread. This is considered acceptable since that would only happen for "intermediate state". The final state when the search is completed will always be fully accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think technically took
and timedOut
could just be volatile primitives, since they are always just set and never incremented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it an option to create a new Cluster object at every change and keep the class immutable? I think that would be easier to reason about compared to all the mutable state it has now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MRT=true, yes, we could move to immutable Cluster objects that get overwritten (in the Clusters shared, thread-safe clusterInfo
Map) because each cluster has only a single callback thread, so you have a single writer per Cluster.
But for MRT=false, the Cluster updating/writer thread is at the granularity of a shard. So during the CanMatch (SearchShards) and during the Query/Fetch Phase you have N threads (N=number of shards per cluster) all potentially updating the same Cluster object with specific shard info. So either we need a shared thread-safe Cluster
object (what I've tried to do in this PR)
OR
if we want to use immutable Cluster
objects, we need a shared per-Cluster lock so that each per-shard callback "writer" thread can safely overwrite the Cluster object with a new one with an "incremented" successful/skipped/failed counters and failure list without a data race that loses information.
I think both are feasible. For the latter, I think we would need the shared Clusters object (which no longer can be immutable for CCS) to construct a set of lock objects, one per Cluster, in its "CCS" constructor. Then the shard callback threads (MRT=false) would need to obtain that lock in order to create a new Cluster object with the updated information and then overwrite the existing one and then release the lock.
If we build such a locking system, we might want to then wire the MRT=true paths to use it as well, though there with a single writer per cluster it isn't necessary.
Let me know your view (or if there's another option I'm not thinking of) on how to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the latest push I changed the SearchResponse.Cluster
to be fully immutable. The SearchResponse.Clusters
object holds an unmodifiable Map<String, AtomicReference<Cluster>>
, where the key is the clusterAlias.
For MRT=true (this PR), each CCSActionListener (one per cluster) gets a reference to the AtomicReference holding the Cluster it is searching. It will then do a CAS operation to update state with a new Cluster object. This isn't strictly necessary for the MRT=true paths, since there is a single writer to the Map per Cluster.
But it sets up the concurrency model for MRT=false, where the updates will be done per shard, not per Cluster. For that work (future ticket), each updater thread will update a Cluster object by retrieving the existing Cluster object, incrementing the appropriate counters and other state (e.g., status or timedOut or failure lists) and then doing a CAS back into the AtomicReference. If that fails, it will grab the new Cluster being held by the AtomicReference and re-attempt to increment/add data to a new Cluster and again try to CAS it into the AtomicReference.
* @param clusterAlias cluster on which the failure occurred | ||
* @param skipUnavailable the skip_unavailable setting of the cluster with the search error | ||
*/ | ||
private static void logCCSError(ShardSearchFailure f, String clusterAlias, boolean skipUnavailable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "feature" was requested by Product at a recent meeting - to start logging any CCS errors that occur so that we can begin to discover what types of CCS issues happen in large production environments, especially as we don't yet have any CCS telemetry in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to extract this to a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I can do that.
@@ -189,7 +189,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws | |||
} | |||
RestActions.buildBroadcastShardsHeader(builder, params, totalShards, successfulShards, skippedShards, failedShards, null); | |||
if (clusters != null) { | |||
builder = clusters.toXContent(builder, null); | |||
builder = clusters.toXContent(builder, params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug fix that was uncovered by the newly added code in this PR.
@@ -523,11 +523,14 @@ public void testCCSRemoteReduceMergeFails() throws Exception { | |||
ActionListener.wrap(r -> fail("no response expected"), failure::set), | |||
latch | |||
); | |||
SearchResponse.Clusters initClusters = new SearchResponse.Clusters(localIndices, remoteIndicesByCluster, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sets up the "initial Clusters" with the list of local indices and remote clusters, so that a separate underlying AtomicReference<SearchResponse.Cluster>
object can be created for each cluster (local or remote) which will be handed to each CCSActionListener created (which are cluster specific).
server/src/main/java/org/elasticsearch/action/search/SearchResponse.java
Outdated
Show resolved
Hide resolved
Removed deprecated Clusters constructor. Created common determineCountFromClusterInfo method for dynamic counts in Clusters from underlying Cluster objects. Took on Cluster objects is now set from the underlying single cluster SearchResponse.
… changing to AtomicReference - not working/incomplete
The SearchResponse.Clusters object now uses an immutable HashMap<String, AtomicReference<Cluster>> rather than ConcurrentHashMap<String, Cluster>, since the concurrency control is now a CAS operation on the underlying AtomicReference<Cluster>> not an overwrite into the HashMap. Changed took in SearchResponse.Cluster to be a TimeValue rather than long to match the usage in SearchResponse.
8b44587
to
364668d
Compare
Relates elastic#97731
Updates the spec for the response objects for `POST _async_search`, `GET _async_search/:id`, `GET _async_search/status/:id` and `GET _search` Two high level changes have happened in 8.10: 1) `completion_time` and `completion_time_in_millis` (async_search only) elastic/elasticsearch#97700 2) Adding `details` metadata to the `_clusters` section of both async and synchronous search responses elastic/elasticsearch#97731
Updates the spec for the response objects for `POST _async_search`, `GET _async_search/:id`, `GET _async_search/status/:id` and `GET _search` Two high level changes have happened in 8.10: 1) `completion_time` and `completion_time_in_millis` (async_search only) elastic/elasticsearch#97700 2) Adding `details` metadata to the `_clusters` section of both async and synchronous search responses elastic/elasticsearch#97731
With the recent addition of per-cluster metadata to the `_clusters` section of the response for cross-cluster searches (see elastic#97731), the `is_partial` setting in the async-search response, now acts as a useful summary to end-users that search/aggs data from all shards is potentially incomplete (not all shards fully searched), which could be for one of 3 reasons: 1. at least one shard was not successfully searched (a PARTIAL search cluster state) 2. at least one cluster (marked as `skip_unavailable`=`true`) was unavailable (or all searches on all shards of that cluster failed), causing the cluster to be marked as SKIPPED 3. a search on at least one cluster timed out (`timed_out`=`true`, resulting in a PARTIAL cluster search status) This commit changes local-only (non-CCS) searches to behave consistently with cross-cluster searches, namely, if any search on any shard fails or if the search times out, the is_partial flag is set to true. Closes elastic#98725
Updates the spec for the response objects for `POST _async_search`, `GET _async_search/:id`, `GET _async_search/status/:id` and `GET _search` Two high level changes have happened in 8.10: 1) `completion_time` and `completion_time_in_millis` (async_search only) elastic/elasticsearch#97700 2) Adding `details` metadata to the `_clusters` section of both async and synchronous search responses elastic/elasticsearch#97731
Updates the spec for the response objects for `POST _async_search`, `GET _async_search/:id`, `GET _async_search/status/:id` and `GET _search` Two high level changes have happened in 8.10: 1) `completion_time` and `completion_time_in_millis` (async_search only) elastic/elasticsearch#97700 2) Adding `details` metadata to the `_clusters` section of both async and synchronous search responses elastic/elasticsearch#97731
Updates the spec for the response objects for `POST _async_search`, `GET _async_search/:id`, `GET _async_search/status/:id` and `GET _search` Two high level changes have happened in 8.10: 1) `completion_time` and `completion_time_in_millis` (async_search only) elastic/elasticsearch#97700 2) Adding `details` metadata to the `_clusters` section of both async and synchronous search responses elastic/elasticsearch#97731 Co-authored-by: Michael Peterson <[email protected]>
For CCS searches with ccs_minimize_roundtrips=true, when an error is returned, it is unclear which cluster
caused the problem. This commit adds additional accounting and error information to the search response
for each cluster involved in a cross-cluster search.
The
_clusters
section of the SearchResponse has a newdetails
section added with an entry for each cluster(remote and local). It includes status info, shard accounting counters and error information that are added
incrementally as the search happens.
The search on each cluster can be in one of 5 states:
RUNNING
SUCCESSFUL
- all shards were successfully searched (successful or skipped)PARTIAL
- some shard searches failed, but at least one succeeded and partial data has been returnedSKIPPED
- no shards were successfully searched (all failed or cluster unavailable) when skip_unavailable=trueFAILED
- no shards were successfully searched (all failed or cluster unavailable) when skip_unavailable=falseA new
SearchReponse.Cluster
object has been added. EachTransportSearchAction.CCSActionListener
(one for each cluster) has a reference to a separate Cluster instance and updates once it gets back information
from its cluster.
The
SearchResponse.Clusters
object only uses the newCluster
object for CCS minimize_roundtrips=true.For local-only searches and CCS minimize_roundtrips=false, it uses the current (immutable) Clusters object as before.