-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reindex search resiliency prototype #43187
Reindex search resiliency prototype #43187
Conversation
This commit is just the (first) integTest for validating that reindex can survive data node restarts.
This cements the SPI to be the individual operations and the capability to convert the results and failures to common format.
|
||
ensureGreen("test"); | ||
String reindexNode = internalCluster().startCoordinatingOnlyNode(Settings.EMPTY); | ||
NodeClient reindexNodeClient = internalCluster().getInstance(NodeClient.class, reindexNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internalCluster().client(reindexNode)
should be simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this is done like this is to get the NodeClient since it has executeLocally, which returns a Task object.
|
||
private void assertSameDocs(int numberOfDocuments, String... indices) { | ||
refresh(indices); | ||
SearchSourceBuilder sourceBuilder = SearchSourceBuilder.searchSource().size(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps just do a count or use a cardinality aggregation on the "data" field? No need for scripting I think.
} else if (retryFromRequest.source().query() == null) { | ||
retryFromRequest.source().query(rangeQueryBuilder); | ||
} else { | ||
throw new UnsupportedOperationException("not yet implemented"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps wrap original query using BoolQueryBuilder where the rangeQueryBuilder is added as filter and the original query as must.
and add seq_no range as filter.
…rch_resiliency_poc
Refactor ScrollableHitSource to pump data out and have a simplified interface (callers should no longer call startNextScroll, instead they simply mark that they are done with the previous result, triggering a new batch of data). This eases making reindex resilient, since we will sometimes need to rerun search during retries. Relates elastic#43187 and elastic#42612
Refactor ScrollableHitSource to pump data out and have a simplified interface (callers should no longer call startNextScroll, instead they simply mark that they are done with the previous result, triggering a new batch of data). This eases making reindex resilient, since we will sometimes need to rerun search during retries. Relates #43187 and #42612
Refactor ScrollableHitSource to pump data out and have a simplified interface (callers should no longer call startNextScroll, instead they simply mark that they are done with the previous result, triggering a new batch of data). This eases making reindex resilient, since we will sometimes need to rerun search during retries. Relates #43187 and #42612
Local reindex can now survive loosing data nodes that contain source data. The original query will be restarted with a filter for `_seq_no >= last_seq_no` when a failure is detected. Part of elastic#42612 and split out from elastic#43187
Local reindex can now survive loosing data nodes that contain source data. The original query will be restarted with a filter for `_seq_no >= last_seq_no` when a failure is detected. The original/first search request is not retried/restarted since this is not what we used to do and it leads to long wait times to get the info back that a search request is bad. Part of #42612 and split out from #43187
This is all covered in split out PRs, mainly #45497 so closing this. |
This PR contains a prototype for ensuring that reindex searching can survive data node restarts and/or related network issues.
Primarily created this PR to collaborate internally.
Most tests only modified to compile.
ReindexResilientSearchIT
checks that resiliency works.A large part of the change is changing the retry handling to now be handled by
ScrollableHitSource
instead of the two subclasses. This ensures it is handled the same way and made it easier to restart the query from scratch. The subclasses main responsibility is not to execute one search or scroll request and convert the result to common format.Notable questions:
SearchRequest
_seq_no >= retryFromValue
condition if there is an existing condition.ThreadContext
handling may not be complete, though I would expect transport and threadPool to handle this automatically?PRs to split out:
ScrollableHitSource