-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delete-by-query plugin #11516
Add delete-by-query plugin #11516
Conversation
} | ||
scanRequest.source(source); | ||
|
||
logger.debug("executing scan request"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this probably need to be trace, action package has DEBUG enabled by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applies to other logging statements here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I left some minor comments around logging and usage of thread pool (not needed I think). I could't follow why we need a semaphore and such, I think I a missing something. My thought was that we do search -> bulk -> search -> .... until there are no more results, so always async callback execution type chain until we are done. |
@kimchy thanks for your review! Your comments make sense, no need to use semaphore stuff... I rebased and updated the code, it is way simpler now. I'll add some rest tests too. |
final String nextScrollId = scrollResponse.getScrollId(); | ||
addShardFailures(scrollResponse.getShardFailures()); | ||
|
||
if (logger.isDebugEnabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raaaah yes
@s1monw thanks for your review. I updated the code following your comment and added a REST test. Can you please have another look if possible? Thanks :) Documentation will be added in another PR. |
out.writeBoolean(false); | ||
} else { | ||
out.writeBoolean(true); | ||
out.writeVLong(timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you write a VLong make sure it's not negative!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, timeout
has been changed to TimeValue
} | ||
|
||
private boolean isTimedOut() { | ||
return request.timeout() != null && (System.currentTimeMillis() >= (startTime + request.timeout().millis())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also use the Threadpool estimations here mabye?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
looks pretty good though. I left a bunch of comments |
@s1monw thanks a lot for your review, very valuable. I updated the code following your comments, please let me know if there are still things to improve. I'd love to have your help on writing documentation for this plugin, since I'm not sure to be able to explain all fallacies of the previous implementation. |
lets get this in as is and open an issue for the documentation I will take a look at comment on it what aspects I would take into account? |
oh yeah so here is my LGTM ;) |
The delete by query plugin adds support for deleting all of the documents (from one or more indices) which match the specified query. It is a replacement for the problematic delete-by-query functionality which has been removed from Elasticsearch core in 2.0. Internally, it uses the Scan/Scroll and Bulk APIs to delete documents in an efficient and safe manner. It is slower than the old delete-by-query functionality, but fixes the problems with the previous implementation. Closes elastic#7052
This page is placed in a /plugins directory until we figure where to place all plugins documentation.
This pull request adds a new plugin called "delete-by-query" which implements the now deprecated delete-by-query feature using scan/scroll/bulk requests.
Notes:
size
parameter controls the scroll shard_size and the number of actions in bulk requests (defaults to 1000)timeout
parameter can be used to stop scrolling documents after a given timeSince the process involves the execution of a scan request (which can fail), then successive async scroll requests (which can also fail) we may imagine a better failure reporting. When a scroll request succeed, the scrolled documents are added to a Bulk request executed in an async manner. If the bulk fails, all documents are reported as
failed
documents in the counter.Rest API documentation and test will be added later.