-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add atomic recursive delete to ZK client and use for drop instance #2994
Conversation
helix-core/src/test/java/org/apache/helix/manager/zk/TestZkBaseDataAccessor.java
Outdated
Show resolved
Hide resolved
helix-core/src/main/java/org/apache/helix/manager/zk/ZKHelixAdmin.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Outdated
Show resolved
Hide resolved
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
try { | ||
multi(ops); | ||
} | ||
catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I feel like we should check the multi result. Even if there are exception thrown, multi may or may not succeeded.
- I think having specific exception class might be better comparing to do a global catch here, and log corresponding error messages . For example, if we get a
InterruptedException
, we know there is concurrent edit of the paths.
related doc:
OpResult: https://www.javadoc.io/static/org.apache.zookeeper/zookeeper/3.7.1/org/apache/zookeeper/OpResult.DeleteResult.html
Multi(): https://www.javadoc.io/doc/org.apache.zookeeper/zookeeper/3.7.1/org/apache/zookeeper/ZooKeeper.html#multi-java.lang.Iterable-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Added check for any ErrorResults in response from multi() call. The class only provides int values that correspond to specific ZK KeeperExceptions. We can convert that int value to a KeeperException.Code and provide a list of when we log + throw error.
- The goal with the global catch was to follow the same pattern as the original deleteRecursively method which global catches and throws a ZkClientException. Do you think specific error catching is necessary in this case? My understanding is that the original exception type (lets say InterruptedException) will still be printed to log and the cause will still be preserved in the
cause
of the new ZkClientException. The main draw back is that callers of this API will not be able to handle specific failures themselves, but will only be able to catch and respond to ZkClientException
Thank you for the feedback @xyuanlu 🙏
zookeeper-api/src/main/java/org/apache/helix/zookeeper/zkclient/ZkClient.java
Show resolved
Hide resolved
Generally LGTM. Please address final comments. |
Pull request approved by: @xyuanlu |
Issues
Feature warranted by discussion from PR #2932
Discussion here: #2932 (comment)
Description
This PR adds an ATOMIC recursive delete method and uses it for dropping/purging instances from the cluster. The deleteRecursivelyAtomic method will either fully fail or fully succeed. This will prevent partially deleted instances from entering the cluster into an invalid state. This behaves similarly to the current deleteRecursively method, but with the addition of atomicity. There are two signatures available, one for deleting a single path and its children, and one for deleting multiple paths and all their children. Atomicity across different paths is needed to ensure atomicity of dropping an instance.
The FederatedZkClient implementation throws an IllegalArgumentException when deleteRecursivelyAtomic is called upon 2 paths that exist in different ZK realms as atomicity cannot be guaranteed across different realms due to their being 2 separate ZKClients and therefore 2 separate calls.
Tests
testDropInstance in TestZkHelixAdmin.java
testDeleteRecursivelyAtomic in TestRawZkClient.java
Ran manual CI on my personal fork and saw no new failing tests introduced.
Changes that Break Backward Compatibility (Optional)
N/A
Commits
Code Quality
(helix-style-intellij.xml if IntelliJ IDE is used)