-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-4655: Communicate the Zxid that triggered a WatchEvent to fire #1950
Conversation
Sorry I can't open a ticket for this. I emailed the mailing list already |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for this useful feature!
I created this JIRA, can you please amend you commit and update the commit message?
https://issues.apache.org/jira/browse/ZOOKEEPER-4655
Over LGTM
I have one question about backward compatibility.
What happens if a "new client" connects to a "old version server" ?
the observed ID will be -1
correct ?
We should state it clearly in the javadocs.
Follow up question/work:
if possible would you like to contribute support for this feature in apache/curator ?
Hey @eolivelli, thanks for taking a look at this! Happy to hear this looks good. I'm just lucky there was room in the existing API for me to squeeze this. Let me update the commit message and poke around the docs and see where I can make it more obvious that this is available. Regarding the new client/old server issue: yeah old servers will always send -1, so it's a pretty clear way to know whether you're talking to an old server. I'll make sure to state that in the docs. Let me take a look at curator and see what can be done. It may be feasible to expose this in a sane way |
468e7a0
to
de81b85
Compare
@eolivelli Done |
Hey @eolivelli can you take another look at this? I'd really appreciate it! |
7d938b5
to
be904a4
Compare
Finally was able to get all the checks to pass. Sorry about that! The branch is looking good now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for a nice feature. Looks good generally except WatchedEvent.equals
.
Normally, I think local cache for zookeeper data tree can only guarantee eventual consistent. Does this feature affect any correctness in building a local cache ?
For example, imagine that between steps 1 and 2, a node /a was deleted then re-created. By the time step 3 is executed, there will be a NodeDeleted event queued up followed by a NodeCreated, causing at best a double read (one from the bootstrap, one from the NodeCreated) or at worst some data inconsistencies in the local cache.
I saw that it cloud filter dated event by comparing with stat.mzxid
. But how inconsistencies could be introduced ? Or it will be tricky to build a eventual consistent local cache for zookeeper data tree without this feature ?
zookeeper-server/src/main/java/org/apache/zookeeper/WatchedEvent.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/test/java/org/apache/zookeeper/server/watch/WatchManagerTest.java
Outdated
Show resolved
Hide resolved
zookeeper-server/src/test/java/org/apache/zookeeper/server/watch/WatchManagerTest.java
Show resolved
Hide resolved
Hey @kezhuw thanks so much for taking a look at this! Regarding your comment, yes without this feature building an eventually consistent cache can be tricky because it requires handling a lot of edge cases when a read does not match what's expected based on the watch events received. Depending on what the local cache does when it receives events, it could end up in a weird state. I banged my head against the wall enough trying to implement a nice consistent local cache that I opened this PR :) |
72b9a95
to
9aaead8
Compare
I think there's a flaky test with |
3b81a47
to
13bd711
Compare
Because the metrics were updated _after_ the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect.
559312e
to
50a032a
Compare
Just for info, ZOOKEEPER-1289: Multi Op Watch Events is probably related. |
Hey @kezhuw, what's the usecase for having a single multi modify the same node more than once? Is there a specific behavior of ZK that is only available via this sort of multi? Based on my understanding of common uses for ZK, it feels like this is a rare usage pattern, so this change is still very useful for anyone that does not do this |
Also, based on my experience in building a client-side cache, having the zxid in the header would resolve a lot of internal inconsistencies, so ZOOKEEPER-1289 may be at least partially solved :) |
Hi @PapaCharlie, I have opened ZOOKEEPER-4695. I agree your point, I think it is a matter of breaking change.
Seems that you are building something new in another language, I think ZooKeeper Watches could be help. |
Yeah I am building something in go (go-zookeeper/zk#89) but it still would help significantly to have the zxid in the watch. I think, fundamentally, because this fits in the existing API, I don't think that the fact that the multis can have this behavior is a strong enough reason not to have this. This can greatly simplify client-side behavior in many applications and avoid double lookups in many circumstances. |
@kezhuw : agreed that regular ZK watch can achieve the ordering the modification events; however, for maintaining an in-memory cache for a Znode tree, a single recursive persistent watch is easier to use than regular watches, which has to be set on each Znode in the tree to trigger events on Znode content changes. Regarding the multi-op API, will the order of watch event be different from the order of write operations in the same multi-op tx? If the orders are the same (e.g. write op1, op2 ==> watch event op1 , watch event op2), then I'd say even if they have the same txid, we can still just order the events in the main memory on the watch event consumer side, and deal with the watch events one by one to maintain the cache consistency. FWIW, we only need the cache to be eventually consistent, yet the simplification gain we can get from leveraging persistent watch will be a lot. |
…nt watches I found this in reply to apache#1950 (comment). But it turns out a known issue apache#1106 (comment). > An important question to all committers. In DataTree.setWatches > persistent watchers are not applied. This means that after a > network partition, no persistent watchers will trigger. I > don't have a feeling about this one way or another - the current > implementation works fine for Curator's use cases.
@gu0keno0 They are same from my knowledge.
I think this is crucial. The "deal" should tolerate to disconnection. Says, "/foo" and "/bar" are watched in a sub tree. Two changes(
Though, persistent watch does not work well with |
My understanding of @PapaCharlie 's use case is that his cache would also reduild upon reconnection. And AFAICT, this feature should be able to simplify the rebuilding process, during which we re-read all Znodes from ZK and new persistent watch events may fire at the same time. Then as @PapaCharlie suggested in his initial statement, in this situation this new feature should be able to simplify the cache maintainenace logic, because if the persistent watch's txid is lower than / equal to the txid of the cached data, we know that the newer data is already read. |
@PapaCharlie I think you can send an email to dev mailing list to ask for opinions and reviews. pr has less exposure chance if it is not get attentions from its creation. I did once and saw others. But before doing that, I recommend you to filter out #1989 related changes to make this pr more concentrated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this patch.
Unfortunately I had to add one blocker comment.
There is a public class in which we are changing the constructor.
This may have side effects on applications especially testing frameworks who mock ZK events.
Can you please address that comment ? then I will be +1
*/ | ||
public WatchedEvent(EventType eventType, KeeperState keeperState, String path) { | ||
public WatchedEvent(EventType eventType, KeeperState keeperState, String path, long zxid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is public API, we should keep the original constructors, maybe you can mark them "@deprecated"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is preserved below, it's just not marked as @deprecated because there are many uses of WatchedEvent that correctly do not have a zxid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
Nice refactoring in the tests.
@PapaCharlie Please assign the jira ticket to yourself and set the status to patch available: |
Hey @anmolnar and @eolivelli thanks for taking a look! Really appreciate the review. I didn't have an account so I asked @abhilash1in to update the ticket. I have an account now so I'll assign it to myself |
PR apache#1950(ZOOKEEPER-4655) was created before apache#1859(ZOOKEEPER-4466) merged. It changes `assertEvent`'s signature which is depended by apache#1859.
…2012) PR #1950(ZOOKEEPER-4655) was created before #1859(ZOOKEEPER-4466) merged. It changes `assertEvent`'s signature which is depended by #1859.
PR #1950(ZOOKEEPER-4655) was created before #1859(ZOOKEEPER-4466) merged. It changes `assertEvent`'s signature which is depended by #1859.
…ire (apache#1950) * Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics Because the metrics were updated _after_ the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect. * Leverage an existing method and refactor the rest of the code to match Since there was an existing waitFor method in ZKTestCase, along with an existing implementation of a waitForMetric LearnerMetricsTest, this commit moves waitForMetric to ZKTestCase and refactors the metric-related usages of waitFor. * Communicate the Zxid that triggered a WatchEvent to fire With the recent addition of persistent watches, many doors have opened up to significantly more performant and intuitive local caches of remote state, but the actual implementation can be difficult because to cache data locally, one needs to execute the following steps: 1. Set the watch 2. Bootstrap the watched subtree 3. Catch up on the events that fired during the bootstrap The issue is it's now very difficult to deduplicate and sanely resolve the remote state during step 3 because it's unknown whether an event arrived during the bootstrap or after. For example, imagine that between steps 1 and 2, a node /a was deleted then re-created. By the time step 3 is executed, there will be a NodeDeleted event queued up followed by a NodeCreated, causing at best a double read (one from the bootstrap, one from the NodeCreated) or at worst some data inconsistencies in the local cache. This change sets the Zxid in the response header whenever the watch event type is NodeCreated, NodeDeleted, NodeDataChanged or NodeChildrenChanged.
…ire (apache#1950) (#55) * Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics Because the metrics were updated _after_ the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect. * Leverage an existing method and refactor the rest of the code to match Since there was an existing waitFor method in ZKTestCase, along with an existing implementation of a waitForMetric LearnerMetricsTest, this commit moves waitForMetric to ZKTestCase and refactors the metric-related usages of waitFor. * Communicate the Zxid that triggered a WatchEvent to fire With the recent addition of persistent watches, many doors have opened up to significantly more performant and intuitive local caches of remote state, but the actual implementation can be difficult because to cache data locally, one needs to execute the following steps: 1. Set the watch 2. Bootstrap the watched subtree 3. Catch up on the events that fired during the bootstrap The issue is it's now very difficult to deduplicate and sanely resolve the remote state during step 3 because it's unknown whether an event arrived during the bootstrap or after. For example, imagine that between steps 1 and 2, a node /a was deleted then re-created. By the time step 3 is executed, there will be a NodeDeleted event queued up followed by a NodeCreated, causing at best a double read (one from the bootstrap, one from the NodeCreated) or at worst some data inconsistencies in the local cache. This change sets the Zxid in the response header whenever the watch event type is NodeCreated, NodeDeleted, NodeDataChanged or NodeChildrenChanged. Co-authored-by: Paul Chesnais <[email protected]>
PR apache#1950(ZOOKEEPER-4655) was created before apache#1859(ZOOKEEPER-4466) merged. It changes `assertEvent`'s signature which is depended by apache#1859.
PR apache#1950(ZOOKEEPER-4655) was created before apache#1859(ZOOKEEPER-4466) merged. It changes `assertEvent`'s signature which is depended by apache#1859. Co-authored-by: Kezhu Wang <[email protected]>
…ire (apache#1950) * Fix a race condition in WatcherCleanerTest.testDeadWatcherMetrics Because the metrics were updated _after_ the listener is invoked, the listener does not always see the fresh metric value. This fixes it so that the test waits for the value to become what we expect. * Leverage an existing method and refactor the rest of the code to match Since there was an existing waitFor method in ZKTestCase, along with an existing implementation of a waitForMetric LearnerMetricsTest, this commit moves waitForMetric to ZKTestCase and refactors the metric-related usages of waitFor. * Communicate the Zxid that triggered a WatchEvent to fire With the recent addition of persistent watches, many doors have opened up to significantly more performant and intuitive local caches of remote state, but the actual implementation can be difficult because to cache data locally, one needs to execute the following steps: 1. Set the watch 2. Bootstrap the watched subtree 3. Catch up on the events that fired during the bootstrap The issue is it's now very difficult to deduplicate and sanely resolve the remote state during step 3 because it's unknown whether an event arrived during the bootstrap or after. For example, imagine that between steps 1 and 2, a node /a was deleted then re-created. By the time step 3 is executed, there will be a NodeDeleted event queued up followed by a NodeCreated, causing at best a double read (one from the bootstrap, one from the NodeCreated) or at worst some data inconsistencies in the local cache. This change sets the Zxid in the response header whenever the watch event type is NodeCreated, NodeDeleted, NodeDataChanged or NodeChildrenChanged.
PR apache#1950(ZOOKEEPER-4655) was created before apache#1859(ZOOKEEPER-4466) merged. It changes `assertEvent`'s signature which is depended by apache#1859.
With the recent addition of persistent watches, many doors have opened up to significantly more performant and intuitive local caches of remote state, but the actual implementation can be difficult because to cache data locally, one needs to execute the following steps:
The issue is it's now very difficult to deduplicate and sanely resolve the remote state during step 3 because it's unknown whether an event arrived during the bootstrap or after. For example, imagine that between steps 1 and 2, a node /a was deleted then re-created. By the time step 3 is executed, there will be a NodeDeleted event queued up followed by a NodeCreated, causing at best a double read (one from the bootstrap, one from the NodeCreated) or at worst some data inconsistencies in the local cache.
This change sets the Zxid in the response header whenever the watch event type is NodeCreated, NodeDeleted, NodeDataChanged or NodeChildrenChanged.