-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refresh cache with refreshAfterWrite doesn't update value #714
Comments
This is due to how
The cache attaches a The problems are exhausterated in application by the fact that dependent actions are processed in LIFO order. A slow action from the application will delay the cache's to be delayed, which could make this happen more frequently. Ideally if we returned the cache's dependent future some of these problems would go away, but at the cost of whomever produced the future not receiving any downstream signals. The write conflict is a less pronounced race that I'd need some more time to revisit, but I don't think that would be easy to weaken without losing linearizability which is a more important to keep. |
…714) Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache uses a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead we can detect this case by seeing if the timestamp delta exceeds our maximum value (120yrs). IF so, then we know this race occurred and can trust the update. Of course intermediate writes might still have occurred, so we have to check that our refresh future was sucessfully unregistered, which would have been discarded in had any other write occurred.
I might have found a refinement for the write timestamp conflict which will resolve the problem. We'll see if the test suite passes, as this is a complex area. If so, then the changes allow your test case to pass by being a little less aggressive in this race condition. However I am unsure if there is any way to write a reliable unit test, so it might remain best-effort. Since a refresh is an optimistic read, you can still observe that different behavior if you replace the entry during its reload. In that case while the cache will discard the refresh, the future will return the new value. Here I am trying to be less aggressive by smarter detection of when we must discard it, but that will always be possible given the linearization requirements. Hopefully I can make this a bit better for you, though. |
…714) Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache uses a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead we can detect this case by seeing if the timestamp delta exceeds our maximum value (120yrs). IF so, then we know this race occurred and can trust the update. Of course intermediate writes might still have occurred, so we have to check that our refresh future was sucessfully unregistered, which would have been discarded in had any other write occurred.
Hi @ben-manes. Thanks very much for the detailed description of the cause of this issue. I will try to revisit this issue when/if your latest change is applied. As I understand, due to the nature of CompletableFuture there is currently no way around this. I understand that the problem is two folded:
I which to have a cache with similar behavior to refreshAfterWrite, but I want to be able to manually discard/refresh the old value from the cache and be guaranteed that next time the key is requested, a new value is computed. Is there an alternative to my approach? |
Yes, fundamentally we cannot work around this. I might be able to make it handle some edge cases better, but there will always be races that violate the initial assumption. You can guarantee by invalidating or explicitly replacing the value. The refresh tries to optimistically load it while serving the latest copy, and then try to swap it if no changes occurred that would invalidate data consistency. This is good for hiding latency instead of an explicit removal and load (e.g. expiration), but it may have to discard the optimistic load. As reads are lock-free, you could use |
…714) [no ci] Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache uses a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead we can detect this case by seeing if the timestamp delta exceeds our maximum value (120yrs). IF so, then we know this race occurred and can trust the update. Of course intermediate writes might still have occurred, so we have to check that our refresh future was sucessfully unregistered, which would have been discarded in had any other write occurred.
My idea to detect and handle one of the race conditions is not strong enough to fix this case. Other races still exist and when I do get your test to pass, it is touchy enough to not be relied upon (e.g. must be in a debugger to ensure certain event ordering). Since the optimization only helps in a rare case and not fully, I don't think it is worth adding if it risks that I accidentally violate linearizability. The idea was to detect if the original read was the infinite timestamp and that the current is the same future. If the latest timestamp was the same then no writes or if greater than the user allowed maximum then the metadata write back then it is okay, else an explicit ABA-style of write would be discarded. I am not positive that this would pass linearizability as I think an ABA could still happen in the delta case. That can't be detected because we discard the refresh future on a write, so a conditional removal for the future during the reload's callback would always fail as the metadata write discarded it. Anyway, I won't be making any improvements here. You have to be explicit by removing or replacing the entry, which will ensure the new value is visible. This optimistic reload can't offer that strictness as it is best-effort so it has to discard in those racy cases. |
Alright @ben-manes, thanks a lot for your effort. At least I got a better understanding of the issue. I will go with your other suggestions instead 👍 |
Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache used a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead, if the refresh is successfully unregistered within the entry's compute to swap the value, then it can be assumed to be valid. Any other write will discard the refresh under the entry's compute, so we retain linearizability. This requires that the load callback to update the metadata does not discard the refresh, since that is not needed for a psuedo write. The write timestamp trick for ABA detection is a layover from the previous implementation (2.x) which did not promise linearization. Note that a refresh is optimistic and races with other writes, so it may be discarded. As it relies on callbacks to write back into the cache, one cannot expect the entry to be updated after the reload completes (e.g. see #714).
Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache used a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead, if the refresh is successfully unregistered within the entry's compute to swap the value, then it can be assumed to be valid. Any other write will discard the refresh under the entry's compute, so we retain linearizability. This requires that the load callback to update the metadata does not discard the refresh, since that is not needed for a psuedo write. The write timestamp trick for ABA detection is a layover from the previous implementation (2.x) which did not promise linearization. Note that a refresh is optimistic and races with other writes, so it may be discarded. As it relies on callbacks to write back into the cache, one cannot expect the entry to be updated after the reload completes (e.g. see #714).
Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache used a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead, if the refresh is successfully unregistered within the entry's compute to swap the value, then it can be assumed to be valid. Any other write will discard the refresh under the entry's compute, so we retain linearizability. This requires that the load callback to update the metadata does not discard the refresh, since that is not needed for a psuedo write. The write timestamp trick for ABA detection is a layover from the previous implementation (2.x) which did not promise linearization. Note that a refresh is optimistic and races with other writes, so it may be discarded. As it relies on callbacks to write back into the cache, one cannot expect the entry to be updated after the reload completes (e.g. see #714).
Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache used a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead, if the refresh is successfully unregistered within the entry's compute to swap the value, then it can be assumed to be valid. Any other write will discard the refresh under the entry's compute, so we retain linearizability. This requires that the load callback to update the metadata does not discard the refresh, since that is not needed for a psuedo write. The write timestamp trick for ABA detection is a layover from the previous implementation (2.x) which did not promise linearization. Note that a refresh is optimistic and races with other writes, so it may be discarded. As it relies on callbacks to write back into the cache, one cannot expect the entry to be updated after the reload completes (e.g. see #714).
Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache used a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead, if the refresh is successfully unregistered within the entry's compute to swap the value, then it can be assumed to be valid. Any other write will discard the refresh under the entry's compute, so we retain linearizability. This requires that the load callback to update the metadata does not discard the refresh, since that is not needed for a psuedo write. The write timestamp trick for ABA detection is a layover from the previous implementation (2.x) which did not promise linearization. Note that a refresh is optimistic and races with other writes, so it may be discarded. As it relies on callbacks to write back into the cache, one cannot expect the entry to be updated after the reload completes (e.g. see #714).
Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache used a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead, if the refresh is successfully unregistered within the entry's compute to swap the value, then it can be assumed to be valid. Any other write will discard the refresh under the entry's compute, so we retain linearizability. This requires that the load callback to update the metadata does not discard the refresh, since that is not needed for a psuedo write. The write timestamp trick for ABA detection is a layover from the previous implementation (2.x) which did not promise linearization. Note that a refresh is optimistic and races with other writes, so it may be discarded. As it relies on callbacks to write back into the cache, one cannot expect the entry to be updated after the reload completes (e.g. see #714).
Linearization means that the refresh should be dropped if another write for that entry occurs, as we may populate the cache with stale data. The cache used a write timestamp to help detect this. However, when this could be too aggressive when a refresh runs concurrently with the async load completing. An inflight load is given an infinite timestamp so that it never expires. A whenComplete callback then updates the entry with its policy metadata, such as its expiration time. That changes the write timestamp and if a refresh runs concurrently with the callback then the refresh may see the infinite timestamp, the replace updates it, and the refresh drops the entry as a conflict. Instead, if the refresh is successfully unregistered within the entry's compute to swap the value, then it can be assumed to be valid. Any other write will discard the refresh under the entry's compute, so we retain linearizability. This requires that the load callback to update the metadata does not discard the refresh, since that is not needed for a psuedo write. The write timestamp trick for ABA detection is a layover from the previous implementation (2.x) which did not promise linearization. Note that a refresh is optimistic and races with other writes, so it may be discarded. As it relies on callbacks to write back into the cache, one cannot expect the entry to be updated after the reload completes (e.g. see #714).
Thanks for the prompt to revisit this. I was able to improve the implementation to handle conflicts better and simplify away from the timestamp check. That doesn’t change our discussion, but did help make things better. 😁 |
Hi.
We are experiencing some issues when trying to manually refresh a cache with refreshAfterWrite.
I have created a minimal project to demonstrate the problem:
Output
By calling refresh on the cache and waiting for the future to be complete, a new value has been returned. However the subsequently .get call (both secondGet and thirdGet) to the cache still returns the old value. I would expect that the cache would provide the new value.
It seems like the issue only occur when the computations in the buildAsync callback are slow (hence the Thread.sleep).
I'm using caffeine version 3.1.0.
Is this is a bug or is it something I have misunderstood?
The text was updated successfully, but these errors were encountered: