-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected expireAfterWrite behavior when computation of entry duration > expireAfterWrite value #191
Comments
My initial guess is that it is due to the compute time exceeding the expiration time. Currently the clock is read once to minimize expensive calls to We could perform a re-read of the current time on the write to avoid adding the penalty of wait times and computation. For loads that are relatively quick compared to the expiration time it doesn't matter. For those like yours which are very tight it does. But I am also unsure if that's used and, if so, not already problematic? It would be nice to understand the use-case better. |
So we kinda stumbled on this trying to debug an issue, where our caches seem to get stuck, unable to ever really return any value. We are not entirely sure this is the cause, but still felt this was odd, we will do more testing for this issue with the expireAfterWrite, and see if the issue persists. As for the use case: in our case we have some work that is normally pretty fast (<1 second), but during certain peak times, our work may suddenly take up 90 seconds to complete instead of 1 second, due to unavailable database connections, I/O and so on. This leads us to exceed our expireAfterWrite (not by design, and not something we desire.), even though we in 99.999% of the time are well within it. I am then guessing we get a whole chain put together of threads wanting to recompute the value, putting our system to a grinding halt, as no cache value ever really seems to be produced in that case. (We are taking steps to reduce the 90 seconds, as this is also not exactly desired functionality, but the contract of expireAfterWrite still seems broken.) We often support peak loads of >500 pageviews/sec so having 50 threads hitting the caches at 100ms interval is not at all unusual. I am not at all into the inner workings of this, but it seems to me that when 1 thread is done recomputing, all others waiting for the result should be given the result of that computation, no matter if the expiration has been reached or not, as this will surely be valid since it was just produced, and thus cannot have exceeded the expireAfterWrite unless it was effectively 0? Subsequent access would then be able to read a correct writeTime if it was set at the computation end? Please let me know if you need more details or something is unclear. |
I think that is fair. I don't know if I will get a chance to walk through the code during the week, but will try to get a release for you over the weekend. If you use an The async cache stores a future that subsequent callers block on and when it completes the write time is set, so that it doesn't expire while in-flight like you are seeing now. If we set it on the synchronous case to, then a waiter or subsequent read will get the full duration as expected. It only impacts narrow cases like yours, but the cost is tiny so I just need to work out the tests to assert the change. |
Yea I think we can work around it, so no rush on our account. Thank you very much for the swift response, and a very great library! |
Sorry that I didn't get a chance to work on this over the weekend. Hopefully I'll be able to catch up during my commute this week. Thanks for being patient. |
Really sorry that I'm not on top of this. Usually try to have a week turn around on bugs. Just been hectic lately and hard to decompress enough to switch contexts over. |
I found a little time Sunday night and I have the fix passing your test. I'll try to work on the unit tests on the train tomorrow. Again, apologies for not being able to work on this more promptly. |
I stole some time at work to wrap this up. I'll release this evening if the CI passes. |
Previously, the current time was read once at the beginning of the computation. This was used to determine if the entry had expired and, when computed, the associated timestamp. For long computations and short expiration times, this could result in the newly computed entry being expired. A chain of computations would build up, each with timestamps older than the last due to waiting for its predecessor. Now the ticker is read again after the computation completes and the timestamps are set. This means there are two reads on a computation, but only 1 in the happy path of a cache hit. Note that obtaining the time can be an expensive operation, so minimizing this is useful for high loads. Updated dependencies and fixed issues discovered by Spotbugs and ErrorProne.
Previously, the current time was read once at the beginning of the computation. This was used to determine if the entry had expired and, when computed, the associated timestamp. For long computations and short expiration times, this could result in the newly computed entry being expired. A chain of computations would build up, each with timestamps older than the last due to waiting for its predecessor. Now the ticker is read again after the computation completes and the timestamps are set. This means there are two reads on a computation, but only 1 in the happy path of a cache hit. Note that obtaining the time can be an expensive operation, so minimizing this is useful for high loads. Updated dependencies and fixed issues discovered by Spotbugs and ErrorProne.
Previously, the current time was read once at the beginning of the computation. This was used to determine if the entry had expired and, when computed, the associated timestamp. For long computations and short expiration times, this could result in the newly computed entry being expired. A chain of computations would build up, each with timestamps older than the last due to waiting for its predecessor. Now the ticker is read again after the computation completes and the timestamps are set. This means there are two reads on a computation, but only 1 in the happy path of a cache hit. Note that obtaining the time can be an expensive operation, so minimizing this is useful for high loads. Updated dependencies and fixed issues discovered by Spotbugs and ErrorProne.
Released. Thank you for being patient and apologies for being slow to resolve this. |
We are experiencing some behavior we cannot really make sense of. When running the following test, we expect:
Instead we see non deterministic behavior with threads waiting from 3 seconds to 12 with no real pattern. The output from the test will show the timings becoming off fairly fast.
The text was updated successfully, but these errors were encountered: