-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a soft reference based shared cache for S3 reads #5357
Conversation
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/ModuloBasedRequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/ModuloBasedRequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard for me to evaluate this in a reasonable amount of time. I worry we're adding complexity and it's not obvious to me that things are correct. I'm posting this now, but I estimate I would need at least another half a day to go through it all.
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3RequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
* {@link #cleanup() cleanup} once the reference count reaches zero. | ||
*/ | ||
// TODO Move the release method lower in file after the fill method. Kept it here for ease of review. | ||
void release() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out what the advantage of acquire/release is as opposed to relying on java reachability to Request. Should we instead have a SoftReference<Request>
structure somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main challenge we were facing was to control when to cancel the request. This uses CleanupReference for that and the request holds a soft reference to the corresponding ByteBuffer. Once the ByteBuffer goes out of scope, the request can be cleared. This new acquire/release methodology was so that each context can control the lifecycle of the buffer and thus the corresponding request.
We have slightly changed the design now so that each context will hold {Request, ownershipToken}, where internally the ownershipToken will be the Buffer only.
So now the reachability of the buffer will directly control the lifecycle of the Request. I might not be able to explain the new design in words but the code is much cleaner and should be easier to follow.
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3RequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3RequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/ModuloBasedRequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/ModuloBasedRequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3Instructions.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3RequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3RequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3RequestCache.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3RequestCache.java
Outdated
Show resolved
Hide resolved
if (existingRequest == null) { | ||
// Ideally, we could have used ".replace" in this case as well, but KeyedObjectHashMap.replace currently | ||
// has a bug when the key is not present in the map. | ||
added = requests.putIfAbsent(key, newAcquiredRequest.request) == null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this fails, we can avoid get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You found this more complicated. I still prefer to avoid an extra lookup. Maybe moot anyway, since we can't seem to trust replace
right now.
Request existingRequest = requests.get(key);
while (true) {
if (existingRequest != null) {
final Request.AcquiredRequest acquired = existingRequest.tryAcquire();
if (acquired != null) {
return acquired;
}
}
if (newAcquiredRequest == null) {
newAcquiredRequest = Request.createAndAcquire(fragmentIndex, context);
}
final boolean added;
if (existingRequest == null) {
added = (existingRequest = requests.putIfAbsent(key, newAcquiredRequest.request)) == null;
} else {
if (!(added = requests.replace(key, existingRequest, newAcquiredRequest.request))) {
existingRequest = requests.get(key);
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted back to remove
+ putIfAbsent
pattern and added a TODO for this with issue #5486.
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3ChannelContext.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can approve like this, but I had one suggestion/question.
Util/src/main/java/io/deephaven/util/reference/CleanupReferenceProcessor.java
Outdated
Show resolved
Hide resolved
...raph/src/main/java/io/deephaven/engine/util/reference/CleanupReferenceProcessorInstance.java
Show resolved
Hide resolved
Labels indicate documentation is required. Issues for documentation have been opened: Community: deephaven/deephaven-docs-community#211 |
This PR adds a soft-reference based cache for recently fetched fragments from S3 for faster lookup.
This cache would be especially useful for smaller files which have fewer fragments and can fit in the cache.
Documentation:
S3.maxFragmentSize
since we won't use a buffer pool internally anymore.maxConcurrentRequests
,readAheadCount
,fragmentSize
,maxCacheSize
.