-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable markSupported in SlicedInputStream #112563
Enable markSupported in SlicedInputStream #112563
Conversation
This means openSlice must be able to be called for a previous slice even if the previous slice has been closed. This is possible for current implementations as a slice corresponds to a blob from the object store, which can be re-opened. Relates ES-9248
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides a few minor comments below, I also wonder whether you have considered adding the changes as a subclass in the stateless codebase? Not that I find any change here is unsafe, but a tighter scope generally feels safer to me.
// We ignore readLimit since openSlice() can re-open previous InputStreams, and we can skip as many bytes as we'd like. | ||
// According to JDK documentation, marking a closed InputStream should have no effect. | ||
if (closed == false) { | ||
if (initialized && nextSlice > 0) { // nextSlice > 0 guards the case it is initialized and numSlices is 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How it possible for initialized = true
and nextSlice = 0
unless there are concurrent invocation on nextStream and mark? But this class seems to assume single threaded usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe my comment misled you to think of concurrency.
There's indeed a single-threaded edge case where it can happen: if you create a SlicedInputStream with 0 slices (effectively an empty InputStream), and try to read, then the SlicedInputStream is initialized and nextSlice = 0. That's why I had added the nextSlice>0 guard here.
However, I reworked the code to make it better: I added an explicit numSlices > 0 guard to both mark/reset to make them ineffective. And added a test testMarkResetZeroSlices
that covers this edge case.
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ywangd for the review! I handled your feedback, so feel free to review again.
Besides a few minor comments below, I also wonder whether you have considered adding the changes as a subclass in the stateless codebase? Not that I find any change here is unsafe, but a tighter scope generally feels safer to me.
Several reasons: (a) I believe the mark/reset functionality may be useful in the main codebase, (b) the mark/reset code is more readable and easier to do in the same class rather than in a seperate sub-class, (c) some functions are final and did not want to tamper with them (I only removed final from the close function because we need to override it in stateless).
However, I agree the tighter scope would be safer, so I made the current implementations to override markSupported() to false, and it is true only for the stateless implementation.
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Show resolved
Hide resolved
// We ignore readLimit since openSlice() can re-open previous InputStreams, and we can skip as many bytes as we'd like. | ||
// According to JDK documentation, marking a closed InputStream should have no effect. | ||
if (closed == false) { | ||
if (initialized && nextSlice > 0) { // nextSlice > 0 guards the case it is initialized and numSlices is 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, maybe my comment misled you to think of concurrency.
There's indeed a single-threaded edge case where it can happen: if you create a SlicedInputStream with 0 slices (effectively an empty InputStream), and try to read, then the SlicedInputStream is initialized and nextSlice = 0. That's why I had added the nextSlice>0 guard here.
However, I reworked the code to make it better: I added an explicit numSlices > 0 guard to both mark/reset to make them ineffective. And added a test testMarkResetZeroSlices
that covers this edge case.
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Outdated
Show resolved
Hide resolved
b966546
to
093a18d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
public void reset() throws IOException { | ||
if (markSupported()) { | ||
// JDK documentation does not clarify if a closed InputStream (that has been previously marked) can be reset. We assume the same | ||
// behavior specified for mark(), i.e., that reset on a closed InputStream has no effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to fail on a closed stream here. Even assert false too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it throw if closed. Did not add an assertion, so I can test the behavior in the unit test. Feel free to tell me if the assertion is important to add.
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Show resolved
Hide resolved
@Override | ||
public boolean markSupported() { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this is important? Can we remove it? Similar for the next one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a question whether we'd like current implementations to have markSupported as true? I originally enabled them in the PR because I believe they are resumable as they read blobs. But @ywangd mentioned:
I wonder whether you have considered adding the changes as a subclass in the stateless codebase? Not that I find any change here is unsafe, but a tighter scope generally feels safer to me.
In light of this, I disabled markSupported in current implementations. They will behave as they were (without resumability). But feel free to tell me if I should go ahead and enable markSupported in all current implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to add a separate subclass for SlicedInputStream
and implement mark/reset support in the subclass so that existing usages of SlicedInputStream
keep unchanged. But I guess it might be challenging to implement the new features in a separate class? The current version works as well. I slightly prefer to have a follow-up PR just to enable the mark/reset support for existing usages since this PR is to support the stateless side changes. Separating them seems to make that concept clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it'd be challenging to have them separate, plus I believe it's useful to have the capability in the core SlicedInputStream since all implementations seem to support mark/reset (even though we do not enable them just to maintain their old behavior just to be safe).
server/src/test/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStreamTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStreamTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I handled it. Added an isClosed() function as well which is needed in the stateless PR. I am in the habit of resolving conversations I think I handled, but feel free to re-open them.
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Show resolved
Hide resolved
public void reset() throws IOException { | ||
if (markSupported()) { | ||
// JDK documentation does not clarify if a closed InputStream (that has been previously marked) can be reset. We assume the same | ||
// behavior specified for mark(), i.e., that reset on a closed InputStream has no effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it throw if closed. Did not add an assertion, so I can test the behavior in the unit test. Feel free to tell me if the assertion is important to add.
@Override | ||
public boolean markSupported() { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a question whether we'd like current implementations to have markSupported as true? I originally enabled them in the PR because I believe they are resumable as they read blobs. But @ywangd mentioned:
I wonder whether you have considered adding the changes as a subclass in the stateless codebase? Not that I find any change here is unsafe, but a tighter scope generally feels safer to me.
In light of this, I disabled markSupported in current implementations. They will behave as they were (without resumability). But feel free to tell me if I should go ahead and enable markSupported in all current implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews! Will merge this PR as soon as it passes the CI.
server/src/main/java/org/elasticsearch/index/snapshots/blobstore/SlicedInputStream.java
Show resolved
Hide resolved
@Override | ||
public boolean markSupported() { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it'd be challenging to have them separate, plus I believe it's useful to have the capability in the core SlicedInputStream since all implementations seem to support mark/reset (even though we do not enable them just to maintain their old behavior just to be safe).
This means openSlice must be able to be called for a previous slice even if the previous slice has been closed. We disable the support for current implementations to keep their old behavior just to be sure. Relates ES-9248
This means openSlice must be able to be called for a previous slice even if the previous slice has been closed.
We disable markSupported in older implementation to keep the same behavior for them.
Relates ES-9248