Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable markSupported in SlicedInputStream #112563

Merged
merged 10 commits into from
Sep 11, 2024

Conversation

kingherc
Copy link
Contributor

@kingherc kingherc commented Sep 5, 2024

This means openSlice must be able to be called for a previous slice even if the previous slice has been closed.

We disable markSupported in older implementation to keep the same behavior for them.

Relates ES-9248

This means openSlice must be able to be called for a previous slice
even if the previous slice has been closed. This is possible for
current implementations as a slice corresponds to a blob from the
object store, which can be re-opened.

Relates ES-9248
@kingherc kingherc added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Sep 5, 2024
@kingherc kingherc self-assigned this Sep 5, 2024
@kingherc kingherc added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 5, 2024
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Sep 5, 2024
@kingherc kingherc marked this pull request as ready for review September 5, 2024 19:12
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@kingherc kingherc requested review from fcofdez and ywangd September 5, 2024 19:12
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides a few minor comments below, I also wonder whether you have considered adding the changes as a subclass in the stateless codebase? Not that I find any change here is unsafe, but a tighter scope generally feels safer to me.

// We ignore readLimit since openSlice() can re-open previous InputStreams, and we can skip as many bytes as we'd like.
// According to JDK documentation, marking a closed InputStream should have no effect.
if (closed == false) {
if (initialized && nextSlice > 0) { // nextSlice > 0 guards the case it is initialized and numSlices is 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How it possible for initialized = true and nextSlice = 0 unless there are concurrent invocation on nextStream and mark? But this class seems to assume single threaded usage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe my comment misled you to think of concurrency.

There's indeed a single-threaded edge case where it can happen: if you create a SlicedInputStream with 0 slices (effectively an empty InputStream), and try to read, then the SlicedInputStream is initialized and nextSlice = 0. That's why I had added the nextSlice>0 guard here.

However, I reworked the code to make it better: I added an explicit numSlices > 0 guard to both mark/reset to make them ineffective. And added a test testMarkResetZeroSlices that covers this edge case.

Copy link
Contributor Author

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ywangd for the review! I handled your feedback, so feel free to review again.

Besides a few minor comments below, I also wonder whether you have considered adding the changes as a subclass in the stateless codebase? Not that I find any change here is unsafe, but a tighter scope generally feels safer to me.

Several reasons: (a) I believe the mark/reset functionality may be useful in the main codebase, (b) the mark/reset code is more readable and easier to do in the same class rather than in a seperate sub-class, (c) some functions are final and did not want to tamper with them (I only removed final from the close function because we need to override it in stateless).

However, I agree the tighter scope would be safer, so I made the current implementations to override markSupported() to false, and it is true only for the stateless implementation.

// We ignore readLimit since openSlice() can re-open previous InputStreams, and we can skip as many bytes as we'd like.
// According to JDK documentation, marking a closed InputStream should have no effect.
if (closed == false) {
if (initialized && nextSlice > 0) { // nextSlice > 0 guards the case it is initialized and numSlices is 0.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, maybe my comment misled you to think of concurrency.

There's indeed a single-threaded edge case where it can happen: if you create a SlicedInputStream with 0 slices (effectively an empty InputStream), and try to read, then the SlicedInputStream is initialized and nextSlice = 0. That's why I had added the nextSlice>0 guard here.

However, I reworked the code to make it better: I added an explicit numSlices > 0 guard to both mark/reset to make them ineffective. And added a test testMarkResetZeroSlices that covers this edge case.

@kingherc kingherc requested a review from ywangd September 6, 2024 09:42
@kingherc kingherc force-pushed the non-issue/adapt-sliced-input-stream branch from b966546 to 093a18d Compare September 6, 2024 10:08
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

public void reset() throws IOException {
if (markSupported()) {
// JDK documentation does not clarify if a closed InputStream (that has been previously marked) can be reset. We assume the same
// behavior specified for mark(), i.e., that reset on a closed InputStream has no effect.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to fail on a closed stream here. Even assert false too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it throw if closed. Did not add an assertion, so I can test the behavior in the unit test. Feel free to tell me if the assertion is important to add.

Comment on lines +56 to +59
@Override
public boolean markSupported() {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this is important? Can we remove it? Similar for the next one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a question whether we'd like current implementations to have markSupported as true? I originally enabled them in the PR because I believe they are resumable as they read blobs. But @ywangd mentioned:

I wonder whether you have considered adding the changes as a subclass in the stateless codebase? Not that I find any change here is unsafe, but a tighter scope generally feels safer to me.

In light of this, I disabled markSupported in current implementations. They will behave as they were (without resumability). But feel free to tell me if I should go ahead and enable markSupported in all current implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to add a separate subclass for SlicedInputStream and implement mark/reset support in the subclass so that existing usages of SlicedInputStream keep unchanged. But I guess it might be challenging to implement the new features in a separate class? The current version works as well. I slightly prefer to have a follow-up PR just to enable the mark/reset support for existing usages since this PR is to support the stateless side changes. Separating them seems to make that concept clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it'd be challenging to have them separate, plus I believe it's useful to have the capability in the core SlicedInputStream since all implementations seem to support mark/reset (even though we do not enable them just to maintain their old behavior just to be safe).

Copy link
Contributor Author

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I handled it. Added an isClosed() function as well which is needed in the stateless PR. I am in the habit of resolving conversations I think I handled, but feel free to re-open them.

public void reset() throws IOException {
if (markSupported()) {
// JDK documentation does not clarify if a closed InputStream (that has been previously marked) can be reset. We assume the same
// behavior specified for mark(), i.e., that reset on a closed InputStream has no effect.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it throw if closed. Did not add an assertion, so I can test the behavior in the unit test. Feel free to tell me if the assertion is important to add.

Comment on lines +56 to +59
@Override
public boolean markSupported() {
return false;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a question whether we'd like current implementations to have markSupported as true? I originally enabled them in the PR because I believe they are resumable as they read blobs. But @ywangd mentioned:

I wonder whether you have considered adding the changes as a subclass in the stateless codebase? Not that I find any change here is unsafe, but a tighter scope generally feels safer to me.

In light of this, I disabled markSupported in current implementations. They will behave as they were (without resumability). But feel free to tell me if I should go ahead and enable markSupported in all current implementations.

@kingherc kingherc removed the serverless-linked Added by automation, don't add manually label Sep 6, 2024
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor Author

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews! Will merge this PR as soon as it passes the CI.

Comment on lines +56 to +59
@Override
public boolean markSupported() {
return false;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it'd be challenging to have them separate, plus I believe it's useful to have the capability in the core SlicedInputStream since all implementations seem to support mark/reset (even though we do not enable them just to maintain their old behavior just to be safe).

@kingherc kingherc merged commit be90011 into elastic:main Sep 11, 2024
15 checks passed
@kingherc kingherc deleted the non-issue/adapt-sliced-input-stream branch September 11, 2024 09:31
davidkyle pushed a commit that referenced this pull request Sep 12, 2024
This means openSlice must be able to be called for a previous slice
even if the previous slice has been closed.

We disable the support for current implementations to keep
their old behavior just to be sure.

Relates ES-9248
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants