Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 Incomplete Read Warning Despite Aborting #1657

Closed
atorstling opened this issue Jun 28, 2018 · 5 comments
Closed

S3 Incomplete Read Warning Despite Aborting #1657

atorstling opened this issue Jun 28, 2018 · 5 comments
Labels
investigating This issue is being investigated and/or work is in progress to resolve the issue.

Comments

@atorstling
Copy link

atorstling commented Jun 28, 2018

Hello!

I'm getting the warnings when reading an S3 getObject stream incompletely and aborting at the end:

try(S3Object object = s3.getObject(bucketName, objectKey)) {
   S3ObjectInputStream is = object.getObjectContent();
   try {
      readSomeFrom(is);
   } finally {
    is.abort();
  }
}

The log entry is:

WARN com.amazonaws.services.s3.internal.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.

I've debugged it and dug up the following stack:

close:178, S3AbortableInputStream (com.amazonaws.services.s3.internal)
close:99, SdkFilterInputStream (com.amazonaws.internal)
close:136, S3ObjectInputStream (com.amazonaws.services.s3.model)
close:99, SdkFilterInputStream (com.amazonaws.internal)
close:99, SdkFilterInputStream (com.amazonaws.internal)
close:211, ProgressInputStream (com.amazonaws.event)
close:181, FilterInputStream (java.io)
closeQuietly:70, IOUtils (com.amazonaws.util)
abort:98, S3ObjectInputStream (com.amazonaws.services.s3.model)

The lowest frame:

@Override
    public void abort() {
        super.abort();

        if (httpRequest != null) {
            httpRequest.abort();
        }

        // The default abort() implementation calls abort on the wrapped stream
        // if it's an SdkFilterInputStream; otherwise we'll need to close the
        // stream.
        if (!(in instanceof SdkFilterInputStream)) {
-->         IOUtils.closeQuietly(in, null);
        }
    }

The highest frame:

@Override
    public void close() throws IOException {
        if (readAllBytes() || isAborted()) {
            super.close();
        } else {
-->         LOG.warn(
                    "Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and " +
                    "may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input " +
                    "stream after use.");
            if (httpRequest != null) {
                httpRequest.abort();
            }
            IOUtils.closeQuietly(in, null);
        }
    }

I think that the problem is incomplete aborting since the delegate stream is of the S3ObjectInputStream is a DigestValidationInputStream and not a SdkFilterInputStream. That seems to happen due to etag validation. I'm on 1.11.355.

@steveloughran
Copy link

Looks like a regression; Hadoop is still only on 1.11.271, and not seeing this on its codepaths. If it has come back: Regression.

Or: it was always there but your choice in input validation has changed the input streams

When it is fixed, can the fix include a call of the abort & verification that the message didn't get logged? As clearly this is a fairly brittle fix and if it returns once, it's going to keep coming back. I am already thinking to declare part of the AWS SDK update for hadoop process one of "play with it on the CLI to see what new error messages appear", but at least this one could be automated out. thanks.

@spfink spfink added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Jun 28, 2018
@shorea
Copy link
Contributor

shorea commented Jul 18, 2018

Was able to reproduce this, have a fix in mind.

@shorea
Copy link
Contributor

shorea commented Jul 19, 2018

I've pushed a fix for this, it will be available in the next release.

@shorea shorea closed this as completed Jul 19, 2018
@steveloughran
Copy link

thanks.

do you have a regression test to verify that this message won't get logged in future?

this is one of those behaviours where a test run which doesn't look at the logs will see success in terms of semantics, but the logs will be full of noise. As this issue now seems to recur, I'd be happy if the patch includes not just a fix but a check of the output logs. I know its fiddly, but log4j is there to be captured if you try hard

@steveloughran
Copy link

FWIW, I'm doing an upgrade of hadoop trunk from v 1.11.271 of the SDK to 1.11.374 and I'm not seeing this error message in our logs. Either its been fixed or we our uses of abort() never managed to trigger it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating This issue is being investigated and/or work is in progress to resolve the issue.
Projects
None yet
Development

No branches or pull requests

4 participants