-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions #6425
HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions #6425
Conversation
🎊 +1 overall
This message was automatically generated. |
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ARetryPolicy.java
Outdated
Show resolved
Hide resolved
Re-ran the whole test suite as well. Don't see any new failures. |
🎊 +1 overall
This message was automatically generated. |
a22fd68
to
4537fa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. Good tests.
...hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
note aws sdk has done something here with Netty |
Testing
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits; will fix
if (values.length == 2) { | ||
try { | ||
long start = Long.parseUnsignedLong(values[0]); | ||
long end = Long.parseUnsignedLong(values[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, got this wrong. as it is in the production code...will fix.
.cause(cause) | ||
.build(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cut this
} | ||
|
||
}, | ||
with(Statistic.ACTION_HTTP_GET_REQUEST, 0)); // vector stats don't add this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to await result and then the count will be incremented
as it stands, this PR will
not done: any attempt for readFully() to recover from that -1 response. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
92c9005
to
a46b4ee
Compare
💔 -1 overall
This message was automatically generated. |
…ions Differentiate from "EOF out of range" from "channel problems" through two different subclasses of EOFException and input streams to always retry on http channel errors; out of range GET requests are not retried. Currently an EOFException is always treated as a fail-fast call in read() This allows for all existing external code catching EOFException to handle both, but S3AInputStream to cleanly differentiate range errors (map to -1) from channel errors (retry) - HttpChannelEOFException is subclass of EOFException, so all code which catches EOFException is still happy. retry policy: connectivityFailure - RangeNotSatisfiableEOFException is the subclass of EOFException raised on 416 GET range errors. retry policy: fail - Method ErrorTranslation.maybeExtractChannelException() to create this from shaded/unshaded NoHttpResponseException, using string match to avoid classpath problems. - And do this for SdkClientExceptions with OpenSSL error code WFOPENSSL0035. This isn't strictly the right place for this as its not an IOE we are remapping... - ErrorTranslation.maybeExtractIOException() to perform this translation as appropriate. S3AInputStream.reopen() code retries on EOF, except on RangeNotSatisfiableEOFException, which is converted to a -1 response to the caller as is done historically. HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions S3AInputStream knows to handle these with read(): HttpChannelEOFException: stream aborting close then retry lazySeek(): Map RangeNotSatisfiableEOFException to -1, but do not map any other EOFException class raised. This means that * out of range reads map to -1 * channel problems in reopen are retried * channel problems in read() abort the failed http connection so it isn't recycled Tests for this using/abusing mocking. Change-Id: I0a31c1ae291ea2b38b0294a50dca5e9d0d4d1fdf
…xceptions Testing through actually raising 416 exceptions and verifying that readFully(), char read() and vector reads are all good. Not yet validated, PositionedReadable.read() where we need to * return -1 past EOF * retry on the http channel issues Change-Id: Ia37f348ccc7d730ae439422e99dfc828612d0bfd
…xceptions If a read() on the inner wrapped stream returned -1 then it is closed. There is no attempt to actually recover within a readFully(); there's a switch to turn this on, but if anyone does it a test will spin forever as the inner PositionedReadable.read(position, buffer, len) downgrades all EOF exceptions to -1. A new method would need to be added which controls whether to downgrade/rethrow exceptions, which makes for more complex work. What does that mean? Possibly reduced resilience to non-retried failures on the inner stream, even though more channel exceptions are retried on. +Split out tests of different read methods into their own test cases. Change-Id: I8f7e75ba73a376e7dfd06c5bb9ebc4138bc80394
Change-Id: Ided5c0c3967199b7485641a40b06fba92aeea948
Change-Id: I37f05880cad808db9f53035d4d49af48a333f2c7
Change-Id: I51d860b8ecc774149307ebba2d2b6ec393671fff
Change-Id: I483d0aa29d45956a4cf9b1efd939978b550fd722
54811a8
to
776ac8b
Compare
rebased to force rebuild |
@mukund-thakur @ahmarsuhail can i get a review of this...want to merge and ideally get onto 3.4 branch in case there's another RC of that |
🎊 +1 overall
This message was automatically generated. |
tested: s3 london; 12 threads and -Dscale. No failures other than the two I'm always seeing on trunk right now |
update, one new failure, probably from having 12 threads as it was a timeout in one of the tests
Does not happen on a standalone rerun.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
…xceptions (#6425) Differentiate from "EOF out of range/end of GET" from "EOF channel problems" through two different subclasses of EOFException and input streams to always retry on http channel errors; out of range GET requests are not retried. Currently an EOFException is always treated as a fail-fast call in read() This allows for all existing external code catching EOFException to handle both, but S3AInputStream to cleanly differentiate range errors (map to -1) from channel errors (retry) - HttpChannelEOFException is subclass of EOFException, so all code which catches EOFException is still happy. retry policy: connectivityFailure - RangeNotSatisfiableEOFException is the subclass of EOFException raised on 416 GET range errors. retry policy: fail - Method ErrorTranslation.maybeExtractChannelException() to create this from shaded/unshaded NoHttpResponseException, using string match to avoid classpath problems. - And do this for SdkClientExceptions with OpenSSL error code WFOPENSSL0035. We believe this is the OpenSSL equivalent. - ErrorTranslation.maybeExtractIOException() to perform this translation as appropriate. S3AInputStream.reopen() code retries on EOF, except on RangeNotSatisfiableEOFException, which is converted to a -1 response to the caller as is done historically. S3AInputStream knows to handle these with read(): HttpChannelEOFException: stream aborting close then retry lazySeek(): Map RangeNotSatisfiableEOFException to -1, but do not map any other EOFException class raised. This means that * out of range reads map to -1 * channel problems in reopen are retried * channel problems in read() abort the failed http connection so it isn't recycled Tests for this using/abusing mocking. Testing through actually raising 416 exceptions and verifying that readFully(), char read() and vector reads are all good. There is no attempt to recover within a readFully(); there's a boolean constant switch to turn this on, but if anyone does it a test will spin forever as the inner PositionedReadable.read(position, buffer, len) downgrades all EOF exceptions to -1. A new method would need to be added which controls whether to downgrade/rethrow exceptions. What does that mean? Possibly reduced resilience to non-retried failures on the inner stream, even though more channel exceptions are retried on. Contributed by Steve Loughran
…xceptions (#6425) Differentiate from "EOF out of range/end of GET" from "EOF channel problems" through two different subclasses of EOFException and input streams to always retry on http channel errors; out of range GET requests are not retried. Currently an EOFException is always treated as a fail-fast call in read() This allows for all existing external code catching EOFException to handle both, but S3AInputStream to cleanly differentiate range errors (map to -1) from channel errors (retry) - HttpChannelEOFException is subclass of EOFException, so all code which catches EOFException is still happy. retry policy: connectivityFailure - RangeNotSatisfiableEOFException is the subclass of EOFException raised on 416 GET range errors. retry policy: fail - Method ErrorTranslation.maybeExtractChannelException() to create this from shaded/unshaded NoHttpResponseException, using string match to avoid classpath problems. - And do this for SdkClientExceptions with OpenSSL error code WFOPENSSL0035. We believe this is the OpenSSL equivalent. - ErrorTranslation.maybeExtractIOException() to perform this translation as appropriate. S3AInputStream.reopen() code retries on EOF, except on RangeNotSatisfiableEOFException, which is converted to a -1 response to the caller as is done historically. S3AInputStream knows to handle these with read(): HttpChannelEOFException: stream aborting close then retry lazySeek(): Map RangeNotSatisfiableEOFException to -1, but do not map any other EOFException class raised. This means that * out of range reads map to -1 * channel problems in reopen are retried * channel problems in read() abort the failed http connection so it isn't recycled Tests for this using/abusing mocking. Testing through actually raising 416 exceptions and verifying that readFully(), char read() and vector reads are all good. There is no attempt to recover within a readFully(); there's a boolean constant switch to turn this on, but if anyone does it a test will spin forever as the inner PositionedReadable.read(position, buffer, len) downgrades all EOF exceptions to -1. A new method would need to be added which controls whether to downgrade/rethrow exceptions. What does that mean? Possibly reduced resilience to non-retried failures on the inner stream, even though more channel exceptions are retried on. Contributed by Steve Loughran
…xceptions (apache#6425) Differentiate from "EOF out of range/end of GET" from "EOF channel problems" through two different subclasses of EOFException and input streams to always retry on http channel errors; out of range GET requests are not retried. Currently an EOFException is always treated as a fail-fast call in read() This allows for all existing external code catching EOFException to handle both, but S3AInputStream to cleanly differentiate range errors (map to -1) from channel errors (retry) - HttpChannelEOFException is subclass of EOFException, so all code which catches EOFException is still happy. retry policy: connectivityFailure - RangeNotSatisfiableEOFException is the subclass of EOFException raised on 416 GET range errors. retry policy: fail - Method ErrorTranslation.maybeExtractChannelException() to create this from shaded/unshaded NoHttpResponseException, using string match to avoid classpath problems. - And do this for SdkClientExceptions with OpenSSL error code WFOPENSSL0035. We believe this is the OpenSSL equivalent. - ErrorTranslation.maybeExtractIOException() to perform this translation as appropriate. S3AInputStream.reopen() code retries on EOF, except on RangeNotSatisfiableEOFException, which is converted to a -1 response to the caller as is done historically. S3AInputStream knows to handle these with read(): HttpChannelEOFException: stream aborting close then retry lazySeek(): Map RangeNotSatisfiableEOFException to -1, but do not map any other EOFException class raised. This means that * out of range reads map to -1 * channel problems in reopen are retried * channel problems in read() abort the failed http connection so it isn't recycled Tests for this using/abusing mocking. Testing through actually raising 416 exceptions and verifying that readFully(), char read() and vector reads are all good. There is no attempt to recover within a readFully(); there's a boolean constant switch to turn this on, but if anyone does it a test will spin forever as the inner PositionedReadable.read(position, buffer, len) downgrades all EOF exceptions to -1. A new method would need to be added which controls whether to downgrade/rethrow exceptions. What does that mean? Possibly reduced resilience to non-retried failures on the inner stream, even though more channel exceptions are retried on. Contributed by Steve Loughran
…t runs (#6465) Disables the new tests added in: HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions #6425 The underlying issue here is that the block prefetch code can identify when there's a mismatch between declared and actual length, and doesn't store any of the incomplete buffer. This should be addressed in HADOOP-18184. Contributed by Steve Loughran
…t runs (#6465) Disables the new tests added in: HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions #6425 The underlying issue here is that the block prefetch code can identify when there's a mismatch between declared and actual length, and doesn't store any of the incomplete buffer. This should be addressed in HADOOP-18184. Contributed by Steve Loughran
S3AInputStream.reopen() code now retries, except on EOFException, which is converted to a -1 response to the caller, as is done historically.
New tests for the translation, but not for the input stream logic. The place for that will be TestS3AInputStreamRetry
How was this patch tested?
New unit test cases; so far run the contract tests.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?