-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18184. S3A Prefetching unbuffer. #5832
base: trunk
Are you sure you want to change the base?
HADOOP-18184. S3A Prefetching unbuffer. #5832
Conversation
based on #4298 |
7d6ae13
to
ffd3180
Compare
💔 -1 overall
This message was automatically generated. |
ffd3180
to
f24f1fd
Compare
HADOOP-18184. S3A prefetch unbuffer
Overall, the prefetch reads of the large files are slow; while it's critical better: one of the huge tests uses it, with a small block size of 1 MB to |
Yes, this is a lot more than just unbuffer, but its the first time i've really had the code in the IDE with me writing tests to use IOStats, context iostats, waiting for tests to finish etc. I have more to do which I will followup on different jiras. key: actually support small block memory caching so you can use the stream without any disk use. needed to switch to this everywhere. |
timeout in lru tests
issue here is having all the different bulk reads in the same test case; if it takes too long (> 10 minutes!) then it fails. the solution here shouldn't be "add a bigger timeout" it should be "make these tests faster by working with smaller files and smaller blocks" |
tested, s3 london, with All the landasat tests are going to be long-haul for most people; the existing hugefile tests should be extended to do the reading on their files which are (a) on the chosen aws region and (b) let you control the filesize
|
💔 -1 overall
This message was automatically generated. |
PR #5851 |
f24f1fd
to
2e6ab81
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have done an initial review of the prod code (not looked at tests). looks good, have some questions and minor suggestions
} | ||
} else { | ||
// free the buffers | ||
bufferPool.getAll().forEach(BufferData::setDone); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done buffers will get released on the next prefetch..but wondering if we can just release here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we had so much grief with the abfs prefetch premature release bug that I am scared now
return false; | ||
} | ||
|
||
if (unbuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we just do blockData = null? Since on initializeUnderlyingResources
we create a new BlockData obj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why just on unbuffer? shouldn't this be cleaned up on close() too?
String message = String.format( | ||
"Caching disabled because of slow operation (%.1f sec)", endOp.duration()); | ||
LOG_CACHING_DISABLED.info(message); | ||
prefetchingStatistics.setPrefetchCachingState(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: naming here can get confusing, it's not immediately clear if prefetchCachingState refers to just caching blocks in memory via prefetching, or caching them to disk. It would make things clearer if this was instead something like setPrefetchDiskCachingState
. if we do rename, also renaming the caching methods would make things clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in method, but not in gauge name. What if we ever allow a cache to memory option here?
LOG.debug("Block {}: Preparing to cache block", blockNumber); | ||
|
||
if (isCachingDisabled()) { | ||
LOG.debug("Block {}: Preparing caching disabled, not prefetching", blockNumber); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be caching disabled, not caching
or something. prefetching may or not be happening here (it could already be done prefetching by the time it gets here)
final int blockNumber = data.getBlockNumber(); | ||
LOG.debug("Block {}: Preparing to cache block", blockNumber); | ||
|
||
if (isCachingDisabled()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused about why we're doing this twice, here and on line 577. as far as I can tell, in between these two, nothing is changing the isCachingDisabled
state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a .get() on the future...which blocks until the data is received. the checks on L577 are if caching changed during that time.
added some more comments and reviewed/tuned log messages
@@ -526,18 +585,21 @@ private void addToCacheAndRelease(BufferData data, Future<Void> blockFuture, | |||
synchronized (data) { | |||
try { | |||
if (data.stateEqualsOneOf(BufferData.State.DONE)) { | |||
LOG.debug("Block {}: Block already in cache; not adding", blockNumber); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be something like "block no longer in use, not adding"
...ct/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/prefetch/PrefetchingStatistics.java
Outdated
Show resolved
Hide resolved
int fileSize = (int) s3Attributes.getLen(); | ||
LOG.debug("Created caching input stream for {} (size = {})", this.getName(), | ||
fileSize); | ||
streamStatistics.setPrefetchState(numBlocksToPrefetch > 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also update this statistic in S3AInMemoryInputStream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think we should publish? that we are prefetching the entire file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind...since S3AInMemoryInputStream doesn't actually do any prefetching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the numBlocksToPrefetch > 0
though? numBlocksToPrefetch will always be > 0 because in S3AFS we do prefetchBlockCount = intOption(conf, PREFETCH_BLOCK_COUNT_KEY, PREFETCH_BLOCK_DEFAULT_COUNT, 1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set to true. i am thinking about a "no prefetch" option where it's only on demand/vector io; but can tune that if/when implemented
return context; | ||
} | ||
|
||
private void incrementBytesRead(int bytesRead) { | ||
if (bytesRead > 0) { | ||
streamStatistics.bytesRead(bytesRead); | ||
streamStatistics.bytesReadFromBuffer(bytesRead); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between this and the one on line 432?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trying to differentiate bytes we read remotely vs bytes which were returned from cached data.
d8694da
to
c0e4f1c
Compare
updated patch. the test that caching is failing because files aren't being added to the buffer dir. theory: they are going somewhere else. trivia; rebased patch wouldn't push to the repo until I updated my github oauth tokens; see https://docs.github.com/en/get-started/getting-started-with-git/caching-your-github-credentials-in-git |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed test code as well
return false; | ||
} | ||
|
||
if (unbuffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why just on unbuffer? shouldn't this be cleaned up on close() too?
} | ||
// update the statistics | ||
prefetchingStatistics.fetchOperationCompleted(isPrefetch, bytesFetched); | ||
if (tracker != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potentially could remove this null check and the one on 404 for the tracker. it used to be null for non prefetching ops before..but won't be null anymore
int fileSize = (int) s3Attributes.getLen(); | ||
LOG.debug("Created caching input stream for {} (size = {})", this.getName(), | ||
fileSize); | ||
streamStatistics.setPrefetchState(numBlocksToPrefetch > 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevermind...since S3AInMemoryInputStream doesn't actually do any prefetching
int fileSize = (int) s3Attributes.getLen(); | ||
LOG.debug("Created caching input stream for {} (size = {})", this.getName(), | ||
fileSize); | ||
streamStatistics.setPrefetchState(numBlocksToPrefetch > 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the numBlocksToPrefetch > 0
though? numBlocksToPrefetch will always be > 0 because in S3AFS we do prefetchBlockCount = intOption(conf, PREFETCH_BLOCK_COUNT_KEY, PREFETCH_BLOCK_DEFAULT_COUNT, 1);
* @param prefetch prefetch option | ||
* @return the modified configuration. | ||
*/ | ||
public static Configuration enablePrefetch(final Configuration conf, boolean prefetch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: renaming to setPrefetchState
or something would improve readability. enablePrefetch
on a glance, makes it seem like we're always enabling it
verifyStatisticCounterValue(ioStats, STREAM_READ_BYTES, | ||
expectedReadBytes); | ||
// unbuffer | ||
in.unbuffer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this test is getting quite big..it might be better to have a separate test for unbuffer
byte[] buffer = new byte[prefetchBlockSize]; | ||
|
||
in.read(buffer, 0, prefetchBlockSize - 10240); | ||
assertCacheFileExists(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we add a seek and then a read here? Though I tried that locally and the test still fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this turns out to be more complex than I'd thought, as there are some long standing behaviours in ensureCurrentBuffer which i suspect is broken.
...adoop-aws/src/test/java/org/apache/hadoop/fs/s3a/prefetch/ITestS3APrefetchingLargeFiles.java
Outdated
Show resolved
Hide resolved
* asserts whether file with .bin suffix is present. It also verifies certain file stats. | ||
*/ | ||
@Test | ||
public void testCacheFileExistence() throws Throwable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking if we can also add a test to check caching gets disabled if it takes too long....but not sure how to do it (or if it's possible)
Also a test that if it's unbuffer, it doesn't get cached
ioStatisticsContext = getCurrentIOStatisticsContext(); | ||
ioStatisticsContext.reset(); | ||
} | ||
|
||
private void createLargeFile() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it's worth following the same pattern as AbstractS3ACostTest
, which creates a huge file in a test, and then other tests assert that the file exists. ITestInMemoryInputStream could extend it as well, and avoid creating and tearing down the small file multiple times
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
testing: some failures. also a timeout in ITestS3APrefetchingLruEviction which I think shows a test in need of some tuning |
…t runs This is actually trickier than it seems as we will need to go deep into the implementation of caching. Specifically: the prefetcher knows the file length and if you open a file shorter than that, but less than one block, the read is considered a failure and the whole block is skipped, so read() of the nominally in-range data returns -1. This fix has to be considered a PoC and should be combined with the other big PR for prefetching, apache#5832 as that is where changes should go. Here is just test tuning and some differentiation of channel problems from other EOFs. Change-Id: Icdf7e2fb10ca77b6ca427eb207472fad277130d7
1381966
to
2b613ff
Compare
…t runs This is actually trickier than it seems as we will need to go deep into the implementation of caching. Specifically: the prefetcher knows the file length and if you open a file shorter than that, but less than one block, the read is considered a failure and the whole block is skipped, so read() of the nominally in-range data returns -1. This fix has to be considered a PoC and should be combined with the other big PR for prefetching, apache#5832 as that is where changes should go. Here is just test tuning and some differentiation of channel problems from other EOFs. Change-Id: Icdf7e2fb10ca77b6ca427eb207472fad277130d7
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
…t runs This is actually trickier than it seems as we will need to go deep into the implementation of caching. Specifically: the prefetcher knows the file length and if you open a file shorter than that, but less than one block, the read is considered a failure and the whole block is skipped, so read() of the nominally in-range data returns -1. This fix has to be considered a PoC and should be combined with the other big PR for prefetching, apache#5832 as that is where changes should go. Here is just test tuning and some differentiation of channel problems from other EOFs. Change-Id: Icdf7e2fb10ca77b6ca427eb207472fad277130d7
774bb73
to
8a55155
Compare
💔 -1 overall
This message was automatically generated. |
8a55155
to
fb83df2
Compare
💔 -1 overall
This message was automatically generated. |
There's some race condition with the list add/evict causing intermittent failures of one of the tests. Looks like the failure condition is
Suspect there's some kind of list update issue.
|
…t runs This is actually trickier than it seems as we will need to go deep into the implementation of caching. Specifically: the prefetcher knows the file length and if you open a file shorter than that, but less than one block, the read is considered a failure and the whole block is skipped, so read() of the nominally in-range data returns -1. This fix has to be considered a PoC and should be combined with the other big PR for prefetching, apache#5832 as that is where changes should go. Here is just test tuning and some differentiation of channel problems from other EOFs. Change-Id: Icdf7e2fb10ca77b6ca427eb207472fad277130d7
fb83df2
to
e7a73c0
Compare
💔 -1 overall
This message was automatically generated. |
…t runs This is actually trickier than it seems as we will need to go deep into the implementation of caching. Specifically: the prefetcher knows the file length and if you open a file shorter than that, but less than one block, the read is considered a failure and the whole block is skipped, so read() of the nominally in-range data returns -1. This fix has to be considered a PoC and should be combined with the other big PR for prefetching, apache#5832 as that is where changes should go. Here is just test tuning and some differentiation of channel problems from other EOFs. Change-Id: Icdf7e2fb10ca77b6ca427eb207472fad277130d7
e7a73c0
to
2134dc1
Compare
💔 -1 overall
This message was automatically generated. |
…t runs This is actually trickier than it seems as we will need to go deep into the implementation of caching. Specifically: the prefetcher knows the file length and if you open a file shorter than that, but less than one block, the read is considered a failure and the whole block is skipped, so read() of the nominally in-range data returns -1. This fix has to be considered a PoC and should be combined with the other big PR for prefetching, apache#5832 as that is where changes should go. Here is just test tuning and some differentiation of channel problems from other EOFs. Change-Id: Icdf7e2fb10ca77b6ca427eb207472fad277130d7
2134dc1
to
bfd3716
Compare
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
… prefetch range This passes the values down but doesn't interpret them; future work Change-Id: I523b26e5a5a43fbf6ba5d2b6e44614c7e4fc70b7 HADOOP-18184. S3A Prefetching unbuffer. compiles against v2 sdk now Change-Id: Ic96af7f76931c6dcc453368ad02ae87d07fa4484 HADOOP-18184. temp file creation/test validation * use block id in filename * log statements include fs path * tests more resilient * logging auditor prints GET range and length Tests are failing with signs of * too many GETs * incomplete buffers. race conditions? Change-Id: Ibdca6292df8cf0149697cecfec24035e2be473d8 HADOOP-19043. S3A: Regression: ITestS3AOpenCost fails on prefetch test runs This is actually trickier than it seems as we will need to go deep into the implementation of caching. Specifically: the prefetcher knows the file length and if you open a file shorter than that, but less than one block, the read is considered a failure and the whole block is skipped, so read() of the nominally in-range data returns -1. This fix has to be considered a PoC and should be combined with the other big PR for prefetching, apache#5832 as that is where changes should go. Here is just test tuning and some differentiation of channel problems from other EOFs. Change-Id: Icdf7e2fb10ca77b6ca427eb207472fad277130d7 HADOOP-19043. S3A: Regression: ITestS3AOpenCost fails on prefetch test runs * Adds EOF logic deep into the prefetching code * Tests still failing. * this has all conflicts with hadoop trunk resolved Change-Id: I9b23b01d010d8a1a680ce849d26a0aebab2389e2 HADOOP-18184. fix NPEs in BlockManager unit tests by adding withPath() Change-Id: Ie3d1c266b1231fa85c01092dd79f2dcf961fe498 HADOOP-18184. prefetching - Cache was not thread safe and it was possible for cleanup to happen while the caller had just verified it was there and before a read lock was acquired. fix: synchronize check and get into one block, use synchronized elsewhere. - try to cut back on assertions in ITestS3APrefetchingLargeFiles which seem too brittle to prefetch behaviour/race conditions. - minor doc, log, assertion changes more work on that test failure needed Change-Id: I288540ec1fb08e1a5684cde8e94e1c7933d1e41d HADOOP-18184. prefetching: style Change-Id: Ifdde5ab33f24515c306a8ccc27ec784c3b6c0a76 HADOOP-18184. unbuffer: reinstate commented out asserts these asserts fail as I don't understand the prefetch logic well enough to make valid assertions Change-Id: I198d10ccead99754afd17040dc4f4c9ebc919906 HADOOP-18184. javadocs Change-Id: I61d013ba439c8f4093ad0634a67c6a20e82062ad
fa4b3f4
to
5b85014
Compare
💔 -1 overall
This message was automatically generated. |
* Lots of logging * prefetch operation appears to be blocking for much longer than the read will take, and the read doesn't take place then anyway. Sync problem? * Test didn't expect any prefetching Change-Id: Ie9da9a464ddccd481865eec2bc97fce5c50ed306
💔 -1 overall
This message was automatically generated. |
Making this a pre-requisite for vectored IO as
How was this patch tested?
s3 london. slow; want to ge the latest changes to speed up the prefetch tests then rebase onto it
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?