-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17628. Distcp contract test is really slow with ABFS and S3A; timing out. #3240
HADOOP-17628. Distcp contract test is really slow with ABFS and S3A; timing out. #3240
Conversation
tested |
…timing out. Changes * subclass can declare whether or not -direct should be default * all tests then switch to that * major cut back on default depth/width of directories * if you set the file size of "scale.test.distcp.file.size.kb" to 0 the large file test case is skipped. * aggressive cutback on all needless mkdir() calls. S3A suite * declares -direct always ABFS suites * deletes superfluous ITestAbfsFileSystemContractSecureDistCp * uses abfs scale test timeout * only runs with -Dscale All these changes bring execution time down from 4 min to 2 min against each store. Change-Id: Ide8c369fb7b97b84c2099e36a8925e183901d9e1
Change-Id: I464f2071db2a0b17ac7589ea0ccb9b671d065c65
b7b11a0
to
8c9d528
Compare
Reviews invited from all, including @mukund-thakur @mehakmeet @ayushtkn @sumangala-patki @bilaharith |
full azure test run with parallel dir patch against cardiff, This test is now a -Dscale option as even though its now a lot faster, it's still slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, thanx for fixing
...tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractDistCp.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
One thing though since we have changed the base test class, have we verified if other distcp tests like hdfs and local are not getting impacted?
* false by default; enable for stores where rename is slow. | ||
* @return true if direct write should be used in all tests. | ||
*/ | ||
protected boolean directWriteAlways() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: method name contains Always but default is false, a bit ambiguous no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to shouldUseDirectWrite
@@ -758,7 +828,7 @@ public void testDistCpWithUpdateExistFile() throws Exception { | |||
verifyPathExists(remoteFS, "", source); | |||
verifyPathExists(localFS, "", dest); | |||
DistCpTestUtils.assertRunDistCp(DistCpConstants.SUCCESS, source.toString(), | |||
dest.toString(), "-delete -update", conf); | |||
dest.toString(), "-delete -update" + getDefaultCLIOptions(), conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : options can be refactored like it is done above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did think about it, but I also felt that it might be prudent to have tests with the CLI to parse, so make sure that -direct is handled there properly. We've been hit in the past by some failures with s3guard and the FS shell API because we were always working at the API level
Will add to release notes the fact you can turn off the large file uploads through <property>
<name>scale.test.distcp.file.size.kb</name>
<value>0</value>
</property> This is useful for anyone doing testing from home on a network with slower upload speeds |
optional OptionalTestHDFSContractDistCp runs fine from IDE'; 4 minutes. All the performance issues of the HDFS contract are related to the large file tests at 2 minutes for one, 1:30 for the other. If those tests were turned off then they could always be run, which would give us better regression checks on the object store behaviours matching HDFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on Central-India(ABFS) and ap-south-1(S3AFS).
+1, some nits.
int fileSizeKb = conf.getInt(SCALE_TEST_DISTCP_FILE_SIZE_KB, | ||
DEFAULT_DISTCP_SIZE_KB); | ||
if (fileSizeKb < 1) { | ||
skip("File size in " + SCALE_TEST_DISTCP_FILE_SIZE_KB + " too small"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe, we should say to "make fileSize in SCALE_TEST_DISTCP_FILE_SIZE_KB
be greater than or equal to 1" or "File Size in SCALE_TEST_DISTCP_FILE_SIZE_KB
smaller than 1"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now
"File size in " + SCALE_TEST_DISTCP_FILE_SIZE_KB + " is zero
It's not bug, just a fact...for HDFS suite it'll be zero by default now
@@ -612,6 +634,9 @@ public void testDirectWrite() throws Exception { | |||
|
|||
@Test | |||
public void testNonDirectWrite() throws Exception { | |||
if (directWriteAlways()) { | |||
skip("not needed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe move below describe()
, or mention in skip message what is being skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, it should be in the previous test. So moved up. thanks for drawing my attention to it.
* Address review comments * log IOStats after each test case. Important: as the cached FS retains statistics, the numbers get bigger over time. * HDFS test is now reinstated, as we've identified that most of its long execution time is from the large file upload/download suites. Disable them and its execution time drops from 4m to 30s, which means it can then be used to make sure the contract suite is consistent between HDFS and the object stores. Change-Id: I6d1cf5bf42916035a806fa9e78c003074f8f12b9
Latest release
IOStats of full suite against S3 london (1:43s)
IOStats of full suite against AWS cardiff (1:28). That region is about 30 miles away from here, though I don't know how cables are routed across the Bristol Channel; it'll probably be a bit longer. In contrast, london will be 100-120 miles away, so latency always going to be a bit higher there.
ABFS is collecting many fewer stats, we really need
Really interesting there that HEAD -> 404 has a mean time of 46ms; HEAD to 200 of 36 millis. |
🎊 +1 overall
This message was automatically generated. |
Change-Id: Ie64af6e109935243cd4c65454cf8ee66ec6fbf58
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
…timing out. (#3240) This patch cuts down the size of directory trees used for distcp contract tests against object stores, so making them much faster against distant/slow stores. On abfs, the test only runs with -Dscale (as was the case for s3a already), and has the larger scale test timeout. After every test case, the FileSystem IOStatistics are logged, to provide information about what IO is taking place and what it's performance is. There are some test cases which upload files of 1+ MiB; you can increase the size of the upload in the option "scale.test.distcp.file.size.kb" Set it to zero and the large file tests are skipped. Contributed by Steve Loughran.
…timing out. (apache#3240) This patch cuts down the size of directory trees used for distcp contract tests against object stores, so making them much faster against distant/slow stores. On abfs, the test only runs with -Dscale (as was the case for s3a already), and has the larger scale test timeout. After every test case, the FileSystem IOStatistics are logged, to provide information about what IO is taking place and what it's performance is. There are some test cases which upload files of 1+ MiB; you can increase the size of the upload in the option "scale.test.distcp.file.size.kb" Set it to zero and the large file tests are skipped. Contributed by Steve Loughran.
…le checking for file skip. (apache#5387) Adding toggleable support for modification time during distcp -update between two stores with incompatible checksum comparison. Also Contains: "HADOOP-18633. fix test AbstractContractDistCpTest#testDistCpUpdateCheckFileSkip." Also Contains: "CDPD-28513. HADOOP-17628. Distcp contract test is really slow with ABFS and S3A; timing out. (apache#3240)" to ensure correctness in the tests. Change-Id: I5ce8d630396c8ea525195841a5e3d3f6aaea7455
Changes
the large file test case is skipped.
S3A suite
ABFS suites
All these changes bring execution time down from 4 min to 2 min against each store.