-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-17166. ABFS: making max concurrent requests and max requests that can be que… #2179
HADOOP-17166. ABFS: making max concurrent requests and max requests that can be que… #2179
Conversation
…ued configurable for AbfsOutputStream
Test results posted have failures. Whats the plan to handle them ? |
PFB the JIRAs to track the same. |
Driver test results using accounts in Canary region Account with HNS Support SharedKey [INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 [ERROR] Errors: [WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 16 OAuth [INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 [ERROR] Errors: [WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 24 Account without HNS support SharedKey [INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 [ERROR] Errors: [WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 16 OAuth [INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 [ERROR] Errors: [WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 24 Reported the error on the following JIRA. |
Prefer you use the strategy outlined in HADOOP-17195. Rather than pool settings for each output stream, the store has a single shared pool with semaphores to limit #of active entries per output stream. org.apache.hadoop.util.SemaphoredDelegatingExecutor does this for you. This ensures that when there is a low #of output streams, they good upload performance, but in heavy use then it throttles back. |
We are working on a similar approach. The same needs extensive tests to ensure the perf side is fine. Till the fix is available this can be used to control the memory consumption. This has been verified with a customer as well. |
The s3a queue design is pretty well tested |
Here is my current view of this patch
If you look at the S3A code, we have a single shared thread pool, which is based on another ASF project: https://github.com/apache/incubator-retired-s4/blob/master/subprojects/s4-comm/src/main/java/org/apache/s4/comm/staging/BlockingThreadPoolExecutorService.java private ListeningExecutorService boundedThreadPool;
// and in initialize()
int totalTasks = intOption(conf,
MAX_TOTAL_TASKS, DEFAULT_MAX_TOTAL_TASKS, 1);
long keepAliveTime = longOption(conf, KEEPALIVE_TIME,
DEFAULT_KEEPALIVE_TIME, 0);
boundedThreadPool = BlockingThreadPoolExecutorService.newInstance(
maxThreads,
maxThreads + totalTasks,
keepAliveTime, TimeUnit.SECONDS,
"s3a-transfer-shared");
// default value is 4
blockOutputActiveBlocks = intOption(conf,
FAST_UPLOAD_ACTIVE_BLOCKS, DEFAULT_FAST_UPLOAD_ACTIVE_BLOCKS, 1); When we create the output stream, we create a new output stream which, although it uses the thread pool, limits the #of blocks each worker can queue for upload. ...
new S3ABlockOutputStream(this,
destKey,
new SemaphoredDelegatingExecutor(boundedThreadPool,
blockOutputActiveBlocks, true),
progress,
partSize,
blockFactory,
statisticsContext.newOutputStreamStatistics(),
getWriteOperationHelper(),
putTracker), This gives us
Weaknesses
I think the S3A code
for point #3: just done it for IOStatistics: https://github.com/apache/hadoop/blob/f5efa4b27536a9e266d9dc06cd3a1e11ded3bfd3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/SemaphoredDelegatingExecutor.java If you pass in a duration factory then the executor will measure the time to acquire a thread before the actual execution. This is nice to know when trying to answer the "why so slow?" question -as it will either show a problem or show where not to look. of course -doesn't just mean 'not enough executors' as 'why are all the submitted operations taking so long?' could be a sign of network problems. To summarise then
|
Thanks for the inputs. There is already work in progress for a long term fix. But I will try these suggestions before raising a PR for the same. |
💔 -1 overall
This message was automatically generated. |
@@ -52,6 +52,8 @@ | |||
public static final String AZURE_OAUTH_TOKEN_FETCH_RETRY_DELTA_BACKOFF = "fs.azure.oauth.token.fetch.retry.delta.backoff"; | |||
|
|||
// Read and write buffer sizes defined by the user | |||
public static final String AZURE_WRITE_MAX_CONCURRENT_REQUESTS = "fs.azure.write.max.concurrent.requests"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use experimental
in name to show they are exactly that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These configs are tested on prod environments. The same can remain as a means to controle the resource usage. With the internal discussions we had we would like to keep the same this way.
@@ -796,6 +796,18 @@ will be -1. To disable readaheads, set this value to 0. If your workload is | |||
doing only random reads (non-sequential) or you are seeing throttling, you | |||
may try setting this value to 0. | |||
|
|||
To run under limited memory situations configure the following. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make clear: or when doing many writes in same process (bulk uploads, hive LLAP/spark with many workers)
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsOutputStream.java
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
LGTM, +1 pending you make clear in the option names and docs that this is experimental
We all have this problem, we all want to get a fix in. My point of view is that a shared thread pool with queue managed to stop one single output stream using up all the capacity is that the correct solution. I base this on S3ABlockOutputStream, whose pool class is in hadoop-common. I also understand, why a simple "let's do this right now" fix can address a situation will rapidly before the ABFS streams switch to However, it is precisely because we know it is an interim fix that I want all the options to have 'experimental' in their name. That way, when they get removed, people won't get upset that the options they were using have gone away. I recognise that you are shipping with this fix, and that you have cluster configurations which use them. However, it is long-standing policy in the project which is "the ASF project must not have its decisions determined by the fact that someone has already shipped a feature in their own branch". That's important: we have all shipped fixes early, and then had to deal with catching up with production releases. I believe the HDFS IPC wire format change between Hadoop 2.0.205 (used in CDH) and Hadoop 2.2.0 was the most controversial here as it was actual protocol incompatibility. The situation here is minor in comparision. Anyone is free to ship a hadoop build with their own changes, but that cannot be used as a veto on changes in the open source codebase itself This makes sense, when you think of it. The good news,
If the old option is used, .the user will see a message logged @ info (unless they have turned off the deprecation log), and the value is remapped to the new one.
So, please use experimental in the name, it gives us freedom to come up with a good design for how to do block upload buffering/pooling without having any future design decisions constrained by this intermediate patch. thanks. |
Adds the options to control the size of the per-output-stream threadpool when writing data through the abfs connector * fs.azure.write.max.concurrent.requests * fs.azure.write.max.requests.to.queue Contributed by Bilahari T H
…pache#2179) Adds the options to control the size of the per-output-stream threadpool when writing data through the abfs connector * fs.azure.write.max.concurrent.requests * fs.azure.write.max.requests.to.queue Contributed by Bilahari T H Conflicts: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java Change-Id: Iab45ca33cf903dc834b6867ac10f7936637c2c8a
Making the AbfsOutputStream maxConcurrentRequests and the maximum size to which the threadpool queue can grow up to.
Driver test results using accounts in Central India
mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
**Client credentials
Account with HNS Support**
[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 451, Failures: 0, Errors: 1, Skipped: 75
[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 24
Account without HNS support
[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 451, Failures: 0, Errors: 1, Skipped: 248
[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 24
**Accesskey
Account with HNS Support**
[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[ERROR] ITestGetNameSpaceEnabled.testFailedRequestWhenCredentialsNotCorrect:160->AbstractAbfsIntegrationTest.getFileSystem:254 » KeyProvider
[INFO]
[ERROR] Tests run: 451, Failures: 0, Errors: 2, Skipped: 42
WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 16
Account without HNS support
[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0
[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 451, Failures: 0, Errors: 1, Skipped: 245
[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 16