Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-17166. ABFS: making max concurrent requests and max requests that can be que… #2179

Merged
merged 7 commits into from
Sep 9, 2020

Conversation

bilaharith
Copy link
Contributor

Making the AbfsOutputStream maxConcurrentRequests and the maximum size to which the threadpool queue can grow up to.

Driver test results using accounts in Central India
mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

**Client credentials

Account with HNS Support**
[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0

[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 451, Failures: 0, Errors: 1, Skipped: 75

[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

Account without HNS support
[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0

[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 451, Failures: 0, Errors: 1, Skipped: 248

[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

**Accesskey

Account with HNS Support**
[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0

[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[ERROR] ITestGetNameSpaceEnabled.testFailedRequestWhenCredentialsNotCorrect:160->AbstractAbfsIntegrationTest.getFileSystem:254 » KeyProvider
[INFO]
[ERROR] Tests run: 451, Failures: 0, Errors: 2, Skipped: 42

WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 16

Account without HNS support
[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0

[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 451, Failures: 0, Errors: 1, Skipped: 245

[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 16

@snvijaya
Copy link
Contributor

Test results posted have failures. Whats the plan to handle them ?

@bilaharith
Copy link
Contributor Author

bilaharith commented Aug 19, 2020

Test results posted have failures. Whats the plan to handle them ?

PFB the JIRAs to track the same.
https://issues.apache.org/jira/browse/HADOOP-17160
https://issues.apache.org/jira/browse/HADOOP-17149

@bilaharith bilaharith closed this Aug 19, 2020
@bilaharith bilaharith reopened this Aug 19, 2020
@bilaharith
Copy link
Contributor Author

Driver test results using accounts in Canary region
mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify

Account with HNS Support

SharedKey

[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0

[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 451, Failures: 0, Errors: 1, Skipped: 32

[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 16

OAuth

[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0

[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 453, Failures: 0, Errors: 1, Skipped: 74

[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

Account without HNS support

SharedKey

[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0

[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 453, Failures: 0, Errors: 1, Skipped: 245

[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 16

OAuth

[INFO] Tests run: 87, Failures: 0, Errors: 0, Skipped: 0

[ERROR] Errors:
[ERROR] ITestAbfsInputStreamStatistics.testReadAheadCounters:346 » TestTimedOut test t...
[INFO]
[ERROR] Tests run: 453, Failures: 0, Errors: 1, Skipped: 249

[WARNING] Tests run: 207, Failures: 0, Errors: 0, Skipped: 24

Reported the error on the following JIRA.
https://issues.apache.org/jira/browse/HADOOP-17158

@steveloughran
Copy link
Contributor

Prefer you use the strategy outlined in HADOOP-17195. Rather than pool settings for each output stream, the store has a single shared pool with semaphores to limit #of active entries per output stream. org.apache.hadoop.util.SemaphoredDelegatingExecutor does this for you. This ensures that when there is a low #of output streams, they good upload performance, but in heavy use then it throttles back.

@bilaharith
Copy link
Contributor Author

Prefer you use the strategy outlined in HADOOP-17195. Rather than pool settings for each output stream, the store has a single shared pool with semaphores to limit #of active entries per output stream. org.apache.hadoop.util.SemaphoredDelegatingExecutor does this for you. This ensures that when there is a low #of output streams, they good upload performance, but in heavy use then it throttles back.

We are working on a similar approach. The same needs extensive tests to ensure the perf side is fine. Till the fix is available this can be used to control the memory consumption. This has been verified with a customer as well.

@steveloughran
Copy link
Contributor

We are working on a similar approach. The same needs extensive tests to ensure the perf side is fine. Till the fix is available this can be used to control the memory consumption. This has been verified with a customer as well

The s3a queue design is pretty well tested

@steveloughran
Copy link
Contributor

Here is my current view of this patch

  • as a quick workaround for scale-up issues in the existing code -it works
  • as the long term design for something which delivers great performance when scaled up as well as down -it's not suitable
  • the code you need is already there

If you look at the S3A code, we have a single shared thread pool, which is based on another ASF project: https://github.com/apache/incubator-retired-s4/blob/master/subprojects/s4-comm/src/main/java/org/apache/s4/comm/staging/BlockingThreadPoolExecutorService.java

  private ListeningExecutorService boundedThreadPool;
  
  // and in initialize()
  int totalTasks = intOption(conf,
      MAX_TOTAL_TASKS, DEFAULT_MAX_TOTAL_TASKS, 1);
  long keepAliveTime = longOption(conf, KEEPALIVE_TIME,
      DEFAULT_KEEPALIVE_TIME, 0);
  boundedThreadPool = BlockingThreadPoolExecutorService.newInstance(
      maxThreads,
      maxThreads + totalTasks,
      keepAliveTime, TimeUnit.SECONDS,
      "s3a-transfer-shared");
  
  // default value is 4
  blockOutputActiveBlocks = intOption(conf,
      FAST_UPLOAD_ACTIVE_BLOCKS, DEFAULT_FAST_UPLOAD_ACTIVE_BLOCKS, 1);

When we create the output stream, we create a new output stream which, although it uses the thread pool, limits the #of blocks each worker can queue for upload.

    ...
new S3ABlockOutputStream(this,
    destKey,
    new SemaphoredDelegatingExecutor(boundedThreadPool,
        blockOutputActiveBlocks, true),
    progress,
    partSize,
    blockFactory,
    statisticsContext.newOutputStreamStatistics(),
    getWriteOperationHelper(),
    putTracker),

This gives us

  • low latency for output stream launch (shared pool)
  • observability into total pool size/load (could make it a gauge...)
  • semaphore to stop a single worker from overloading the system
  • A thread pool to use for other operations.

Weaknesses

  • I think we overuse the blocking thread pool in other places (openFile(),..)& should do a review to make sure we are using the unbounded one more. I managed to create a deadlock once.
  • If you set the active block option "fs.s3a.fast.upload.active.blocks" too big then you can still OOM if you buffer in RAM. Default S3A block buffer is on disk to avoid this problem. Now people only complain about us not cleaning up temp disk space when spark workers are killed
  • You still need to decide that blocking thread pool size (default = 32).

I think the S3A code

  1. should lift that * cores option and say "if the pool size < 0 then we use it as a multiplier for cores"", so -4 would mean "4 * cores". Maybe too late to do this in a backwards compatible way though now, unless I add a new option, deprecate the old one, etc.
  2. Add pool size as a gauge. If we could also (somehow) include #of pending blocks across all semaphored executors then you'd get a view of how many threads were being held up.
  3. And we could add "time to wait for a thread" as a metric too. Actually, I like that...if we can tie to a stream we can tie to a task, and hence to a job. Tricky though. Let me see what I can do there with the IOStatistics PR in front of me.

for point #3: just done it for IOStatistics: https://github.com/apache/hadoop/blob/f5efa4b27536a9e266d9dc06cd3a1e11ded3bfd3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/SemaphoredDelegatingExecutor.java

If you pass in a duration factory then the executor will measure the time to acquire a thread before the actual execution. This is nice to know when trying to answer the "why so slow?" question -as it will either show a problem or show where not to look. of course -doesn't just mean 'not enough executors' as 'why are all the submitted operations taking so long?' could be a sign of network problems.

To summarise then

  • just go for the same thread pool in Store and semaphored wrapper around this passed in to each output stream
  • choose some pool size, with the idea of a negative value == per core making it easy to do a scalable config
  • plan for wiring up statistics collection in the near future.

@bilaharith
Copy link
Contributor Author

Here is my current view of this patch

  • as a quick workaround for scale-up issues in the existing code -it works
  • as the long term design for something which delivers great performance when scaled up as well as down -it's not suitable
  • the code you need is already there

If you look at the S3A code, we have a single shared thread pool, which is based on another ASF project: https://github.com/apache/incubator-retired-s4/blob/master/subprojects/s4-comm/src/main/java/org/apache/s4/comm/staging/BlockingThreadPoolExecutorService.java

  private ListeningExecutorService boundedThreadPool;
  
  // and in initialize()
  int totalTasks = intOption(conf,
      MAX_TOTAL_TASKS, DEFAULT_MAX_TOTAL_TASKS, 1);
  long keepAliveTime = longOption(conf, KEEPALIVE_TIME,
      DEFAULT_KEEPALIVE_TIME, 0);
  boundedThreadPool = BlockingThreadPoolExecutorService.newInstance(
      maxThreads,
      maxThreads + totalTasks,
      keepAliveTime, TimeUnit.SECONDS,
      "s3a-transfer-shared");
  
  // default value is 4
  blockOutputActiveBlocks = intOption(conf,
      FAST_UPLOAD_ACTIVE_BLOCKS, DEFAULT_FAST_UPLOAD_ACTIVE_BLOCKS, 1);

When we create the output stream, we create a new output stream which, although it uses the thread pool, limits the #of blocks each worker can queue for upload.

    ...
new S3ABlockOutputStream(this,
    destKey,
    new SemaphoredDelegatingExecutor(boundedThreadPool,
        blockOutputActiveBlocks, true),
    progress,
    partSize,
    blockFactory,
    statisticsContext.newOutputStreamStatistics(),
    getWriteOperationHelper(),
    putTracker),

This gives us

  • low latency for output stream launch (shared pool)
  • observability into total pool size/load (could make it a gauge...)
  • semaphore to stop a single worker from overloading the system
  • A thread pool to use for other operations.

Weaknesses

  • I think we overuse the blocking thread pool in other places (openFile(),..)& should do a review to make sure we are using the unbounded one more. I managed to create a deadlock once.
  • If you set the active block option "fs.s3a.fast.upload.active.blocks" too big then you can still OOM if you buffer in RAM. Default S3A block buffer is on disk to avoid this problem. Now people only complain about us not cleaning up temp disk space when spark workers are killed
  • You still need to decide that blocking thread pool size (default = 32).

I think the S3A code

  1. should lift that * cores option and say "if the pool size < 0 then we use it as a multiplier for cores"", so -4 would mean "4 * cores". Maybe too late to do this in a backwards compatible way though now, unless I add a new option, deprecate the old one, etc.
  2. Add pool size as a gauge. If we could also (somehow) include #of pending blocks across all semaphored executors then you'd get a view of how many threads were being held up.
  3. And we could add "time to wait for a thread" as a metric too. Actually, I like that...if we can tie to a stream we can tie to a task, and hence to a job. Tricky though. Let me see what I can do there with the IOStatistics PR in front of me.

for point #3: just done it for IOStatistics: https://github.com/apache/hadoop/blob/f5efa4b27536a9e266d9dc06cd3a1e11ded3bfd3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/SemaphoredDelegatingExecutor.java

If you pass in a duration factory then the executor will measure the time to acquire a thread before the actual execution. This is nice to know when trying to answer the "why so slow?" question -as it will either show a problem or show where not to look. of course -doesn't just mean 'not enough executors' as 'why are all the submitted operations taking so long?' could be a sign of network problems.

To summarise then

  • just go for the same thread pool in Store and semaphored wrapper around this passed in to each output stream
  • choose some pool size, with the idea of a negative value == per core making it easy to do a scalable config
  • plan for wiring up statistics collection in the near future.

Thanks for the inputs. There is already work in progress for a long term fix. But I will try these suggestions before raising a PR for the same.
Currently I would like this change as a quick workaround/interim fix.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 32m 31s trunk passed
+1 💚 compile 0m 36s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 0m 28s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 0m 22s trunk passed
+1 💚 mvnsite 0m 36s trunk passed
+1 💚 shadedclient 16m 27s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 25s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 23s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 0m 58s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 0m 56s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 30s the patch passed
+1 💚 compile 0m 32s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 0m 32s the patch passed
+1 💚 compile 0m 24s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 javac 0m 24s the patch passed
+1 💚 checkstyle 0m 16s the patch passed
+1 💚 mvnsite 0m 29s the patch passed
+1 💚 whitespace 0m 1s The patch has no whitespace issues.
+1 💚 shadedclient 16m 41s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 24s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 26s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 findbugs 1m 12s the patch passed
_ Other Tests _
+1 💚 unit 1m 18s hadoop-azure in the patch passed.
-1 ❌ asflicense 0m 31s The patch generated 3 ASF License warnings.
78m 2s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2179/8/artifact/out/Dockerfile
GITHUB PR #2179
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint
uname Linux 01aa91d20d02 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 4454286
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2179/8/testReport/
asflicense https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2179/8/artifact/out/patch-asflicense-problems.txt
Max. process+thread count 309 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2179/8/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@@ -52,6 +52,8 @@
public static final String AZURE_OAUTH_TOKEN_FETCH_RETRY_DELTA_BACKOFF = "fs.azure.oauth.token.fetch.retry.delta.backoff";

// Read and write buffer sizes defined by the user
public static final String AZURE_WRITE_MAX_CONCURRENT_REQUESTS = "fs.azure.write.max.concurrent.requests";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use experimental in name to show they are exactly that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These configs are tested on prod environments. The same can remain as a means to controle the resource usage. With the internal discussions we had we would like to keep the same this way.

@@ -796,6 +796,18 @@ will be -1. To disable readaheads, set this value to 0. If your workload is
doing only random reads (non-sequential) or you are seeing throttling, you
may try setting this value to 0.

To run under limited memory situations configure the following.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make clear: or when doing many writes in same process (bulk uploads, hive LLAP/spark with many workers)

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 25s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 31m 4s trunk passed
+1 💚 compile 0m 34s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 compile 0m 28s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 checkstyle 0m 22s trunk passed
+1 💚 mvnsite 0m 33s trunk passed
+1 💚 shadedclient 16m 29s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 29s trunk passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 23s trunk passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+0 🆗 spotbugs 0m 55s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 0m 53s trunk passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 27s the patch passed
+1 💚 compile 0m 27s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javac 0m 27s the patch passed
+1 💚 compile 0m 22s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 javac 0m 22s the patch passed
+1 💚 checkstyle 0m 15s the patch passed
+1 💚 mvnsite 0m 26s the patch passed
-1 ❌ whitespace 0m 0s The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 shadedclient 15m 30s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 0m 22s the patch passed with JDK Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1
+1 💚 javadoc 0m 20s the patch passed with JDK Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
+1 💚 findbugs 0m 57s the patch passed
_ Other Tests _
+1 💚 unit 1m 14s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 28s The patch does not generate ASF License warnings.
74m 12s
Subsystem Report/Notes
Docker ClientAPI=1.40 ServerAPI=1.40 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2179/9/artifact/out/Dockerfile
GITHUB PR #2179
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle markdownlint
uname Linux 240462ef6ff8 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0207f5c
Default Java Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.8+10-post-Ubuntu-0ubuntu118.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_265-8u265-b01-0ubuntu2~18.04-b01
whitespace https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2179/9/artifact/out/whitespace-eol.txt
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2179/9/testReport/
Max. process+thread count 357 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2179/9/console
versions git=2.17.1 maven=3.6.0 findbugs=4.0.6
Powered by Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Sep 3, 2020
@apache apache deleted a comment from hadoop-yetus Sep 3, 2020
@apache apache deleted a comment from hadoop-yetus Sep 3, 2020
@apache apache deleted a comment from hadoop-yetus Sep 3, 2020
@apache apache deleted a comment from hadoop-yetus Sep 3, 2020
@steveloughran
Copy link
Contributor

LGTM, +1 pending you make clear in the option names and docs that this is experimental

These configs are tested on prod environments.

The same can remain as a means to control the resource usage. With the internal discussions we had we would like to keep the same this way.

We all have this problem, we all want to get a fix in.

My point of view is that a shared thread pool with queue managed to stop one single output stream using up all the capacity is that the correct solution. I base this on S3ABlockOutputStream, whose pool class is in hadoop-common.

I also understand, why a simple "let's do this right now" fix can address a situation will rapidly before the ABFS streams switch to BlockingThreadPoolExecutorService.

However, it is precisely because we know it is an interim fix that I want all the options to have 'experimental' in their name. That way, when they get removed, people won't get upset that the options they were using have gone away.

I recognise that you are shipping with this fix, and that you have cluster configurations which use them. However, it is long-standing policy in the project which is "the ASF project must not have its decisions determined by the fact that someone has already shipped a feature in their own branch". That's important: we have all shipped fixes early, and then had to deal with catching up with production releases. I believe the HDFS IPC wire format change between Hadoop 2.0.205 (used in CDH) and Hadoop 2.2.0 was the most controversial here as it was actual protocol incompatibility. The situation here is minor in comparision.

Anyone is free to ship a hadoop build with their own changes, but that cannot be used as a veto on changes in the open source codebase itself

This makes sense, when you think of it.

The good news, Configuration.addDeprecations() lets you provide an automated way to map from deprecated values (your current set of options) to ones with "experimental" in the name, such as here (S3ATestUtils)

  private static void addDeprecatedKeys() {
    Configuration.DeprecationDelta[] deltas = {
        // STS endpoint configuration option
        new Configuration.DeprecationDelta(
            S3ATestConstants.TEST_STS_ENDPOINT,
            ASSUMED_ROLE_STS_ENDPOINT)
    };

    Configuration.addDeprecations(deltas);
    Configuration.reloadExistingConfigurations();
  }

  static {
    addDeprecatedKeys();
  }

If the old option is used, .the user will see a message logged @ info (unless they have turned off the deprecation log), and the value is remapped to the new one.

log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN

So, please use experimental in the name, it gives us freedom to come up with a good design for how to do block upload buffering/pooling without having any future design decisions constrained by this intermediate patch.

thanks.

asfgit pushed a commit that referenced this pull request Oct 14, 2020
Adds the options to control the size of the per-output-stream threadpool
when writing data through the abfs connector

* fs.azure.write.max.concurrent.requests
* fs.azure.write.max.requests.to.queue

Contributed by Bilahari T H
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
…pache#2179)

Adds the options to control the size of the per-output-stream threadpool
when writing data through the abfs connector

* fs.azure.write.max.concurrent.requests
* fs.azure.write.max.requests.to.queue

Contributed by Bilahari T H

Conflicts:
	hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java

Change-Id: Iab45ca33cf903dc834b6867ac10f7936637c2c8a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants