HADOOP-18971: [ABFS] Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size #6270

anujmodi2021 · 2023-11-13T11:30:24Z

https://issues.apache.org/jira/browse/HADOOP-18971

Description of PR

Footer Read Optimization was introduced to Hadoop azure in this Jira: https://issues.apache.org/jira/browse/HADOOP-17347
and was kept disabled by default.
This PR is to enable footer reads by default based on the results of analysis performed as below:

In our scale workload analysis, it was found that workloads working with Parquet (or for that matter OCR etc.) have a lot of footer reads. Footer reads here refers to the read operations done by workload to get the metadata of the parquet file which is required to understand where the actual data resides in the parquet.
This whole process takes place in 3 steps:

Workload reads the last 8 bytes of parquet file to get the offset and size of the metadata which is present just above these 8 bytes.
Using that offset, workload reads the metadata to get the exact offset and length of data which it wants to read.
Workload performs the final read operation to get the data it wants to use for its purpose.

Here the first two steps are metadata reads that can be combined into a single footer read. When workload tries to read certain last few bytes of data (let's say this value is footer size), driver will intelligently read some extra bytes above the footer size to cater to the next read which is going to come.

Q. What is the footer size of file?
A: 16KB. Any read request trying to get the data within last 16KB of the file will qualify for whole footer read. This value is enough to cater to all types of files including parquet, OCR, etc.

Q. What is the buffer size to read when reading the footer?
A. Let's call this footer read buffer size. Prior to this PR footer read buffer size was same as read buffer size (default 4MB). It was found that for most of the workload required footer size was only 256KB. i.e. For almost all parquet files metadata for that file was found to be within last 256KBs. Keeping this in mind it does not make sense to read whole buffer length of 4MB as a part of footer read. Moreover, reading larger data than require incur additional costs in terms of server and network latencies. Based on this and extensive experimentation it was observed that footer read buffer size of 512KB is ideal for almost all the workloads running on parquet, OCR, etc.

Following configuration was introduced to configure the footer read buffer size:
fs.azure.footer.read.request.size: default 512 KB.

Quantitative Stats: For a workload running on parquet files with above optimization enabled, the number of read requests got reduced by 2.3M down from 20M. That means around 10% reduction in overall TPS.

How was this patch tested?

Test suite was run and no failures were found.
Existing tests were modified to cover the production code changes.

For code changes:

Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java

saxenapranav · 2023-11-13T12:36:11Z

...ure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsInputStreamReadFooter.java

@@ -190,7 +193,8 @@ private void seekReadAndTest(final FileSystem fs, final Path testFilePath,
    try (FSDataInputStream iStream = fs.open(testFilePath)) {
      AbfsInputStream abfsInputStream = (AbfsInputStream) iStream
          .getWrappedStream();
-      long bufferSize = abfsInputStream.getBufferSize();
+      long footerReadBufferSize = abfsInputStream.getFooterReadBufferSize();


+1 on diff sizes of file. Should we have parameterized values for getFooterReadBufferSize. Right now, it depends on what test-config developer has.

Not able to get this...
Can you please elaborate...

The footer buffer size here will be the default one unless user sets it in configs explicitly.
Are you recommending this to be hardcoded.

The default value of this config is 256KB. Now, developer can have any other config also. Right now, test is very much inline of using 256 KB. What I am proposing is, that in the test, we set the config and don't depend on the dev given config. Plus, I am proposing we run this test for different values of footerBufferSize.

I'd propose something else, will comment below

hadoop-yetus · 2023-11-13T13:53:20Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 32s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	48m 3s		trunk passed
+1 💚	compile	0m 37s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 33s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 28s		trunk passed
+1 💚	mvnsite	0m 38s		trunk passed
+1 💚	javadoc	0m 35s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 31s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 3s		trunk passed
+1 💚	shadedclient	39m 10s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 28s		the patch passed
+1 💚	compile	0m 30s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 30s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 18s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 6 new + 2 unchanged - 0 fixed = 8 total (was 2)
+1 💚	mvnsite	0m 28s		the patch passed
+1 💚	javadoc	0m 25s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 23s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 4s		the patch passed
+1 💚	shadedclient	38m 14s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 2s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		141m 29s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/1/artifact/out/Dockerfile
GITHUB PR	#6270
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux c4ae513e19be 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `5af02c5`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/1/testReport/
Max. process+thread count	532 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/1/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-11-13T14:32:21Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 42s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 1 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 41s		trunk passed
+1 💚	compile	0m 40s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 35s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 33s		trunk passed
+1 💚	mvnsite	0m 40s		trunk passed
+1 💚	javadoc	0m 39s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 35s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 6s		trunk passed
+1 💚	shadedclient	36m 45s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 30s		the patch passed
+1 💚	compile	0m 31s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 31s		the patch passed
+1 💚	compile	0m 27s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 27s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 22s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 8 new + 2 unchanged - 0 fixed = 10 total (was 2)
+1 💚	mvnsite	0m 31s		the patch passed
+1 💚	javadoc	0m 27s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 27s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 7s		the patch passed
+1 💚	shadedclient	36m 21s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 1s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 38s		The patch does not generate ASF License warnings.
		137m 20s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/2/artifact/out/Dockerfile
GITHUB PR	#6270
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 93c32e743192 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `14214f0`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/2/testReport/
Max. process+thread count	692 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/2/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

anujmodi2021 · 2023-11-13T17:06:24Z

:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

HNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] ITestAbfsFileSystemContractDistCp>AbstractContractDistCpTest.testUpdateDeepDirectoryStructureToRemote:259->AbstractContractDistCpTest.distCpUpdateDeepDirectoryStructure:334->AbstractContractDistCpTest.assertCounterInRange:294->Assert.assertTrue:42->Assert.fail:89 Files Copied value 2 above maximum 1
[INFO]
[ERROR] Tests run: 339, Failures: 1, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 591, Failures: 0, Errors: 0, Skipped: 274
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

Time taken: 25 mins 53 secs.

hadoop-yetus · 2023-11-13T19:32:19Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 42s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 27s		trunk passed
+1 💚	compile	0m 41s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 38s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 36s		trunk passed
+1 💚	mvnsite	0m 42s		trunk passed
+1 💚	javadoc	0m 43s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 37s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 10s		trunk passed
+1 💚	shadedclient	37m 3s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 30s		the patch passed
+1 💚	compile	0m 32s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 32s		the patch passed
+1 💚	compile	0m 27s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 27s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 19s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 4 new + 2 unchanged - 0 fixed = 6 total (was 2)
+1 💚	mvnsite	0m 31s		the patch passed
+1 💚	javadoc	0m 26s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 26s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 3s		the patch passed
+1 💚	shadedclient	38m 3s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 2s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 39s		The patch does not generate ASF License warnings.
		139m 39s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/3/artifact/out/Dockerfile
GITHUB PR	#6270
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux b9aa07842398 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `e438b94`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/3/testReport/
Max. process+thread count	554 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/3/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran

I'm going to propose that you add the option as a paramter to the openFile() builder API. This will

allow applications to explicitly set it (unlikely given parquet is trapped compiling against hadoop 2.8)
make it trival to write tests which open files of different footer length, including the stress test cases (4 bytes, 0 bytes...)

...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java

steveloughran · 2023-11-14T11:32:17Z

...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java

@@ -104,7 +104,20 @@ public final class ConfigurationKeys {
  public static final String AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = "fs.azure.write.enableappendwithflush";
  public static final String AZURE_READ_BUFFER_SIZE = "fs.azure.read.request.size";
  public static final String AZURE_READ_SMALL_FILES_COMPLETELY = "fs.azure.read.smallfilescompletely";
+  /**
+   * When parquet files are read, first few read are metadata reads before reading the actual data.


this is roughly the same for ORC, isn't it?

Yes, the procedure is same for ORC as well...
This optimization will impact workloads running on both parquet and ORC files

steveloughran · 2023-11-14T11:32:26Z

...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java

+  /**
+   * When parquet files are read, first few read are metadata reads before reading the actual data.
+   * First the read is done of last 8 bytes of parquet file to get the postion of metadta and next read
+   * is done for reading that metadata. With this optimizations these two reads can be combined into 1.


nit "optimization"

steveloughran · 2023-11-14T11:33:17Z

...op-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FileSystemConfigurations.java

@@ -59,7 +59,8 @@ public final class FileSystemConfigurations {
  public static final boolean DEFAULT_AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = false;
  public static final int DEFAULT_READ_BUFFER_SIZE = 4 * ONE_MB;  // 4 MB
  public static final boolean DEFAULT_READ_SMALL_FILES_COMPLETELY = false;
-  public static final boolean DEFAULT_OPTIMIZE_FOOTER_READ = false;
+  public static final boolean DEFAULT_OPTIMIZE_FOOTER_READ = true;
+  public static final int DEFAULT_FOOTER_READ_BUFFER_SIZE = 512 * ONE_KB;


this is 512k; docs in file above say 265K.

Updated the doc.
So 256KB is enough data but to be on safer side we want to read more. It was found that 512 KB is a sweet spot between caching as much data as possible without impacting performance as much as possible.

steveloughran · 2023-11-14T11:33:25Z

...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java

  public static final String AZURE_READ_OPTIMIZE_FOOTER_READ = "fs.azure.read.optimizefooterread";
+  /**
+   * In case of footer reads it was not required to read full buffer size.
+   * Most of the metadata information required was within 256KB and it will be more performant to read lesser.


"read less"

steveloughran · 2023-11-14T11:35:35Z

...ure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsInputStreamReadFooter.java

@@ -190,7 +193,8 @@ private void seekReadAndTest(final FileSystem fs, final Path testFilePath,
    try (FSDataInputStream iStream = fs.open(testFilePath)) {
      AbfsInputStream abfsInputStream = (AbfsInputStream) iStream
          .getWrappedStream();
-      long bufferSize = abfsInputStream.getBufferSize();
+      long footerReadBufferSize = abfsInputStream.getFooterReadBufferSize();


I'd propose something else, will comment below

anmolanmol1234 · 2023-11-15T10:45:06Z

...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java

@@ -358,6 +360,7 @@ private int readFileCompletely(final byte[] b, final int off, final int len)
    return optimisedRead(b, off, len, 0, contentLength);
  }

+  // To do footer read of files when enabled


nit: . at end for javadoc

hadoop-yetus · 2023-11-21T12:49:03Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 29s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	45m 20s		trunk passed
+1 💚	compile	0m 36s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 32s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 31s		trunk passed
+1 💚	mvnsite	0m 39s		trunk passed
+1 💚	javadoc	0m 38s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 33s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 3s		trunk passed
+1 💚	shadedclient	34m 31s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 29s		the patch passed
+1 💚	compile	0m 29s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 29s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 19s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 8 new + 2 unchanged - 0 fixed = 10 total (was 2)
+1 💚	mvnsite	0m 30s		the patch passed
+1 💚	javadoc	0m 26s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 24s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 7s		the patch passed
+1 💚	shadedclient	34m 52s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	1m 59s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		130m 17s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/4/artifact/out/Dockerfile
GITHUB PR	#6270
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux bc2d8f03a8ba 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `c0c1319`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/4/testReport/
Max. process+thread count	672 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/4/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

anujmodi2021 · 2023-11-22T09:25:32Z

:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

HNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 591, Failures: 0, Errors: 0, Skipped: 274
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41

Time taken: 27 mins 14 secs.

hadoop-yetus · 2023-11-22T09:34:44Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 46s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	49m 3s		trunk passed
+1 💚	compile	0m 37s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 33s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 29s		trunk passed
+1 💚	mvnsite	0m 38s		trunk passed
+1 💚	javadoc	0m 36s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 32s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 2s		trunk passed
+1 💚	shadedclient	37m 56s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 27s		the patch passed
+1 💚	compile	0m 30s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 30s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 19s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 10 new + 2 unchanged - 0 fixed = 12 total (was 2)
+1 💚	mvnsite	0m 29s		the patch passed
+1 💚	javadoc	0m 25s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 23s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 2s		the patch passed
+1 💚	shadedclient	37m 44s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 13s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 34s		The patch does not generate ASF License warnings.
		140m 7s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/5/artifact/out/Dockerfile
GITHUB PR	#6270
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 37ce969fe51e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `6bb6bd7`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/5/testReport/
Max. process+thread count	532 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/5/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

saxenapranav

Tests getting into better shape. Some points:

the default value comment
There are some tests which follow builder route and some follow config route, can we have a way wherein a given test run once for builder and once for config.

saxenapranav · 2023-11-22T09:56:43Z

...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java

@@ -811,13 +813,17 @@ private AbfsInputStreamContext populateAbfsInputStreamContext(
    boolean bufferedPreadDisabled = options
        .map(c -> c.getBoolean(FS_AZURE_BUFFERED_PREAD_DISABLE, false))
        .orElse(false);
+    int footerReadBufferSize = options.map(c -> c.getInt(
+        AZURE_FOOTER_READ_BUFFER_SIZE, DEFAULT_FOOTER_READ_BUFFER_SIZE))


In case if dev goes with builder route and have not given fs.azure.footer.read.request.size in builder opts, it will take the default value even though in config a value is defined. Better in place of default value, have the abfsConfugration.getFooterReadBufferSize().

How can you verify this happens, change bit of your testNumBackendCalls code:

private void testNumBackendCalls(boolean optimizeFooterRead) throws Exception { for (int i = 0; i <= 4; i++) { for (int j = 0; j <= 2; j++) { int fileSize = (int) Math.pow(2, i) * 256 * ONE_KB; int footerReadBufferSize = (int) Math.pow(2, j) * 256 * ONE_KB; final AzureBlobFileSystem fs = getFileSystem(optimizeFooterRead, fileSize, footerReadBufferSize); String fileName = methodName.getMethodName() + i; byte[] fileContent = getRandomBytesArray(fileSize); Path testFilePath = createFileWithContent(fs, fileName, fileContent); int length = AbfsInputStream.FOOTER_SIZE; FutureDataInputStreamBuilder builder = fs.openFile(testFilePath); // builder.opt(ConfigurationKeys.AZURE_FOOTER_READ_BUFFER_SIZE, // footerReadBufferSize); try (FSDataInputStream iStream = builder.build().get()) { ...... ``` Lets add this usecase as well in tests.

saxenapranav

all look good. just one thing.

saxenapranav · 2023-11-22T12:58:37Z

...ure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsInputStreamReadFooter.java

+
+    // Verify that default value is used if nothing is set explicitly
+    FSDataInputStream iStream = fs.open(testFilePath);
+    verifyConfigValueInStream(iStream, DEFAULT_FOOTER_READ_BUFFER_SIZE);


there should be a check if there is a config in test-config.

hadoop-yetus · 2023-11-22T14:58:53Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 46s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	47m 19s		trunk passed
+1 💚	compile	0m 36s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 34s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 29s		trunk passed
+1 💚	mvnsite	0m 38s		trunk passed
+1 💚	javadoc	0m 36s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 32s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 3s		trunk passed
+1 💚	shadedclient	37m 28s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 28s		the patch passed
+1 💚	compile	0m 29s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 29s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 18s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 9 new + 2 unchanged - 0 fixed = 11 total (was 2)
+1 💚	mvnsite	0m 29s		the patch passed
+1 💚	javadoc	0m 25s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 24s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 1s		the patch passed
+1 💚	shadedclient	38m 17s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 12s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 33s		The patch does not generate ASF License warnings.
		138m 28s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/6/artifact/out/Dockerfile
GITHUB PR	#6270
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux c1238a49b4ab 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `a7597ed`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/6/testReport/
Max. process+thread count	579 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/6/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

anujmodi2021 · 2023-11-24T06:20:06Z

@steveloughran @mukund-thakur @mehakmeet
Gentle reminder for reviewing this PR.
Thanks

saxenapranav

Thanks for taking suggestions. Look good.

hadoop-yetus · 2023-11-24T08:37:31Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 46s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 33s		trunk passed
+1 💚	compile	0m 39s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 34s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 30s		trunk passed
+1 💚	mvnsite	0m 39s		trunk passed
+1 💚	javadoc	0m 36s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 32s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 3s		trunk passed
+1 💚	shadedclient	37m 45s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 28s		the patch passed
+1 💚	compile	0m 30s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 30s		the patch passed
+1 💚	compile	0m 25s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 25s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 19s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 9 new + 2 unchanged - 0 fixed = 11 total (was 2)
+1 💚	mvnsite	0m 29s		the patch passed
+1 💚	javadoc	0m 25s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 24s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 3s		the patch passed
+1 💚	shadedclient	38m 7s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 12s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 36s		The patch does not generate ASF License warnings.
		137m 57s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/7/artifact/out/Dockerfile
GITHUB PR	#6270
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 5c896dd04fdc 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `2b2ee8e`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/7/testReport/
Max. process+thread count	606 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/7/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

hadoop-yetus · 2023-11-24T08:57:31Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 31s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 1s		No case conflicting files found.
+0 🆗	codespell	0m 0s		codespell was not available.
+0 🆗	detsecrets	0m 0s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	46m 29s		trunk passed
+1 💚	compile	0m 34s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 35s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	checkstyle	0m 30s		trunk passed
+1 💚	mvnsite	0m 39s		trunk passed
+1 💚	javadoc	0m 36s		trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 33s		trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 1s		trunk passed
+1 💚	shadedclient	37m 48s		branch has no errors when building and testing our client artifacts.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 29s		the patch passed
+1 💚	compile	0m 29s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 29s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 19s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 9 new + 2 unchanged - 0 fixed = 11 total (was 2)
+1 💚	mvnsite	0m 28s		the patch passed
+1 💚	javadoc	0m 25s		the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 24s		the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
+1 💚	spotbugs	1m 7s		the patch passed
+1 💚	shadedclient	38m 56s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	1m 57s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		138m 34s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/8/artifact/out/Dockerfile
GITHUB PR	#6270
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux 6fa94ffe7462 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `85a4d62`
Default Java	Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/8/testReport/
Max. process+thread count	631 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/8/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran

LGTM
+1

steveloughran · 2024-01-01T19:19:05Z

ok, you need to merge in trunk and it'll be good to go.

Can you also add as a comment what the explicit text you want in the commit message. I'm thinking something like

HADOOP-17347. ABFS: read/cache footer with fs.azure.footer.read.request.size

The option fs.azure.footer.read.request.size sets the size of the footer to
read and cache; the default value of 524288 is good for parquet

Contributed by ...

side issue, I'm actually thinking of adding cross-stream caching in s3a fs, so that threads working on different row chunks can share footers. Some small LRU cache, with IOStats collection to help measure its value.

thoughts?

anujmodi2021 · 2024-01-02T09:48:51Z

ok, you need to merge in trunk and it'll be good to go.

Can you also add as a comment what the explicit text you want in the commit message. I'm thinking something like
HADOOP-17347. ABFS: read/cache footer with fs.azure.footer.read.request.size

The option fs.azure.footer.read.request.size sets the size of the footer to
read and cache; the default value of 524288 is good for parquet

Contributed by ...
side issue, I'm actually thinking of adding cross-stream caching in s3a fs, so that threads working on different row chunks can share footers. Some small LRU cache, with IOStats collection to help measure its value.

thoughts?

Thanks for the review @steveloughran .
I have resolved merge conflicts and validated the changes by running the test suite again.
The commit message you mentioned seems good. Slight changes can be done as following:

HADOOP-17347. ABFS: read/cache footer with fs.azure.footer.read.request.size

The option fs.azure.footer.read.request.size sets the size of the footer to
read and cache; the default value of 524288 is good for most analytic workloads running on parquet, ORC etc file formats.

Contributed by ...

anujmodi2021 · 2024-01-02T09:55:28Z

Regarding the cross-stream caching.
Its sounds really good.
I think not only footer reads but this can be expanded to other prefetches as well.
Especially small files that are read fully can be cached such that multiple streams can be catered to.

anujmodi2021 · 2024-01-02T10:15:39Z

:::: AGGREGATED TEST RESULT ::::

HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41

HNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41

NonHNS-SharedKey

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11
[INFO] Results:
[INFO]
[WARNING] Tests run: 585, Failures: 0, Errors: 0, Skipped: 266
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 44

AppendBlob-HNS-OAuth

[INFO] Results:
[INFO]
[WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5
[INFO] Results:
[INFO]
[WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41

Time taken: 25 mins 3 secs.

hadoop-yetus · 2024-01-02T12:04:36Z

🎊 +1 overall

Vote	Subsystem	Runtime	Logfile	Comment
+0 🆗	reexec	0m 49s		Docker mode activated.
			_ Prechecks _
+1 💚	dupname	0m 0s		No case conflicting files found.
+0 🆗	codespell	0m 1s		codespell was not available.
+0 🆗	detsecrets	0m 1s		detect-secrets was not available.
+1 💚	@author	0m 0s		The patch does not contain any @author tags.
+1 💚	test4tests	0m 0s		The patch appears to include 3 new or modified test files.
			_ trunk Compile Tests _
+1 💚	mvninstall	47m 19s		trunk passed
+1 💚	compile	0m 48s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	compile	0m 36s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	checkstyle	0m 28s		trunk passed
+1 💚	mvnsite	0m 40s		trunk passed
+1 💚	javadoc	0m 38s		trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 33s		trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 6s		trunk passed
+1 💚	shadedclient	37m 53s		branch has no errors when building and testing our client artifacts.
-0 ⚠️	patch	38m 14s		Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
			_ Patch Compile Tests _
+1 💚	mvninstall	0m 28s		the patch passed
+1 💚	compile	0m 30s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javac	0m 30s		the patch passed
+1 💚	compile	0m 26s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	javac	0m 26s		the patch passed
+1 💚	blanks	0m 0s		The patch has no blanks issues.
-0 ⚠️	checkstyle	0m 19s	/results-checkstyle-hadoop-tools_hadoop-azure.txt	hadoop-tools/hadoop-azure: The patch generated 8 new + 2 unchanged - 0 fixed = 10 total (was 2)
+1 💚	mvnsite	0m 29s		the patch passed
+1 💚	javadoc	0m 26s		the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚	javadoc	0m 24s		the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚	spotbugs	1m 4s		the patch passed
+1 💚	shadedclient	37m 48s		patch has no errors when building and testing our client artifacts.
			_ Other Tests _
+1 💚	unit	2m 13s		hadoop-azure in the patch passed.
+1 💚	asflicense	0m 35s		The patch does not generate ASF License warnings.
		139m 12s

Subsystem	Report/Notes
Docker	ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/9/artifact/out/Dockerfile
GITHUB PR	#6270
Optional Tests	dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname	Linux ec31e83480ab 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool	maven
Personality	dev-support/bin/hadoop.sh
git revision	trunk / `66b035c`
Default Java	Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions	/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/9/testReport/
Max. process+thread count	530 (vs. ulimit of 5500)
modules	C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output	https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6270/9/console
versions	git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by	Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

steveloughran · 2024-01-02T12:05:00Z

I think not only footer reads but this can be expanded to other prefetches as well.
Especially small files that are read fully can be cached such that multiple streams can be catered to.

i don't know how common that use is...whereas for spark/tez and workers, reopening the same file is not unusual -they just process different parts.

I think this is why prefecting doesn't do anything for orc/parquet. Note that impala does cache the column indexes/page indexes so it doesn't need the filesystem to secretly do it for them.

anujmodi2021 · 2024-01-03T09:59:13Z

I think not only footer reads but this can be expanded to other prefetches as well.
Especially small files that are read fully can be cached such that multiple streams can be catered to.

i don't know how common that use is...whereas for spark/tez and workers, reopening the same file is not unusual -they just process different parts.

I think this is why prefecting doesn't do anything for orc/parquet. Note that impala does cache the column indexes/page indexes so it doesn't need the filesystem to secretly do it for them.

Yes, we have also had similar observations. But I feel this cross-stream caching is a good idea for both footer reads, and small files read.

…read.request.size (apache#6270) The option fs.azure.footer.read.request.size sets the size of the footer to read and cache; the default value of 524288 has been measured to be good for most workloads running on parquet, ORC and similar file formats. Contributed by Anuj Modi

Code Changes to enable footer optimization with new buffer size

5af02c5

github-actions bot added trunk TOOLS ABFS labels Nov 13, 2023

Modified Tests

14214f0

saxenapranav suggested changes Nov 13, 2023

View reviewed changes

anujmodi2021 changed the title ~~Code Changes to enable footer optimization with new buffer size~~ HADOOP-18971: [ABFS] Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size Nov 13, 2023

anujmodi2021 marked this pull request as ready for review November 13, 2023 17:06

PR Checks

e438b94

steveloughran requested changes Nov 14, 2023

View reviewed changes

anmolanmol1234 reviewed Nov 15, 2023

View reviewed changes

Addressing Comments

c0c1319

Enhanced Test Coverage For Footer Related Configs

6bb6bd7

anujmodi2021 requested review from saxenapranav, steveloughran and anmolanmol1234 November 22, 2023 09:25

saxenapranav suggested changes Nov 22, 2023

View reviewed changes

Test File Changes

a7597ed

saxenapranav suggested changes Nov 22, 2023

View reviewed changes

Addressing Comments

2b2ee8e

Fixed Test Case

85a4d62

saxenapranav approved these changes Nov 24, 2023

View reviewed changes

steveloughran approved these changes Jan 1, 2024

View reviewed changes

Fixing Merge Conflicts

66b035c

steveloughran merged commit e3c135b into apache:trunk Jan 3, 2024
4 checks passed

anujmodi2021 deleted the footerOptimization branch April 17, 2024 10:18

HADOOP-18971: [ABFS] Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size #6270

HADOOP-18971: [ABFS] Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size #6270

Conversation

anujmodi2021 commented Nov 13, 2023 • edited Loading

Description of PR

How was this patch tested?

For code changes:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Nov 13, 2023

hadoop-yetus commented Nov 13, 2023

anujmodi2021 commented Nov 13, 2023

HNS-OAuth

HNS-SharedKey

NonHNS-SharedKey

AppendBlob-HNS-OAuth

hadoop-yetus commented Nov 13, 2023

steveloughran left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Nov 21, 2023

anujmodi2021 commented Nov 22, 2023

HNS-OAuth

HNS-SharedKey

NonHNS-SharedKey

AppendBlob-HNS-OAuth

hadoop-yetus commented Nov 22, 2023

saxenapranav left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saxenapranav left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadoop-yetus commented Nov 22, 2023

anujmodi2021 commented Nov 24, 2023

saxenapranav left a comment

Choose a reason for hiding this comment

hadoop-yetus commented Nov 24, 2023

hadoop-yetus commented Nov 24, 2023

steveloughran left a comment

Choose a reason for hiding this comment

steveloughran commented Jan 1, 2024

anujmodi2021 commented Jan 2, 2024

anujmodi2021 commented Jan 2, 2024

anujmodi2021 commented Jan 2, 2024

HNS-OAuth

HNS-SharedKey

NonHNS-SharedKey

AppendBlob-HNS-OAuth

hadoop-yetus commented Jan 2, 2024

steveloughran commented Jan 2, 2024

anujmodi2021 commented Jan 3, 2024

anujmodi2021 commented Nov 13, 2023 •

edited

Loading