-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18971: [ABFS] Enable Footer Read Optimizations with Appropriate Footer Read Buffer Size #6270
Conversation
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
Outdated
Show resolved
Hide resolved
@@ -190,7 +193,8 @@ private void seekReadAndTest(final FileSystem fs, final Path testFilePath, | |||
try (FSDataInputStream iStream = fs.open(testFilePath)) { | |||
AbfsInputStream abfsInputStream = (AbfsInputStream) iStream | |||
.getWrappedStream(); | |||
long bufferSize = abfsInputStream.getBufferSize(); | |||
long footerReadBufferSize = abfsInputStream.getFooterReadBufferSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on diff sizes of file. Should we have parameterized values for getFooterReadBufferSize. Right now, it depends on what test-config developer has.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not able to get this...
Can you please elaborate...
The footer buffer size here will be the default one unless user sets it in configs explicitly.
Are you recommending this to be hardcoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value of this config is 256KB. Now, developer can have any other config also. Right now, test is very much inline of using 256 KB. What I am proposing is, that in the test, we set the config and don't depend on the dev given config. Plus, I am proposing we run this test for different values of footerBufferSize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose something else, will comment below
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 25 mins 53 secs. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to propose that you add the option as a paramter to the openFile() builder API. This will
- allow applications to explicitly set it (unlikely given parquet is trapped compiling against hadoop 2.8)
- make it trival to write tests which open files of different footer length, including the stress test cases (4 bytes, 0 bytes...)
...ls/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java
Show resolved
Hide resolved
@@ -104,7 +104,20 @@ public final class ConfigurationKeys { | |||
public static final String AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = "fs.azure.write.enableappendwithflush"; | |||
public static final String AZURE_READ_BUFFER_SIZE = "fs.azure.read.request.size"; | |||
public static final String AZURE_READ_SMALL_FILES_COMPLETELY = "fs.azure.read.smallfilescompletely"; | |||
/** | |||
* When parquet files are read, first few read are metadata reads before reading the actual data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is roughly the same for ORC, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the procedure is same for ORC as well...
This optimization will impact workloads running on both parquet and ORC files
/** | ||
* When parquet files are read, first few read are metadata reads before reading the actual data. | ||
* First the read is done of last 8 bytes of parquet file to get the postion of metadta and next read | ||
* is done for reading that metadata. With this optimizations these two reads can be combined into 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit "optimization"
@@ -59,7 +59,8 @@ public final class FileSystemConfigurations { | |||
public static final boolean DEFAULT_AZURE_ENABLE_SMALL_WRITE_OPTIMIZATION = false; | |||
public static final int DEFAULT_READ_BUFFER_SIZE = 4 * ONE_MB; // 4 MB | |||
public static final boolean DEFAULT_READ_SMALL_FILES_COMPLETELY = false; | |||
public static final boolean DEFAULT_OPTIMIZE_FOOTER_READ = false; | |||
public static final boolean DEFAULT_OPTIMIZE_FOOTER_READ = true; | |||
public static final int DEFAULT_FOOTER_READ_BUFFER_SIZE = 512 * ONE_KB; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is 512k; docs in file above say 265K.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the doc.
So 256KB is enough data but to be on safer side we want to read more. It was found that 512 KB is a sweet spot between caching as much data as possible without impacting performance as much as possible.
public static final String AZURE_READ_OPTIMIZE_FOOTER_READ = "fs.azure.read.optimizefooterread"; | ||
/** | ||
* In case of footer reads it was not required to read full buffer size. | ||
* Most of the metadata information required was within 256KB and it will be more performant to read lesser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"read less"
@@ -190,7 +193,8 @@ private void seekReadAndTest(final FileSystem fs, final Path testFilePath, | |||
try (FSDataInputStream iStream = fs.open(testFilePath)) { | |||
AbfsInputStream abfsInputStream = (AbfsInputStream) iStream | |||
.getWrappedStream(); | |||
long bufferSize = abfsInputStream.getBufferSize(); | |||
long footerReadBufferSize = abfsInputStream.getFooterReadBufferSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd propose something else, will comment below
@@ -358,6 +360,7 @@ private int readFileCompletely(final byte[] b, final int off, final int len) | |||
return optimisedRead(b, off, len, 0, contentLength); | |||
} | |||
|
|||
// To do footer read of files when enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: . at end for javadoc
🎊 +1 overall
This message was automatically generated. |
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 27 mins 14 secs. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests getting into better shape. Some points:
- the default value comment
- There are some tests which follow builder route and some follow config route, can we have a way wherein a given test run once for builder and once for config.
@@ -811,13 +813,17 @@ private AbfsInputStreamContext populateAbfsInputStreamContext( | |||
boolean bufferedPreadDisabled = options | |||
.map(c -> c.getBoolean(FS_AZURE_BUFFERED_PREAD_DISABLE, false)) | |||
.orElse(false); | |||
int footerReadBufferSize = options.map(c -> c.getInt( | |||
AZURE_FOOTER_READ_BUFFER_SIZE, DEFAULT_FOOTER_READ_BUFFER_SIZE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case if dev goes with builder route and have not given fs.azure.footer.read.request.size in builder opts, it will take the default value even though in config a value is defined. Better in place of default value, have the abfsConfugration.getFooterReadBufferSize().
How can you verify this happens, change bit of your testNumBackendCalls
code:
private void testNumBackendCalls(boolean optimizeFooterRead)
throws Exception {
for (int i = 0; i <= 4; i++) {
for (int j = 0; j <= 2; j++) {
int fileSize = (int) Math.pow(2, i) * 256 * ONE_KB;
int footerReadBufferSize = (int) Math.pow(2, j) * 256 * ONE_KB;
final AzureBlobFileSystem fs = getFileSystem(optimizeFooterRead,
fileSize, footerReadBufferSize);
String fileName = methodName.getMethodName() + i;
byte[] fileContent = getRandomBytesArray(fileSize);
Path testFilePath = createFileWithContent(fs, fileName, fileContent);
int length = AbfsInputStream.FOOTER_SIZE;
FutureDataInputStreamBuilder builder = fs.openFile(testFilePath);
// builder.opt(ConfigurationKeys.AZURE_FOOTER_READ_BUFFER_SIZE,
// footerReadBufferSize);
try (FSDataInputStream iStream = builder.build().get()) {
......
```
Lets add this usecase as well in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all look good. just one thing.
|
||
// Verify that default value is used if nothing is set explicitly | ||
FSDataInputStream iStream = fs.open(testFilePath); | ||
verifyConfigValueInStream(iStream, DEFAULT_FOOTER_READ_BUFFER_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be a check if there is a config in test-config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken
🎊 +1 overall
This message was automatically generated. |
@steveloughran @mukund-thakur @mehakmeet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking suggestions. Look good.
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
+1
ok, you need to merge in trunk and it'll be good to go. Can you also add as a comment what the explicit text you want in the commit message. I'm thinking something like
side issue, I'm actually thinking of adding cross-stream caching in s3a fs, so that threads working on different row chunks can share footers. Some small LRU cache, with IOStats collection to help measure its value. thoughts? |
Thanks for the review @steveloughran .
|
Regarding the cross-stream caching. |
:::: AGGREGATED TEST RESULT :::: HNS-OAuth[INFO] Results: HNS-SharedKey[INFO] Results: NonHNS-SharedKey[INFO] Results: AppendBlob-HNS-OAuth[INFO] Results: Time taken: 25 mins 3 secs. |
🎊 +1 overall
This message was automatically generated. |
i don't know how common that use is...whereas for spark/tez and workers, reopening the same file is not unusual -they just process different parts. I think this is why prefecting doesn't do anything for orc/parquet. Note that impala does cache the column indexes/page indexes so it doesn't need the filesystem to secretly do it for them. |
Yes, we have also had similar observations. But I feel this cross-stream caching is a good idea for both footer reads, and small files read. |
…read.request.size (apache#6270) The option fs.azure.footer.read.request.size sets the size of the footer to read and cache; the default value of 524288 has been measured to be good for most workloads running on parquet, ORC and similar file formats. Contributed by Anuj Modi
https://issues.apache.org/jira/browse/HADOOP-18971
Description of PR
Footer Read Optimization was introduced to Hadoop azure in this Jira: https://issues.apache.org/jira/browse/HADOOP-17347
and was kept disabled by default.
This PR is to enable footer reads by default based on the results of analysis performed as below:
In our scale workload analysis, it was found that workloads working with Parquet (or for that matter OCR etc.) have a lot of footer reads. Footer reads here refers to the read operations done by workload to get the metadata of the parquet file which is required to understand where the actual data resides in the parquet.
This whole process takes place in 3 steps:
Here the first two steps are metadata reads that can be combined into a single footer read. When workload tries to read certain last few bytes of data (let's say this value is footer size), driver will intelligently read some extra bytes above the footer size to cater to the next read which is going to come.
Q. What is the footer size of file?
A: 16KB. Any read request trying to get the data within last 16KB of the file will qualify for whole footer read. This value is enough to cater to all types of files including parquet, OCR, etc.
Q. What is the buffer size to read when reading the footer?
A. Let's call this footer read buffer size. Prior to this PR footer read buffer size was same as read buffer size (default 4MB). It was found that for most of the workload required footer size was only 256KB. i.e. For almost all parquet files metadata for that file was found to be within last 256KBs. Keeping this in mind it does not make sense to read whole buffer length of 4MB as a part of footer read. Moreover, reading larger data than require incur additional costs in terms of server and network latencies. Based on this and extensive experimentation it was observed that footer read buffer size of 512KB is ideal for almost all the workloads running on parquet, OCR, etc.
Following configuration was introduced to configure the footer read buffer size:
fs.azure.footer.read.request.size: default 512 KB.
Quantitative Stats: For a workload running on parquet files with above optimization enabled, the number of read requests got reduced by 2.3M down from 20M. That means around 10% reduction in overall TPS.
How was this patch tested?
Test suite was run and no failures were found.
Existing tests were modified to cover the production code changes.
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?