-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1749: Fix supportVectoredIO
for hadoop version string with optional patch labels
#1990
Conversation
99f9b15
to
0a06fd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version I am getting from AWS for hadoop is |
@dongjoon-hyun I can certainly write a test case, but I would have to pass down the VersionInfo and do a little refactor. Let me know if you would like me to do that |
0a06fd1
to
a2afdd0
Compare
@dongjoon-hyun just pushed the tests |
a2afdd0
to
a8180da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.3.6-amzn-3
looks like reasonable requests which we had better handle because it follows semantic versioning.
Can we make this proposal more compatible with Semantic Versioning? For example, 3.3.6-amzn-3
has HADOOP-18103 if they follows semantic versioning. So, I believe supportVectoredIO
should return true.
Could you make a change in a way to consider Semantic Versioning
?
int minor = Integer.parseInt(versionParts[1]); | ||
int patch = Integer.parseInt(versionParts[2]); | ||
return major == 3 && (minor > 3 || (minor == 3 && patch > 4)); | ||
default boolean supportVectoredIO(String version) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is okay for me.
a8180da
to
7a7c68d
Compare
Done @dongjoon-hyun |
As a last piece, please update the PR title and description according to your latest code change. Here, there is no |
supportVectoredIO
for hadoop version string with optional patch labels
you caught me doing that :). Done |
supportVectoredIO
for hadoop version string with optional patch labelssupportVectoredIO
for hadoop version string with optional patch labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please apply one. I can approve the ID creation quickly for you. |
@@ -48,7 +49,7 @@ | |||
*/ | |||
public class RecordReaderUtils { | |||
private static final HadoopShims SHIMS = HadoopShimsFactory.get(); | |||
private static final boolean supportVectoredIO = SHIMS.supportVectoredIO(); | |||
private static final boolean supportVectoredIO = SHIMS.supportVectoredIO(VersionInfo.getVersion()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make CI happy?
[INFO] There is 1 error reported by Checkstyle 10.17.0 with checkstyle.xml ruleset.
Error: src/java/org/apache/orc/impl/RecordReaderUtils.java:[52] (sizes) LineLength: Line is longer than 100 characters (found 101).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see now.
I have requested one with |
Ya, I approved you request. And, I changed the reporter field of ORC-1749 to When this PR is merged, the assignee field will be filled with |
There are cases where the hadoop version info may not be respecting the semantic versioning. It is the case for the hadoop version provided in some of the AWS managed services. This causes a ExceptionInInitializerError while trying to instantiate and ORC file reader. Fixing this by defaulting to non vectored IO in case the semantic versioning is not respected
7a7c68d
to
36bfffe
Compare
…onal patch labels ### What changes were proposed in this pull request? Parse correctly semantic versioning when there is the optional labels in the patch are present. ### Why are the changes needed? There are cases where the hadoop version info may not be respecting the semantic versioning. It is the case for the hadoop version provided in some of the AWS managed services. This causes a ExceptionInInitializerError while trying to instantiate and ORC file reader. ### How was this patch tested? Included unit tests. It required a small refactor to be able to test it in an easier way ### Was this patch authored or co-authored using generative AI tooling? No Closes #1990 from otorreno/fix/supportVectoredIO. Authored-by: Oscar Torreno <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 37201cb) Signed-off-by: Dongjoon Hyun <[email protected]>
Welcome to the Apache ORC community, @otorreno . You are added to the Apache ORC contributor group and ORC-1749 is assigned to you. This is a part of Apache ORC 2.0.2 release milestone and will be released in 3 weeks before August 15th. In addition, this will be a part of Apache Spark 4.0.0-preview2 in a near future. Thank you again and congratulations for your first commit to ASF repository. |
### What changes were proposed in this pull request? This PR aims to upgrade ORC to 2.0.2 for Apache Spark 4.0.0. ### Why are the changes needed? To bring the latest maintenance release with bug fixes. - https://orc.apache.org/news/2024/08/15/ORC-2.0.2/ - apache/orc#1989 - apache/orc#1990 ### Does this PR introduce _any_ user-facing change? No. This is not released. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47774 from dongjoon-hyun/SPARK-49251. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to upgrade ORC to 2.0.2 for Apache Spark 4.0.0. ### Why are the changes needed? To bring the latest maintenance release with bug fixes. - https://orc.apache.org/news/2024/08/15/ORC-2.0.2/ - apache/orc#1989 - apache/orc#1990 ### Does this PR introduce _any_ user-facing change? No. This is not released. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47774 from dongjoon-hyun/SPARK-49251. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to upgrade ORC to 2.0.2 for Apache Spark 4.0.0. ### Why are the changes needed? To bring the latest maintenance release with bug fixes. - https://orc.apache.org/news/2024/08/15/ORC-2.0.2/ - apache/orc#1989 - apache/orc#1990 ### Does this PR introduce _any_ user-facing change? No. This is not released. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47774 from dongjoon-hyun/SPARK-49251. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? This PR aims to upgrade ORC to 2.0.2 for Apache Spark 4.0.0. ### Why are the changes needed? To bring the latest maintenance release with bug fixes. - https://orc.apache.org/news/2024/08/15/ORC-2.0.2/ - apache/orc#1989 - apache/orc#1990 ### Does this PR introduce _any_ user-facing change? No. This is not released. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47774 from dongjoon-hyun/SPARK-49251. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Parse correctly semantic versioning when there is the optional labels in the patch are present.
Why are the changes needed?
There are cases where the hadoop version info may not be respecting the semantic versioning. It is the case for the hadoop version provided in some of the AWS managed services. This causes a ExceptionInInitializerError while trying to instantiate and ORC file reader.
How was this patch tested?
Included unit tests. It required a small refactor to be able to test it in an easier way
Was this patch authored or co-authored using generative AI tooling?
No