-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-8889] Upgrade GCSIO to 2.2.2 #14817
Conversation
@mprashanthsagar Could you take a look at the changes to address breaking changes from GCSIO 2.2.x? |
This is tested with the bleeding edge version of GCSIO with the following instruction. It's verified that Beam job uses Directpath for the GCS operations with the lastest version of GCSIO. Build GCSIO (2.2.1-SNAPSHOT)
Build Beam (After modifying the version of gcsio to 2.2.1-SNAPSHOT)
Build word-count example (After modifying the version of beam to 2.29.0-SNAPSHOT)
|
What is the next step on this PR? Do you need a review or wait until the new GCSIO changes are merged. |
@aaltay This is still a draft and it needs to wait for the next GCSIO release. Once it's released, this PR will be ready to get reviewed. |
@@ -763,7 +751,7 @@ public void onSuccess(StorageObject response, HttpHeaders httpHeaders) | |||
@Override | |||
public void onFailure(GoogleJsonError e, HttpHeaders httpHeaders) throws IOException { | |||
IOException ioException; | |||
if (errorExtractor.itemNotFound(e)) { | |||
if (e.getCode() == HttpStatusCodes.STATUS_CODE_NOT_FOUND) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
itemNotFound(e)
does a recursive check to find the exception, we could have a STATUS_CODE_NOT_FOUND
in the cause but not the root exception, Can we retain usage of errorExtractor.itemNotFound()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not possible anymore because ApiErrorExtrator doesn't support GoogleJsonError anymore by GoogleCloudDataproc/hadoop-connectors#327. I copied the same routine from the old ApiErrorExtrator to do the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a regression ? Note that Beam file IO connectors are very sensitives to changes in behavior of rename/copy/delete etc. since current behavior is carefully implemented (after many bugs) to be correct when there are step failures and retries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unresloving)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, itemNotFound
method was removed by the recent gcsio so I copied the actual implementation of the method to keep it consistent.
...e-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java
Outdated
Show resolved
Hide resolved
...e-cloud-platform-core/src/test/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtilTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -107,6 +110,7 @@ public GcsUtil create(PipelineOptions options) { | |||
storageBuilder.getHttpRequestInitializer(), | |||
gcsOptions.getExecutorService(), | |||
hasExperiment(options, "use_grpc_for_gcs"), | |||
gcsOptions.getGcpCredential(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not backwards compatible. What if gcpCredentials is not provided ? (I assume default credentials will be used but we should make sure that this does not result in a regression).
* @return a SeekableByteChannel that can read the object data | ||
*/ | ||
@VisibleForTesting | ||
SeekableByteChannel open(GcsPath path, GoogleCloudStorageReadOptions readOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used for the test. Previously test access the implementation detail (e.g. GoogleCloudStorageReadChannel
) in testGCSChannelCloseIdempotent()
but it doesn't need to do it anymore with this function.
@@ -763,7 +751,7 @@ public void onSuccess(StorageObject response, HttpHeaders httpHeaders) | |||
@Override | |||
public void onFailure(GoogleJsonError e, HttpHeaders httpHeaders) throws IOException { | |||
IOException ioException; | |||
if (errorExtractor.itemNotFound(e)) { | |||
if (e.getCode() == HttpStatusCodes.STATUS_CODE_NOT_FOUND) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a regression ? Note that Beam file IO connectors are very sensitives to changes in behavior of rename/copy/delete etc. since current behavior is carefully implemented (after many bugs) to be correct when there are step failures and retries.
Retest this please |
Run Java PostCommit |
Run Dataflow ValidatesRunner |
Codecov Report
@@ Coverage Diff @@
## master #14817 +/- ##
=======================================
Coverage 83.78% 83.78%
=======================================
Files 441 441
Lines 59500 59500
=======================================
+ Hits 49852 49855 +3
+ Misses 9648 9645 -3
Continue to review full report at Codecov.
|
Run PythonDocker PreCommit |
Run Java_Examples_Dataflow PreCommit |
Thanks. LGTM (please make sure that all internal tests pass as well) |
This is for upgrading GCSIO to 2.2.2
R: @kennknowles
Changes:
ResilientOperation.getGoogleRequestCallable
.CreateObjectOptions
build pattern.StorageResourceId
type.AsyncWriteChannelOptions
andGoogleCloudStorageOptions
.AbstractGoogleAsyncWriteChannel
)Credentials
toGoogleCloudStorageOptions
so that it can use DirectPath properly with GCSIO 2.2.2 or later.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.