-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-13250] Factorise gcsio.GcsIO() from gcsfilesystem to allow overriding #17455
[BEAM-13250] Factorise gcsio.GcsIO() from gcsfilesystem to allow overriding #17455
Conversation
Can one of the admins verify this patch? |
2 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Codecov Report
@@ Coverage Diff @@
## master #17455 +/- ##
==========================================
- Coverage 73.92% 73.92% -0.01%
==========================================
Files 689 689
Lines 90397 90400 +3
==========================================
- Hits 66829 66827 -2
- Misses 22384 22389 +5
Partials 1184 1184
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
R: @johnjcasey |
I don't see anything wrong with this change per se, but I think it would make more sense alongside a test change that leverages the new factorization. Do you have a specific test scenario that would be helped by this? |
This could be done to simplify some tests that patch on the module level # sdks/python/apache_beam/io/gcp/gcsfilesystem_test.py
@mock.patch('apache_beam.io.gcp.gcsfilesystem.gcsio') # this could be removed
def test_create(self, mock_gcsio):
# Prepare mocks.
gcsio_mock = mock.MagicMock()
gcsfilesystem.gcsio.GcsIO = lambda: gcsio_mock # this would become: self.fs.get_gcsio = lambda: gcsio_mock
... Would it be enough? it so i'll apply those change when I have time it might looks irrelevant but at the very least sdk users using those classes directly do not have to patch modules to override some gcsio.GcsIO properties when considered necessary |
@johnjcasey - would you be able to respond to the last question? |
Yep, that would work for me. |
Perfect I'll apply the change in the coming week |
Closing this PR since a similar change seems to have been merged to master : currently on master: # sdks/python/apache_beam/io/gcp/gcsfilesystem.py ...
def _gcsIO(self):
return gcsio.GcsIO(pipeline_options=self._pipeline_options) |
Improvement to resolve : https://issues.apache.org/jira/browse/BEAM-13250
first commit is identical to #15977, but since I believe this is relevant , I am open to put some effort to integrate a proper solution
This change is relevant for testing purpose. Being able to change
storage_client
allows use gcs emulator or other gcs configurations. (for example: fsouza/fake-gcs-server#623 (comment) )This way it is possible at the very least to do :
Please note that this is not only usefull for testing purpose, it allows more flexibility to configure the storage_client for the price of adding another user defined filesystem scheme
Another alternative could to instead add an beam options to be able to replace the url or other options. This might be better since it does not a new function to the GcsFileSystem that could be used by sdk users
Thanks :)
R: @aaltay @ihji
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.