Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-13250] Factorise gcsio.GcsIO() from gcsfilesystem to allow overriding #17455

Closed
wants to merge 1 commit into from
Closed

[BEAM-13250] Factorise gcsio.GcsIO() from gcsfilesystem to allow overriding #17455

wants to merge 1 commit into from

Conversation

BigJerBD
Copy link

Improvement to resolve : https://issues.apache.org/jira/browse/BEAM-13250
first commit is identical to #15977, but since I believe this is relevant , I am open to put some effort to integrate a proper solution

This change is relevant for testing purpose. Being able to change storage_client allows use gcs emulator or other gcs configurations. (for example: fsouza/fake-gcs-server#623 (comment) )

This way it is possible at the very least to do :

class TestGCSFileSystem(GCSFileSystem):
  GCS_PREFIX = 'gs-test://'

  @classmethod
  def scheme(cls):
    return 'gs-test'

  @staticmethod
  def get_gcsio():
    return gcsio.GcsIO(
      storage_client=storage.StorageV1(
          url="http://my_custom_url...",
          # Other custom configurations ...
      )
    )

Please note that this is not only usefull for testing purpose, it allows more flexibility to configure the storage_client for the price of adding another user defined filesystem scheme

Another alternative could to instead add an beam options to be able to replace the url or other options. This might be better since it does not a new function to the GcsFileSystem that could be used by sdk users

Thanks :)

R: @aaltay @ihji


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@asf-ci
Copy link

asf-ci commented Apr 24, 2022

Can one of the admins verify this patch?

2 similar comments
@asf-ci
Copy link

asf-ci commented Apr 24, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented Apr 24, 2022

Can one of the admins verify this patch?

@codecov
Copy link

codecov bot commented Apr 24, 2022

Codecov Report

Merging #17455 (c34cd42) into master (3f2e3c7) will decrease coverage by 0.00%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master   #17455      +/-   ##
==========================================
- Coverage   73.92%   73.92%   -0.01%     
==========================================
  Files         689      689              
  Lines       90397    90400       +3     
==========================================
- Hits        66829    66827       -2     
- Misses      22384    22389       +5     
  Partials     1184     1184              
Flag Coverage Δ
python 83.64% <85.71%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/gcp/gcsfilesystem.py 90.22% <85.71%> (+0.22%) ⬆️
sdks/python/apache_beam/utils/interactive_utils.py 87.80% <0.00%> (-7.32%) ⬇️
...eam/runners/interactive/interactive_environment.py 90.18% <0.00%> (-0.31%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.39% <0.00%> (-0.25%) ⬇️
sdks/python/apache_beam/transforms/util.py 95.98% <0.00%> (-0.17%) ⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py 89.06% <0.00%> (+0.15%) ⬆️
.../python/apache_beam/typehints/trivial_inference.py 96.41% <0.00%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f2e3c7...c34cd42. Read the comment docs.

@aaltay
Copy link
Member

aaltay commented Apr 28, 2022

R: @johnjcasey

@johnjcasey
Copy link
Contributor

I don't see anything wrong with this change per se, but I think it would make more sense alongside a test change that leverages the new factorization. Do you have a specific test scenario that would be helped by this?

@BigJerBD
Copy link
Author

BigJerBD commented May 1, 2022

This could be done to simplify some tests that patch on the module level

  # sdks/python/apache_beam/io/gcp/gcsfilesystem_test.py
  @mock.patch('apache_beam.io.gcp.gcsfilesystem.gcsio')   # this could be removed
  def test_create(self, mock_gcsio):
    # Prepare mocks.
    gcsio_mock = mock.MagicMock()
    gcsfilesystem.gcsio.GcsIO = lambda: gcsio_mock   # this would become:  self.fs.get_gcsio =  lambda: gcsio_mock
    ...

Would it be enough? it so i'll apply those change when I have time

it might looks irrelevant but at the very least sdk users using those classes directly do not have to patch modules to override some gcsio.GcsIO properties when considered necessary

@aaltay
Copy link
Member

aaltay commented May 12, 2022

@johnjcasey - would you be able to respond to the last question?

@johnjcasey
Copy link
Contributor

Yep, that would work for me.

@BigJerBD
Copy link
Author

Perfect I'll apply the change in the coming week

@BigJerBD
Copy link
Author

Closing this PR since a similar change seems to have been merged to master :

currently on master:

# sdks/python/apache_beam/io/gcp/gcsfilesystem.py ...
 def _gcsIO(self):
    return gcsio.GcsIO(pipeline_options=self._pipeline_options)

@BigJerBD BigJerBD closed this May 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants