Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12751] Set clientRequestId for Dataflow python job creation #15335

Merged
merged 1 commit into from
Aug 27, 2021

Conversation

baeminbo
Copy link
Contributor

Current Dataflow pipeline python runner doesn't specify clientRequestId which causes job creation API retry to fail with "There is already an active job named...". Java runner doesn't have this issue as it sets clientRequestId properly.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status Build Status Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status ---
XLang Build Status Build Status Build Status Build Status Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@baeminbo
Copy link
Contributor Author

@pabloem , could you take a look at this PR?

@pabloem
Copy link
Member

pabloem commented Aug 19, 2021

I'll look. Thanks @baeminbo

@pabloem
Copy link
Member

pabloem commented Aug 19, 2021

Run Python PreCommit

@pabloem
Copy link
Member

pabloem commented Aug 19, 2021

Run Python 3.8 PostCommit

@codecov
Copy link

codecov bot commented Aug 19, 2021

Codecov Report

Merging #15335 (1e23d3d) into master (af59192) will decrease coverage by 0.13%.
The diff coverage is 88.88%.

❗ Current head 1e23d3d differs from pull request most recent head 81fe195. Consider uploading reports for the commit 81fe195 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master   #15335      +/-   ##
==========================================
- Coverage   83.81%   83.68%   -0.14%     
==========================================
  Files         441      440       -1     
  Lines       59773    59958     +185     
==========================================
+ Hits        50101    50174      +73     
- Misses       9672     9784     +112     
Impacted Files Coverage Δ
...apache_beam/runners/dataflow/internal/apiclient.py 76.32% <88.88%> (+1.66%) ⬆️
sdks/python/apache_beam/io/gcp/bigquery.py 67.62% <0.00%> (-8.76%) ⬇️
...ks/python/apache_beam/runners/worker/data_plane.py 87.70% <0.00%> (-2.90%) ⬇️
sdks/python/apache_beam/internal/metrics/metric.py 90.42% <0.00%> (-1.07%) ⬇️
sdks/python/apache_beam/io/localfilesystem.py 91.47% <0.00%> (-0.78%) ⬇️
...nners/portability/fn_api_runner/worker_handlers.py 79.44% <0.00%> (-0.35%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.39% <0.00%> (-0.25%) ⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py 88.80% <0.00%> (-0.21%) ⬇️
sdks/python/apache_beam/runners/common.py 88.73% <0.00%> (-0.15%) ⬇️
setup.py 0.00% <0.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update af59192...81fe195. Read the comment docs.

@pabloem
Copy link
Member

pabloem commented Aug 19, 2021

Run Python 3.8 PostCommit

@baeminbo
Copy link
Contributor Author

Please hold on reviewing this PR. It looks like job creation doesn't work as intended in the following scenario.

  1. Create a python job with a job name (e.g. my-test).
  2. Create another python job with the same job name (my-test).

I saw (1) and (2) have different clientRequestId, so I expected the second job creation failed by already-existing job name. But, (2) returned the job of (1). I'll investigate this issue further.

@pabloem
Copy link
Member

pabloem commented Aug 20, 2021

okay - can you please ping me whenever you've debugged that? thanks!

@baeminbo baeminbo force-pushed the BEAM-12751 branch 3 times, most recently from 1e23d3d to c86bfa1 Compare August 22, 2021 02:51
@baeminbo
Copy link
Contributor Author

@pabloem I fixed the issue of not-failing for already-existing job above. For this, I added a logic to check the clientRequestId of the job returned from the API and throw an DataflowJobAlreadyExistsError if not matched. And, I added two unit tests to apiclient_test.py. Could you review this again? Thanks!

@pabloem pabloem merged commit cbbebcd into apache:master Aug 27, 2021
@pabloem
Copy link
Member

pabloem commented Aug 27, 2021

LGTM thanks

@baeminbo baeminbo deleted the BEAM-12751 branch April 25, 2022 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants