Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12405] Reduce tail latency in Storage API writes #14765

Merged
merged 2 commits into from
May 26, 2021

Conversation

reuvenlax
Copy link
Contributor

@reuvenlax reuvenlax commented May 8, 2021

Several improvements

  • Reduce cache timeout to a value smaller than the server-side timeout. This prevents us from hitting errors when we try to use a StreamWriter that has timed out on the server.
  • Reduce maximum retry interval to 10 seconds.
  • Bound the bundle size to the StorageApiWriteXXXRecords classes by using GroupIntoBatches. Previously we would sometimes get very large bundles, and due to the design of the sink we cannot flush the writes until the entire bundle has been written. Bounding the bundle size allows us to flush these items more quickly.

@reuvenlax reuvenlax force-pushed the vortex_sink_tail_latency branch from d23bfb6 to 73db9c1 Compare May 25, 2021 23:59
@reuvenlax reuvenlax changed the title address issues in Storage API writes [BEAM-12405] Reduce tail latency in Storage API writes May 26, 2021
@reuvenlax
Copy link
Contributor Author

R: @yirutang

@codecov
Copy link

codecov bot commented May 26, 2021

Codecov Report

Merging #14765 (4ad9651) into master (fe88aaf) will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #14765   +/-   ##
=======================================
  Coverage   83.77%   83.78%           
=======================================
  Files         434      434           
  Lines       58282    58260   -22     
=======================================
- Hits        48828    48812   -16     
+ Misses       9454     9448    -6     
Impacted Files Coverage Δ
...examples/snippets/transforms/aggregation/latest.py
...y38/build/srcs/sdks/python/apache_beam/pipeline.py
...pache_beam/runners/portability/portable_metrics.py
...testing/benchmarks/nexmark/models/nexmark_model.py
...examples/snippets/transforms/elementwise/values.py
.../apache_beam/io/gcp/datastore/v1new/datastoreio.py
...s/sdks/python/apache_beam/testing/test_pipeline.py
...ache_beam/runners/interactive/recording_manager.py
.../python/apache_beam/transforms/periodicsequence.py
...he_beam/io/flink/flink_streaming_impulse_source.py
... and 858 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8922c1c...4ad9651. Read the comment docs.

Copy link
Contributor

@yirutang yirutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch size of 2M and retry interval looks reasonable to me.

Copy link
Contributor

@yirutang yirutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server side stream timeout is 10 minutes if there is no active traffic so the change looks good.

@reuvenlax reuvenlax merged commit 79c1932 into apache:master May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants