Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve BatchElements documentation #32082

Merged
merged 7 commits into from
Aug 30, 2024
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions sdks/python/apache_beam/transforms/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -802,6 +802,20 @@ class BatchElements(PTransform):
corresponding to its contents. Each batch is emitted with a timestamp at
the end of their window.

When the max_batch_duration_secs arg is provided, a stateful implementation
of BatchElements is used to batch elements across bundles. This is most
impactful in streaming applications where many bundles only contain one
element. Larger max_batch_duration_secs values `might` reduce the throughput
Copy link
Contributor

@tvalentyn tvalentyn Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies - if i added the backticks here, that was unintentional.

Suggested change
element. Larger max_batch_duration_secs values `might` reduce the throughput
element. Larger max_batch_duration_secs values might reduce the throughput

of the transform, while smaller values `might` improve the throughput but
tvalentyn marked this conversation as resolved.
Show resolved Hide resolved
make it more likely that batches are smaller than the target batch size.

As a general recommendation, start with low values (e.g. 0.005 aka 5ms) and
increase as needed to get the desired tradeoff between target batch size
and latency or throughput.

For more information on tuning parameters to this transform, see
https://beam.apache.org/documentation/patterns/batch-elements

Args:
min_batch_size: (optional) the smallest size of a batch
max_batch_size: (optional) the largest size of a batch
Expand Down
Loading