You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using BatchElements, one can almost precisely control the size of each batch by setting min_batch_size and max_batch_size to the same value and by also providing element_size_fn. It almost works.
However, the process function of _GlobalWindowsBatchingDoFn blindly adds elements to a batch before checking to see if adding that element would exceed max_batch_size. This can be fixed by changing process to only add an element to a batch if A) the batch is empty or B) adding the element will not exceed max_batch_size. There are cases where exceeding max_batch_size in any way is unacceptable (example, APIs that have a max request payload size).
hi @jmdobry ! Thanks for reporting. Would you be interested in contributing a fix (and ideally a test) to address this ? Let me know if you have any questions. FYI we also have https://s.apache.org/beam-python-dev-wiki for development tips. Thanks!
What happened?
When using
BatchElements
, one can almost precisely control the size of each batch by settingmin_batch_size
andmax_batch_size
to the same value and by also providingelement_size_fn
. It almost works.However, the
process
function of_GlobalWindowsBatchingDoFn
blindly adds elements to a batch before checking to see if adding that element would exceedmax_batch_size
. This can be fixed by changingprocess
to only add an element to a batch if A) the batch is empty or B) adding the element will not exceedmax_batch_size
. There are cases where exceedingmax_batch_size
in any way is unacceptable (example, APIs that have a max request payload size).Example fixed implementation:
Issue Priority
Priority: 3 (minor)
Issue Components
The text was updated successfully, but these errors were encountered: