You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the Kafka reader with multiple workers on a topic that has been populated with a fixed number of records, the job would occasionally read too many records. In this specific scenario, I had a job with 3 workers reading from a topic with 300k records in 10k batches, but would end up with anywhere between 300k and 330k records in Elasticsearch after the workers finished reading from the topic.
I believe the extra reads are coming from the rebalancing that happens as the workers first start up. Since the workers do not all start up at the same time, there is a chance that the first worker would be able to connect to kafka before the other two have started and fetch a batch of records, but once another worker joins, a rebalance is triggered, and the worker with the first batch of records would not be able to commit the offsets after putting the records in ES. Since no offsets are committed, the workers would re-read that data after the rebalance.
The logs did show that the workers were processing more than the 300000 records that were in the topic, and in one case a worker logged this error after marking its first slice as resolved:
"msg":"Kafka reader error after slice resolution { Error: Broker: Group rebalance in progress ...
Ultimately this stems from the workers committing offsets after a slice is resolved since any problems with committing the offsets would only happen after the data has been completely processed.
The text was updated successfully, but these errors were encountered:
refs #5
When running the Kafka reader with multiple workers on a topic that has been populated with a fixed number of records, the job would occasionally read too many records. In this specific scenario, I had a job with 3 workers reading from a topic with 300k records in 10k batches, but would end up with anywhere between 300k and 330k records in Elasticsearch after the workers finished reading from the topic.
I believe the extra reads are coming from the rebalancing that happens as the workers first start up. Since the workers do not all start up at the same time, there is a chance that the first worker would be able to connect to kafka before the other two have started and fetch a batch of records, but once another worker joins, a rebalance is triggered, and the worker with the first batch of records would not be able to commit the offsets after putting the records in ES. Since no offsets are committed, the workers would re-read that data after the rebalance.
The logs did show that the workers were processing more than the 300000 records that were in the topic, and in one case a worker logged this error after marking its first slice as
resolved
:Ultimately this stems from the workers committing offsets after a slice is
resolved
since any problems with committing the offsets would only happen after the data has been completely processed.The text was updated successfully, but these errors were encountered: