Kafka reader duplicates reads under certain circumstances #15

macgyver603 · 2018-11-29T20:31:04Z

refs #5

When running the Kafka reader with multiple workers on a topic that has been populated with a fixed number of records, the job would occasionally read too many records. In this specific scenario, I had a job with 3 workers reading from a topic with 300k records in 10k batches, but would end up with anywhere between 300k and 330k records in Elasticsearch after the workers finished reading from the topic.

I believe the extra reads are coming from the rebalancing that happens as the workers first start up. Since the workers do not all start up at the same time, there is a chance that the first worker would be able to connect to kafka before the other two have started and fetch a batch of records, but once another worker joins, a rebalance is triggered, and the worker with the first batch of records would not be able to commit the offsets after putting the records in ES. Since no offsets are committed, the workers would re-read that data after the rebalance.

The logs did show that the workers were processing more than the 300000 records that were in the topic, and in one case a worker logged this error after marking its first slice as resolved:

"msg":"Kafka reader error after slice resolution { Error: Broker: Group rebalance in progress ...

Ultimately this stems from the workers committing offsets after a slice is resolved since any problems with committing the offsets would only happen after the data has been completely processed.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka reader duplicates reads under certain circumstances #15

Kafka reader duplicates reads under certain circumstances #15

macgyver603 commented Nov 29, 2018

Kafka reader duplicates reads under certain circumstances #15

Kafka reader duplicates reads under certain circumstances #15

Comments

macgyver603 commented Nov 29, 2018