kafka-streams-multi-runner

This main purpose of this project is to reproduce a bug in kafka-streams present at least in version 3.3.1 and likely since earlier.

Details of the bug

I found that in a production kafka-streams application, during deployments when partitions move from one application instance to another, state stores returned stale data.

This project can be used to reproduce the bug by running multiple instances of a kafka streams app with the ability to start and stop instances in an orderly fashion.

Using this I have been able to reproduce the bug in the following setting:

processor API based app
number of standby tasks of 1
cache enabled key-value state store

The interaction between caching and standby tasks is the cause of this bug. When an active task becomes a standby, the restoration doesn't invalidate the cache, leading to stale values returned by the cached key-value store when the task becomes active again on the same instance.

Setup

This project has two sub projects - coordinator and worker. The worker is a processor-api based kafka-streams app and the coordinator coordinates the running of multiple instances of the worker.

When the coordinator is run, it creates an input topic and produces certain messages in that topic. The worker stores some state in a state store and makes some assertions on what it expects the state to be.

The coordinator starts and stops instances of worker according to a "program", a list of integers. An integer in a program toggles the running state of the instance every 15 seconds. For example, the program "1 2 1" results in the following sequence of execution

start instance 1 -> wait 15 seconds -> start instance 2 -> wait 15 seconds -> stop instance 1

How to run the project

You'll need to install sbt.

The coordinator expects a worker JAR to be present in a certain location in the project. Run sbt "worker/assembly" to build the worker JAR.
The worker expects kafka to be running on localhost:9085 with plaintext auth.
Run sbt "coordinator / run 1 2 3" to run the program "1 2 3" in the coordinator. This particular program reproduces the said bug. If no arguments are passed to the coordinator, it runs a random program.

The coordinator will output the following

    appid: 5e23aef6-b9f1-43a1-ba3e-ce7feedd99e6
    [0] starting instance +1
    [1] starting instance +2
    [2] starting instance +3
    [0]!!! BROKEN !!! Expected 58 but found 31 partition: 3
    [1]!!! BROKEN !!! Expected 64 but found 58 partition: 1

The number in square brackets is the "program-counter". You can find worker logs for this instance in {app_id}.{program-counter}.log

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
coordinator		coordinator
project		project
worker		worker
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kafka-streams-multi-runner

Details of the bug

Setup

How to run the project

About

Releases

Packages

Languages

balajirrao/kafka-streams-multi-runner

Folders and files

Latest commit

History

Repository files navigation

kafka-streams-multi-runner

Details of the bug

Setup

How to run the project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages