In this demo, we will show you how to use docker-compose
to run multiple datagen
instances and produce 30GB of data to a Kafka cluster.
The docker-compose.yaml
file defines the following services:
redpanda
: A single-node Kafka instance.- 3
datagen
instances that produce data to Redpanda simultaneously.
Each datagen
instance produces 10GB of random data to Redpanda using an auto incrementing key thanks to the iteration.index
identifier in the schemas/schema.json
file. This allows you to simulate an upsert source with a total of 30GB of data but only 10GB of unique data.
Example of the datagen
instance configuration:
datagen1:
image: materialize/datagen:latest
container_name: datagen1
depends_on:
- redpanda
environment:
KAFKA_BROKERS: redpanda:9092
volumes:
- ./schemas:/schemas
entrypoint:
datagen -s /tests/schema.json -f json -n 10024 --record-size 1048576 -d
Rundown of the datagen
instance configuration:
image
: Thedatagen
Docker image.container_name
: The name of the container. This should be unique for each instance.depends_on
: Thedatagen
instance depends on theredpanda
service.environment
: TheKAFKA_BROKERS
environment variable is used to configure the Kafka/Redpanda brokers. If you are using a Kafka cluster with SASL authentication, you can also set theSASL_USERNAME
,SASL_PASSWORD
andSASL_MECHANISM
environment variables.volumes
: Thedatagen
instance mounts theschemas
directory to the/schemas
directory in the container. This is where we have theschema.json
file.entrypoint
: Thedatagen
command line arguments. The-s
flag is used to specify the schema file. The-f
flag is used to specify the output format. The-n
flag is used to specify the number of records to generate. The--record-size
flag is used to specify the size of each record. The-d
flag is used to enable debug logging.
-
Clone the
datagen
repository:git clone https://github.com/MaterializeInc/datagen.git cd datagen/examples/docker-compose
-
Start the demo:
docker-compose up -d
The demo will take a few minutes to start up. You should see the following output:
Creating network "docker-compose_default" with the default driver Creating docker-compose_redpanda_1 ... done Creating docker-compose_datagen_1 ... done Creating docker-compose_datagen_2 ... done Creating docker-compose_datagen_3 ... done
-
Verify that the demo is running:
docker-compose ps -a
-
Stopping the demo:
docker-compose down -v