Name		Name	Last commit message	Last commit date
parent directory ..
schemas		schemas
README.md		README.md
docker-compose.yaml		docker-compose.yaml

README.md

Datagen Docker Compose Demo

In this demo, we will show you how to use docker-compose to run multiple datagen instances and produce 30GB of data to a Kafka cluster.

Overview

The docker-compose.yaml file defines the following services:

redpanda: A single-node Kafka instance.
3 datagen instances that produce data to Redpanda simultaneously.

Datagen instances overview

Each datagen instance produces 10GB of random data to Redpanda using an auto incrementing key thanks to the iteration.index identifier in the schemas/schema.json file. This allows you to simulate an upsert source with a total of 30GB of data but only 10GB of unique data.

Example of the datagen instance configuration:

  datagen1:
    image: materialize/datagen:latest
    container_name: datagen1
    depends_on:
      - redpanda
    environment:
      KAFKA_BROKERS: redpanda:9092
    volumes:
      - ./schemas:/schemas
    entrypoint:
      datagen -s /tests/schema.json -f json -n 10024 --record-size 1048576 -d

Rundown of the datagen instance configuration:

image: The datagen Docker image.
container_name: The name of the container. This should be unique for each instance.
depends_on: The datagen instance depends on the redpanda service.
environment: The KAFKA_BROKERS environment variable is used to configure the Kafka/Redpanda brokers. If you are using a Kafka cluster with SASL authentication, you can also set the SASL_USERNAME, SASL_PASSWORD and SASL_MECHANISM environment variables.
volumes: The datagen instance mounts the schemas directory to the /schemas directory in the container. This is where we have the schema.json file.
entrypoint: The datagen command line arguments. The -s flag is used to specify the schema file. The -f flag is used to specify the output format. The -n flag is used to specify the number of records to generate. The --record-size flag is used to specify the size of each record. The -d flag is used to enable debug logging.

Prerequisites

Running the demo

Clone the datagen repository:

git clone https://github.com/MaterializeInc/datagen.git

cd datagen/examples/docker-compose

Start the demo:

docker-compose up -d

The demo will take a few minutes to start up. You should see the following output:

Creating network "docker-compose_default" with the default driver
Creating docker-compose_redpanda_1  ... done
Creating docker-compose_datagen_1   ... done
Creating docker-compose_datagen_2   ... done
Creating docker-compose_datagen_3   ... done

Verify that the demo is running:
```
docker-compose ps -a
```
Stopping the demo:
```
docker-compose down -v
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker-compose

docker-compose

README.md

Datagen Docker Compose Demo

Overview

Datagen instances overview

Prerequisites

Running the demo

Useful links

Files

docker-compose

Directory actions

More options

Directory actions

More options

Latest commit

History

docker-compose

Folders and files

parent directory

README.md

Datagen Docker Compose Demo

Overview

Datagen instances overview

Prerequisites

Running the demo

Useful links