Skip to content

Commit

Permalink
Add datagen Kubernetes guide (#123)
Browse files Browse the repository at this point in the history
  • Loading branch information
bobbyiliev authored Nov 26, 2024
1 parent d302e28 commit b7780ee
Show file tree
Hide file tree
Showing 4 changed files with 350 additions and 6 deletions.
13 changes: 7 additions & 6 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,12 @@

This directory contains end-to-end tutorials for the `datagen` tool.

| Tutorial | Description |
| -------- | ----------- |
| [ecommerce](ecommerce) | A tutorial for the `datagen` tool that generates data for an ecommerce website. |
| [docker-compose](docker-compose) | A `docker-compose` setup for the `datagen`. |
| [blog](blog) | Sample data for a blog with users, posts, and comments. |
| [webhook](webhook) | A tutorial for the `datagen` tool that generates data for a webhook. |
| Tutorial | Description |
| -------------------------------- | ------------------------------------------------------------------------------------------------ |
| [ecommerce](ecommerce) | A tutorial for the `datagen` tool that generates data for an ecommerce website. |
| [docker-compose](docker-compose) | A `docker-compose` setup for the `datagen`. |
| [blog](blog) | Sample data for a blog with users, posts, and comments. |
| [webhook](webhook) | A tutorial for the `datagen` tool that generates data for a webhook. |
| [kubernetes](kubernetes) | A tutorial for the `datagen` tool that deploys to Kubernetes alongside a Redpanda Kafka cluster. |

To request a new tutorial, please [open an issue](https://github.com/MaterializeInc/datagen/issues/new?assignees=&labels=feature%2C+enhancement&template=feature_request.md&title=Feature%3A+).
221 changes: 221 additions & 0 deletions examples/kubernetes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
# Kubernetes Example

This example demonstrates how to deploy the datagen tool to Kubernetes alongside a Redpanda Kafka cluster.

## Overview

The example includes:
- A single-node Redpanda deployment for Kafka
- A datagen deployment that produces data to Redpanda
- ConfigMap to store the datagen schema
- Associated Kubernetes services

## Prerequisites

- A Kubernetes cluster
- `kubectl` configured to interact with your cluster
- Basic understanding of Kubernetes concepts (Deployments, Services, ConfigMaps)

## Setup

1. First, create a namespace for our resources (if not already exists):

```bash
kubectl create namespace materialize
```

2. Apply the Kubernetes manifests, which will create the datagen and Redpanda deployments:

```bash
kubectl apply -f examples/kubernetes/datagen.yaml
kubectl apply -f examples/kubernetes/redpanda.yaml
```

## Manifest Details

The deployment consists of several Kubernetes resources. Let's examine each one:

### 1. Schema ConfigMap

This ConfigMap stores the schema definition that datagen will use to generate data:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: datagen-schema
namespace: materialize
data:
schema.json: |
[
{
"_meta": {
"topic": "mz_datagen_test"
},
"id": "iteration.index",
"name": "faker.internet.userName()"
}
]
```
You can customize the schema to generate different data. For more information, see the datagen [README](../../README.md) file.
### 2. Datagen Deployment
The datagen deployment uses the official `materialize/datagen` image and mounts the schema `ConfigMap`:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: datagen
namespace: materialize
spec:
replicas: 1
selector:
matchLabels:
app: datagen
template:
metadata:
labels:
app: datagen
spec:
containers:
- name: datagen
image: materialize/datagen:latest
args:
[
"datagen",
"-s", "/schemas/schema.json",
"-f", "json",
"-n", "10024",
"-w", "2000",
"-d"
]
env:
- name: KAFKA_BROKERS
value: "redpanda.materialize.svc.cluster.local:9092"
volumeMounts:
- name: datagen-schema-volume
mountPath: /schemas
readOnly: true
volumes:
- name: datagen-schema-volume
configMap:
name: datagen-schema
```

### 3. Redpanda Deployment and Service

The Redpanda deployment provides a Kafka-compatible message broker:

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redpanda
namespace: materialize
spec:
replicas: 1
selector:
matchLabels:
app: redpanda
template:
metadata:
labels:
app: redpanda
spec:
containers:
- name: redpanda
image: docker.vectorized.io/vectorized/redpanda:v23.3.5
command: ["/usr/bin/rpk"]
args: [
"redpanda",
"start",
"--overprovisioned",
"--smp", "1",
"--memory", "1G",
"--reserve-memory", "0M",
"--node-id", "0",
"--check=false",
"--kafka-addr", "0.0.0.0:9092",
"--advertise-kafka-addr", "redpanda.materialize.svc.cluster.local:9092",
"--pandaproxy-addr", "0.0.0.0:8082",
"--advertise-pandaproxy-addr", "redpanda.materialize.svc.cluster.local:8082",
"--set", "redpanda.enable_transactions=true",
"--set", "redpanda.enable_idempotence=true",
"--set", "redpanda.auto_create_topics_enabled=true",
"--set", "redpanda.default_topic_partitions=1"
]
ports:
- containerPort: 9092
- containerPort: 8081
- containerPort: 8082
livenessProbe:
httpGet:
path: /v1/status/ready
port: 9644
initialDelaySeconds: 30
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: redpanda
namespace: materialize
spec:
selector:
app: redpanda
ports:
- name: kafka
protocol: TCP
port: 9092
targetPort: 9092
- name: pandaproxy
protocol: TCP
port: 8082
targetPort: 8082
```

## Verifying the Deployment

1. Check if the pods are running:

```bash
kubectl get pods -n materialize
```

2. View datagen logs:

```bash
kubectl logs -f deployment/datagen -n materialize
```

3. View Redpanda logs:

```bash
kubectl logs -f deployment/redpanda -n materialize
```

## Scaling

You can scale the datagen deployment to produce more data in parallel:

```bash
kubectl scale deployment datagen -n materialize --replicas=3
```

## Cleanup

To remove all resources:

```bash
kubectl delete namespace materialize
```

## Useful Links

- [Materialize documentation](https://materialize.com/docs/)
- [Materialize community Slack](https://materialize.com/s/chat)
- [Materialize Blog](https://materialize.com/blog/)
- [Kubernetes documentation](https://kubernetes.io/docs/home/)
56 changes: 56 additions & 0 deletions examples/kubernetes/datagen.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: datagen-schema
namespace: materialize
data:
schema.json: |
[
{
"_meta": {
"topic": "mz_datagen_test"
},
"id": "iteration.index",
"name": "faker.internet.userName()"
}
]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: datagen
namespace: materialize
spec:
replicas: 1
selector:
matchLabels:
app: datagen
template:
metadata:
labels:
app: datagen
spec:
containers:
- name: datagen
image: materialize/datagen:latest
args:
[
"datagen",
"-s", "/schemas/schema.json",
"-f", "json",
"-n", "10024",
"-w", "2000",
"-d"
]
env:
- name: KAFKA_BROKERS
value: "redpanda.materialize.svc.cluster.local:9092"
volumeMounts:
- name: datagen-schema-volume
mountPath: /schemas
readOnly: true
volumes:
- name: datagen-schema-volume
configMap:
name: datagen-schema
66 changes: 66 additions & 0 deletions examples/kubernetes/redpanda.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: redpanda
namespace: materialize
spec:
replicas: 1
selector:
matchLabels:
app: redpanda
template:
metadata:
labels:
app: redpanda
spec:
containers:
- name: redpanda
image: docker.vectorized.io/vectorized/redpanda:v23.3.5
command: ["/usr/bin/rpk"]
args: [
"redpanda",
"start",
"--overprovisioned",
"--smp", "1",
"--memory", "1G",
"--reserve-memory", "0M",
"--node-id", "0",
"--check=false",
"--kafka-addr", "0.0.0.0:9092",
"--advertise-kafka-addr", "redpanda.materialize.svc.cluster.local:9092",
"--pandaproxy-addr", "0.0.0.0:8082",
"--advertise-pandaproxy-addr", "redpanda.materialize.svc.cluster.local:8082",
"--set", "redpanda.enable_transactions=true",
"--set", "redpanda.enable_idempotence=true",
"--set", "redpanda.auto_create_topics_enabled=true",
"--set", "redpanda.default_topic_partitions=1"
]
ports:
- containerPort: 9092
- containerPort: 8081
- containerPort: 8082
livenessProbe:
httpGet:
path: /v1/status/ready
port: 9644
initialDelaySeconds: 30
periodSeconds: 10

---
apiVersion: v1
kind: Service
metadata:
name: redpanda
namespace: materialize
spec:
selector:
app: redpanda
ports:
- name: kafka
protocol: TCP
port: 9092
targetPort: 9092
- name: pandaproxy
protocol: TCP
port: 8082
targetPort: 8082

0 comments on commit b7780ee

Please sign in to comment.