Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres #101

Merged
merged 8 commits into from
Jun 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,10 @@ SSL_KEY_LOCATION=
SCHEMA_REGISTRY_URL=
SCHEMA_REGISTRY_USERNAME=
SCHEMA_REGISTRY_PASSWORD=

# Postgres
export POSTGRES_HOST=
export POSTGRES_PORT=
export POSTGRES_DB=
export POSTGRES_USER=
export POSTGRES_PASSWORD=
7 changes: 7 additions & 0 deletions .github/workflows/integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,5 +47,12 @@ jobs:
- name: Clean Kafka topic and schema registry
run: docker exec datagen datagen -s /tests/schema.avsc -f avro -d --clean --prefix avro

# Postgres tests
- name: Produce data to Postgres with Faker.js
run: docker exec datagen datagen -s /tests/products.sql -f postgres -n 3

- name: Produce data to Postgres with multiple tables
run: docker exec datagen datagen -s /tests/schema2.sql -f postgres -n 3

- name: Docker Compose Down
run: docker compose down -v
39 changes: 36 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Datagen CLI

This command line interface application allows you to take schemas defined in JSON (`.json`), Avro (`.avsc`), or SQL (`.sql`) and produce believable fake data to Kafka in JSON or Avro format.
This command line interface application allows you to take schemas defined in JSON (`.json`), Avro (`.avsc`), or SQL (`.sql`) and produce believable fake data to Kafka in JSON or Avro format or to Postgres.

The benefits of using this datagen tool are:
- You can specify what values are generated using the expansive [FakerJS API](https://fakerjs.dev/api/) to craft data that more faithfully imitates your use case. This allows you to more easily apply business logic downstream.
Expand All @@ -10,6 +10,8 @@ The benefits of using this datagen tool are:

> :construction: Specifying relationships between datasets currently requires using JSON for the input schema.

> :construction: The `postgres` output format currently does not support specifying relationships between datasets.

## Installation

### npm
Expand Down Expand Up @@ -56,6 +58,13 @@ SSL_KEY_LOCATION=
SCHEMA_REGISTRY_URL=
SCHEMA_REGISTRY_USERNAME=
SCHEMA_REGISTRY_PASSWORD=

# Postgres
POSTGRES_HOST=
POSTGRES_PORT=
POSTGRES_DB=
POSTGRES_USER=
POSTGRES_PASSWORD=
```

The `datagen` program will read the environment variables from `.env` in the current working directory.
Expand All @@ -75,7 +84,7 @@ Fake Data Generator
Options:
-V, --version output the version number
-s, --schema <char> Schema file to use
-f, --format <char> The format of the produced data (choices: "json", "avro", default: "json")
-f, --format <char> The format of the produced data (choices: "json", "avro", "postgres", default: "json")
-n, --number <char> Number of records to generate. For infinite records, use -1 (default: "10")
-c, --clean Clean (delete) Kafka topics and schema subjects previously created
-dr, --dry-run Dry run (no data will be produced to Kafka)
Expand Down Expand Up @@ -239,12 +248,36 @@ CREATE TABLE "ecommerce"."products" (
"merchant_id" int NOT NULL COMMENT 'faker.datatype.number()',
"price" int COMMENT 'faker.datatype.number()',
"status" int COMMENT 'faker.datatype.boolean()',
"created_at" datetime DEFAULT (now())
"created_at" timestamp DEFAULT (now())
);
```

This will produce the desired mock data to the topic `ecommerce.products`.

#### Producing to Postgres

You can also produce the data to a Postgres database. To do this, you need to specify the `-f postgres` option and provide Postgres connection information in the `.env` file. Here is an example `.env` file:

```
# Postgres
export POSTGRES_HOST=
export POSTGRES_PORT=
export POSTGRES_DB=
export POSTGRES_USER=
export POSTGRES_PASSWORD=
```

Then, you can run the following command to produce the data to Postgres:

```bash
datagen \
-s tests/products.sql \
-f postgres \
-n 1000
```

> :warning: You can only produce to Postgres with a SQL schema.

### Avro Schema

> :construction: Avro input schema currently does not support arbitrary FakerJS methods. Instead, data is randomly generated based on the type.
Expand Down
7 changes: 4 additions & 3 deletions datagen.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ program
.requiredOption('-s, --schema <char>', 'Schema file to use')
.addOption(
new Option('-f, --format <char>', 'The format of the produced data')
.choices(['json', 'avro'])
.choices(['json', 'avro', 'postgres'])
.default('json')
)
.addOption(
Expand Down Expand Up @@ -113,11 +113,12 @@ if (!global.wait) {
process.exit(0);
}

// Generate data

await dataGenerator({
format: options.format,
schema: parsedSchema,
iterations: options.number
iterations: options.number,
initialSchema: options.schema
})

await end();
Expand Down
14 changes: 14 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,15 @@ services:
- 8082:8082
healthcheck: {test: curl -f localhost:9644/v1/status/ready, interval: 1s, start_period: 30s}

postgres:
image: postgres:14.0
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
ports:
- 5432:5432

datagen:
build: .
container_name: datagen
Expand All @@ -33,6 +42,11 @@ services:
environment:
SCHEMA_REGISTRY_URL: http://redpanda:8081
KAFKA_BROKERS: redpanda:9092
POSTGRES_HOST: postgres
POSTGRES_PORT: 5432
POSTGRES_DB: postgres
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
volumes:
- ./tests:/tests
# Override the entrypoint to run the container and keep it running
Expand Down
Loading