-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor faker parsing and enable array relationships (#85)
* refactor faker parsing and enable array relationships * add helpful error message * update ecommerce example * slight change to array example * update ecommerce example * accommodate breaking change to pass tests * update readme * add warning about executing user input to readme * fix typo * beef up examples with blog example * bump version
- Loading branch information
1 parent
7441dc3
commit 801d383
Showing
17 changed files
with
453 additions
and
306 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Blog Demo | ||
|
||
This small example generates relational data for a blog where users make posts, and posts have comments by other users. | ||
|
||
## Inspect the Schema | ||
|
||
1. Take a moment to look at [blog.json](./blog.json) and make a prediction about what the output will look like. | ||
|
||
## Do a Dry Run | ||
|
||
Here is a command to do a dry run of a single iteration. | ||
|
||
``` | ||
datagen \ | ||
--dry-run \ | ||
--debug \ | ||
--schema examples/blog/blog.json \ | ||
--format avro\ | ||
--prefix mz_datagen_blog \ | ||
--number 1 | ||
``` | ||
|
||
Notice that in a single iteration, a user is created, and then 2 posts are created, and for each post, 2 comments are created. Then, since comments are made by users, 2 additional users are created. This happens in such a way that the value of a field in a parent record is passed to child records (eg if `users.id` is `5`, then each associated post will have `posts.user_id` equal to `5`). This makes it so downstream systems can perform meaningful joins. | ||
|
||
Also notice the number of unique primary keys of each collection are limited, so over time you will see each key appear multiple times. These can be interpreted in upstream systems as updates. | ||
|
||
## (Optional) Produce to Kafka | ||
|
||
See [.env.example](../../.env.example) to see the environment variables to connect to your Kafka cluster. | ||
If you use the `--format avro` option, you would also have to set environment variables to connect to your Schema Registry. | ||
|
||
After you set those, you can produce to your Kafka cluster. Press `Ctrl+C` when you are ready to stop the producer. | ||
|
||
``` | ||
datagen \ | ||
--schema examples/blog/blog.json \ | ||
--format avro \ | ||
--prefix mz_datagen_blog \ | ||
--number -1 | ||
``` | ||
|
||
When you are finished, you can delete all the topics and schema subjects with the `--clean` option. | ||
|
||
``` | ||
datagen \ | ||
--schema examples/blog/blog.json \ | ||
--format avro \ | ||
--prefix mz_datagen_blog \ | ||
--clean | ||
``` | ||
|
||
## (Optional) Query in Materialize | ||
|
||
Materialize is a [streaming database](https://materialize.com/guides/streaming-database/). You create materialized views with standard SQL and Materialize will eagerly read from Kafka topics and Postgres tables and keep your materialized views up to date automatically in response to new data. It's Postgres wire compatible, so you can read your materialized views directly with the `psql` CLI or any Postgres client library. | ||
|
||
See the [ecommerce example](../ecommerce/README.md) for a full end-to-end example where data is transformed in and served from Materialize in near real-time. | ||
|
||
### Learn More | ||
|
||
Check out the Materialize [docs](www.materialize.com/docs) and [blog](www.materialize.com/blog) for more! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
[ | ||
{ | ||
"_meta": { | ||
"topic": "users", | ||
"key": "id", | ||
"relationships": [ | ||
{ | ||
"topic": "posts", | ||
"parent_field": "id", | ||
"child_field": "user_id", | ||
"records_per": 2 | ||
} | ||
] | ||
}, | ||
"id": "faker.datatype.number(100)", | ||
"name": "faker.internet.userName()", | ||
"email": "faker.internet.exampleEmail()", | ||
"phone": "faker.phone.imei()", | ||
"website": "faker.internet.domainName()", | ||
"city": "faker.address.city()", | ||
"company": "faker.company.name()" | ||
}, | ||
{ | ||
"_meta": { | ||
"topic": "posts", | ||
"key": "id", | ||
"relationships": [ | ||
{ | ||
"topic": "comments", | ||
"parent_field": "id", | ||
"child_field": "post_id", | ||
"records_per": 2 | ||
} | ||
] | ||
}, | ||
"id": "faker.datatype.number(1000)", | ||
"user_id": "faker.datatype.number(100)", | ||
"title": "faker.lorem.sentence()", | ||
"body": "faker.lorem.paragraph()" | ||
}, | ||
{ | ||
"_meta": { | ||
"topic": "comments", | ||
"key": "id", | ||
"relationships": [ | ||
{ | ||
"topic": "users", | ||
"parent_field": "user_id", | ||
"child_field": "id", | ||
"records_per": 1 | ||
} | ||
] | ||
}, | ||
"id": "faker.datatype.number(2000)", | ||
"user_id": "faker.datatype.number(100)", | ||
"body": "faker.lorem.paragraph", | ||
"post_id": "faker.datatype.number(1000)", | ||
"views": "faker.datatype.number({min: 100, max: 1000})", | ||
"status": "faker.datatype.number(1)" | ||
} | ||
] |
Oops, something went wrong.