Loading JSON and AVRO data from Confluent Cloud Kafka into StarRocks #22791
Closed
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This tutorial describes how you can load AVRO and JSON data from Confluent Cloud into StarRocks. This tutorial uses StarRocks' Routine Load and NOT the Kafka Sink Connector.
Sept 2023 update: StarRocks released a StarRocks Kafka Connector. See https://docs.starrocks.io/en-us/latest/loading/Kafka-connector-starrocks for details.
Prerequisites
For this tutorial you need to:
A StarRocks or CelerData database cluster
This is out of scope for the tutorial.
A Confluent Cloud cluster
This is out of scope for the tutorial.
Create a Kafka topic
You can use the UI to create the topic or create an API key, connect to confluent and then execute
confluent kafka topic create quickstart
.[JSON] Generate test data in the Kafka topic
Create a Confluent Datagen source connector to create sample clickstream data. We are not using Kafka Schema Registry.
Sample JSON data will look like this:
[JSON] Create a Kafka client
Create a C/C++ client connection. StarRocks will act like the C/C++ client. What it'll do is give you the info you need to supply for the StarRocks connection. Make sure you also create Kafka Cluster API key. That will be the info you need to login from StarRocks to Confluent Cloud. Save all the configuration data. You will need it for the future steps.
[JSON] Create a database, a table and query the data.
Create the database.
Create the aggregate table.
Now load the data.
Tip: We use the setting "kafka_broker_list". It is the same as what Confluent calls "bootstrap.servers".
Check "routine load" job status by typing in
It should show you this
Once the data has been loaded, you can do a count.
You will see something like this
[AVRO] Generate test data in the Kafka topic
You must enable Kafka Schema Registry to use AVRO. Create a Confluent Datagen source connector to create sample order data.
Sample AVRO data will look like this:
[AVRO] Create a Kafka client
Create a C/C++ client connection. StarRocks will act like the C/C++ client. What it'll do is give you the info you need to supply for the StarRocks connection. Make sure you also create a Kafka Cluster API key AND a Kafka Schema Registry API key. That will be the info you need to login from StarRocks to Confluent Cloud. Save all the configuration data. You will need it for the future steps.
Tip: The C/C++ client connection doesn't show you the schema registry URI or gives you the box to create the schema registry API key. If you click on the Python client connection, it'll give you that info and will have the box that can create your schema registry key.
[AVRO] Create a database, a table and query the data.
Create the database.
Create the aggregate table.
Now load the data.
Tip: We use the setting "kafka_broker_list". It is the same as what Confluent calls "bootstrap.servers".
Check "routine load" job status by typing in
It should show you this
Once the data has been loaded, you can do a count.
You will see something like this
Beta Was this translation helpful? Give feedback.
All reactions