-
Notifications
You must be signed in to change notification settings - Fork 6
Twitter to Kafka pipeline
This tutorial shows how to set up a very basic pipeline which simply reads content from a source and copies it into a destination (here: from Twitter Stream API to kafka topic).
To learn more about the configuration format, please read the associated documentation on that topic.
We assume that you have a standalone processing node up and running. If not, see the build and deployment instruction for guidance.
The pipeline set up in this use case is named twitter-to-kafka.
As this pipeline serves as bridge between a source and a sink it has only two components configured which need one queue to exchange data. The queue used in this case is named twitter-content according to its contents.
The pipeline configures two components: a source and an emitter. The source component establishes a connection with the Twitter Streaming API using the spqr-twitter source. All data is handed over to the Kafka emitter which exports the contents using the spqr-kafka emitter.
To put the pipeline to life, we assume that you have a Kafka instance available somewhere and that a processing node is running in standalone mode (to keep things simple).
As the processing node comes with a set of pre-installed components, all you have to do, is POST the configuration shown below to processing node running at port 7070 (localhost): (configuration is assumed to live inside file twitter-to-kafka.json)
curl -H "Content-Type: application/json" -X POST -d @twitter-to-kafka.json http://localhost:7070/pipelines/
The response should look like the following and the Kafka topic should receive incoming data:
{
state: "OK"
msg: ""
pid: "twitter-to-kafka"
}
{
"id" : "twitter-to-kafka",
"queues" : [ {
"id" : "twitter-content",
"queueSettings" : null
} ],
"components" : [ {
"id" : "twitter-stream-reader",
"type" : "SOURCE",
"name" : "twitterSource",
"version" : "0.0.1",
"settings" : {
"twitter.consumer.key" : "<your_consumer_key>",
"twitter.consumer.secret" : "<your_consumer_secret>",
"twitter.token.key" : "<your_token_key>",
"twitter.token.secret" : "<your_token_secret>",
"twitter.tweet.terms" : "fifa,uefa,soccer"
},
"fromQueue" : "",
"toQueue" : "twitter-content"
}, {
"id" : "kafka-topic-emitter",
"type" : "EMITTER",
"name" : "kafkaEmitter",
"version" : "0.0.1",
"settings" : {
"clientId" : "twitterToKafka",
"topic" : "twitter",
"metadataBrokerList" : "localhost:9092",
"zookeeperConnect" : "localhost:2181",
"messageAcking" : "false",
"charset" : "UTF-8"
},
"fromQueue" : "twitter-content",
"toQueue" : ""
} ]
}
SPQR - stream processing and querying in realtime by Otto Group