MrTweety Analytic

Analyses tweets in real-time to figure out the most popular hashtags.

Uses Apache Spark and a Twitter Hosebird Client to retrieve top 5 tweets in a sliding window of 15 minutes. The result is stored in a JSON file and a small JS app is utilizing it to update a simple website.

See the site live at feud.eu.

Docker

Building

The app is designed to run as a single Docker container.

To build all the jars and generate the Dockerfile, use:

./gradlew build

Then build the Docker image:

docker build -t mrtweety-analytic .

Running

To run the Docker container, use your favourite Docker tool, or simply:

docker run --name mrtweety-analytic -d \
    -p 9090:80 \
    -e TWITTER_CONSUMER_KEY=xxx \
    -e TWITTER_CONSUMER_SECRET=xxx \
    -e TWITTER_ACCESS_TOKEN=xxx \
    -e TWITTER_ACCESS_TOKEN_SECRET=xxx \
    mrtweety-analytic

Replace xxx with appropriate Twitter API keys. Then point your browser to localhost:9090 and see the result.

The Docker container packages together Kafka producer, Spark app and a website with JS.

About

Kafka producer

Producer uses Twitter HBC to pipe tweets from Twitter’s streaming APIs into the Kafka messaging system.

Module takes tweets from the Twitter Streaming API and publishes them to the Kafka messaging system.

For the Kafka to run, you first need to start Zookeeper:

bin/zookeeper-server-start.sh config/zookeeper.properties

then start Kafka itself:

bin/kafka-server-start.sh config/server.properties

The following command is useful for looking at messages passing through Kafka in console:

bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic tweet

Spark app

A single-node Spark app receiving data from Kafka and processing them real-time. Spark Streaming is used. The processed result is being stored in a JSON file every 10 seconds.

Build a "fat jar" like this:

./gradlew shadowJar

Optionally you can change the default path of the result JSON file by setting the environment variable RESULT_FILENAME, e.g. like this:

export RESULT_FILENAME=/tmp/analytic.json

Finally submit the "fat jar" to your Spark for execution:

spark-submit --class "cz.zee.mrtweety.analytic.spark.SparkApplication" --master 'local[4]' spark/build/libs/spark-all.jar

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
gradle/wrapper		gradle/wrapper
producer		producer
spark		spark
web		web
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.adoc		README.adoc
build.gradle		build.gradle
docker-push.sh		docker-push.sh
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MrTweety Analytic

Docker

Building

Running

About

Kafka producer

Spark app

About

Releases

Packages

Languages

License

kubahorak/mrtweety-analytic

Folders and files

Latest commit

History

Repository files navigation

MrTweety Analytic

Docker

Building

Running

About

Kafka producer

Spark app

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages