datapipeline

This project focused on the setup and configuration of a data pipeline consisting of Kafka, Python, and MongoDB.

Setup Kafka

Download the tarball file

http://ftp.heanet.ie/mirrors/www.apache.org/dist/kafka/0.9.0.1/kafka-0.9.0.1-src.tgz

Create the log directories

/usr/local/kafka/logs/kafka_logs_1

Adjust the configuration file

/usr/local/kafka/config/server1.properties

Launch zookeeper

bin/zookeeper-server-start.sh config/zookeeper.properties &

Launch kafka server

bin/kafka-server-start.sh config/server1.properties &

Create a kafka topic

bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 2 --replication-factor 2 --name first

Write Python Producer and Consumer

Setup MongoDB

mongod --config /etc/conf/datapipeline.conf

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
datapipeline.conf		datapipeline.conf
kafkaConsumer.py		kafkaConsumer.py
kafkaProducer.py		kafkaProducer.py

Provide feedback