- Programming Language - Python
- Apache Kafka
- Amazon Web Service (AWS)
- EC2 - host kafka server
- S3 (Simple Storage Service) - store data
- Glue Crawler - create schema for data in S3
- Glue Catalog - store schema for data in S3
- Athena - Query data in S3
Using conda
Create Conda environment
conda create --name env_name python=3.8
Activate the environment
conda activate env_name
Install requirements
pip install -r requirements.txt
To reproduce this project 👇
- Login to AWS console
- Create ec2 instance
- edit inbounds rules to allow custom -> my ip
- ssh into EC2 instance
- View kafla commands for installing and setting up Kafka.
- configure aws account with
aws cli
andaws configure
- download csv file with credentials
- create s3 bucket
- run kafka producer and consumer notebooks to simulate streaming
- create crawler in AWS Glue
- choose s3 data source
- create role in IAM and provide AdministratorAccess
- create new database
- run crawler
- athena
- create new s3 bucket to store output queries
- use real-time API
- create python program to run and show subscribing and publishing in terminal like below