Real Time Stock Market Analysis with kafka

Technology Used

Programming Language - Python
- dpkp/kafka-python: Python client for Apache Kafka
- fsspec/s3fs: S3 Filesystem
Apache Kafka
Amazon Web Service (AWS)
- EC2 - host kafka server
- S3 (Simple Storage Service) - store data
- Glue Crawler - create schema for data in S3
- Glue Catalog - store schema for data in S3
- Athena - Query data in S3

Using conda

Create Conda environment

  conda create --name env_name python=3.8

Activate the environment

  conda activate env_name

Install requirements

  pip install -r requirements.txt

To reproduce this project 👇

Login to AWS console
Create ec2 instance
- edit inbounds rules to allow custom -> my ip
- ssh into EC2 instance
- View kafla commands for installing and setting up Kafka.
configure aws account with aws cli and aws configure
- download csv file with credentials
create s3 bucket
- run kafka producer and consumer notebooks to simulate streaming
create crawler in AWS Glue
- choose s3 data source
- create role in IAM and provide AdministratorAccess
- create new database
- run crawler
athena
- create new s3 bucket to store output queries

use real-time API
create python program to run and show subscribing and publishing in terminal like below
- A small Kafka producer and consumer example

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
architecture.jpeg		architecture.jpeg
data.csv		data.csv
kafka-commands.md		kafka-commands.md
kafka_consumer.ipynb		kafka_consumer.ipynb
kafka_producer.ipynb		kafka_producer.ipynb
notes.md		notes.md
requirements.txt		requirements.txt
simulate_streaming.ipynb		simulate_streaming.ipynb