Twitter Data Pipeline Project

Overview

This goal of this project was to securely ingest, streamline and perform analysis on raw data from Twitter using the Twitter API and further transform the data using an ETL process.

Goals

Data Ingestion

ETL

Data Lake

Automation

Cloud Processing

Tools used

Twitter API - How the data is accessed

Amazon S3 - Data lake/Object storage

Amazon EC2 - Cloud computing service to process our code so we arent processing it locally.

Apache Airflow - Workflow orchestration.

Python/Pandas - ETL/data transformation

Simple Architectual Diagram

Automated Pipeline

This data pipeline is designed to run daily. The Airflow DAG is responsible for triggering the python script (twitter_etl.py). This ensures that the latest data from the Twitter API is fetched regularly.

Upon successful extraction the ETL process is triggered for specified Twitter user. The data is transformed from JSON format to CSV using a Pandas Dataframe and then the object (CSV) is uploaded to the data lake (AWS S3).

The transformed data can then be accessed using a visualization tool such as Tablaeu, Quicksight, Power BI, or Superset to build dashboards to conduct various types of analysis.

Conclusion

The purpose of this project was to utilize the Twitter API, AWS services, and Airflow to create an automated data pipeline that can efficiently process user/tweet data to be made available for analysis. This project showcases the versatility of AWS service in building robust and automated DE solutions.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
twitter_etl.ipynb		twitter_etl.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Data Pipeline Project

Overview

Goals

Tools used

Simple Architectual Diagram

Automated Pipeline

Conclusion

About

Releases

Packages

Languages

claydoers/twitter-analysis-project

Folders and files

Latest commit

History

Repository files navigation

Twitter Data Pipeline Project

Overview

Goals

Tools used

Simple Architectual Diagram

Automated Pipeline

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages