Skip to content

This project utilizes the Twitter API to ingest data related to users tweets for sentiment analysis. For this project I used various AWS services, Python, and Apache Airflow.

Notifications You must be signed in to change notification settings

claydoers/twitter-analysis-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Twitter Data Pipeline Project

Overview

This goal of this project was to securely ingest, streamline and perform analysis on raw data from Twitter using the Twitter API and further transform the data using an ETL process.

Goals

  • Data Ingestion
  • ETL
  • Data Lake
  • Automation
  • Cloud Processing
  • Tools used

  • Twitter API - How the data is accessed
  • Amazon S3 - Data lake/Object storage
  • Amazon EC2 - Cloud computing service to process our code so we arent processing it locally.
  • Apache Airflow - Workflow orchestration.
  • Python/Pandas - ETL/data transformation
  • Simple Architectual Diagram

    image

    Automated Pipeline

    This data pipeline is designed to run daily. The Airflow DAG is responsible for triggering the python script (twitter_etl.py). This ensures that the latest data from the Twitter API is fetched regularly.

    Upon successful extraction the ETL process is triggered for specified Twitter user. The data is transformed from JSON format to CSV using a Pandas Dataframe and then the object (CSV) is uploaded to the data lake (AWS S3).

    The transformed data can then be accessed using a visualization tool such as Tablaeu, Quicksight, Power BI, or Superset to build dashboards to conduct various types of analysis.

    Conclusion

    The purpose of this project was to utilize the Twitter API, AWS services, and Airflow to create an automated data pipeline that can efficiently process user/tweet data to be made available for analysis. This project showcases the versatility of AWS service in building robust and automated DE solutions.

    About

    This project utilizes the Twitter API to ingest data related to users tweets for sentiment analysis. For this project I used various AWS services, Python, and Apache Airflow.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published