DEMAN4 - GROUP 1 - CUP OF JOY - ETL pipeline

Team members:

Ramon Almeida
Derek Tak
Ghazala Rehman
Subha Vivekanandan
May Alavi

Project description

The problem the project was trying to solve:

The client's business is experiencing issues with collating and analysing the data they are producing at each branch, as their technical setup is limited. The software currently being used only generates reports for single branches which is time consuming to collate data on all branches. Gathering meaningful data for the company on the whole is difficult, due to the limitations of the software The company currently has no way of identifying trends, meaning they are potentially losing out on major revenue streams.

The requested solution from the client:

A fully scalable ETL (Extract, Transform, Load) pipeline to handle large volumes of transaction data for the business. This pipeline is to collect all the transaction data generated by each individual café and place it in a single location

How the pipeline will be used by the client:

The client will have 3 csv files uploaded to AWS, at 8pm everyday, for every branch, which is being stored into the Cafe Data S3Bucket. The tranformation Lambda will be triggered by the S3 event which is the cafe file being uploaded to the bucket. The Lambda will extract the csvs and will transform them. It will then send the transformed data to the Transformed Data S3Bucket. As soon as the transformed data is uploaded to the Transformed Data Bucket, the this will constitue another S3 event which will trigger the load Lambda which will load the data into Redshift. The client will then be able to visualise, query and monitor the date using Grafana and Metabase.

The technologies that we used:

AWS

-Lambda: enabled us to apply custom logic to the pipeline and works in real-time.

-S3 bucket: a durable and easy to navigate tool which enabled us to maintain our pipeline entirely on AWS.

-Redshift: a scalable and cost-effective database which was appropriate for this pipeline as the data was continuously growing.

-EC2: enabled containers consisting of external technologies to be hosted and accessed by all group members

Other

-Grafana: allows for bringing multiple data sources into one location which is why we used it for lambda metrics however it was not as effective in connecting to redshift.

-Metabase: easy to navigate and user-friendly tool to visualise the redshift database by using integrated SQL queries.

Some of the challenges that we faced and features we hope to implement in the future:

Challenges

-Having to rewrite our structure and code entirely at the end of Sprint1.
-Not having enough time for consistent and regular code reviews.
-Group struggled in ensuring all members get sufficient exposure and practice with the key elements of the pipeline equally.
-Struggled with using GitHub in a consistent manner and as was intended for the project.

Future enhancements

-Implementing a queue system.
-Using grafana for data visualisation successfully.
-Optimising the Lambda code.
-Using a test-driven approach.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
test		test
.gitignore		.gitignore
README.md		README.md
cup_of_joy_schema.sql		cup_of_joy_schema.sql
data.csv.txt		data.csv.txt
lambda_function.py		lambda_function.py
products_menu.csv		products_menu.csv
pytest.ini		pytest.ini
requirements.txt		requirements.txt
test_transform_normalise_function.py		test_transform_normalise_function.py
transform-normalise.py		transform-normalise.py
transform_normalise_function.py		transform_normalise_function.py
transform_normalise_pandas.py		transform_normalise_pandas.py
transform_normalise_pandas_v2.py		transform_normalise_pandas_v2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEMAN4 - GROUP 1 - CUP OF JOY - ETL pipeline

Team members:

Project description

The problem the project was trying to solve:

The requested solution from the client:

How the pipeline will be used by the client:

The technologies that we used:

AWS

Other

Some of the challenges that we faced and features we hope to implement in the future:

Challenges

Future enhancements

About

Releases

Packages

Languages

ramon-almeida/ETL-Pipeline-AWS

Folders and files

Latest commit

History

Repository files navigation

DEMAN4 - GROUP 1 - CUP OF JOY - ETL pipeline

Team members:

Project description

The problem the project was trying to solve:

The requested solution from the client:

How the pipeline will be used by the client:

The technologies that we used:

AWS

Other

Some of the challenges that we faced and features we hope to implement in the future:

Challenges

Future enhancements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages