The world of data engineering is ever-changing, with new tools and technologies emerging regularly. Building an effective analytics platform can be a daunting task, especially if you’re not familiar with all the tools available. How do you turn scattered, complex data into a model that drives insights and decision-making? In this project, I explored best practices such as data modeling, testing, documentation, and version control. I efficiently extract, load, and transform data into a unified, analytics-ready format. This is the construction of a robust data pipeline for a fictional e-commerce company, implementing best practices in data engineering.
- BigQuery
- dbt
- Docker
- Airbyte
- Dagster
Ensure you have Python 3 installed. If not, you can download and install it from Python's official website.
- Fork the Repository:
- Click the "Fork" button on the top right corner of this repository.
- Clone the repository:
git clone https://github.com/YOUR_USERNAME/data-engineering.git
- Note: Replace YOUR_USERNAME with your GitHub username
- Navigate to the directory:
cd data engineering
- Set Up a Virtual Environment:
- For Mac:
python3 -m venv venv
source venv/bin/activate
- For Windows:
python -m venv venv
.\venv\Scripts\activate
- For Mac:
- Install Dependencies:
pip install -e ".[dev]"