Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Branch #56

Merged
merged 48 commits into from
May 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
e955fff
Merge pull request #46 from IvanildoBarauna/feat-UpdateReadme
IvanildoBarauna May 20, 2024
514cca1
chore: Update WorkFlow and Jupyter Notebook
May 20, 2024
725ba7b
Add generated HTML to README
actions-user May 20, 2024
9cfefc7
Merge pull request #47 from IvanildoBarauna/fix-FolderToDeploy
IvanildoBarauna May 20, 2024
43b37a6
Merge pull request #48 from IvanildoBarauna/main
IvanildoBarauna May 20, 2024
cb078ac
chore: Adjust gitatts
IvanildoBarauna May 20, 2024
828c8d9
Update Readme
IvanildoBarauna May 20, 2024
4089256
Update Readme
IvanildoBarauna May 20, 2024
2d3013f
chore: Remove commented lines from .gitattributes file
IvanildoBarauna May 20, 2024
15b6d82
Merge pull request #49 from IvanildoBarauna/feature-MVCImplement
IvanildoBarauna May 20, 2024
ea6a443
chore: Update .gitattributes file to set Python as the language for H…
IvanildoBarauna May 20, 2024
89408dd
Merge branch 'main' into feature-MVCImplement
IvanildoBarauna May 20, 2024
2e3fd37
Merge pull request #50 from IvanildoBarauna/feature-MVCImplement
IvanildoBarauna May 20, 2024
5f1692e
Refactor: pipeline execution and parameter validation
IvanildoBarauna May 21, 2024
f509300
refactor: separate ParamsValidator as a standalone component
IvanildoBarauna May 21, 2024
f158776
chore: Remove unused code and update API endpoints
IvanildoBarauna May 21, 2024
de4f6a4
refactor: change path to test
IvanildoBarauna May 21, 2024
4549b7e
refactor: Separate ParamsValidator into standalone component
IvanildoBarauna May 21, 2024
e7e94bc
chore: Bump version to v4.4.0 and update dependencies
IvanildoBarauna May 21, 2024
3ea08c0
refactor: Added Transform Component
IvanildoBarauna May 22, 2024
bb4b4bd
refactor: Update requests dependency to version 2.32.2
IvanildoBarauna May 22, 2024
00dae48
Add generated HTML to README
actions-user May 22, 2024
9db9735
refactor: Update dependencies and improve CI/CD workflow
IvanildoBarauna May 22, 2024
e42d490
Merge branch 'feature-ArchChanges' of github.com:IvanildoBarauna/ETL-…
IvanildoBarauna May 22, 2024
6e2b31b
refactor: Update extraction and transformation tests
IvanildoBarauna May 22, 2024
51298f8
tests: ParamsValidator
IvanildoBarauna May 22, 2024
96e88c0
Refactor: Remove Output Path from extracted_files variable
IvanildoBarauna May 22, 2024
cf80fd5
Refactor: Tests
IvanildoBarauna May 22, 2024
ad8b2eb
Refactor: Tests
IvanildoBarauna May 22, 2024
4fb366b
Refactor: Tests
IvanildoBarauna May 22, 2024
eb73447
Refactor: Tests
IvanildoBarauna May 22, 2024
9c9f9e3
chore: Tests in CI
IvanildoBarauna May 22, 2024
ac1cc15
Refactor: Add validation for case in many parameters
IvanildoBarauna May 22, 2024
8418437
Add load module
IvanildoBarauna May 22, 2024
b6352a6
feat: Added Queue from processing and Load Data with paralelism
IvanildoBarauna May 22, 2024
f3e1e14
Refactor for respect single-responsability in transform module
IvanildoBarauna May 22, 2024
6c32918
fix: quantity of params
IvanildoBarauna May 22, 2024
da3fdee
feat: Addded progress bar for visualize paralelism
IvanildoBarauna May 22, 2024
99a6238
refact: name of variables and CamelCase components
IvanildoBarauna May 23, 2024
2369e06
refact: rename objects
IvanildoBarauna May 23, 2024
c4bca33
chore: rename entrypoint and internalizing objects
IvanildoBarauna May 23, 2024
40d9148
feat: Refactor CI/CD workflow to use Poetry for dependency management…
IvanildoBarauna May 23, 2024
3c6be5c
refactor: Remove unused code in setup.py
IvanildoBarauna May 23, 2024
71c4089
Refactor Dockerfile CMD to use "etl/run.py" as the entrypoint
IvanildoBarauna May 23, 2024
de1731e
Merge pull request #51 from IvanildoBarauna/feature-ArchChanges
IvanildoBarauna May 23, 2024
8a05499
feat: Update Readme
IvanildoBarauna May 23, 2024
ad71a14
feat: Update Readme
IvanildoBarauna May 23, 2024
461c363
Merge pull request #55 from IvanildoBarauna/feature-UpdateReadme
IvanildoBarauna May 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Arquivo .gitattributes

# Permite a detecção linguística apenas para células de código
*.ipynb linguist-language=Python
*.html linguist-language=Jupyter Notebook
# *.ipynb linguist-language=Python
*.html linguist-language=Python

# Indica que apenas as células de código devem ser detectadas
*.ipynb.diff linguist-language=Python
*.ipynb.merge linguist-language=Python
# # Indica que apenas as células de código devem ser detectadas
*.ipynb.diff linguist-language=Jupyter Notebook
*.ipynb.merge linguist-language=Jupyter Notebook
135 changes: 66 additions & 69 deletions .github/workflows/CI-CD.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,82 +5,78 @@ on:
branches:
- main
paths-ignore:
- '**/README.md'
- '**/CONTRIBUTING.md'
- '**/CODE_OF_CONDUCT.md'
- '.github/**'
- 'docs/**'
- '**/.editorconfig'
- '**/.gitignore'
- '**/LICENSE'
- '**/CREDITS'
- "**/README.md"
- "**/CONTRIBUTING.md"
- "**/CODE_OF_CONDUCT.md"
- ".github/**"
- "docs/**"
- "**/.editorconfig"
- "**/.gitignore"
- "**/LICENSE"
- "**/CREDITS"

workflow_dispatch:

jobs:
test:
if: github.actor != 'actions[bot]'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.head_ref }}

- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
# Gerar o arquivo .env
echo "SERVER_URL=https://economia.awesomeapi.com.br" > .env

- name: Run tests
run: pytest
- uses: actions/checkout@v2
with:
ref: ${{ github.head_ref }}

- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.9

- name: Install and Run Tests
run: |
python -m pip install --upgrade pip
python -m pip install poetry
poetry install
poetry run pytest

build:
needs: test
if: github.actor != 'actions[bot]'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.head_ref }}

- name: Build Docker Image using Dockerfile
run: |
docker build -t etl-awesome-api .

- name: Run Docker Image using Dockerfile
run: |
docker run etl-awesome-api

- name: Build Docker Image using Docker Compose
run: |
docker-compose up --build -d

- name: Run Docker image using Docker Compose
run: |
docker run etl-awesome-api-compose

- name: Run Application using Python Native
run: |
python -m venv .venv
source .venv/bin/activate
.venv/bin/python -m pip install --upgrade pip
echo "SERVER_URL=https://economia.awesomeapi.com.br" > .env
pip install -e .
python etl/main.py

- name: Run Application using Poetry
run: |
pip install poetry
poetry install
poetry run python etl/main.py

- uses: actions/checkout@v2
with:
ref: ${{ github.head_ref }}

- name: Build Docker Image using Dockerfile
run: |
docker build -t etl-awesome-api .

- name: Run Docker Image using Dockerfile
run: |
docker run etl-awesome-api

- name: Build Docker Image using Docker Compose
run: |
docker-compose up --build -d

- name: Run Docker image using Docker Compose
run: |
docker run etl-awesome-api-compose

- name: Run Application using Python Native
run: |
python -m venv .venv
source .venv/bin/activate
.venv/bin/python -m pip install --upgrade pip
pip install -e .
python etl/run.py

- name: Run Application using Poetry
run: |
pip install poetry
poetry install
poetry run python etl/run.py

deploy:
needs: build
if: github.actor != 'actions[bot]'
Expand All @@ -97,20 +93,21 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.9'
python-version: "3.9"

- name: Install Jupyter
run: pip install notebook

- name: Convert notebook to HTML
run: |
jupyter nbconvert --to html notebooks/data_explorer.ipynb --output-dir=views --output=index

jupyter nbconvert --to html notebooks/data_explorer.ipynb --output-dir=docs --output=index
jupyter nbconvert --to html notebooks/data_explorer.ipynb --output-dir=etl/views --output=index

- name: Setup Git
run: |
git config --global user.name 'GitHub Actions'
git config --global user.email '[email protected]'

- name: Commit and Push Notebook
run: |
git add .
Expand All @@ -119,4 +116,4 @@ jobs:
else
git commit -m "Add generated HTML to README"
git push
fi
fi
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,3 +165,5 @@ cython_debug/
poetry.lock
etl/common/logs/*

*.DS_Store

22 changes: 7 additions & 15 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,16 +1,8 @@
{
"python.testing.pytestArgs": [
"tests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"python.testing.unittestArgs": [
"-v",
"-s",
"./tests",
"-p",
"test_*.py"
],
"python.analysis.autoImportCompletions": true,
"python.analysis.typeCheckingMode": "basic"
}
"python.testing.pytestArgs": ["."],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"python.testing.unittestArgs": ["-v", "-s", "./tests", "-p", "test_*.py"],
"python.analysis.autoImportCompletions": true,
"python.analysis.typeCheckingMode": "basic"
}
5 changes: 1 addition & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,5 @@ COPY . .
# Install the dependencies
RUN poetry install

# Gerar o arquivo .env
RUN echo "SERVER_URL=https://economia.awesomeapi.com.br" > .env

# Run the container
CMD ["poetry", "run", "python", "etl/main.py"]
CMD ["poetry", "run", "python", "etl/run.py"]
69 changes: 48 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,60 @@

## Project Stack

<img src="https://github.com/devicons/devicon/blob/master/icons/python/python-original.svg" Alt="Python" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/docker/docker-original.svg" Alt="Docker" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/poetry/poetry-original.svg" Alt="Poetry" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/pandas/pandas-original.svg" Alt="Pandas" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/jupyter/jupyter-original.svg" Alt="Jupyter" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/matplotlib/matplotlib-original.svg" Alt="Matplotlib" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/github/github-original.svg" Alt="GitHub" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/githubactions/githubactions-original.svg" Alt="GitHub Actions" width="50" height="50">
<img src="https://github.com/devicons/devicon/blob/master/icons/python/python-original.svg" Alt="Python" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/docker/docker-original.svg" Alt="Docker" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/poetry/poetry-original.svg" Alt="Poetry" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/pandas/pandas-original.svg" Alt="Pandas" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/jupyter/jupyter-original.svg" Alt="Jupyter" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/matplotlib/matplotlib-original.svg" Alt="Matplotlib" width="50" height="50"> <img src="https://github.com/devicons/devicon/blob/master/icons/githubactions/githubactions-original.svg" Alt="GitHub Actions" width="50" height="50">

## Project Description
## Descrição do Projeto

This project, called "Awesome Project: ETL Process for Currency Quotes Data", is a solution dedicated to extracting, transforming, and loading (ETL) currency quote data. It makes a single request to a specific endpoint to obtain quotes for multiple currencies.
O projeto "Awesome Project: ETL Process for Currency Quotes Data" é uma solução completa dedicada à extração, transformação e carregamento (ETL) de dados de cotações de moedas. Este projeto utiliza diversas técnicas e arquiteturas avançadas para garantir a eficiência e a robustez do processo ETL.

The request response is then processed, where each currency quote is separated and stored in individual files in Parquet format. This makes it easier to organize data and efficiently retrieve it for future analysis.
## Destaques do Projeto:

Additionally, the project includes a Jupyter Notebook for data exploration. This notebook is responsible for consolidating all individual Parquet files into a single dataset. From there, the data can be explored and analyzed to gain valuable insights into currency quotes.
- Arquitetura MVC: Implementação da arquitetura Model-View-Controller (MVC), separando a lógica de negócio, a interface do usuário e a manipulação de dados para uma melhor organização e manutenção do código.

In summary, this project provides a complete solution for collecting, processing, and analyzing currency quote data.
- Testes Abrangentes: Desenvolvimento de testes para garantir a qualidade e a robustez do código em diversas etapas do processo ETL

- Paralelismo nos Modelos: Utilização de paralelismo nas etapas de transformação e carregamento dos dados, aumentando a eficiência e reduzindo o tempo de processamento.

- Mensageria Fire-Forget: Uso de mensageria (queue.queue) no modelo fire-forget para gerenciar os arquivos gerados entre as etapas de transformação e carregamento, garantindo um fluxo de dados contínuo e eficiente.

- Validação de Parâmetros: Envio de parâmetros válidos baseados na própria fonte de dados de requisições, garantindo a integridade e a precisão das informações processadas.

- Gestão de Configurações: Utilização de um módulo de configuração para gerenciar endpoints, tempos de retry e quantidade de tentativas, proporcionando flexibilidade e facilidade de ajustes.

- Módulo Comum: Implementação de um módulo comum para reutilização de código em todo o projeto, promovendo a consistência e a redução de redundâncias.

- Views Dinâmicas: Geração de views com index.html utilizando nbConvert, baseado em dados consolidados de um Jupyter Notebook que integra os arquivos gerados em um único dataset para exploração e análise.

# Processo ETL:

- Extração: Uma única requisição é feita a um endpoint específico para obter cotações de múltiplas moedas.
- Transformação: A resposta da requisição é processada, separando cada cotação de moeda e armazenando em arquivos individuais no formato Parquet, facilitando a organização e recuperação dos dados.
- Carregamento: Os arquivos Parquet individuais são consolidados em um único dataset utilizando um Jupyter Notebook, permitindo uma análise abrangente e insights valiosos sobre as cotações de moedas.

Em resumo, o "Awesome Project: ETL Process for Currency Quotes Data" oferece uma solução robusta e eficiente para coleta, processamento e análise de dados de cotações de moedas, utilizando técnicas avançadas de arquitetura e paralelismo para otimizar cada etapa do processo ETL.

## Project Structure

- [`data/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/data): Stores raw data in Parquet format.
- ETH-EUR-1713658884.parquet: Example: Raw data for ETH-EUR quotes. file-name = symbol + unix timestamp of extraction
- [`notebooks/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/notebooks): Contains the `data_explorer.ipynb` notebook for data exploration.
- [`etl/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl): Holds the project source code.
- [`main.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/main.py): The entry point for the ETL Module.
- [`models/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/jobs): ETL Modules.
- [`ExtractApiData/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/jobs/ExtractApiData): Module for data extraction from API.
- [`ApiToParquetFile.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/jobs/ExtractApiData/ApiToParquetFile.py): Extract API data to Parquet File and store in /data.
- [`utils/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/utils)
- [`logs.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/utils/logs.py): Package for managing logs.
- [`common.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/utils/common.py): Package for common tasks in the code.
- [`constants.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/utils/constants.py): Constants used in the code.
- [`data/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/data): Armazena dados brutos no formato Parquet.
- ETH-EUR-1713658884.parquet: Exemplo: Dados brutos para cotações ETH-EUR. nome_do_arquivo = símbolo + timestamp unix da extração
- [`notebooks/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/notebooks): Contém o notebook `data_explorer.ipynb` para exploração de dados.
- [`etl/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl): Contém o código-fonte do projeto.
- [`run.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/run.py): Entrypoint da aplicação
- [`common/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/common): Biblioteca para reutilização e padronização de código.
- [`utils/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/utils)
- [`logs.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/utils/logs.py): Pacote para gerenciamento de logs.
- [`common.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/utils/common.py): Pacote para tarefas comuns no código como recuperação de diretório de saída ou timestamp default.
- [`logs/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/common/logs): Para armazenamento de logs de debug.
- [`controller/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/controller)
- [`pipeline.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/controller/pipeline.py): Recebe requisições de extração de dados e orquestra os modelos de ETL.
- [`models/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/models):
- [`extract/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/models/extract)
- [`api_data_extractor.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/models/extract/api_data_extractor.py): Recebe os parâmetros do controller envia a requisição e retorna em JSON.
- [`transform/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/models/transform)
- [`publisher.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/models/extract/publisher.py): Recebe o JSON do extrator, separa o dicionário por moeda e publica cada um deles para uma fila pra serem processados individualmente.
- [`load/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/models/load)
- [`parquet_loader.py`](https://github.com/IvanildoBarauna/ETL-awesome-api/blob/main/etl/models/extract/parquet_loader.py): Em uma thread separada, recebe um novo dicionário da fila que o transformer está publicando e gera arquivos .parquet no diretório padrão.
- [`views/`](https://github.com/IvanildoBarauna/ETL-awesome-api/tree/main/etl/views): Para armazenamento de análise de dados e visualização.

## How to run this project and verify execution time:

Expand All @@ -50,9 +78,8 @@ In summary, this project provides a complete solution for collecting, processing
$ python -m venv .venv
$ source .venv/bin/activate # On Windows use `venv\Scripts\activate`
$ .venv/bin/python -m pip install --upgrade pip
$ echo "SERVER_URL=https://economia.awesomeapi.com.br" > .env # Create enviroment variable for server URL`
$ pip install -e .
$ python etl/main.py
$ python etl/run.py
```

Learn more about [venv module in python](https://docs.python.org/pt-br/3/library/venv.html)
Expand All @@ -73,7 +100,7 @@ In summary, this project provides a complete solution for collecting, processing

4. Or you can install and run the project using the dependency manager [`poetry`](https://python-poetry.org/):
`sh
$ poetry install && poetry run python etl/main.py
$ poetry install && poetry run python etl/run.py
`
</details>

Expand Down
Loading