Skip to content

Commit

Permalink
Altera constantes e parâmetros de captura e materialização relacionad…
Browse files Browse the repository at this point in the history
…os aos flows `br_rj_riodejaneiro_gtfs` (#687)

* commit inicial

* Atualiza changelog

* Atualiza changelog

* Atualiza changelog

* Reverte enriquecimento de logs da task `get_raw_from_sources`

* Corrige E226

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
  • Loading branch information
eng-rodrigocunha and mergify[bot] authored May 21, 2024
1 parent ad32fd4 commit 186748e
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 5 deletions.
11 changes: 11 additions & 0 deletions pipelines/rj_smtr/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Changelog - rj_smtr

## [1.0.0] - 2024-05-21

### Alterado

- Inclui tratamento específico na task `transform_raw_to_nested_structure` relacionado aos flows `br_rj_riodejaneiro_gtfs` (https://github.com/prefeitura-rio/pipelines/pull/687)
- Altera constantes e parâmetros de captura e materialização relacionados aos flows `br_rj_riodejaneiro_gtfs` (https://github.com/prefeitura-rio/pipelines/pull/687)

### Corrigido
- Corrige erro `pipelines/rj_smtr/tasks.py:116:64: E226 missing whitespace around arithmetic operator` na task `build_incremental_model` (https://github.com/prefeitura-rio/pipelines/pull/687)
8 changes: 4 additions & 4 deletions pipelines/rj_smtr/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -1289,7 +1289,7 @@ class constants(Enum): # pylint: disable=c0103
}

# GTFS
GTFS_DATASET_ID = "gtfs"
GTFS_DATASET_ID = "br_rj_riodejaneiro_gtfs"

GTFS_GENERAL_CAPTURE_PARAMS = {
"partition_date_only": True,
Expand Down Expand Up @@ -1346,12 +1346,12 @@ class constants(Enum): # pylint: disable=c0103
},
{
"table_id": "ordem_servico",
"primary_key": ["servico"],
"primary_key": ["servico", "tipo_os"],
"extract_params": {"filename": "ordem_servico"},
},
{
"table_id": "ordem_servico_trajeto_alternativo",
"primary_key": ["servico"],
"primary_key": ["servico", "tipo_os"],
"extract_params": {"filename": "ordem_servico_trajeto_alternativo"},
},
{
Expand All @@ -1361,7 +1361,7 @@ class constants(Enum): # pylint: disable=c0103
]

GTFS_MATERIALIZACAO_PARAMS = {
"dataset_id": GTFS_DATASET_ID,
"dataset_id": "gtfs",
"dbt_vars": {
"data_versao_gtfs": "",
"version": {},
Expand Down
9 changes: 8 additions & 1 deletion pipelines/rj_smtr/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def build_incremental_model( # pylint: disable=too-many-arguments

if refresh:
log("Running in full refresh mode")
log(f"DBT will run the following command:\n{run_command+' --full-refresh'}")
log(f"DBT will run the following command:\n{run_command + ' --full-refresh'}")
dbt_client.cli(run_command + " --full-refresh", sync=True)
last_mat_date = get_table_min_max_value(
query_project_id, dataset_id, mat_table_id, field_name, "max"
Expand Down Expand Up @@ -1455,6 +1455,13 @@ def transform_raw_to_nested_structure(
for col in data.columns[data.dtypes == "object"].to_list():
data[col] = data[col].str.strip()

if (
constants.GTFS_DATASET_ID.value in raw_filepath
and "ordem_servico" in raw_filepath
and "tipo_os" not in data.columns
):
data["tipo_os"] = "Regular"

log(
f"Finished cleaning! Data:\n{data_info_str(data)}", level="info"
)
Expand Down

0 comments on commit 186748e

Please sign in to comment.