-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial dbt models to support GTFS guidelines checks #1712
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
bfe0f91
initial work towards #1688
lauriemerrell 85a9950
gtfs guidelines initial implementation: tweaks & improvements
lauriemerrell 6d95229
gtfs guidelines: add metabase semantic type for calitp agency name
lauriemerrell 4220748
sync new dataset to metabase
lauriemerrell 6ba71b6
gtfs guidelines: rename table, formatting updates
lauriemerrell 2fa72e2
rename compliance gtfs feature per PR review
lauriemerrell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -65,3 +65,5 @@ models: | |
mart: | ||
transit_database: | ||
schema: mart_transit_database | ||
gtfs_guidelines: | ||
schema: mart_gtfs_guidelines |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
-- declare checks | ||
{% macro static_feed_downloaded_successfully() %} | ||
"Static GTFS feed downloads successfully" | ||
{% endmacro %} | ||
|
||
{% macro no_validation_errors_in_last_30_days() %} | ||
"No validation errors in last 30 days" | ||
{% endmacro %} | ||
|
||
-- declare features | ||
{% macro compliant_on_the_map() %} | ||
"Compliance" | ||
{% endmacro %} | ||
|
||
|
||
-- columns | ||
{% macro gtfs_guidelines_columns() %} | ||
date, | ||
calitp_itp_id, | ||
calitp_url_number, | ||
calitp_agency_name, | ||
check, | ||
status, | ||
feature | ||
{% endmacro %} |
48 changes: 48 additions & 0 deletions
48
warehouse/models/mart/gtfs_guidelines/_gtfs_guidelines.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
version: 2 | ||
|
||
models: | ||
- name: fact_daily_guideline_checks | ||
description: | | ||
Each row represents a date/guideline check/feed combination, with pass/fail information | ||
indicating whether that feed complied with that check on that date. | ||
|
||
Note that this table is only partially implemented; can use "SELECT DISTINCT check" | ||
to see the list of checks that are evaluated herein. | ||
tests: | ||
- dbt_utils.unique_combination_of_columns: | ||
combination_of_columns: | ||
- date | ||
- calitp_itp_id | ||
- calitp_url_number | ||
- check | ||
- dbt_utils.equal_rowcount: | ||
compare_model: ref('stg_gtfs_guidelines__feed_guideline_index') | ||
columns: | ||
- name: date | ||
description: Date on which the check is being evaluated. | ||
- name: calitp_itp_id | ||
description: '{{ doc("column_calitp_itp_id") }}' | ||
meta: | ||
metabase.semantic_type: type/FK | ||
- name: calitp_url_number | ||
description: '{{ doc("column_calitp_url_number") }}' | ||
meta: | ||
metabase.semantic_type: type/FK | ||
- name: calitp_agency_name | ||
description: Human readable agency name, provided for convenience. | ||
meta: | ||
metabase.semantic_type: type/Title | ||
- name: check | ||
description: | | ||
A string description of the GTFS guideline check being performed. For example, | ||
"Static GTFS feed downloads successfully". | ||
- name: status | ||
description: | | ||
Either "PASS" or "FAIL", indicating check status on the given date for the | ||
given feed. | ||
tests: | ||
- not_null | ||
- name: feature | ||
description: | | ||
A string label for the GTFS "feature" associated with the given check. For example, | ||
"Compliant / On the Map". |
22 changes: 22 additions & 0 deletions
22
warehouse/models/mart/gtfs_guidelines/fact_daily_guideline_checks.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
{{ config(materialized='table') }} | ||
|
||
-- start query | ||
WITH stg_gtfs_guidelines__schedule_downloaded_successfully AS ( | ||
SELECT * FROM {{ ref('stg_gtfs_guidelines__schedule_downloaded_successfully') }} | ||
), | ||
|
||
stg_gtfs_guidelines__no_validation_errors_in_last_30_days AS ( | ||
SELECT * FROM {{ ref('stg_gtfs_guidelines__no_validation_errors_in_last_30_days') }} | ||
), | ||
|
||
fact_daily_guideline_checks AS ( | ||
SELECT | ||
{{ gtfs_guidelines_columns() }} | ||
FROM stg_gtfs_guidelines__schedule_downloaded_successfully | ||
UNION ALL | ||
SELECT | ||
{{ gtfs_guidelines_columns() }} | ||
FROM stg_gtfs_guidelines__no_validation_errors_in_last_30_days | ||
) | ||
|
||
SELECT * FROM fact_daily_guideline_checks |
34 changes: 34 additions & 0 deletions
34
warehouse/models/staging/gtfs_guidelines/stg_gtfs_guidelines__feed_guideline_index.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
{{ config(materialized='table') }} | ||
|
||
WITH gtfs_schedule_fact_daily_feeds AS ( | ||
SELECT * FROM {{ ref('gtfs_schedule_fact_daily_feeds') }} | ||
), | ||
|
||
gtfs_schedule_dim_feeds AS ( | ||
SELECT * FROM {{ ref('gtfs_schedule_dim_feeds') }} | ||
), | ||
|
||
-- list all the checks that have been implemented | ||
checks_implemented AS ( | ||
SELECT {{ static_feed_downloaded_successfully() }} AS check, {{ compliant_on_the_map() }} AS feature | ||
UNION ALL | ||
SELECT {{ no_validation_errors_in_last_30_days() }}, {{ compliant_on_the_map() }} | ||
), | ||
|
||
-- create an index: all feed/date/check combinations | ||
stg_gtfs_guidelines__feed_check_index AS ( | ||
SELECT | ||
t2.calitp_itp_id, | ||
t2.calitp_url_number, | ||
t2.calitp_agency_name, | ||
t1.date, | ||
t1.feed_key, | ||
t3.check, | ||
t3.feature | ||
FROM gtfs_schedule_fact_daily_feeds AS t1 | ||
LEFT JOIN gtfs_schedule_dim_feeds AS t2 | ||
USING (feed_key) | ||
CROSS JOIN checks_implemented AS t3 | ||
) | ||
|
||
SELECT * FROM stg_gtfs_guidelines__feed_check_index |
60 changes: 60 additions & 0 deletions
60
...els/staging/gtfs_guidelines/stg_gtfs_guidelines__no_validation_errors_in_last_30_days.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
WITH feed_guideline_index AS ( | ||
SELECT * FROM {{ ref('stg_gtfs_guidelines__feed_guideline_index') }} | ||
WHERE check = {{ no_validation_errors_in_last_30_days() }} | ||
), | ||
|
||
validation_fact_daily_feed_codes AS ( | ||
SELECT * FROM {{ ref('validation_fact_daily_feed_codes') }} | ||
), | ||
|
||
validation_dim_codes AS ( | ||
SELECT * FROM {{ ref('validation_dim_codes') }} | ||
), | ||
|
||
validation_errors_by_day AS ( | ||
SELECT | ||
feed_key, | ||
date, | ||
SUM(n_notices) as validation_errors | ||
FROM validation_fact_daily_feed_codes | ||
LEFT JOIN validation_dim_codes USING(code) | ||
WHERE severity = "ERROR" | ||
GROUP BY feed_key, date | ||
), | ||
|
||
validation_errors_in_last_30_days_check AS ( | ||
SELECT | ||
date, | ||
calitp_itp_id, | ||
calitp_url_number, | ||
calitp_agency_name, | ||
check, | ||
feature, | ||
SUM(validation_errors) | ||
OVER ( | ||
PARTITION BY | ||
calitp_itp_id, | ||
calitp_url_number | ||
ORDER BY date | ||
ROWS BETWEEN 30 PRECEDING AND CURRENT ROW | ||
) AS errors_last_30_days | ||
FROM feed_guideline_index | ||
LEFT JOIN validation_errors_by_day USING (feed_key, date) | ||
), | ||
|
||
validation_errors_in_last_30_days_idx AS ( | ||
SELECT | ||
date, | ||
calitp_itp_id, | ||
calitp_url_number, | ||
calitp_agency_name, | ||
check, | ||
CASE | ||
WHEN errors_last_30_days > 0 THEN "FAIL" | ||
WHEN errors_last_30_days = 0 THEN "PASS" | ||
END AS status, | ||
feature | ||
FROM validation_errors_in_last_30_days_check | ||
) | ||
|
||
SELECT * FROM validation_errors_in_last_30_days_idx |
37 changes: 37 additions & 0 deletions
37
.../models/staging/gtfs_guidelines/stg_gtfs_guidelines__schedule_downloaded_successfully.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
WITH feed_guideline_index AS ( | ||
SELECT * FROM {{ ref('stg_gtfs_guidelines__feed_guideline_index') }} | ||
WHERE check = {{ static_feed_downloaded_successfully() }} | ||
), | ||
|
||
gtfs_schedule_fact_daily_feeds AS ( | ||
SELECT * FROM {{ ref('gtfs_schedule_fact_daily_feeds') }} | ||
), | ||
|
||
static_feed_downloaded_successfully_check AS ( | ||
SELECT | ||
feed_key, | ||
date, | ||
CASE | ||
WHEN extraction_status = "success" THEN "PASS" | ||
WHEN extraction_status = "error" THEN "FAIL" | ||
ELSE null | ||
lauriemerrell marked this conversation as resolved.
Show resolved
Hide resolved
|
||
END AS status, | ||
{{ static_feed_downloaded_successfully() }} AS check | ||
FROM gtfs_schedule_fact_daily_feeds | ||
), | ||
|
||
static_feed_downloaded_successfully_check_idx AS ( | ||
SELECT | ||
t1.date, | ||
t1.calitp_itp_id, | ||
t1.calitp_url_number, | ||
t1.calitp_agency_name, | ||
t1.check, | ||
t2.status, | ||
t1.feature, | ||
FROM feed_guideline_index AS t1 | ||
LEFT JOIN static_feed_downloaded_successfully_check AS t2 | ||
USING (feed_key, date, check) | ||
) | ||
|
||
SELECT * FROM static_feed_downloaded_successfully_check_idx |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables instead of macros potentially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason you think that would be better? I thought that macros were more accessible (I feel like we don't want average dbt users editing
dbt_project.yml
a lot? And this table is going to have a lot of iteration.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO I think a macro for a string value is overkill, that's about it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline -- my summary is: All the
check
values will be used at least two places (in their actual staging check table and in the construction of the index), I wanted to make it easy to be able to reference these hard-coded values if we do want to use them other places / make it so we can update one location and be confident it will propagate