-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gtfs loader v2 jsonification #1738
Conversation
110fdd2
to
8335323
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to set a job failure threshold anywhere?
airflow/dags/unzip_and_validate_gtfs_schedule/convert_to_json/METADATA.yml
Show resolved
Hide resolved
airflow/dags/unzip_and_validate_gtfs_schedule/convert_to_json/agency.yml
Outdated
Show resolved
Hide resolved
What do you mean? Like if >X% of outcomes are failures, we fail the job? Open to it but not sure what we'd set the threshold at |
Yeah really it's to catch situations where a bug broadly impacted the job and/or we think that the overall success rate is oddly low and worth "failing" the job. It can occur after the results upload so it's primarily something to bubble up through alerting. The downloader is 95%, maybe this could be similar? |
Description
Completes #1533, building on the work in #1696. Specifically, reads in files from unzipped GTFS downloads, converts them
jsonl
, and saves thejsonl
files in the same hive partitions as the input file & creates accompanying outcomes file.This PR does not create external tables for the JSONL data, that will be handled in #1536.
Resolves #1533
Type of change
How has this been tested?
Run in local Airflow for 8/12 (date that has test data available.)
Screenshots
Top two (existing) tasks manually marked as failures just to speed things up