Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gtfs loader v2 jsonification #1738

Merged
merged 8 commits into from
Sep 7, 2022
Merged

Conversation

lauriemerrell
Copy link
Contributor

@lauriemerrell lauriemerrell commented Sep 1, 2022

Description

Completes #1533, building on the work in #1696. Specifically, reads in files from unzipped GTFS downloads, converts them jsonl, and saves the jsonl files in the same hive partitions as the input file & creates accompanying outcomes file.

This PR does not create external tables for the JSONL data, that will be handled in #1536.

Resolves #1533

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation
  • agencies.yml

How has this been tested?

Run in local Airflow for 8/12 (date that has test data available.)

Screenshots

Top two (existing) tasks manually marked as failures just to speed things up
image

@lauriemerrell lauriemerrell marked this pull request as ready for review September 2, 2022 18:47
@lauriemerrell lauriemerrell force-pushed the gtfs-loader-v2-jsonification branch from 110fdd2 to 8335323 Compare September 2, 2022 18:48
Copy link
Contributor

@atvaccaro atvaccaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to set a job failure threshold anywhere?

@lauriemerrell
Copy link
Contributor Author

do we want to set a job failure threshold anywhere?

What do you mean? Like if >X% of outcomes are failures, we fail the job? Open to it but not sure what we'd set the threshold at

@atvaccaro
Copy link
Contributor

atvaccaro commented Sep 2, 2022

Yeah really it's to catch situations where a bug broadly impacted the job and/or we think that the overall success rate is oddly low and worth "failing" the job. It can occur after the results upload so it's primarily something to bubble up through alerting. The downloader is 95%, maybe this could be similar?

@lauriemerrell lauriemerrell merged commit 64470b6 into main Sep 7, 2022
@lauriemerrell lauriemerrell deleted the gtfs-loader-v2-jsonification branch September 7, 2022 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GTFS Schedule Pipeline: gtfs_loader core updates
2 participants