Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timestamp hive partitions #1634

Merged
merged 5 commits into from
Jul 14, 2022
Merged

Timestamp hive partitions #1634

merged 5 commits into from
Jul 14, 2022

Conversation

lauriemerrell
Copy link
Contributor

@lauriemerrell lauriemerrell commented Jul 14, 2022

Description

Refactoring GTFS schedule and Airtable to use timestamps in partitions instead of "time" strings.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected) -- we are doing a backfill so that the external tables will continue to work
  • Documentation
  • agencies.yml

How has this been tested?

Local Airflow.

tY2EtdXMvYmVsbGZsb3dlci1jYS11cy56aXA=/ts=2022-07-14T19:26:22.469787+00:00/bellflower-ca-us.zip
took 6 minutes ago to process 178 records
[2022-07-14 19:26:23,734] {storage.py:255} INFO - saving 581.4 kB to gs://test-calitp-gtfs-schedule-raw/download_schedule_feed_results/dt=2022-07-14/ts=2022-07-14T19:19:28.126183+00:00/results.jsonl
successfully fetched 177 of 178
Failures:
 404 Client Error: Not Found for url: http://data.trilliumtransit.com/gtfs/taft-co-us/taft-ca-us.zip
Skipping since in development mode! Would have emailed 1 failures.
[2022-07-14 19:26:24,872] {python.py:151} INFO - Done. Returned value was: None
[2022-07-14 19:26:24,878] {taskinstance.py:1212} INFO - Marking task as SUCCESS. dag_id=download_gtfs_schedule_v2, task_id=download_schedule_feeds, execution_date=20220401T000000, start_date=20220630T160127, end_date=20220714T192624

@lauriemerrell lauriemerrell marked this pull request as ready for review July 14, 2022 19:28
@atvaccaro atvaccaro merged commit 6f0829d into main Jul 14, 2022
@atvaccaro atvaccaro deleted the timestamp-hive-partitions branch July 14, 2022 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants