Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with two-row feed_info.txt #1471

Closed
2 tasks
lauriemerrell opened this issue May 4, 2022 · 2 comments · Fixed by #1712 or #1804
Closed
2 tasks

Deal with two-row feed_info.txt #1471

lauriemerrell opened this issue May 4, 2022 · 2 comments · Fixed by #1712 or #1804
Assignees
Labels
project-gtfs-schedule For issues related to gtfs-schedule project

Comments

@lauriemerrell
Copy link
Contributor

The current Foothill Transit (ITP ID 112) feed's feed_info.txt has two rows with two sets of active dates (both have the same start date) and the same feed version identifier.
image

This causes fanout in the warehouse, because we have two copies of the same feed in gtfs_schedule_dim_feeds. Specifically, a test is failing on gtfs_schedule_fact_daily_trips because we have two copies of the same trip.

While this is clearly a bug in their feed, I am not entirely sure how to address this in the warehouse. I am inclined to put in a hacky condition in gtfs_schedule_dim_feeds or feed_info_clean to filter out one of these rows specifically because this should be rare and putting in logic that tries to programmatically address this kind of situation is unlikely to be robust.

AC:

  • Only one row in gtfs_schedule_dim_feeds for the Foothill Transit feed from 2022-05-03.
  • The unique combination of columns test on gtfs_schedule_fact_daily_trips should pass.
@lauriemerrell lauriemerrell self-assigned this May 4, 2022
@lauriemerrell lauriemerrell added the project-gtfs-schedule For issues related to gtfs-schedule project label May 4, 2022
@holly-g
Copy link
Contributor

holly-g commented May 11, 2022

Slack thread here: https://cal-itp.slack.com/archives/C02JFS4LAMU/p1651671937502189
Let's revisit 5/31 after receiving clarification in the GTFS spec.

@lauriemerrell
Copy link
Contributor Author

whoops this was not supposed to be closed by #1712, reopening

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
project-gtfs-schedule For issues related to gtfs-schedule project
Projects
None yet
2 participants