-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTFS Schedule: We should check for file-level deletion in type 2 / SCD logic #1184
Comments
AFAICT, we don't check in our pipeline to see if an overall file has been deleted. We need to make |
Affected feed is |
@holly-g, I'm removing the We need to add a second check either before or after Affected files are saved in
|
I wonder if a version of the logic that we use to get the latest feed for GTFS schedule would work here. Example query:
Here, There may be a preferable way to get this info from a later |
Update: see #1280 -- the proposal immediately above about |
Another update: We need to look at |
We should do this as part of #1259 |
Thanks @o-ram for sharing these examples, it's very helpful to know that this is happening... cc @holly-g & @edasmalchi from a reports data display perspective -- wondering whether we think that this needs to be fixed in current data structure (i.e., within the next like 6 weeks) or whether it can wait until we refactor the versioning in the schedule pipeline (could take a little while but will likely be a more robust fix.) Tagging this against #1536, I'm actually not sure that we're going to do #1259 under the current pipeline any more |
@evansiroky and @e-lo -- I'm wondering if we should be updating the Transit Tech Stacks Airtable information to note the seeming changes in tech stack that Olivia noted above in #1184 (comment) |
Per conversation in sprint planning, going to close this given an example of us handling file deletion properly. Since we version at the feed (aka extract) level, it's easy to see where a
Also closing related Schedule issues as "not planned." |
In December, we changed URL 0 for SolTrans. The old feed had a
feed_info.txt
file, while the new feed does not. However, the pipeline did not mark the oldfeed_info
as deleted when that change occurred.This means that we still show that old
feed_info
file as active (for example it is still infeed_info_clean
withcalitp_deleted_at = 2099-01-01
, i.e., "active").This leads to information from that old feed appearing in the reports for SolTrans even though the rest of the feed is correct and updated, because some fields in the report are calculated from
feed_info
.We need to figure out why
feed_info
wasn't marked as deleted in this case.The text was updated successfully, but these errors were encountered: