Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Story: Python dbt model to produce shape-stop grain table #2237

Closed
tiffanychu90 opened this issue Feb 1, 2023 · 4 comments
Closed

User Story: Python dbt model to produce shape-stop grain table #2237

tiffanychu90 opened this issue Feb 1, 2023 · 4 comments
Labels
warehouse-poc Staging proof-of-concept tables related to analytics-driven warehouse models

Comments

@tiffanychu90
Copy link
Member

tiffanychu90 commented Feb 1, 2023

User stories


Summary

Use this issue to test out Python dbt models to cut stop-to-stop segments. Additional RT v2 Speeds Roadmap context.

Can these Python scripts to prep_stop_segments and
cut_stop_segments be adapted directly?

Table Schema

  • Grain: feed_key-route_id-shape_id-stop_id
  • Identifier columns: shape_array_key, service_date, stop_id, stop_sequence
  • Drop duplicates on above.
    • Is it ever possible that stop_id has different stop_sequence values for different routes? It seems possible that it can have different values across trips, but hopefully by accounting for route and direction, the same stop_id only has 1 stop sequence value. Exploratory Finding: yes, it can. At shape_id level, stop_id-stop_sequence is unique combo, but not anything more aggregated than shape_id
  • Potential use cases:
    • Stop-to-stop segments to get us stop-level metrics. Delay (difference from scheduled arrival time and actual) and speed (speed from last stop to this stop)?

Tester [Stakeholder]

  1. @tiffanychu90

Sprint Ready Checklist

    • Acceptance criteria defined
    • Team understands acceptance criteria
    • Team has defined solution / steps to satisfy acceptance criteria
    • Acceptance criteria is verifiable / testable
    • External / 3rd Party dependencies identified
@tiffanychu90 tiffanychu90 added the warehouse-poc Staging proof-of-concept tables related to analytics-driven warehouse models label Feb 1, 2023
@edasmalchi
Copy link
Member

Identifier columns: feed_key, service_date, route_id, direction_id, stop_id, stop_sequence
Drop duplicates on above.
Is it ever possible that stop_id has different stop_sequence values for different routes? It seems possible that it can have different values across trips, but hopefully by accounting for route and direction, the same stop_id only has 1 stop sequence value
TODO: do exploratory work related to this with stop_times table

I think we'll have to take a really close look at this. I can think of a few examples of when this would not be true -- for example a route where one trip in a particular direction deviates to serve a high school around dismissal time, then resumes the route and serves all remaining stops. It seems possible that those remaining stops would have a shifted (or even completely different) stop_sequence...

My first thought is that making a table like this at the shape_id level could avoid some of those issues, hopefully capturing complexities like the one above. But as far as I know the spec makes no guarantees that shapes map to a particular service pattern, stop sequence, etc -- only that they describe where the vehicle passes through space.

Totally get why a table like this would be helpful, but it also seems to involve assumptions beyond what we can rely on from the GTFS spec since my read is that stop sequence relationships are only required to be consistent within the individual trip (hence the cumbersome joins).

Curious to see the results of the EDA (or even participate if the timing works out post-Better Buses).

Perhaps one middle ground would be to make this table without stop_sequence, it could be used to show things like "here are all the stops generally served by this route in this direction", without claiming that there is a consistent sequence or that all trips on that route in that direction serve exactly those stops.

@tiffanychu90
Copy link
Member Author

@edasmalchi Good point, I'll open up a research task to dive into what exactly is happening with stop_sequence and then we can decide whether stop_sequence should be included or not in the table

@tiffanychu90 tiffanychu90 changed the title User Story: route-direction-stop grain table User Story: shape-stop grain table Mar 1, 2023
@tiffanychu90
Copy link
Member Author

@edasmalchi Good point, I'll open up a research task to dive into what exactly is happening with stop_sequence and then we can decide whether stop_sequence should be included or not in the table

Conclusion: cannot go above shape-level if we want to use stop_sequence.

Changing this to shape-stop_id grain table, which can support the stop-to-stop segments.

@tiffanychu90 tiffanychu90 changed the title User Story: shape-stop grain table User Story: Python dbt model to produce shape-stop grain table Mar 15, 2023
@tiffanychu90
Copy link
Member Author

Closing....punt this task to the future, we have what we need in make segmentize for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
warehouse-poc Staging proof-of-concept tables related to analytics-driven warehouse models
Projects
None yet
Development

No branches or pull requests

2 participants