Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Request - review RT segments speeds preprocessing stages #751

Closed
edasmalchi opened this issue May 16, 2023 · 3 comments
Closed
Assignees
Labels
gtfs-rt Work related to GTFS-Realtime research request Issues that serve as a request for research (summary and handoff)

Comments

@edasmalchi
Copy link
Member

Research Question

Single sentence description:
As we work towards an open data release, review the first several steps of the segment speeds pipeline to make sure we're keeping as much usable data as feasible.

How will this research be used?

Will be used to adjust scripts as necessary.

Stakeholders & End-Users

All open data stakeholders

Metrics

tbd

Data sources

existing per #592

Deliverables:

Notebook?

@edasmalchi edasmalchi added the research request Issues that serve as a request for research (summary and handoff) label May 16, 2023
@edasmalchi edasmalchi self-assigned this May 16, 2023
@edasmalchi edasmalchi added data Work related to the management of data gtfs-rt Work related to GTFS-Realtime open-data Work related to publishing, ingesting open data labels May 16, 2023
@edasmalchi
Copy link
Member Author

read through much of the pipeline code and looked at several operators plus trip stats in rt_segment_speeds/05_threshold_v2_exploratory.ipynb

reccomending 10 minutes / 70% of segments for that threshold

@edasmalchi
Copy link
Member Author

looks like 5 minutes is currently set: https://github.com/cal-itp/data-analyses/blob/main/rt_segment_speeds/scripts/config.yml

(that works for me too)

@tiffanychu90
Copy link
Member

  • Implemented 10 min cutoff in Tweak speeds pipeline #795 for May and June dates.
  • Feb-Apr dates use 5 min cutoffs...can go back and rerun at later date.
  • Jan shouldn't be used because this was pre-timezone/dealing with UTCs/Pacific times, so RT and schedule data won't line up well because only 1 date for RT is downloaded. Now, we download 2 dates and concat to get 1 service date for schedule data

@tiffanychu90 tiffanychu90 removed data Work related to the management of data open-data Work related to publishing, ingesting open data labels Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gtfs-rt Work related to GTFS-Realtime research request Issues that serve as a request for research (summary and handoff)
Projects
None yet
Development

No branches or pull requests

2 participants