Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add variable fares by time or day #343

Conversation

omar-kabbani
Copy link
Contributor

@omar-kabbani omar-kabbani commented Aug 5, 2022

Hi everyone,

 MobilityData is moving forward with the second iteration of GTFS-Fares v2, for more information about the overall plan, you can check issue #341.

This pull request covers fares that vary by time and day, which is a section of the entire GTFS-Fares v2 proposal.

Fares can vary based on the time of day (described using timeframe_id) and day of week/year (described using service_id).

The changes in this pull request are:

  • Add new file; timeframes.txt, to define timeframes.
  • Extend fare_leg_rules.txt with from_timeframe_id, to_timeframe_id, and service_id to describe time-dependant fares.

The time-dependant variables (timeframe and day information) are modelled into fare legs (and not fare products) since in GTFS-Fares v2, fare_leg_rules.txt models the location and time variables (from/to area, from/to timeframe, and service ID). All other factors are modelled in fare products.

Here's a quick example

  • A regular fare is required to travel.
  • The fare is discounted if riders travel between 6:00 AM and 7:00 AM.
  • On New Year's, transit is free (no fare).

Define service days and New Year's using calendar.txt. and calendar_dates.txt

service_id monday tuesday wednesday thursday friday saturday sunday start_date end_date
regular_service 1 1 1 1 1 1 1 20230101 20231231
service_id date exception_type
regular_service 20230101 2
new_year_service 20230101 1

Define the discounted timeframe using timeframes.txt

timeframe_id start_time end_time
morning 06:00:00 07:00:00
regular 00:00:00 05:59:59
regular 00:07:01 23:59:59
all_day 00:00:00 23:59:59

Define the time and date-based fares using fare_leg_rules.txt

leg_group_id from_area_id to_area_id network_id from_timeframe_id to_timeframe_id service_id fare_product
leg1 zoneA zoneB bus regular all_day regular_service regular_fare
leg1 zoneA zoneB bus morning all_day regular_service discounted_fare
leg1 zoneA zoneB bus regular all_day new_year_service free_fare
leg1 zoneA zoneB bus morning all_day new_year_service free_fare

Data consumer: Apple
Data producer:

 (To be announced)

Please go through the changes and share your thoughts here!
Looking forward to feedback and contribution on this proposal.

For other questions/concerns, don’t hesitate to reach out to [email protected].

@skinkie
Copy link
Contributor

skinkie commented Aug 5, 2022

I suggest to make a change and don't mention a 24h period, but keep it in line with the rest of the GTFS spec (hence 24+h) formats.

@flocsy
Copy link
Contributor

flocsy commented Aug 7, 2022

Need to clarify the start_time/end_time formats: do we require to have them be between 00:00 and 23:59? Do we allow timeframe like: start_time: 07:00, end_time: 06:00? Or is tomorrow's 06:00 represented as 30:00? Or we don't allow this and it has to be like in the example that we have it cut in two: 00:00-06:00 and 07:00-23:00? Currently the proposal has "24-hour format" so maybe we should add "allowed values: 00:00 - 23:59"

It'll be beneficial IMHO (easier to look at the data) if instead of: [06:00-07:59], [08:00-15:59] we could use [06:00-08:00).[08:00-16:00)

IMHO we need to have a constraint that the timeframes are distinct. (You can't have both: 06:00-06:59 and 05:00-07:59, and depending what we decide about the end of the timeframe: 06:59] vs 07:00) we should not allow both: 06:00-07:00 and 07:00-18:00 because 07:00 is listed in 2 timeframes) Though it's not clear to me if we want this constraint in timeframes.txt (probably not) or somehow on the data in fare_leg_rules.txt (probably yes, but it'll be very complicated and it should also incorporate service_id)

[We might also need to have a constraint that if there are timeframes then they need to cover all the 24 hours (you can't have peak-hour: 06:00-07:59 without also having the 00:00-05:59 and 08:00-23:59). Looks like it's not needed because empty timeframe_id covers the "rest", but might not be that simple, see below]

The problem with what empty timeframe_id means: "all timeframes defined in timeframes.timeframe_id excluding the ones listed under fare_leg_rules.[from/to/?_]timeframe_id" is that it might be not the case. For example in weekdays (defined in services.txt, not in timeframes.txt) we might have peek: 06:00-08:00 and 16:00-18:00, ("rest of the day" should be "off-peak") but the weekends are probably considered as off-peak: 00:00-24:00 on saturday and sunday. This very realistic example also shows why the distinction constraint is not that simple.

Maybe we could add timeframes.service_id as an optional field? That way the constraint could be that timeframes having the same service_id must be distinct.

Do we really need seconds? To me it sounds unrealistic and unnecessary.

fare_leg_rules.txt: IMHO we only need to add service_id, timeframe_id. While I can see what it could be used to if we had from/to timeframe_id, it also makes it probably too complex and error prone (or needs some sentences to constrain them to consecutive timeframes, but then that is probably something that can't be enforced either...)
Let's say we have: 00:00-06:00: off-peak, 06:00-08:00: peak, 08:00-16:00: off-peak, 16:00-18:00: peak, 18:00-24:00: off-peak. Now if we could have a rule like: from_timeframe_id: off-peak, to_timeframe_id: off-peak, then a trip that starts at 05:55 and ends at 08:05 would be.... what? considered a legal off-peak trip? I don't think so...

@davidlewis-ito
Copy link

Great to see this conversation started.

We have a few comments.

  1. timeframe_ids & service_ids in fare_leg_rules, fare_products or elsewhere ?
    I am not sure of the logic of putting these time discriminating elements in the fare_leg_rules table. Although this was the original concept during the historic "GTFS Fares V2 Google Doc" discussion, I understood that the thinking has now moved on to use the fare_products table to combine the various discriminating factors.

I would take this a step further and propose that these time discriminating elements should go into another table ("time_rules.txt") that would yield a single key to be matched in the product table.

As well as providing a stronger semantic modelling this also avoids the fare_products table (or fare_leg_rules) from exploding in size when one needs to model the more complicated peak definitions in a context of (say) a complex zonal system.

  1. Scope of timeframes
    We have found use cases where the time discrimination for a fare depends on either:
  • the time the rider boards the bus
  • the time the train started or ended its trip
  • the time the rider enters the station

We would therefore propose further attributes: from_timeframe_type and to_timeframe_type with values of trip, leg, station (and potentially others).

  1. We also agree with the comments of @flocsy above in eradicating the use of XX:59:59 times .

@omar-kabbani
Copy link
Contributor Author

Thanks @skinkie, @flocsy, and @davidlewis-ito for your input! I have responded to your comments below - please let me know if something is outstanding.

@skinkie:

I suggest to make a change and don't mention a 24h period, but keep it in line with the rest of the GTFS spec (hence 24+h) formats.

  • Correct! I will change the wording to reference the Time field type in the specification.

@flocsy:

Need to clarify the start_time/end_time formats

It'll be beneficial IMHO (easier to look at the data) if instead of: [06:00-07:59], [08:00-15:59] we could use [06:00-08:00).[08:00-16:00)

  • With regards to the interval, start_time and end_time are included in the interval - just like how start_date and end_date are in calendar.txt. I will change the wording to clarify that.

IMHO we need to have a constraint that the timeframes are distinct

  • I think that overlapping timeframes that are part of the same timeframe_id should be forbidden. I can change the wording to add that.

We might also need to have a constraint that if there are timeframes then they need to cover all the 24 hours

  • This is always going to be a constraint when using empty fields in fare_leg_rules.txt and fare_transfer_rules.txt. The exact same issue arises if an area_id is present in more than one network_id. If using empty fields complicates things, then the timeframes should be defined explicitly.

Do we really need seconds? To me it sounds unrealistic and unnecessary.

  • I believe most data producers will not use seconds, but this is how Time is formatted in GTFS. Adding another data type may unnecessary complicate things. Removing seconds from Time will break current datasets.

fare_leg_rules.txt: IMHO we only need to add service_id, timeframe_id.

  • Currently, the field from_timeframe_id indicates the window for when a trip starts to be eligible for the special pricing. I realize now that in my example above I made an error by leaving to_timeframe_id blank (it should be a timeframe that covers all the day). I have changed that to a timeframe that covers all day, thanks for catching that! Basically any trip that starts between 6:00 AM and 7:00 AM and ends whenever is eligible for special pricing. If an agency has special restrictions on when a trip should end to be eligible for special pricing, then they should use the field to_timeframe_id with that timeframe.

@davidlewis-ito:

timeframe_ids & service_ids in fare_leg_rules, fare_products or elsewhere ?

  • I leave this to what the community wants to see - currently, fare_leg_rules.txt models the spatial (from area - to area) and temporal (from time window - to time window) factors of a trip to determine pricing. All other variables are modelled in fare_products.txt. If there is a push to model timeframes separately, then we will do as the industry sees fit.

Scope of timeframes

Good catch! For fare calculation purposes, the start/end of a trip is not clear. Since this is specific to fares, the start time is when a fare is validated (tap on the bus, or tap to enter the station, etc.), and the end time is when a fare is validated (tap off). I can modify the wording to reflect that.

@flocsy
Copy link
Contributor

flocsy commented Aug 10, 2022

@omar-kabbani
Regarding keeping both from_timeframe_id and to_timeframe_id: this poses a problem even with the most common use case: most of the day is "off-peak" but 06:00-07:59 it's "peak". If the rule would be: from_timeframe_id: off-peak, to_timeframe_id: off-peak, then probably the goal of the producer would be to say that when the whole trip is in off-peak then it's cheaper. However it can't differentiate between a trip: 05:55-08:13 and 10:12-11:34 where the 1st trip probably shouldn't be considered, since it includes the whole peak hour as well.

Regarding the constraint about overlapping timeframes: you wrote above that "overlapping timeframes that are part of the same timeframe_id should be forbidden". That for sure has to be added, but is that enough? What would you make of the following rules:

timeframe_id,start_time,end_time
off-peak,00:00,05:59
peak,06:00,07:59
off-peak,07:00,23:59

Part of the problem seems to be related to what both @davidlewis-ito and I touched that the timeframes and the service dates together define the times, so there should be something that connects between them. Either by having some explicit table for it or by having complicated constraints in the definitions. The above 3 lines look like not OK because there are 2 timeframes that include 07:00-07:59, so it's probably a bug, but on the other hand if I would change it to:

timeframe_id,start_time,end_time
off-peak,00:00,05:59
peak,06:00,07:59
off-peak,08:00,23:59
off-peak,00:00,23:59

Then it would still look strange at first sight, but when I would add that the 1st 3 lines cover the whole 24 hours for work days and the 4th line covers the whole 24 hours on weekends, then it would make sense. But then this should be somewhere explicitly defined, so the service_ids work-day and weekend should be related tho them. Of course we could also change the 4th line's timeframe_id to weekend, but that would only help humans when looking at it, but on the other hand it would probably also double the number of lines in the fare_leg_rules.

Regarding the timeframe_type: I think that modifying the wording is not enough, because it would only clarify the confusion on a local level, which anyway probably would not be confusing for the locals, but it misses the whole point of GTFS Fares V2: automated fare calculations. When you have an algorithm that calculates the fares and it knows that it takes 2 minutes to walk from the station entrance to the 23rd platform then it can and should take that into account. And then it is important for the algorithm to differentiate between these nuances: start to count at the entrance of the station or when you enter the train (reach the platform) or when the train leaves...

@omar-kabbani
Copy link
Contributor Author

Hi @flocsy -

Now if we could have a rule like: from_timeframe_id: off-peak, to_timeframe_id: off-peak, then a trip that starts at 05:55 and ends at 08:05 would be.... what? considered a legal off-peak trip? I don't think so...

Technically, yes - this would be an off-peak trip, since the fields are based on fare validation. The examples for off-peak fares that I found online only look at when the trip starts. Hence, if the discounted pricing starts at 7:00 AM and a rider taps at 6:59 AM - they will not pay the discounted price. For example, on TfL's fare structure, it is mentioned that "Peak and off-peak fares are charged based on the time you touch in.". Hence, from_timeframe_id=morning and to_timeframe_id=all_day (riders are eligible for a discount if they tap in the morning - it does not matter at what time their trip ends - could be during the morning period or after)

However it can't differentiate between a trip: 05:55-08:13 and 10:12-11:34 where the 1st trip probably shouldn't be considered, since it includes the whole peak hour as well.

Same here - if we are basing this on fare validation (tap on/tap off, etc.) both trips are-off peak. Unless we add a field contains_timeframe_id similar to how areas are handled, but this sounds a bit overkill - I do not know about all agencies, but I think most of the ones who provide variable pricing base it off of the start of the journey (tapping on).

From your posts, I take it that you are suggesting using one field for timeframes in fare_leg_rules.txt - what would that represent? The start of a trip, or a window which the trip should start and end in, or something else?

For associating service_id to timeframe_id, an option would be to add a service_id field to timeframes.txt - that way, the timeframe is bound a set of days.

As for timeframe_type, I leave this to data consumers as to how to handle it - but the current proposal is based on the fare validation (as in when a rider taps on or pays a fare or passes through fare gates etc.). How long it takes riders to go from the entrance to the fare gates is at the discretion of the trip planning application as well as through the use of pathways.txt. If there is a need to make this more specific, what enumerations should be included for timeframe type?

@flocsy
Copy link
Contributor

flocsy commented Aug 11, 2022

I am brainstorming rather than saying that there should be only 1 timerfame_id field PLUS timeframe_type. With those two we probably can handle many logical use cases (timeframe_type: station_entrance / vehicle_entrance would mean what you described, but there would also be a possibility to have trip_duration or whole_trip that would be able to disallow a trip: 05:55-06:13) Also by having timeframe_id + timeframe_type we can handle more real life use cases IMHO than with from_timeframe_id+to_timeframe_id (and in most cases from_timeframe_id would be equal to to_timeframe_id anyway)

Regarding adding service_id to somewhere: well, yes, of course the proposal would put them in fare_leg_rules, but there are other places we could add it, like to the timeframes. We'll need to think about the pros and cons of these (and maybe more) possibilities. As I see it we'll have to consider the most common use cases, which I believe mostly will have these in common:
the most common rules would be something like day-of-week (represented in a service) and hour-of-day (represented in timeframe) combined somehow (simplest example: M-F: 06:00-07:59 => peak, rest of the day => off-peak, but S-S: 00:00-23:59 => off-peak) For this basic case it's probably best to add service_id to timeframes. Up to this point it makes sense and also it "fixes" the overlap issue.

However considering what I think would be the next most common use case (exceptions for holidays) then if we would add just 1 service exception for X-mas, then how would it look? If we add service_id to timeframes, then we would have probably 1 more (maybe 2 if there's still some peek/off-peek thing) lines to timeframes, but then we would also need to duplicate most rules in fare_leg_rules, wouldn't we? I'm not really sure what's best. And if some lines behave in X-mas as usual, but other lines as weekend, then things get even more complicated.

@omar-kabbani
Copy link
Contributor Author

Hey everyone - just a reminder that MobilityData is hosting a roundtable discussion on Wednesday 24 August at 11:00 AM ET to discuss the outstanding items regarding time-variable fares. If you would like to attend, please react to this message or send an email to [email protected] and we will send you the invite.

For a summary and more details, please check out this post.

@flocsy
Copy link
Contributor

flocsy commented Aug 22, 2022 via email

@halbertram
Copy link

halbertram commented Aug 23, 2022

I'm afraid I can't make the discussion. I think there's a part of @davidlewis-ito's suggestion on timeframe_type that hasn't been picked up fully: the case when the determination of the peak-ness of a leg is determined by the time that vehicle leaves its origin (or reaches its destination) - eg LIRR in NYC. It's effectively an attribute of the trip and is not connected to the rider's timings of when they board that trip at all. This feels like something that would be worth considering if it could be captured in timeframes, or whether it would be necessary to represent in another way, such as trip-level fare attributes.

@omar-kabbani
Copy link
Contributor Author

Summary of roundtable discussion on variable fares by time or day:

August 24 11:00 AM ET

Attended by: @Cristhian-HA, @omar-kabbani, @skinkie, Chris Erickson (Trillium), and Scott Jackson (Apple).

Number of fields

  • It was discussed that 2 fields are required to effectively describe time-based fare calculations:
    • "from" timeframe which defines the window for when a rider has to start a journey to be eligible for the special fare.
    • "to" timeframe which defines the window for when a rider has to end a journey to be eligible for the special fare.
  • Most cases, the "from" timeframe field will be used, since many agencies handle special fares depending on when riders start their journey and not when they end them (for example: TfL and MRT).
  • Since most agencies do not use the trip end times to process time-based fares, and only rely on when the trip starts - one option is to have the field to_timeframe_id default to "All day" if left blank. This will go against the default behaviour of other blank fields in fare_leg_rules.txt, but this will save data producers the hassle of creating a separate entry in timeframes.txt to describe "All day".

Timeframe type

  • The need for a file or field to describe the timeframe type was highlighted since transit services have different ways of defining when a fare is activated (at station entrance, upon boarding, upon departure, etc.).
  • One option is to extend fare_leg_rules.txt with timeframe_type with enumerations.

File structure

  • from_timeframe_id and to_timeframe_id to remain in fare_leg_rules.txt
  • timeframe_type to be added to fare_leg_rules.txt
  • Use timeframes.txt and calendar.txt to define days and times - and use the fields from_timeframe_id, to_timeframe_id, and service_id in fare_leg_rules.txt to associate times and days to fare legs.

@omar-kabbani
Copy link
Contributor Author

Hey everybody, I will close this pull request since I will be leaving MobilityData and therefore I cannot remain the advocate for this proposal.

In order to preserve the conversation history, I will open an issue in google/transit and reference this pull request. Please continue the discussion there, and provide feedback on how to move forward with this proposal.

I also wanted to thank you all for your constant engagement in these GTFS extension proposals and for all the valuable feedback!

@flocsy
Copy link
Contributor

flocsy commented Oct 11, 2022 via email

@isabelle-dr
Copy link
Collaborator

isabelle-dr commented Nov 15, 2022

Opened #350 #357 to continue the work on this PR

@flocsy
Copy link
Contributor

flocsy commented Nov 15, 2022

can't this be reopened?

@isabelle-dr
Copy link
Collaborator

isabelle-dr commented Nov 15, 2022

@flocsy I linked the issue in the post above but I meant to link to PR #357.
Omar was the advocate of this PR and I can't continue on this behalf, unfortunately.
I know it's not ideal; I've read everything in this PR and I'm updating #357 as a result so we can resume where we left off.
Sorry for the inconvenience -

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants