-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define semantic cardinality for GTFS-realtime fields #64
Conversation
* As originally discussed in google#19 and https://groups.google.com/forum/#!msg/gtfs-realtime/wm3W7QIEZ9Y/DLyWKkknJyoJ, the current GTFS-realtime spec documentation describes field Protocol Buffer cardinality, not semantic cardinality. This has created confusion for consumers and producers where fields have been omitted based on them being labeled as "optional" in the GTFS-realtime spec, even if they were required under certain logical transit conditions (e.g., stop_sequence must be provided if a trip contains a loop). * This patch changes the "Cardinality" documentation to define the semantic cardinality for each data element as "required", "conditionally required", and "optional". It also bumps the gtfs_realtime_version in the .proto file to 2.0 so validators can strictly enforce semantic cardinality based on the gtfs_realtime_version.
I should also mention that with the exception of making the header timestamp required, I believe all of the other field cardinalities reflect existing GTFS-rt documentation. So, this isn't so much proposing something new as it is trying to capture existing logic spread around the docs into a concise description in the Cardinality and Description fields. |
gtfs-realtime/spec/en/reference.md
Outdated
@@ -114,7 +129,7 @@ Note that the update can describe a trip that has already completed.To this end, | |||
|------------------|------------|-------------------|-------------------| | |||
| **trip** | [TripDescriptor](#message-tripdescriptor) | required | The Trip that this message applies to. There can be at most one TripUpdate entity for each actual trip instance. If there is none, that means there is no prediction information available. It does *not* mean that the trip is progressing according to schedule. | | |||
| **vehicle** | [VehicleDescriptor](#message-vehicledescriptor) | optional | Additional information on the vehicle that is serving this trip. | | |||
| **stop_time_update** | [StopTimeUpdate](#message-stoptimeupdate) | repeated | Updates to StopTimes for the trip (both future, i.e., predictions, and in some cases, past ones, i.e., those that already happened). The updates must be sorted by stop_sequence, and apply for all the following stops of the trip up to the next specified one. | | |||
| **stop_time_update** | [StopTimeUpdate](#message-stoptimeupdate) | required | Updates to StopTimes for the trip (both future, i.e., predictions, and in some cases, past ones, i.e., those that already happened). The updates must be sorted by stop_sequence, and apply for all the following stops of the trip up to the next specified one. At least one stop_time_update must be provided for the trip. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a trip is canceled and its ScheduleRelationship
is reported as CANCELED
, some (most?) operators don't list any stops in StopTimeUpdate
as its assumed that the trip will not be making any stops.
Do we want this to be the case? Or, would it preferred to list all stops with the StopTimeUpdate
's ScheduleRelationship
as SKIPPED
for all stops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in bf3d2a5.
gtfs-realtime/spec/en/reference.md
Outdated
| **departure** | [StopTimeEvent](#message-stoptimeevent) | optional | | ||
| **stop_sequence** | [uint32](https://developers.google.com/protocol-buffers/docs/proto#scalar) | conditionally required | Must be the same as in stop_times.txt in the corresponding GTFS feed. Either stop_sequence or stop_id must be provided within a StopTimeUpdate - both fields cannot be empty. stop_sequence is required for trips that visit the same stop_id more than once (e.g., a loop) to disambiguate which stop the prediction is for. | | ||
| **stop_id** | [string](https://developers.google.com/protocol-buffers/docs/proto#scalar) | conditionally required | Must be the same as in stops.txt in the corresponding GTFS feed. Either stop_sequence or stop_id must be provided within a StopTimeUpdate - both fields cannot be empty. | | ||
| **arrival** | [StopTimeEvent](#message-stoptimeevent) | conditionally required | Either arrival or departure must be provided within a StopTimeUpdate - both fields cannot be empty. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a StopTimeUpdate
has a ScheduleRelationship
of SKIPPED
, I don't think it would be necessary to provide an arrival
or a departure
StopTimeEvent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed via 56066ee
|
||
### Term Definitions | ||
|
||
* **required**: Exactly one | ||
* **repeated**: Zero or more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned that repeated
has been removed; I think it should be included so that people know they can supply 0 or more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RachM I had concerns about removing repeated
too, but wasn't sure how to represent this in the table alongside the new info. Currently I added text to the "Description" field, but I do like the ability to scan the table and easily see which fields are 0+ elements.
Questions for this:
- Should we add a new column for one/many relationships? If so, what should this be called?
- Do we leave this as the "Cardinality" column and change the name of the column for the new semantic fields to "Requirement", or something similar?
Also, I've never liked the term repeated
for this concept because to my knowledge it's very protocol buffer (implementation)-specific. I'd prefer to find another term - suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we leave this as the "Cardinality" column and change the name of the column for the new semantic fields to "Requirement", or something similar?
I think that's the best approach. My reasoning:
- Cardinality best describes the number of entities allowed in that field, so let's keep it purely to numbers. Of course the savvy user may determine the "requirement" from the number, but I think it's useful to explicitly state the "requirement" in a separate column.
- A new column,
Required
or something similar makes it very clear that the user should check that column to see if the field is needed.
I think it's good to separate the two concepts; a) is it needed (check Required
column) and 2) how many can I provide (check Cardinality
column).
I agree about repeated
; it's not user friendly. If you have the Cardinality
column, then the Required
column entries become: Yes (for required
) and No (for optional
and repeated
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good on adding the new column.
So here's what I'm working on now to update this proposal, in summary:
Required
column will address semantic requirementsCardinality
column will include information about how many elements should be provided for a particular field
The goal of this proposal is to represent the semantic requirements of fields, based on transit use cases and domain logic. The main challenge here is that there isn't a direct mapping between semantic requirements in Required
and protobuf cardinality. As a result I think it will be confusing to show semantic requirements in Required
and protobuf cardinality in Cardinality
.
I'd prefer to keep the spec documentation independent of the implementation - in other words, the docs should be purely semantic, with the .proto
file being the reference for the protobuf cardinality.
With this in mind, I propose that we define semantic cardinality something like the following:
Cardinality represents the number of elements that may be provided for a particular field:
- One - A single one element may be provided for this field. This maps to the Protocol Buffer required and optional cardinalities.
- Many - Many elements (0, 1, or more) may be provided for this field. This maps to the Protocol Buffer repeated cardinality.
Always reference the Required and Description fields to see when a field is required, conditionally required, or optional. Please reference
gtfs-realtime.proto
for Protocol Buffer cardinality.
Using these definitions, FeedMessage
and TripUpdate
would look like the following:
FeedMessage
Field Name | Type | Required | Cardinality | Description |
---|---|---|---|---|
header | FeedHeader | Required | One | Metadata about this feed and feed message. |
entity | FeedEntity | Required | Many | Contents of the feed. |
TripUpdate
Field Name | Type | Required | Cardinality | Description |
---|---|---|---|---|
trip | TripDescriptor | Required | One | XXXX |
vehicle | VehicleDescriptor | Optional | One | XXXX |
stop_time_update | StopTimeUpdate | Conditionally required | Many | XXXX |
timestamp | uint64 | Optional | One | XXXX |
delay | int32 | Optional | One | XXXX |
…ule_relationship Addresses google#64 (review)
...based on stop_time_update.schedule_relationship. See https://github.com/google/transit/blob/master/gtfs-realtime/spec/en/reference.md#enum-schedulerelationship
@jxeeno Good catch on the |
* A new "Required" field holds the information as to whether a particular field is Required, Optional, or Conditionally required * The existing "Cardinality" field is changed to a semantic version of cardinality instead of using Protobuf cardinality
I've pushed a new commit 040f49c into this PR that creates a new column for "Required", and uses semantic cardinality definitions ( Feedback welcome! I'm definitely open to comments on how to improve this. |
Any additional comments on this proposal? I'd like to make any further changes based on feedback prior to calling a vote. |
LGTM - I'm happy for voting to commence. |
Great! I'd like to call for a vote on this proposal then - voting will end Thursday July 20th at 23:59:59 UTC. |
I vote yes (on behalf of Google). |
+1 |
* This field can't be "Required", because there are legitimate scenarios where a feed may not have any real-time information about the system (e.g., if it's 3am, and the agency doesn't offer any service at 3am). So, change to "Conditionally required" with description.
Sorry for the delay updating this proposal - I was out on vacation last week. Unfortunately we currently don't have the 3 needed votes for this to pass (we're at 2), and the voting period ended July 20th. This ends up being a good thing, as I realized there was one issue in this proposal (FeedMessage.entity should be "Conditionally required" - I just fixed this in 3c24ff7, see commit description for more info). So, I'd like to call for another vote on this PR - voting will end Tuesday Aug 8th at 23:59:59 UTC. |
I vote yes. |
+1 I vote yes. |
I vote yes.
…On Tue, Aug 1, 2017 at 2:04 AM, Kenneth Tsang ***@***.***> wrote:
+1 I vote yes.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#64 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABrtS4u7HRBhQ_opQROXQh6VOnZJXO2lks5sTmuIgaJpZM4OIlGJ>
.
|
Alright, looks like this passed the vote - 3 yes votes and 0 nos! Congrats everyone, we have GTFS-realtime v2.0! 🎊 🎊 🎊 This includes better documentation for semantic cardinality and requirements - which fields are required and under what conditions. @RachM Would you be able to squash and merge this using the Github web UI, so one commit ends up in the master branch? Alternately I could force-push a new single commit to replace the iterative commits currently in the PR, although it will erase the history of commits based on feedback through the proposal. |
I missed making this change in google#64.
I missed making this change in #64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help
@benkaday3218 would you be able to provide details about the areas where you need help? |
This patch changes the "Cardinality" documentation to define the semantic cardinality for each data element as "required", "conditionally required", and "optional", with an explanation for when "conditionally required" fields are required in the "Description" section of that field.EDIT - Field requirements are now being defined in a new field "Required" - see Define semantic cardinality for GTFS-realtime fields #64 (comment) and discussion that follows. It also bumps the gtfs_realtime_versionin the .proto fileto 2.0 so validators can strictly enforce semantic cardinality based on the gtfs_realtime_version.Announced on GTFS-realtime Google Group at:
https://groups.google.com/forum/#!topic/gtfs-realtime/J-csZSxeWjs