Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define semantic cardinality for GTFS-realtime fields #64

Merged
merged 5 commits into from
Aug 9, 2017

Conversation

barbeau
Copy link
Collaborator

@barbeau barbeau commented Jun 28, 2017

  • As originally discussed in Clarify definition of "cardinality" #19 and https://groups.google.com/forum/#!msg/gtfs-realtime/wm3W7QIEZ9Y/DLyWKkknJyoJ, the current GTFS-realtime spec documentation describes field Protocol Buffer cardinality, not semantic cardinality. This has created confusion for consumers and producers where fields have been omitted based on them being labeled as "optional" in the GTFS-realtime spec, even if they were required under certain logical transit conditions (e.g., stop_sequence must be provided if a trip contains a loop).
  • This patch changes the "Cardinality" documentation to define the semantic cardinality for each data element as "required", "conditionally required", and "optional", with an explanation for when "conditionally required" fields are required in the "Description" section of that field. EDIT - Field requirements are now being defined in a new field "Required" - see Define semantic cardinality for GTFS-realtime fields #64 (comment) and discussion that follows. It also bumps the gtfs_realtime_version in the .proto file to 2.0 so validators can strictly enforce semantic cardinality based on the gtfs_realtime_version.

Announced on GTFS-realtime Google Group at:
https://groups.google.com/forum/#!topic/gtfs-realtime/J-csZSxeWjs

* As originally discussed in google#19 and https://groups.google.com/forum/#!msg/gtfs-realtime/wm3W7QIEZ9Y/DLyWKkknJyoJ, the current GTFS-realtime spec documentation describes field Protocol Buffer cardinality, not semantic cardinality.  This has created confusion for consumers and producers where fields have been omitted based on them being labeled as "optional" in the GTFS-realtime spec, even if they were required under certain logical transit conditions (e.g., stop_sequence must be provided if a trip contains a loop).
* This patch changes the "Cardinality" documentation to define the semantic cardinality for each data element as "required", "conditionally required", and "optional".  It also bumps the gtfs_realtime_version in the .proto file to 2.0 so validators can strictly enforce semantic cardinality based on the gtfs_realtime_version.
@barbeau
Copy link
Collaborator Author

barbeau commented Jun 28, 2017

I should also mention that with the exception of making the header timestamp required, I believe all of the other field cardinalities reflect existing GTFS-rt documentation. So, this isn't so much proposing something new as it is trying to capture existing logic spread around the docs into a concise description in the Cardinality and Description fields.

@@ -114,7 +129,7 @@ Note that the update can describe a trip that has already completed.To this end,
|------------------|------------|-------------------|-------------------|
| **trip** | [TripDescriptor](#message-tripdescriptor) | required | The Trip that this message applies to. There can be at most one TripUpdate entity for each actual trip instance. If there is none, that means there is no prediction information available. It does *not* mean that the trip is progressing according to schedule. |
| **vehicle** | [VehicleDescriptor](#message-vehicledescriptor) | optional | Additional information on the vehicle that is serving this trip. |
| **stop_time_update** | [StopTimeUpdate](#message-stoptimeupdate) | repeated | Updates to StopTimes for the trip (both future, i.e., predictions, and in some cases, past ones, i.e., those that already happened). The updates must be sorted by stop_sequence, and apply for all the following stops of the trip up to the next specified one. |
| **stop_time_update** | [StopTimeUpdate](#message-stoptimeupdate) | required | Updates to StopTimes for the trip (both future, i.e., predictions, and in some cases, past ones, i.e., those that already happened). The updates must be sorted by stop_sequence, and apply for all the following stops of the trip up to the next specified one. At least one stop_time_update must be provided for the trip. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a trip is canceled and its ScheduleRelationship is reported as CANCELED, some (most?) operators don't list any stops in StopTimeUpdate as its assumed that the trip will not be making any stops.

Do we want this to be the case? Or, would it preferred to list all stops with the StopTimeUpdate's ScheduleRelationship as SKIPPED for all stops?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jxeeno Good catch! I missed this when converting some of our gtfs-rt-validator rules to the text. I agree, if ScheduleRelationship is CANCELED then no stop_time_updates need to be provided. For my reference this is E041 for us. I'll push an update to fix this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in bf3d2a5.

| **departure** | [StopTimeEvent](#message-stoptimeevent) | optional |
| **stop_sequence** | [uint32](https://developers.google.com/protocol-buffers/docs/proto#scalar) | conditionally required | Must be the same as in stop_times.txt in the corresponding GTFS feed. Either stop_sequence or stop_id must be provided within a StopTimeUpdate - both fields cannot be empty. stop_sequence is required for trips that visit the same stop_id more than once (e.g., a loop) to disambiguate which stop the prediction is for. |
| **stop_id** | [string](https://developers.google.com/protocol-buffers/docs/proto#scalar) | conditionally required | Must be the same as in stops.txt in the corresponding GTFS feed. Either stop_sequence or stop_id must be provided within a StopTimeUpdate - both fields cannot be empty. |
| **arrival** | [StopTimeEvent](#message-stoptimeevent) | conditionally required | Either arrival or departure must be provided within a StopTimeUpdate - both fields cannot be empty. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a StopTimeUpdate has a ScheduleRelationship of SKIPPED, I don't think it would be necessary to provide an arrival or a departure StopTimeEvent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jxeeno Good catch again! Yes, for SKIPPED neither arrival nor departures need to be provided (E043 for our validator). Also, it's actually an error to provide arrival and departure for ScheduleRelationship of NO_DATA (E042). I'll fix both these items.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed via 56066ee


### Term Definitions

* **required**: Exactly one
* **repeated**: Zero or more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that repeated has been removed; I think it should be included so that people know they can supply 0 or more.

Copy link
Collaborator Author

@barbeau barbeau Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RachM I had concerns about removing repeated too, but wasn't sure how to represent this in the table alongside the new info. Currently I added text to the "Description" field, but I do like the ability to scan the table and easily see which fields are 0+ elements.

Questions for this:

  • Should we add a new column for one/many relationships? If so, what should this be called?
  • Do we leave this as the "Cardinality" column and change the name of the column for the new semantic fields to "Requirement", or something similar?

Also, I've never liked the term repeated for this concept because to my knowledge it's very protocol buffer (implementation)-specific. I'd prefer to find another term - suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we leave this as the "Cardinality" column and change the name of the column for the new semantic fields to "Requirement", or something similar?

I think that's the best approach. My reasoning:

  • Cardinality best describes the number of entities allowed in that field, so let's keep it purely to numbers. Of course the savvy user may determine the "requirement" from the number, but I think it's useful to explicitly state the "requirement" in a separate column.
  • A new column, Required or something similar makes it very clear that the user should check that column to see if the field is needed.

I think it's good to separate the two concepts; a) is it needed (check Required column) and 2) how many can I provide (check Cardinality column).

I agree about repeated; it's not user friendly. If you have the Cardinality column, then the Required column entries become: Yes (for required) and No (for optional and repeated).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good on adding the new column.

So here's what I'm working on now to update this proposal, in summary:

  • Required column will address semantic requirements
  • Cardinality column will include information about how many elements should be provided for a particular field

The goal of this proposal is to represent the semantic requirements of fields, based on transit use cases and domain logic. The main challenge here is that there isn't a direct mapping between semantic requirements in Required and protobuf cardinality. As a result I think it will be confusing to show semantic requirements in Required and protobuf cardinality in Cardinality.

I'd prefer to keep the spec documentation independent of the implementation - in other words, the docs should be purely semantic, with the .proto file being the reference for the protobuf cardinality.

With this in mind, I propose that we define semantic cardinality something like the following:

Cardinality represents the number of elements that may be provided for a particular field:

Always reference the Required and Description fields to see when a field is required, conditionally required, or optional. Please reference gtfs-realtime.proto for Protocol Buffer cardinality.

Using these definitions, FeedMessage and TripUpdate would look like the following:

FeedMessage

Field Name Type Required Cardinality Description
header FeedHeader Required One Metadata about this feed and feed message.
entity FeedEntity Required Many Contents of the feed.

TripUpdate

Field Name Type Required Cardinality Description
trip TripDescriptor Required One XXXX
vehicle VehicleDescriptor Optional One XXXX
stop_time_update StopTimeUpdate Conditionally required Many XXXX
timestamp uint64 Optional One XXXX
delay int32 Optional One XXXX

@barbeau
Copy link
Collaborator Author

barbeau commented Jun 29, 2017

@jxeeno Good catch on the schedule_relationship conditional states - I've pushed commits to fix both of these.

* A new "Required" field holds the information as to whether a particular field is Required, Optional, or Conditionally required
* The existing "Cardinality" field is changed to a semantic version of cardinality instead of using Protobuf cardinality
@barbeau
Copy link
Collaborator Author

barbeau commented Jun 30, 2017

I've pushed a new commit 040f49c into this PR that creates a new column for "Required", and uses semantic cardinality definitions (One and Many) in "Cardinality" instead of protobuf cardinality (as proposed in #64 (comment)).

Feedback welcome! I'm definitely open to comments on how to improve this.

@barbeau
Copy link
Collaborator Author

barbeau commented Jul 12, 2017

Any additional comments on this proposal? I'd like to make any further changes based on feedback prior to calling a vote.

@RachM
Copy link
Contributor

RachM commented Jul 13, 2017

LGTM - I'm happy for voting to commence.

@barbeau
Copy link
Collaborator Author

barbeau commented Jul 13, 2017

Great! I'd like to call for a vote on this proposal then - voting will end Thursday July 20th at 23:59:59 UTC.

@RachM
Copy link
Contributor

RachM commented Jul 13, 2017

I vote yes (on behalf of Google).

@jxeeno
Copy link
Contributor

jxeeno commented Jul 14, 2017

+1

* This field can't be "Required", because there are legitimate scenarios where a feed may not have any real-time information about the system (e.g., if it's 3am, and the agency doesn't offer any service at 3am).  So, change to "Conditionally required" with description.
@barbeau
Copy link
Collaborator Author

barbeau commented Jul 31, 2017

Sorry for the delay updating this proposal - I was out on vacation last week.

Unfortunately we currently don't have the 3 needed votes for this to pass (we're at 2), and the voting period ended July 20th.

This ends up being a good thing, as I realized there was one issue in this proposal (FeedMessage.entity should be "Conditionally required" - I just fixed this in 3c24ff7, see commit description for more info).

So, I'd like to call for another vote on this PR - voting will end Tuesday Aug 8th at 23:59:59 UTC.

@RachM
Copy link
Contributor

RachM commented Jul 31, 2017

I vote yes.

@jxeeno
Copy link
Contributor

jxeeno commented Aug 1, 2017

+1 I vote yes.

@qwasar
Copy link

qwasar commented Aug 4, 2017 via email

@barbeau
Copy link
Collaborator Author

barbeau commented Aug 9, 2017

Alright, looks like this passed the vote - 3 yes votes and 0 nos! Congrats everyone, we have GTFS-realtime v2.0! 🎊 🎊 🎊

This includes better documentation for semantic cardinality and requirements - which fields are required and under what conditions.

@RachM Would you be able to squash and merge this using the Github web UI, so one commit ends up in the master branch?

Alternately I could force-push a new single commit to replace the iterative commits currently in the PR, although it will erase the history of commits based on feedback through the proposal.

@RachM RachM merged commit eb4b243 into google:master Aug 9, 2017
@barbeau barbeau deleted the semantic-cardinality branch August 10, 2017 17:48
barbeau added a commit to barbeau/transit that referenced this pull request Aug 31, 2017
RachM pushed a commit that referenced this pull request Sep 3, 2017
Copy link

@benkaday3218 benkaday3218 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help

@Cristhian-HA
Copy link

@benkaday3218 would you be able to provide details about the areas where you need help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants