Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

828 MCAP specification for osi tracefiles #833

Closed
wants to merge 13 commits into from

Conversation

TimmRuppert
Copy link

@TimmRuppert TimmRuppert commented Oct 14, 2024

Reference to a related issue in the repository

Closes #828

Add a description

  • Add an initial support for mcap files as an alternative to tradtional .osi and .txth files
  • Precision of .osi and .txth spec
  • Removed example code as this here is a spec and not the implementation
  • Mentioning of a companion repo for utils which implement this spec (separation of spec and tutorials/examples/code)

Some questions to ask:
What is this change?

  • Spec change to support mcap
    Is this a bug fix or a feature? Does it break any existing functionality or force me to update to a new version?
    How has it been tested?
  • Feature. Breaking changes for naming convention are currently in discussion..

Take this checklist as orientation for yourself, if this PR is ready for the Change Control Board:

If you can’t check all of them, please explain why.
If all boxes are checked or commented and you have achieved at least one positive review, you can assign the label ReadyForCCBReview!

@TimmRuppert TimmRuppert linked an issue Oct 14, 2024 that may be closed by this pull request
@jdsika
Copy link
Contributor

jdsika commented Oct 15, 2024

I would (rather quickly) need an example MCAP file for OSI v3.7.0 with (at best all) objects filled according to standard.

@TimmRuppert
Copy link
Author

TimmRuppert commented Oct 16, 2024

I would (rather quickly) need an example MCAP file for OSI v3.7.0 with (at best all) objects filled according to standard.

Here a three example files: Example_OSI_MCAP.zip

  • Only_SD_v1.mcap contains multiple sensor data on one channel (topic)
  • Two_SD_different_rate_v1.mcap contains multiple sensor data on two channels
  • All_Top-Level_with_Timestamp_v1.mcap contains all top-level messages which have a timestamp field

I quickly tried to adapt most outcomes of the last meeting. The messages are basically empty except that varying timestamps are set.

Signed-off-by: Timm Ruppert <[email protected]>
@TimmRuppert TimmRuppert force-pushed the 828-mcap-example-for-osi-tracefiles branch from 89eaa8c to 38249ce Compare October 16, 2024 12:36
@jdsika
Copy link
Contributor

jdsika commented Oct 16, 2024

I would (rather quickly) need an example MCAP file for OSI v3.7.0 with (at best all) objects filled according to standard.

Here a three example files: Example_OSI_MCAP.zip

* `Only_SD_v1.mcap` contains multiple sensor data on one channel (topic)

* `Two_SD_different_rate_v1.mcap` contains multiple sensor data on **two** channels

* `All_Top-Level_with_Timestamp_v1.mcap` contains all top-level messages which have a timestamp field

I quickly tried to adapt most outcomes of the last meeting. The messages are basically empty except that varying timestamps are set.

Thanks! Some ground truth data from esmini with objects etc from as many types as possible would be great :))

@TimmRuppert
Copy link
Author

Thanks! Some ground truth data from esmini with objects etc from as many types as possible would be great :))

I just updated esmini and it seems like there is an issue with the FMU.. Anyways, for starters here is one of our highway examples where I converted a native tracefile SensorView to an mcap file (the ground truth is therefore a submessage)

demo_SV_onramp_V1.zip

I can provide a GT top level message as well but I am a bit tight on schedule for today and tomorrow.

@jdsika
Copy link
Contributor

jdsika commented Oct 17, 2024

No, thank you. It was just important for me to have a "third set of eyes" creating a file in order to debug something. Otherwise everyone is always blaming in circles :)

@jdsika
Copy link
Contributor

jdsika commented Oct 18, 2024

FYI: OSI TRacefile writer in openPASS

@TimmRuppert
Copy link
Author

I have updated the spec based on our last discussion. In the next meeting we need to address the following things :

@TimmRuppert
Copy link
Author

Documenting today's meeting concerning the points mentioned above:

  • We agreed on
    • <opt prefix> To help sort after maneuver or similar
    • <opt timestamp> To help sort after recording time (if no prefix) or quickly identify a recording. Should represent abs. timestamp of zero-time of top-level messages (synchronized global time)
    • <required type> Like for .txth and .osi but additionally multi
    • <opt suffix> Further details you might want to provide without having to inspect the .mcap metadata (e.g. if applicable a min. required OSI version)
  • Requirement of the protobuf version in the meta and/or the naming convention
  • move to the channel metadata
  • might not be used by anyone except for some debugging edge-cases but does not hurt either
  • Clarify timestamp details based on Documentation page for timing #834

    • timestamps in the file name according to the naming convention
    • in the publish_time field of the MCAP message
  • General review and potential new things somebody would like to be considered

  • log_time = publish_time = timestamp of top-level message

@jdsika
Copy link
Contributor

jdsika commented Oct 21, 2024

Consider the following case:
We will have one mcap trace with a lot of information and traces and a user wants to add osi traces to it. This has as a consequence that as much information as possible must be placed in the trace/channel meta data and not

@thomassedlmayer
Copy link
Contributor

thomassedlmayer commented Oct 21, 2024

List of potential optional metadata fields that mainly emerged from the Gaia-X project (focus on measurement data):

  • (1) Start/stop timestamp / duration (include/exclude depending on mcap capabilities of extracting those from the mcap timestamps)
  • (2) Frame rate (@ClemensLinnhoff You mentioned frame rate as an interesting metadata field. I thought about how to handle variable frame rates: Maybe define it to be an approximated frame rate in the case of variable frame rates?)
  • (3) Content granularity: Level of granularity of contained data, e.g. trace contains object lists / detection lists (could also be extended to flag if a trace contains (detected) lane network, traffic lights, traffic signs, environmental conditions)
  • (4) Traffic direction: Left-hand/right-hand
  • (5) Location: E.g. country, state, city, etc. (could also be put into trace description)
  • (6) Contained road types: Indicate that road types like motorway/rural/city/traffic calmed zone etc. are contained in the trace
  • (7) Contained lane types: Indicate that biking/walking/parking lanes are contained in the trace
  • (8) Scenario identifier: User-/company-/project-specific scenario identifiers
  • (9) Host moving object description, e.g. measurement vehicle details -> Could also be put in "Used data sources" (@ClemensLinnhoff You mentioned, this could be useful to you; Do you have an example use case for synthetic data?)
  • (10) Target moving objects: OSI-id(s)+description of object(s) of interest (vehicle/pedestrian performing certain actions of interest)
  • (11) Events: Tag certain events of interest with timestamp+description, e.g. cut-out action of vehicle x at timestamp y, sensor fault at timestamp z
  • (12) Used data sources/tools: How/when was the trace generated/processed (repeated field to record tool/processing chain), e.g.
    • 2024-10-17 15:54:08+02:00: Captured on measurement vehicle x with sensor y (firmware version 1.0)
    • 2024-10-17 15:55:20+02:00: Processed with tool x version 1.0
    • 2024-10-17 15:54:08+02:00: Synthetically generated with tool x version 1.0/sensor model x version 1.0
  • (13) Creator information: Contact/name/company/license

@TimmRuppert @ClemensLinnhoff @jdsika Feel free to add your opinion on which we should include.

In case we keep a lot of the fields above I would list the less important metadata definitions with less normative priority:

  1. Required ("must"): osi_version, protobuf
  2. Recommended ("should"): e.g. description, creator information, data sources, frame rate
  3. Metadata hints ("optionally"): We suggest to use the proposed structure if the respective information is available and the trace creator wants to include it (e.g. location, granularity, contained road/lane types, events, traffic direction).

@ClemensLinnhoff
Copy link
Contributor

ClemensLinnhoff commented Oct 21, 2024

2: This is also something that mcap natively supports, as far as I know. @TimmRuppert how does it handle varying frame rates? Does it take the mean?
5: This is probably only valid for measurements which is not the standard case for OSI, so I would not put this in the standard specification.
9: My example use-case would be for re-simulating a measurement for model validation. Then you know, which kind of measurement vehicle is simulated.

@TimmRuppert
Copy link
Author

  1. start and stop timestamps are present in the "summary" section and thus easily accessible (either via API or CLI). Therefore, I would recommend to not store them in the metadata again.
  2. The average framerate is given by the message-count and start/stop timestamps. This is for example supported by mcap CLI . Therefore, I would recommend to not store this in the metadata again.
  3. I think this is somewhat covered by the shacls and hard to unify without further spec. I would agree with Clemens and leave that out for the moment
  4. -9. Sure, why not
  5. This might work for simple scenarios. But if we think about more complex city scenarios, deciding which traffic participant might be relevant is sometimes up to the function consuming this data.
  6. Strongly agree
  7. We thought about something similar as well. Instead of a repeated field (where the "field" would need to be defined by us as only string values are accepted) we could also do it per channel and add it to the channel metadata?
  8. Good idea, how about splitting this into "license" and "contact person" (or similar). Storing a license in the file might help sharing/reusing data.

In any case everything needs good key-name and category-name.

@TimmRuppert TimmRuppert changed the title 828 mcap example for osi tracefiles 828 mcap specification for osi tracefiles Oct 22, 2024
@TimmRuppert TimmRuppert changed the title 828 mcap specification for osi tracefiles 828 MCAP specification for osi tracefiles Oct 22, 2024
@jdsika jdsika added the Documentation Everything which impacts the quality of the documentation and guidelines. label Oct 25, 2024
@TimmRuppert
Copy link
Author

TimmRuppert commented Oct 25, 2024

General:

  • Can only contain one scenario with a unique global time
  • one mcap file is a dataset

Metadata:

  • Strongly recommended additional detailed metadata, category asam_osi
    • description e.g. cut-in
    • creator e.g. person or company (not tool) csv
    • license csv of spdx identifiers
    • data_sources e.g. csv of model, recorder, scenario player
  • Channel-wise: zero_time : ISO 8601 YYYYMMDDThhmmss.f representing the wall clock time

Solved by "Can only contain one scenario with a unique global time "

  • Optional if you want to use more detailed metadata which follows some kind of public schema or standard

    • Add a file wide metadata with the name context . This category of metadata should contain prefixes as keys and details about the used metadata specification in form of links/names as values. This prefix should be an identifier for other metadata used on a file or channel level. Here are some examples:
    • gaiax_hdmap : https://github.com/GAIA-X4PLC-AAD/ontology-management-base/tree/main/hdmap/
      • Usage in channel gaiax_hdmaps_HdMapShape : value 123
      • So everybody seeing this channel metadata has the chance to get the document specifing what that data means
    • Add another example for left hand / right traffic rule of openDrive
    • Add another example for some random iso

    See also Project structure and c++ trace file reader/writer Lichtblick-Suite/asam-osi-utilities#2 (comment)

@jdsika jdsika added this to the V3.7.1 milestone Nov 8, 2024
@jdsika
Copy link
Contributor

jdsika commented Nov 8, 2024

I would like to suggest to add a comment to "Nested Top Level Messages" as e.g. the SensorView in SensorData like the following:

When using ASAM OSI MCAP as a container for OSI traces the user is allowed to remove nested OSI Top level messages and add them as separate channels into the MCAP container"

This comment shoule be added above the top level messages in the .proto files as well.

What is your opinion about this? I think the nested messages were created as the collection of traces in one container was not defined at the time with .osi files?

@ClemensLinnhoff
Copy link
Contributor

What is your opinion about this? I think the nested messages were created as the collection of traces in one container was not defined at the time with .osi files?

I think the nesting is not only for trace files, but also for the messages between FMUs directly. I think it is just simpler to send one SensorData instead of a SensorData, a SensorView and a GroundTruth. So I would not add this to the proto files but just in the MCAP trace file documentation.

@TimmRuppert
Copy link
Author

TimmRuppert commented Nov 8, 2024

What is your opinion about this? I think the nested messages were created as the collection of traces in one container was not defined at the time with .osi files?

Besides the points @ClemensLinnhoff mentioned, this might result in a good portion of looking back and forth in the file in order to the retrieve the corresponding messages. Especially as the SensorData::timestamp is not required to (but should) be the GroundTruth::timestamp. So there is no enforcement of a real identifier to match messages. Furthermore, SensorView is repeated field in the SensorData. How would one specify which ones are meant?

Nonetheless, I understand and totally share your motivation. Maybe something for OSI4. It would just require a lot of breaking changes.

@pmai
Copy link
Contributor

pmai commented Nov 8, 2024

What is your opinion about this? I think the nested messages were created as the collection of traces in one container was not defined at the time with .osi files?

I think the nesting is not only for trace files, but also for the messages between FMUs directly. I think it is just simpler to send one SensorData instead of a SensorData, a SensorView and a GroundTruth. So I would not add this to the proto files but just in the MCAP trace file documentation.

I don't think this is even at all related to trace files. I expect a trace file to contain the messages as they are being sent/received. No one is forcing anyone to use the nested messages, if they don't want to (I personally think they are usually a mistake in the case of SensorData->SensorView just for traceability purposes, but that's just me). But if they are used then they should be stored as is.

Now if someone wants to be creative they can do all kinds of things as they like. It's not the standards job to say how to use it. So I think this needs no mention anywhere, since it is purely up to the user and use case.

TimmRuppert and others added 3 commits November 11, 2024 19:11
Signed-off-by: Timm Ruppert <[email protected]>
Signed-off-by: ClemensLinnhoff <[email protected]>
Signed-off-by: ClemensLinnhoff <[email protected]>
@ClemensLinnhoff ClemensLinnhoff marked this pull request as ready for review November 12, 2024 07:48
ClemensLinnhoff and others added 3 commits November 12, 2024 13:47
Signed-off-by: ClemensLinnhoff <[email protected]>
Signed-off-by: Timm Ruppert <[email protected]>
- `publish_time` field:
** Must reflect the timestamp of the stored OSI top-level message
** Must be in nanoseconds
- `log_time` field: Must reflect the time when the message was enqueued for MCAP file addition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be removed here?

Copy link
Author

@TimmRuppert TimmRuppert Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate further?

I see the issue. log_time is defined twice

Comment on lines 41 to 42
** `zero_time`: ISO 8601 YYYYMMDDThhmmss.f formatted point in time representing the zero time of the scenario
** `timestamp`: ISO 8601 YYYYMMDDThhmmss.f formatted creation time of the file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specification of timezone is missing.

Why not use nanoseconds (unix epoch)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally also favor the unix epoch in ms or nanoseconds. The proposal is using ISO 8601 as it seems like that was used for trace file names etc. before.

That would also make more sense to be compliant with https://opensimulationinterface.github.io/osi-antora-generator/asamosi/latest/gen/structosi3_1_1EnvironmentalConditions.html#a636bb78627046f34208f42f586ab2086?

Lets wait an see if anyone disagrees.

- An MCAP file is considered a single dataset

== Schema
- `name` field: Full message type name, including package (e.g., `osi3.SensorData`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it defined somewhere what "full message type name" means exactly?
Does it make sense to specify that the channels must be named "osi3.MessageType"?
Especially, because you used "OSI3::SensorData" in line 60 (not in the same context but anyways).

Copy link
Author

@TimmRuppert TimmRuppert Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it defined somewhere what "full message type name" means exactly?

I understand the protobuf documentation as if this is defined. But to be more precise I will change it to fully-qualified name of the protobuf message type

Does it make sense to specify that the channels must be named "osi3.MessageType"?

There is a mix usage of "must" and leaving it out. Should be changed.

Especially, because you used "OSI3::SensorData" in line 60 (not in the same context but anyways).

There I meant the osi3::SensorData Struct. But it makes sense to simply write "A channel containing OSI SensorData messages" and circumvent this. I will change line 60

- Must allow other non-OSI data to be present in the MCAP file
- Message records must be written into `chunk records` for indexed files
- Only OSI top-level messages containing a timestamp field are permitted to be directly stored in MCAP channels
- Must contain only a single scenario with a unique global time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't exactly understand what's meant by "unique global time". There could be multiple scenarios with the same time which means it can not be unique. Did you mean defined/specified global time or that all contained messages must be in the same time frame?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our last meeting we agreed that an mcap must only contain one scenario (while technically it could contain multiple). We decided to ditch the technical possibility to store multiple independent scenarios to avoid extreme confusion with interesting files and usage that it not intended: One could come up with the idea to store all possible NCAP scenarios at once in a huge file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that we talked about a common/unified time frame but not about that an MCAP file should only be allowed to contain a single scenario.

Also, I thought that the term "unique global time" is maybe a bit confusing as well. I'd rather write something like common time frame.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Must contain only a single scenario with a unique global time
- Must contain only a single scenario with a common time frame

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I don't agree that an mcap file must contain only a single scenario. I think the word "scenario" could be misleading. You can put an arbitrary number of scenarios in one trace file. IMO the only relevant information here is that all the messages in the file must have a common time frame.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the term scenario is a bit misleading. You could have multiple scenarios one after another in one simulation. The important thing is, that all channels have the same time line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Must contain only a single scenario with a unique global time
- Must be limited to a single, unified sequence of events within the same time frame.



== File-wide Metadata
- Must include metadata with the name `versions` containing at least the following key-value pair:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be some convention or at least one sentence talking about the used separator character for the prefix.

I see that you used "-" as a separator in the examples. I feel like this is quite confusing (especially when reading this document) because to me it isn't exactly clear, what's part of the prefix name and what's the actual key when there are multiple "-" in the key or (e.g. "GAIA-X4PLC-AAD-hdmap-actual-key"). Something like a "." would probably make it more obvious what the actual prefix is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have any opinion about this. @jdsika had the idea to add prefixes, so lets see what he suggests.

doc/architecture/trace_file_mcap_format.adoc Outdated Show resolved Hide resolved
|An optional prefix which may be used to specify the type of scenario (e.g. `cut-in`) or uniqueness of the setup (e.g. `target-5m`). May not contain any `_` characters.

|opt. timestamp
|Defines the absolute start time for a scenario or recording. If following the recommended zero time for the timestamps of the top-level messages, this time must represent the zero time. The format must adhere to ISO 8601 [cite:iso8601].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual timestamp format (including timezone information) should be specified. Probably the same as in OSI file naming convention.

Copy link
Author

@TimmRuppert TimmRuppert Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my limited understanding of the ISO 8601, the .osi and .txth spec is not stating a format. I just provides the example 20210818T150542Z and refers to ISO 8601

Something like YYYYMMDDThhmmssZ would be a valid format with respect to ISO 8601 but YYYY-MM-DD-HH as well right right?

I would like to add the format YYYYMMDDThhmmssZ and the mention that it must be in UTC (not local, due to the Z) to the .osi/.txth trace file naming convention but this assumes that generalizing the example is not considered a breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming convention even states that the recommended format is YYYMMDDThhmmssZ (even though with exemplary numbers). I would just add the format specification and maybe even consider decimal places. The recommended format remains the same as far as I'm concerned.

doc/architecture/trace_file_mcap_format.adoc Outdated Show resolved Hide resolved
|opt. prefix
|An optional prefix which may be used to specify the type of scenario (e.g. `cut-in`) or uniqueness of the setup (e.g. `target-5m`). May not contain any `_` characters.

|opt. timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should state that the timestamp can only exceptionally be omitted when there really is no reference to a global time in the file?

I think if you have a real-world capture or any other trace file that has any meaningful relation to a global time frame, it should be visible in the filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion the meaningful relation to a global time is the exception here, since simulation should be the "normal" use-case rather than measurements.

doc/architecture/trace_file_mcap_format.adoc Outdated Show resolved Hide resolved
- Must include metadata with the name `asam_osi` containing at least the following key-value pairs:
** `zero_time`: ISO 8601 YYYYMMDDThhmmss.fTZD formatted point in time representing the zero time of the scenario
** `timestamp`: ISO 8601 YYYYMMDDThhmmss.fTZD formatted creation time of the file
- It is strongly recommended to include metadata with the name `asam_osi` containing the following key-value pairs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be two "asam_osi" metadata records (see two lines above) or should all the metadata fields be in one metadata record?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't know if it is technically possible to have two metadata records with the same name (I assume it might be), it would make more sense to have all of that in one record. Considering your proposal to always speak of "a metadata record" in this context (see other comment) it should be clearer once this has been added.

Binary trace file.
Messages are separated by a length specification before each message.
The length is represented by a four-byte, little-endian, unsigned integer.
The length does not include the integer itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The information of this line got lost.

@pmai
Copy link
Contributor

pmai commented Nov 24, 2024

I have taken the liberty to morph the current state into something that tries to be a more precise and normative specification while trying to be more minimalist in what it touches, and leaves other topics (like changes in the naming convention, the specific mapping of non-OSI meta-data into meta-data records) for either separate PRs or other layered specifications: #841.

It also tries to make the spec more robust, by specifying more explicitly how to use MCAP elements (e.g. placement of records, chunking, ...).

The other major change is that it now recommends making log_time and published_time identical, if no specific reasons would speak to making them differ, as this enables much better use of MCAP indexing facilities to do random access of traces.

People who want to replay while keeping jitter of their middleware (due to asynchronous communication) intact can still do so, but more sane use cases that abstract away middleware jitter (or are synchronous in nature) can still reap the benefits of the MCAP format index machinery (one might suggest to MCAP that they might like to add indexing on published_time to enable both use cases at the same time, but that's a different story).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Everything which impacts the quality of the documentation and guidelines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MCAP example for OSI tracefiles
5 participants