Add captureTime, receiveTime and rtpTimestamp to VideoFrameMetadata #813

guidou · 2024-07-05T13:10:27Z

These fields are useful for WebRTC-based applications.
See issue #601

guidou · 2024-07-05T14:29:32Z

cc @Djuffin, @padenot , @youennf, @aboba

aboba · 2024-07-08T14:15:21Z

Does this PR imply any behavior in WebCodecs API?

For example, on encoding is there an expectation that VideoFrame.captureTime is copied to EncodedVideoChunk.captureTime? Or on decoding is EncodedVideoChunk.receiveTime or EncodedVideoChunk.rtpMetadata to be copied to VideoFrame.receiveTime or VideoFrame.rtpMetadata?

If there are no changes in behavior (e.g. if the attributes don't affect the encode or decode process or some other aspect of WebCodecs) then the attributes could be defined in another specification where behavior is affected (e.g. mediacapture-transform?), and added to the VideoFrame Metadata Registry.

guidou · 2024-07-08T14:46:48Z

This PR as currently written does not imply any behavior in the WebCodecs API, although I would expect the things you mentioned (e.g., forwarding them to/from EncodedVideoChunk) as potentially useful.

The idea for this PR is to provide information to applications so that they can do similar things to what they can do with requestVideoFrameCallback (e.g., better A/V sync and delay measurements). This doesn't require any other behavior changes in WebCodecs (at least for applications using mediacapture-transform + WebRTC).

guidou · 2024-07-08T14:48:15Z

I think we can specify forwarding to EncodedVideoChunk in a separate PR since this one has value on its own without specifying further changes to WebCodecs.

Djuffin · 2024-07-08T20:47:52Z

I used to be skeptical about these timestamps since they are not passed through the encoding-decoding cycle, but since we already have entries in VideoFrame Metadata Registry that don't do that, I think it's okay now.

And RTC software like Teams, Mean and Facetime can really use it for A/V sync and latency estimation, even if they have to pass this information via separate channels. So LGTM

aboba · 2024-07-08T21:52:49Z

I agree that this metadata is useful. The question is whether behavior is well specified, so that interop is possible. For example, there is the question of where the metadata orginates:

MAY/SHOULD/MUST the MediaStreamTrackProcessor method provide VideoFrame.captureTimestamp if the MST is obtained from a local capture?
MAY/SHOULD/MUST the MediaStreamTrackProcessor method provide VideoFrame.receiveTimestamp and VideoFrame.rtpMetadata if the MST is obtained remotely via WebRTC-PC?

Djuffin · 2024-07-08T23:37:39Z

I thought for all metadata entries the answer to these questions is MAY.

aboba · 2024-07-09T01:01:48Z

@Djuffin MAY might be ok for these metadata fields. However, is alignment of VideoFrame.timestamp and EncodedVideoChunk.timestamp optional for WebCodecs implementations?

youennf · 2024-07-09T08:16:14Z

I thought for all metadata entries the answer to these questions is MAY.

I agree from a WebCodecs POV.
But it is not sufficient from an interop point of view.
Probably each spec defining a MST video source should describe which metadata it generates, just like each spec defines which constraints are supported by a given source.
Putting the definition at the source ensures the same metadata is exposed via MSTP or via VideoFrame constructor (from a video element).

That would mean mediacapture-main and webrtc-pc here.
As of mediacapture-transform VideoTrackGenerator, nothing seems needed though we could add a note stating that metadata are preserved.

guidou · 2024-07-09T14:41:01Z

FWIW, the requestVideoFrameCallback spec where these fields are originally defined say that captureTime applies to local cameras and remote frames (WebRTC), receiveTime to WebRTC frames, and rtpTimestamp to WebRTC frames. But I agree with @youennf that having each MST source spec indicate the metadata it generates is the best way to organize that.

In any case, we need to have entries for these fields in the VideoFrameMetadata registry.

chrisn · 2024-07-09T15:14:09Z

Media WG meets today, please add agenda label if you'd like to discuss.

Djuffin · 2024-07-09T21:10:30Z

@Djuffin MAY might be ok for these metadata fields. However, is VideoFrame.timestamp and EncodedVideoChunk.timestamp optional for WebCodecs implementations?

They're mandatory.

Is there some kind of deep connection here that I miss?

aboba

Changes to VideoFrame registry are ok, but should probably not reference RVFC. receiveTime and rtpMetadata could be defined in WebRTC-PC or WebRTC-Extensions and captureTime could be defined in Media Capture & Streams or Media Capture Extensions.

chrisn · 2024-07-11T10:56:30Z

Minutes from 9 July 2024 Media WG meeting. @aboba summarised the conclusion in #813 (review).

Djuffin · 2024-07-15T22:18:00Z

Summary of WG discussion:
HTMLVideoElement.requestVideoFrameCallback is not the best spec to reference here, because it doesn't describe how and when these timestamps are set.
Corresponding changes need to be made in MediaStreamTrackProcessor and Media Capture and Streams specs. Something along the lines: "MediaStreamTrackProcessor sets capture timestamps for VideoFrames coming from camera..."

Later this PR should reference these specs.

aboba · 2024-09-06T15:34:48Z

"And RTC software like Teams, Mean and Facetime can really use it for A/V sync and latency estimation, even if they have to pass this information via separate channels."

[BA] To do A/V sync, captureTime and receiveTime need to be provided for both audio and video.

Also, if they are to be usable for non-RTP transports, they need to be defined in a way that is independent of RTP/RTCP. For example, on the local peer, captureTime represents the capture time of the first byte according to the local wallclock. On a remote peer, captureTime is set by the receiver. For example, the local peer's captureTime can be serialized on the wire and then set on the receiver (e.g. not adjusted to the receiver wallclock). receiveTime is set on the receiver, based on the receiver's wallclock. (receiveTime - captureTime) can then be used to estimate the sender/receiver offset as well as Jitter.

Partly addresses w3c/webcodecs#813 (review).

aboba · 2024-09-13T16:16:15Z

video_frame_metadata_registry.src.html

@@ -61,6 +61,18 @@
    <td>segments</td>
    <td>[Human face segmentation](https://w3c.github.io/mediacapture-extensions/#human-face-segmentation)</td>
  </tr>
+  <tr>
+    <td>captureTime</td>
+    <td>[Capture time](https://wicg.github.io/video-rvfc/#dom-videoframecallbackmetadata-capturetime)</td>


The RVFC text is too RTP centric to be used here. I'd copy over the text and make some changes. For a WebCodecs application, captureTime can be serialized on the wire and set by a WebCodecs application for frames received from a remote peer. Also, NTP timestamp format does not imply a global clock so change “estimated using clock synchronization” to “aligned to the sender wallclock”.

The plan is to

Define what the concept "capture time" is in mediacapture-extensions, and define "remote capture time" over at webrtc-extensions (this would refer to the new mediacapture-extensions "capture time" concept plus text copied from rVFC to describe estimation).

Refer to these new definitions here (local tracks: look here; remote webrtc tracks: look here) instead of the rVFC reference, but use DOMHighResTimestamp relative to local Performance.timeOrigin.

That would leave captureTime unset otherwise. A webcodecs app could use whichever out-of-band techniques to compute & set a valid local DOMHighResTimestamp. Does it need to be mentioned in the spec?

The PR is now updated to reference mediacapture-extensions, where these concepts are now defined.

aboba · 2024-09-13T16:17:18Z

video_frame_metadata_registry.src.html

+  </tr>
+  <tr>
+    <td>receiveTime</td>
+    <td>[Receive time](https://wicg.github.io/video-rvfc/#dom-videoframecallbackmetadata-receivetime)</td>


Comment: RVFC text is also too WebRTC-centric. Can we allow this to be present in a WebCodecs application as well (set by the receiver)?

The plan is to describe this as "set for remote webrtc tracks" and use DOMHighResTimestamp relative to local Performance.timeOrigin. I agree it should be settable by a WebCodecs application.

aboba · 2024-09-13T16:19:30Z

video_frame_metadata_registry.src.html

+  </tr>
+  <tr>
+    <td>rtpTimestamp</td>
+    <td>[RTP timestamp](https://wicg.github.io/video-rvfc/#dom-videoframecallbackmetadata-rtptimestamp)</td>


The RVFC text is ok here but I'd still probaby copy it over rather than referencing it.

I planned to define what a "rtp timestamp" is over at webrtc-extensions and simply define here that it's present for "remote webrtc tracks" together with a reference.

Replaced with the reference to mediacapture-extensions.
Did not copy text to follow the format for Human face segmentation

guidou · 2024-11-12T14:21:48Z

This PR has been updated to reference mediacapture-extensions where these concepts are now properly defined (similar to human face segmentation).

aboba · 2024-12-05T00:05:58Z

@guidou Does this PR also resolve #599 ?

SHA: 41636a6 Reason: push, by Djuffin Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Add captureTime, receiveTime and rtpMetadata to VideoFrameMetadata

250fe3a

aboba requested review from Djuffin and padenot July 8, 2024 19:03

Djuffin approved these changes Jul 8, 2024

View reviewed changes

aboba added the agenda Add to Media WG call agenda label Jul 9, 2024

aboba requested changes Jul 10, 2024

View reviewed changes

handellm added a commit to handellm/mediacapture-extensions that referenced this pull request Sep 9, 2024

Define captureTime.

6f40605

Partly addresses w3c/webcodecs#813 (review).

handellm mentioned this pull request Sep 11, 2024

Capture, receive, and RTP timestamp concept definitions & normative requirements for gUM/gDM w3c/mediacapture-extensions#156

Merged

aboba requested changes Sep 13, 2024

View reviewed changes

This was referenced Sep 20, 2024

Add WebRTC-specific interactions with capture/receive/RTP timestamps w3c/webrtc-extensions#224

Open

Add interactions with capture/presentation/receive/RTP timestamps w3c/mediacapture-transform#112

Draft

Update references to mediacapture-extensions.

a74786c

Update references

3a88a39

guidou requested a review from aboba November 18, 2024 09:41

aboba approved these changes Dec 4, 2024

View reviewed changes

This was referenced Dec 5, 2024

Expose in VideoFrameMetadata some fields from VideoFrameCallbackMetadata #601

Closed

Relationship between VideoFrameMetadata and VideoFrameCallbackMetadata #599

Open

guidou changed the title ~~Add captureTime, receiveTime and rtpMetadata to VideoFrameMetadata~~ Add captureTime, receiveTime and rtpTimestamp to VideoFrameMetadata Dec 5, 2024

guidou mentioned this pull request Dec 5, 2024

Should there be an AudioDataMetadata? #855

Open

Djuffin merged commit 41636a6 into w3c:main Dec 10, 2024
2 checks passed

github-actions bot added a commit that referenced this pull request Dec 10, 2024

Merge pull request #813 from guidou/guidou/capture-time-metadata

d24c667

SHA: 41636a6 Reason: push, by Djuffin Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add captureTime, receiveTime and rtpTimestamp to VideoFrameMetadata #813

Add captureTime, receiveTime and rtpTimestamp to VideoFrameMetadata #813

guidou commented Jul 5, 2024 •

edited

Loading

guidou commented Jul 5, 2024

aboba commented Jul 8, 2024 •

edited

Loading

guidou commented Jul 8, 2024

guidou commented Jul 8, 2024

Djuffin commented Jul 8, 2024 •

edited

Loading

aboba commented Jul 8, 2024 •

edited

Loading

Djuffin commented Jul 8, 2024

aboba commented Jul 9, 2024 •

edited

Loading

youennf commented Jul 9, 2024

guidou commented Jul 9, 2024

chrisn commented Jul 9, 2024

Djuffin commented Jul 9, 2024

aboba left a comment

chrisn commented Jul 11, 2024

Djuffin commented Jul 15, 2024

aboba commented Sep 6, 2024

aboba Sep 13, 2024 •

edited

Loading

handellm Sep 13, 2024

guidou Nov 12, 2024

aboba Sep 13, 2024 •

edited

Loading

handellm Sep 13, 2024

aboba Sep 13, 2024

handellm Sep 13, 2024

guidou Nov 14, 2024

guidou commented Nov 12, 2024

aboba commented Dec 5, 2024

Add captureTime, receiveTime and rtpTimestamp to VideoFrameMetadata #813

Add captureTime, receiveTime and rtpTimestamp to VideoFrameMetadata #813

Conversation

guidou commented Jul 5, 2024 • edited Loading

guidou commented Jul 5, 2024

aboba commented Jul 8, 2024 • edited Loading

guidou commented Jul 8, 2024

guidou commented Jul 8, 2024

Djuffin commented Jul 8, 2024 • edited Loading

aboba commented Jul 8, 2024 • edited Loading

Djuffin commented Jul 8, 2024

aboba commented Jul 9, 2024 • edited Loading

youennf commented Jul 9, 2024

guidou commented Jul 9, 2024

chrisn commented Jul 9, 2024

Djuffin commented Jul 9, 2024

aboba left a comment

Choose a reason for hiding this comment

chrisn commented Jul 11, 2024

Djuffin commented Jul 15, 2024

aboba commented Sep 6, 2024

aboba Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

handellm Sep 13, 2024

Choose a reason for hiding this comment

guidou Nov 12, 2024

Choose a reason for hiding this comment

aboba Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

handellm Sep 13, 2024

Choose a reason for hiding this comment

aboba Sep 13, 2024

Choose a reason for hiding this comment

handellm Sep 13, 2024

Choose a reason for hiding this comment

guidou Nov 14, 2024

Choose a reason for hiding this comment

guidou commented Nov 12, 2024

aboba commented Dec 5, 2024

guidou commented Jul 5, 2024 •

edited

Loading

aboba commented Jul 8, 2024 •

edited

Loading

Djuffin commented Jul 8, 2024 •

edited

Loading

aboba commented Jul 8, 2024 •

edited

Loading

aboba commented Jul 9, 2024 •

edited

Loading

aboba Sep 13, 2024 •

edited

Loading

aboba Sep 13, 2024 •

edited

Loading