Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal for mid based signalling #943

Draft
wants to merge 12 commits into
base: full-mesh
Choose a base branch
from
Draft

Conversation

dbkr
Copy link
Member

@dbkr dbkr commented Mar 3, 2023

No description provided.

Copy link
Contributor

@SimonBrandner SimonBrandner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really does seem to look great apart of some nits. The only thing I think we should think deeper about is the depth :D

Comment on lines 26 to 37
"m.negotiate": {
"version": 1,
"call_id": "35657a5b793ce",
"conf_id": "bbe53499f82e3",
"invitee": "@bob:example.org",
"lifetime": 60000,
"party_id": "123456",
"description": {
"type": "offer",
"sdp": "[...]",
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should also be split up into several pieces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in split the common call stuff out from the actual negotiation? Interesting idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well m.call.candidates would still include some of this, so it would make sense to have that in a common mixin

doc/mid-based-signalling.md Outdated Show resolved Hide resolved
"m.tracks.describe": {
"1": { // transceiver mid 1
"media_uuid": "aaaa-aaaa-aaaaaa-aaaa-aaaa",
"media_group_uuid": "1234-1234-123456-1234-1234", // rather than 'track group ID' to match media UUID?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense this way ✔️

"version": 1,
"call_id": "35657a5b793ce",
"conf_id": "bbe53499f82e3",
"invitee": "@bob:example.org",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of this field? Isn't it clear who the recipient is if this is to-device

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's really just consistency with 1:1 calls, although if we're essentially deprecating them then maybe we shouldn't worry about it.

Comment on lines +147 to +149
XXX: How flat vs deep do we want the structure to be here? I've done it quite deep here,
organised by user ID / device ID / media group UUID, but they could also just be a flat
list of tracks. It would be more duplication but maybe less effort to read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be useful to think about how would we actually implement this and how it fits in with the current implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've expanded on this a bit in 457578a, was this the sort of thing you were thinking about?

Comment on lines 182 to 184
This has also been rearrnaged a little to make the media UUIDs the keys and remove the
unsubscribe section which is unnecessary if we always send the complete set of tracks we
want to receive (we unsubscribe by just removing the media UUID from the dict).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure we actually wanted to avoid this to keep the messages short. Perhaps we could use null to say we want to unsubscribe. We could use a similar technique for unpublishing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe I am barking up the wrong tree with this and it should just be the deltas each time, especially if we have sequence numbers to ensure we don't lose events.

unsubscribe section which is unnecessary if we always send the complete set of tracks we
want to receive (we unsubscribe by just removing the media UUID from the dict).

This also now contains a sequence number, so the focus can reply with a an ack:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all events now have a sequence number? What does the number mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add feedback to track subscriptions with seqnums so every response can be matched to a request

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I've added a bit more detail in the doc.

doc/mid-based-signalling.md Outdated Show resolved Hide resolved
"m.call.transferee": false,
"m.call.dtmf": false,
},
"m.tracks.advertise": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m.call?

Comment on lines +195 to +199
"aaaa-aaaa-aaaaaa-aaaa-aaaa": {
"width": 1024,
"height": 576,
},
"bbbb-bbbb-bbbbbb-bbbb-bbbb": {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For pull-based media in the future, this is going to need to include the full path (e.g. userId/deviceId/mediaGroupUUID/mediaUUID)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why would this be different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that the focus knows from where to request the track in case it can't read state

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, yes, although in this case the address would also need to include the SFU that the user was publishing to? I wonder if this should look more like:

media: [
  {
    "media_uuid": "bbbb-bbbb-bbbbbb-bbbb-bbbb",
    "user_id": "@alice:example.org",
    "device_id": "aaaaaa",
    "sfu_user_id": "@sfu:example.org",
    "sfu_device_id": "sfusfusfu",
  }
]

@@ -0,0 +1,276 @@
# MID Based Signalling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be nice opportunity to fix conf vs call id somehow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. afaik its just the MSC that is confused somewhere, but note to check.

doc/mid-based-signalling.md Outdated Show resolved Hide resolved
doc/mid-based-signalling.md Outdated Show resolved Hide resolved

m.call.subscribe
```
"m.call.subscribe": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"m.call.subscribe": {
"m.call.subscribe_to_media": {

Perhaps this for clarity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it m.call.subscribe_media as I think it's fine as an abbreviation for consistency.

Comment on lines 129 to 143
"m.call.advertise": {
"alice:example.org": { // user ID
"88888888": { // device ID
"2345-2345-234567-2345-2345": [{ // media group uuid
"media_uuid": "aaaa-aaaa-aaaaaa-aaaa-aaaa":
"purpose": "m.usermedia",
"kind": "video",
}, {
"media_uuid": "bbbb-bbbb-bbbbbb-bbbb-bbbb":
"purpose": "m.usermedia",
"kind": "audio",
},
},
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the js-sdk class structure POV, it would make much more sense for this to be structured the same way metadata is atm (user_id and device_id being props of media group uuid) since the individual CallFeeds (media groups) have the userIds and deviceIds as props, so the mapping makes much more sense that way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically remove the user ID & device ID levels and move them to attributes of the media group ID? It would be towards the flatter end of possible structures and maybe a nice compromise between depth & duplication. I'm not sure we should necessarily be basing the choices on what the js-sdk does directly, but if the js-sdk has a god reason for doing it that way then that's certainly valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I mean. I'd say the js-sdk doesn't have a god reason for this since I was the one who wrote and I haven't gone mad enough to consider myself a diety (yet) :D But it does have a good reason - I think it makes sense in terms of media groups being what you display in the UI (but I wrote it so I am biased)

(sorry, I sometimes can't help myself 😅 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like c62296c?

"m.call.transferee": false,
"m.call.dtmf": false,
},
"m.call.describe": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question we should ask ourselves is if we'd ever want to have metadata for media groups too which would then mean it would make sense to have this structured in a way similar to the current metadata structure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we do this in the advertise message though? That's where the bulk of the metadata should be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we might (or not) have some fast-changing metadata on the media group level, so it would be useful for that case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I see. My gut feeling is that the media groups would probably be purely a way to associate two tracks together as a pair (or group) and deliberately not add any further data, but perhaps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, that does make sense, but perhaps, not sure

media UUID, perhaps?)

If the focus needs to renegotiate to send the tracks, it does so, describing the media UUIDs it intends to send on the
transceivers once the negotiation is complete:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, thats the transceiver id's (mid's) from the Focus side. Right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be the same, no? Since the media lines must be the same on each side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the media lines are bidirectional and correlate to the remote side.

What i mean is, whoever initiates the offer determines what is sent over the mid-line. When Focus only receives tracks via a connection, the Focus will never have a reason for start a media renegotiation, because on their side the media will never change.

In case the Focus start a renegotiation and creates an offer, then the focus sending media as well. In this case the Focus defines, what is sending over the media lines. But this means the client will receive a track from another peer.

If the client not sending media to the Focus the Focus starts with 0 media line. If the client already sending to the Focus the Focus could use as well the 0 media line. But we decide in this case the Focus will use a new Transceiver.

Long story short: The media lines alone define not what will sending over it. I know we want use pre defined Transceiver and reserve the 0-media lines and 1-media lines for Client sending medias. However, I have still my doubts that this is a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. From my PoV, the three different scenarios work like this:

If Alice (the client) isn't sending any media to the focus, the focus starts sending media to her and starts a renegotiation to add a transceiver, which becomes mid 0. It includes an m.call.describe_media saying that the media on mid 0 is Bob's media.

If Alice is already sending media, the focus renegotiates but decides to add a new transceiver. This becomes mid 2, so its m.call.describe_media informs the client that mid 2 contains Bob's media. There is no entry for mid 0 or 1 because the focus isn't sending anything on those transceivers, it's only receiving.

If Alice is already sending media and the focus decides to re-use one of those transceivers, its m.call.describe_media informs the client that Bob's media will arrive on mid 0. Alice will re-send her own m.call.describe_media in her answer re-iterating that her own media is being sent on mid 0 since she's still sending it in the other direction. The focus's m.call.describe_media describes what's flowing on the transceivers in the focus -> client direction and the client's m.call.describe_media describes what's going client -> focus.

then look very similar to the structure of the `m.call.advertise` event. It could either keep
a reference to the transceiver it was receiving media on in the structure itself alongside
the media UUID, or maintain a separate map of media UUID to transceiver / peer connection such
that the first structure could be marshalled to JSON and sent to clients as-is.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SFU do not need to save and maintain the map of media UUID to transceiver / peer connection. The mid's are only needed to identify the tracks inside the SDP. If the connection is negotiated, they are no longer required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather, the SFU needs to know which tracks belong to which UUID (send and receive).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it need to know what media UUID corresponds to what it's sending in each case: whether it does that by mapping them to mid or track ID or whatever is up to the implementation - is that what you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly, that is what I meant. I only just wanted to pointed out that's MID's quite unstable and only needed to init this process. And we should not based on it in such whatever implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood – hopefully c06d3d4 clarifies this a bit.

doc/mid-based-signalling.md Outdated Show resolved Hide resolved
doc/mid-based-signalling.md Outdated Show resolved Hide resolved
Comment on lines +173 to +174
The `select_answer` is also tweaked to be more extensible-event like although is essentially
the same:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a dumb question, but do we need a select_answer still? It seems like after exchanging the SDP offer and SDP answer, we could establish a connection and start streaming.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did wonder whether to remove it for group calls, but I think I'm inclined to keep it for consistency. The connection doesn't need to be blocked on the select_answer though, we can still start streaming after just the offer & answer.

doc/mid-based-signalling.md Outdated Show resolved Hide resolved
Comment on lines +208 to +210
This also now contains a sequence number. This is a monotonically increasing integer, starting
at 0 and scoped to the lifetime of the peer connection. The focus will send a reply containing
this sequence number to acknowledge that it has processed the message. This can be a positive ack:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🚀

Btw, do we want to use a sequence number or rather some sort of a randomly generated transaction ID?

I know that the advantage of a sequence number is that it still can be used as a transaction ID and we probably need it since To-Device messages are not ordered. The only problem that I can think of though is that what if one party receives a message with a sequence number 10000 after a message with a sequence number 5? What's the behavior? Should we drop the 10000 message or should we buffer it and process it once the previous message arrives (this opens up a possible attack vector of sending 1, 2, 3, 10000000 and causing the receiving client to deplete their own buffer)?

Also, what to do when the SFU gets restarted and does not know the last sent seq? - I suppose this is solved by "scoped to the lifetime of the peer connection", but I wonder if we need to have seq for all events (negotiate events could also fail).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, seqnums allow for a little more flexibility but we wouldn't need the re-ordering capabilities if we're using a channel that guarantees ordering which the WebRTC DC will give us. A randomly generated ID could avoid clients thinking they have to do reordering and they wouldn't have to worry about masking them monotonically increasing.

Co-authored-by: Daniel Abramov <[email protected]>
dbkr and others added 3 commits March 8, 2023 11:21
Co-authored-by: Daniel Abramov <[email protected]>
Co-authored-by: Daniel Abramov <[email protected]>
@CLAassistant
Copy link

CLAassistant commented Sep 6, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants