-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proposal for mid based signalling #943
base: full-mesh
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really does seem to look great apart of some nits. The only thing I think we should think deeper about is the depth :D
doc/mid-based-signalling.md
Outdated
"m.negotiate": { | ||
"version": 1, | ||
"call_id": "35657a5b793ce", | ||
"conf_id": "bbe53499f82e3", | ||
"invitee": "@bob:example.org", | ||
"lifetime": 60000, | ||
"party_id": "123456", | ||
"description": { | ||
"type": "offer", | ||
"sdp": "[...]", | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should also be split up into several pieces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in split the common call stuff out from the actual negotiation? Interesting idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well m.call.candidates
would still include some of this, so it would make sense to have that in a common mixin
"m.tracks.describe": { | ||
"1": { // transceiver mid 1 | ||
"media_uuid": "aaaa-aaaa-aaaaaa-aaaa-aaaa", | ||
"media_group_uuid": "1234-1234-123456-1234-1234", // rather than 'track group ID' to match media UUID? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense this way ✔️
"version": 1, | ||
"call_id": "35657a5b793ce", | ||
"conf_id": "bbe53499f82e3", | ||
"invitee": "@bob:example.org", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this field? Isn't it clear who the recipient is if this is to-device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's really just consistency with 1:1 calls, although if we're essentially deprecating them then maybe we shouldn't worry about it.
XXX: How flat vs deep do we want the structure to be here? I've done it quite deep here, | ||
organised by user ID / device ID / media group UUID, but they could also just be a flat | ||
list of tracks. It would be more duplication but maybe less effort to read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be useful to think about how would we actually implement this and how it fits in with the current implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've expanded on this a bit in 457578a, was this the sort of thing you were thinking about?
doc/mid-based-signalling.md
Outdated
This has also been rearrnaged a little to make the media UUIDs the keys and remove the | ||
unsubscribe section which is unnecessary if we always send the complete set of tracks we | ||
want to receive (we unsubscribe by just removing the media UUID from the dict). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure we actually wanted to avoid this to keep the messages short. Perhaps we could use null
to say we want to unsubscribe. We could use a similar technique for unpublishing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe I am barking up the wrong tree with this and it should just be the deltas each time, especially if we have sequence numbers to ensure we don't lose events.
doc/mid-based-signalling.md
Outdated
unsubscribe section which is unnecessary if we always send the complete set of tracks we | ||
want to receive (we unsubscribe by just removing the media UUID from the dict). | ||
|
||
This also now contains a sequence number, so the focus can reply with a an ack: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all events now have a sequence number? What does the number mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add feedback to track subscriptions with seqnums so every response can be matched to a request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I've added a bit more detail in the doc.
doc/mid-based-signalling.md
Outdated
"m.call.transferee": false, | ||
"m.call.dtmf": false, | ||
}, | ||
"m.tracks.advertise": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m.call
?
"aaaa-aaaa-aaaaaa-aaaa-aaaa": { | ||
"width": 1024, | ||
"height": 576, | ||
}, | ||
"bbbb-bbbb-bbbbbb-bbbb-bbbb": {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pull-based media in the future, this is going to need to include the full path (e.g. userId/deviceId/mediaGroupUUID/mediaUUID
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why would this be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that the focus knows from where to request the track in case it can't read state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, yes, although in this case the address would also need to include the SFU that the user was publishing to? I wonder if this should look more like:
media: [
{
"media_uuid": "bbbb-bbbb-bbbbbb-bbbb-bbbb",
"user_id": "@alice:example.org",
"device_id": "aaaaaa",
"sfu_user_id": "@sfu:example.org",
"sfu_device_id": "sfusfusfu",
}
]
@@ -0,0 +1,276 @@ | |||
# MID Based Signalling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be nice opportunity to fix conf vs call id somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. afaik its just the MSC that is confused somewhere, but note to check.
doc/mid-based-signalling.md
Outdated
|
||
m.call.subscribe | ||
``` | ||
"m.call.subscribe": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"m.call.subscribe": { | |
"m.call.subscribe_to_media": { |
Perhaps this for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it m.call.subscribe_media
as I think it's fine as an abbreviation for consistency.
doc/mid-based-signalling.md
Outdated
"m.call.advertise": { | ||
"alice:example.org": { // user ID | ||
"88888888": { // device ID | ||
"2345-2345-234567-2345-2345": [{ // media group uuid | ||
"media_uuid": "aaaa-aaaa-aaaaaa-aaaa-aaaa": | ||
"purpose": "m.usermedia", | ||
"kind": "video", | ||
}, { | ||
"media_uuid": "bbbb-bbbb-bbbbbb-bbbb-bbbb": | ||
"purpose": "m.usermedia", | ||
"kind": "audio", | ||
}, | ||
}, | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the js-sdk class structure POV, it would make much more sense for this to be structured the same way metadata is atm (user_id
and device_id
being props of media group uuid) since the individual CallFeed
s (media groups) have the userId
s and deviceId
s as props, so the mapping makes much more sense that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically remove the user ID & device ID levels and move them to attributes of the media group ID? It would be towards the flatter end of possible structures and maybe a nice compromise between depth & duplication. I'm not sure we should necessarily be basing the choices on what the js-sdk does directly, but if the js-sdk has a god reason for doing it that way then that's certainly valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I mean. I'd say the js-sdk doesn't have a god reason for this since I was the one who wrote and I haven't gone mad enough to consider myself a diety (yet) :D But it does have a good reason - I think it makes sense in terms of media groups being what you display in the UI (but I wrote it so I am biased)
(sorry, I sometimes can't help myself 😅 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So something like c62296c?
doc/mid-based-signalling.md
Outdated
"m.call.transferee": false, | ||
"m.call.dtmf": false, | ||
}, | ||
"m.call.describe": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question we should ask ourselves is if we'd ever want to have metadata for media groups too which would then mean it would make sense to have this structured in a way similar to the current metadata structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we do this in the advertise message though? That's where the bulk of the metadata should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we might (or not) have some fast-changing metadata on the media group level, so it would be useful for that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I see. My gut feeling is that the media groups would probably be purely a way to associate two tracks together as a pair (or group) and deliberately not add any further data, but perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, that does make sense, but perhaps, not sure
media UUID, perhaps?) | ||
|
||
If the focus needs to renegotiate to send the tracks, it does so, describing the media UUIDs it intends to send on the | ||
transceivers once the negotiation is complete: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, thats the transceiver id's (mid's) from the Focus side. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They should be the same, no? Since the media lines must be the same on each side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the media lines are bidirectional and correlate to the remote side.
What i mean is, whoever initiates the offer determines what is sent over the mid-line. When Focus only receives tracks via a connection, the Focus will never have a reason for start a media renegotiation, because on their side the media will never change.
In case the Focus start a renegotiation and creates an offer, then the focus sending media as well. In this case the Focus defines, what is sending over the media lines. But this means the client will receive a track from another peer.
If the client not sending media to the Focus the Focus starts with 0 media line. If the client already sending to the Focus the Focus could use as well the 0 media line. But we decide in this case the Focus will use a new Transceiver.
Long story short: The media lines alone define not what will sending over it. I know we want use pre defined Transceiver and reserve the 0-media lines and 1-media lines for Client sending medias. However, I have still my doubts that this is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand. From my PoV, the three different scenarios work like this:
If Alice (the client) isn't sending any media to the focus, the focus starts sending media to her and starts a renegotiation to add a transceiver, which becomes mid 0. It includes an m.call.describe_media
saying that the media on mid 0 is Bob's media.
If Alice is already sending media, the focus renegotiates but decides to add a new transceiver. This becomes mid 2, so its m.call.describe_media
informs the client that mid 2 contains Bob's media. There is no entry for mid 0 or 1 because the focus isn't sending anything on those transceivers, it's only receiving.
If Alice is already sending media and the focus decides to re-use one of those transceivers, its m.call.describe_media
informs the client that Bob's media will arrive on mid 0. Alice will re-send her own m.call.describe_media
in her answer re-iterating that her own media is being sent on mid 0 since she's still sending it in the other direction. The focus's m.call.describe_media
describes what's flowing on the transceivers in the focus -> client direction and the client's m.call.describe_media
describes what's going client -> focus.
doc/mid-based-signalling.md
Outdated
then look very similar to the structure of the `m.call.advertise` event. It could either keep | ||
a reference to the transceiver it was receiving media on in the structure itself alongside | ||
the media UUID, or maintain a separate map of media UUID to transceiver / peer connection such | ||
that the first structure could be marshalled to JSON and sent to clients as-is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the SFU do not need to save and maintain the map of media UUID to transceiver / peer connection. The mid's are only needed to identify the tracks inside the SDP. If the connection is negotiated, they are no longer required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather, the SFU needs to know which tracks belong to which UUID (send and receive).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it need to know what media UUID corresponds to what it's sending in each case: whether it does that by mapping them to mid or track ID or whatever is up to the implementation - is that what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly, that is what I meant. I only just wanted to pointed out that's MID's quite unstable and only needed to init this process. And we should not based on it in such whatever implementation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood – hopefully c06d3d4 clarifies this a bit.
The `select_answer` is also tweaked to be more extensible-event like although is essentially | ||
the same: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is a dumb question, but do we need a select_answer
still? It seems like after exchanging the SDP offer and SDP answer, we could establish a connection and start streaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did wonder whether to remove it for group calls, but I think I'm inclined to keep it for consistency. The connection doesn't need to be blocked on the select_answer though, we can still start streaming after just the offer & answer.
This also now contains a sequence number. This is a monotonically increasing integer, starting | ||
at 0 and scoped to the lifetime of the peer connection. The focus will send a reply containing | ||
this sequence number to acknowledge that it has processed the message. This can be a positive ack: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 🚀
Btw, do we want to use a sequence number or rather some sort of a randomly generated transaction ID?
I know that the advantage of a sequence number is that it still can be used as a transaction ID and we probably need it since To-Device messages are not ordered. The only problem that I can think of though is that what if one party receives a message with a sequence number 10000
after a message with a sequence number 5
? What's the behavior? Should we drop the 10000
message or should we buffer it and process it once the previous message arrives (this opens up a possible attack vector of sending 1, 2, 3, 10000000 and causing the receiving client to deplete their own buffer)?
Also, what to do when the SFU gets restarted and does not know the last sent seq
? - I suppose this is solved by "scoped to the lifetime of the peer connection", but I wonder if we need to have seq
for all events (negotiate events could also fail).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, seqnums allow for a little more flexibility but we wouldn't need the re-ordering capabilities if we're using a channel that guarantees ordering which the WebRTC DC will give us. A randomly generated ID could avoid clients thinking they have to do reordering and they wouldn't have to worry about masking them monotonically increasing.
Co-authored-by: Daniel Abramov <[email protected]>
Co-authored-by: Daniel Abramov <[email protected]>
Co-authored-by: Daniel Abramov <[email protected]>
No description provided.