Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.x] Remote publishers Switch problem #3444

Closed
brave44 opened this issue Oct 5, 2024 · 11 comments
Closed

[1.x] Remote publishers Switch problem #3444

brave44 opened this issue Oct 5, 2024 · 11 comments
Labels
multistream Related to Janus 1.x pr-exists

Comments

@brave44
Copy link
Contributor

brave44 commented Oct 5, 2024

What version of Janus is this happening on?
1.3

Have you tested a more recent version of Janus too?
Tested on latest version.

Additional context
Switches for remote publishers leads to lost packets. Additional info here: https://janus.discourse.group/t/remote-publishers-switch-problem/1319

@brave44 brave44 added the multistream Related to Janus 1.x label Oct 5, 2024
@lminiero
Copy link
Member

lminiero commented Oct 5, 2024

I gave it some thought yesterday and I think it's due to the fact that all remote publishers use the same SSRCs, since a simple math is used to then figure out which packet belongs to which m-line: as a result, a switch would not recognize the source change since the SSRC is the same, and the different sequence number/timestamp space would lead to the SRTP errors. I'll have to figure out if that's indeed the cause by looking at the code, though, and I'll try and do that on Monday.

@spscream
Copy link
Contributor

spscream commented Oct 8, 2024

@lminiero any updates? we could provide any info to help fix this

@lminiero
Copy link
Member

lminiero commented Oct 8, 2024

Looking at the code it's very likely what I wrote in my previous post: when you use publish_remotely, all forwarders are created using REMOTE_PUBLISHER_BASE_SSRC (1000) as a basis, which means the forwarder for m-line 0 will have SSRC 1000, m-line 1 1010, m-line 2 1020, and so on. The monotonic steps are left for simulcast (e.g., for m-line 1, base layer will have SSRC 1010, mid layer 1011, high layer 1012). This allows the Janus instance receiving all packets for that publisher on the same port to demultiplex traffic and know to which m-line it corresponds.

The problem, now, is that all remote publishers are created the same way and follow the same logic. This means that when you use add_remote_publisher on a receiving Janus instance, both remote user X and remote user Y will have the same SSRC, but their sequence numbers and timestamps will proceed on their own. If you were subscribed to user X and want to switch to user Y and the m-lines you're switching are exactly the same (e.g., switch from m-line 1 of X to m-line 1 of Y), then the SSRC of the new source will be the same, which means our code to update RTP headers won't kick in, and we'll start relaying packets with the same SSRC but completely different sequence numbers and timestamps. This is what causes those SRTP errors, that only "fix" themselves after a while because you need to wait until sequence numbers are outside of the conflict range.

I think there are a couple of ways to deal with this:

  1. Add a new SSRC offset that can be passed to publish_remotely and add_remote_publisher, so that we don't always use that REMOTE_PUBLISHER_BASE_SSRC=1000 offset but we can use different ones for different remote publishers, avoiding that ambiguity. I don't like this option much because it's an API change, that we can make optional but would leave the default if you want to use switching broken.
  2. Keep the APIs exactly as they are (since we only need them for demultiplexing publishers, and each publisher uses a different port so no ambiguity there) but then, before relaying traffic, always apply a different (fixed but random, for each publisher) offset to different remote publishers, so that as long as subscribers are concerned, we won't encounter this issue. This would be my preferred approach: it might still have ultrarare occurrences of this occurring (e.g., if by chance two remote publishers end up with the same offset when we create them) but to be honest the chances of two random 32-bit integers to be the same are quite slim.

I'll start exploring option 2. and I'll prepare a PR when I think I have it. If you have other ideas please feel free to make suggestions.

@spscream
Copy link
Contributor

spscream commented Oct 8, 2024

Maybe make offset based on publisher_id? it's guarantied to be unique, since two publishers can't have the same publisher_id in the same room

@lminiero
Copy link
Member

lminiero commented Oct 8, 2024

publisher_id is 64-bit, SSRCs are 32-bit. There would be a higher chance of conflict due to the truncating. Besides, publisher_id could be a string and not numeric.

I've just prepared a PR on my approach 2. and it seems to be working, pushing it shortly.

@lminiero
Copy link
Member

lminiero commented Oct 8, 2024

@brave44 @spscream please try the PR above.

spscream pushed a commit to spscream/janus-gateway that referenced this issue Oct 8, 2024
@spscream
Copy link
Contributor

spscream commented Oct 8, 2024

we tested it and it doesn't work, video doesn't play at all. In logs we got repeating messages:

2024-10-08T18:51:45.479297758Z Sending PLI to remote publisher
2024-10-08T18:51:45.479592502Z We need a PLI for the simulcast context
2024-10-08T18:51:45.479601404Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.479604865Z Sending PLI to remote publisher
2024-10-08T18:51:45.479619492Z We need a PLI for the simulcast context
2024-10-08T18:51:45.479641382Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.479645377Z Sending PLI to remote publisher
2024-10-08T18:51:45.479716929Z We need a PLI for the simulcast context
2024-10-08T18:51:45.479740414Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.479746909Z Sending PLI to remote publisher
2024-10-08T18:51:45.510889408Z We need a PLI for the simulcast context
2024-10-08T18:51:45.510924284Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.510928833Z Sending PLI to remote publisher
2024-10-08T18:51:45.510932316Z We need a PLI for the simulcast context
2024-10-08T18:51:45.510935332Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.510938605Z Sending PLI to remote publisher
2024-10-08T18:51:45.511267089Z We need a PLI for the simulcast context
2024-10-08T18:51:45.511291079Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.511296365Z Sending PLI to remote publisher
2024-10-08T18:51:45.512013294Z We need a PLI for the simulcast context
2024-10-08T18:51:45.512034895Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.512054355Z Sending PLI to remote publisher
2024-10-08T18:51:45.512245159Z We need a PLI for the simulcast context
2024-10-08T18:51:45.512259467Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.512264938Z Sending PLI to remote publisher
2024-10-08T18:51:45.512376218Z We need a PLI for the simulcast context
2024-10-08T18:51:45.512398128Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.512404247Z Sending PLI to remote publisher
2024-10-08T18:51:45.512589416Z We need a PLI for the simulcast context
2024-10-08T18:51:45.512607716Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.512613581Z Sending PLI to remote publisher
2024-10-08T18:51:45.512983241Z We need a PLI for the simulcast context
2024-10-08T18:51:45.513005351Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.513011450Z Sending PLI to remote publisher
2024-10-08T18:51:45.513231636Z We need a PLI for the simulcast context
2024-10-08T18:51:45.513247277Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.513250933Z Sending PLI to remote publisher
2024-10-08T18:51:45.513409364Z We need a PLI for the simulcast context
2024-10-08T18:51:45.513425803Z Simulcast change sending PLI to 42019930 (#1, 62d02002-425d-5284-a3ed-e185baaec194:bce19e60-5184-54f8-836c-b69e49c32030:feed)
2024-10-08T18:51:45.513429826Z Sending PLI to remote publisher

@lminiero
Copy link
Member

lminiero commented Oct 9, 2024

I had forgotten to add the offset to the SSRCs in the simulcast array too, should work now.

@spscream
Copy link
Contributor

spscream commented Oct 9, 2024

it works perfectly now! thank you!

@brave44
Copy link
Contributor Author

brave44 commented Oct 9, 2024

Yes, switch for remote publishers now works as good as for local, thanks!

@lminiero
Copy link
Member

lminiero commented Oct 9, 2024

Ack, thank you both for the feedback, I'll merge then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multistream Related to Janus 1.x pr-exists
Projects
None yet
Development

No branches or pull requests

3 participants