[SCTP] limit the bytes in the PendingQueue by using a semaphore #367

KillingSpark · 2022-12-15T09:23:59Z

As discussed in #360 the pending queue can grow indefinitly if the sender writes packets faster than the association is able to transmit them.

This PR solves this by enforcing a limit on the pendig queue. This blocks the sender until enough space is free.

codecov · 2022-12-15T09:40:33Z

Codecov Report

Base: 59.97% // Head: 59.87% // Decreases project coverage by -0.09% ⚠️

Coverage data is based on head (ee22bc5) compared to base (225cec0).
Patch coverage: 56.68% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #367      +/-   ##
==========================================
- Coverage   59.97%   59.87%   -0.10%     
==========================================
  Files         504      504              
  Lines       48070    48000      -70     
  Branches    12519    12516       -3     
==========================================
- Hits        28828    28739      -89     
- Misses      10021    10025       +4     
- Partials     9221     9236      +15

Impacted Files	Coverage Δ
sctp/examples/ping.rs	`0.00% <0.00%> (ø)`
sctp/examples/pong.rs	`0.00% <0.00%> (ø)`
sctp/examples/throughput.rs	`0.00% <0.00%> (ø)`
sctp/src/error.rs	`41.66% <ø> (ø)`
sctp/src/queue/queue_test.rs	`65.12% <14.28%> (-3.59%)`	⬇️
.../association_internal/association_internal_test.rs	`60.67% <25.00%> (-0.42%)`	⬇️
sctp/src/stream/stream_test.rs	`49.60% <40.00%> (+0.80%)`	⬆️
sctp/src/queue/pending_queue.rs	`33.33% <60.00%> (-55.24%)`	⬇️
sctp/src/association/association_test.rs	`59.50% <72.22%> (+0.55%)`	⬆️
... and 9 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

KillingSpark · 2022-12-15T09:46:25Z

Converted to draft because this has some weird behaviour on my branch where I have all the performance PRs merged.

It causes the sender to no longer print its Send xxxx pkts lines but there is still throughput even long after it stops printing. I need to investigate what is happening there.

Apart from that this works well. Speeds are good (maybe even a little improved probably because of less pressure on the allocator?). And the backpressure does its thing. The amount of packets sent are close the packets received.

Throughput: 108919170 Bytes/s, 1662 pkts, 1662 loops
Send 1733 pkts
Throughput: 80870190 Bytes/s, 1234 pkts, 1234 loops
Send 1666 pkts
Throughput: 137295825 Bytes/s, 2095 pkts, 2095 loops
Send 2097 pkts
Throughput: 109967730 Bytes/s, 1678 pkts, 1678 loops
Send 1680 pkts
Throughput: 134871030 Bytes/s, 2058 pkts, 2058 loops
Send 1672 pkts
Throughput: 137951175 Bytes/s, 2105 pkts, 2105 loops
Send 2126 pkts
Throughput: 54721725 Bytes/s, 835 pkts, 835 loops
Throughput: 1048560 Bytes/s, 16 pkts, 16 loops
Throughput: 137689035 Bytes/s, 2101 pkts, 2101 loops
Throughput: 139720620 Bytes/s, 2132 pkts, 2132 loops
Throughput: 138606525 Bytes/s, 2115 pkts, 2115 loops
Throughput: 52296930 Bytes/s, 798 pkts, 798 loops
Throughput: 139982760 Bytes/s, 2136 pkts, 2136 loops
Throughput: 113047875 Bytes/s, 1725 pkts, 1725 loops
Throughput: 137099220 Bytes/s, 2092 pkts, 2092 loops
Throughput: 110229870 Bytes/s, 1682 pkts, 1682 loops
Throughput: 138016710 Bytes/s, 2106 pkts, 2106 loops
Throughput: 139524015 Bytes/s, 2129 pkts, 2129 loops
Throughput: 137885640 Bytes/s, 2104 pkts, 2104 loops
Throughput: 139851690 Bytes/s, 2134 pkts, 2134 loops
Throughput: 111737175 Bytes/s, 1705 pkts, 1705 loops
Throughput: 124778640 Bytes/s, 1904 pkts, 1904 loops
Throughput: 138278850 Bytes/s, 2110 pkts, 2110 loops
Throughput: 138082245 Bytes/s, 2107 pkts, 2107 loops
Throughput: 109508985 Bytes/s, 1671 pkts, 1671 loops
Throughput: 136116195 Bytes/s, 2077 pkts, 2077 loops
Throughput: 49675530 Bytes/s, 758 pkts, 758 loops
Throughput: 137361360 Bytes/s, 2096 pkts, 2096 loops
Throughput: 111606105 Bytes/s, 1703 pkts, 1703 loops
Throughput: 137033685 Bytes/s, 2091 pkts, 2091 loops
Throughput: 108591495 Bytes/s, 1657 pkts, 1657 loops
Throughput: 106101165 Bytes/s, 1619 pkts, 1619 loops

Edit: this is even stranger, the receiver too stops the printing at some point (way later) but netstat still shows traffic being delivered and the process is still using the same amount of CPU... wth

sctp/src/queue/pending_queue.rs

k0nserv · 2022-12-15T09:48:51Z

sctp/src/queue/pending_queue.rs

+
+impl PushLimitSemaphore {
+    /// blocks until the credits where sucessfully taken
+    fn aquire(&self, credits: u64) {


This is ultimately called from an async context right? It's not goodo to block in a way that doesn't yield to the tokio runtime. In fact if we manage to block all threads it's possible to dead lock the whole runtime since no forward progress an be made to release any permits.

As far as I can tell the stream presents a sync interface?

Hmmm nope you are right, I was fooled because write, write_sctp, and append are sync code. They are called a lot in async context.

So these should just become async and use the tokio semaphore?

Edit: The tokio semaphore does not really provide the interface we need, they only allow taking/releasing one permit at a time.

Edit2: TIL tokio does not have a CondVar so building the same thing with async primitives isn't easy

The tokio semaphore does not really provide the interface we need, they only allow taking/releasing one permit at a time.

It does? add_permits and acquire_many. acquire_many presumably blocks until there are enough permits, so it can act as a condvar.

.... never mind. I was looking at outdated docs. I'll just update to this.

But it will still be a breaking change to public API, is that ok?

Actually no it doesn't have what we need. It releases the permits as soon as the guard is dropped.

We need to aquire capacity in one place and grant capacity in another place. I don't think that is doable with the API tokio Semaphore presents

Hmm, you can move the permit guard around but perhaps that's a fools errand. Maybe ask in the tokio discord if they have thoughts, otherwise your lock based approach with a tokio::sync::Notify is appropriate

I'll ask on their discord if anyone has a better idea but yeah the Notify based impl would satisfy me too.

I did not see the stange missing prints with the async version so maybe this was actually caused by blocking something... I'll keep it running for a while and see if it occurs again.

In any case what are your thoughts on breaking the API? Do you see a way around it?

I don't care much about breaking because we are still pre 1.0.0 and are finding out the way these things should work to be optimal.

It is interesting that this change would undo work by @melekes in #344 which was quite recent. That's not necessarily wrong though, but it would be good to have @melekes's thoughts too

I don't mind bringing back async if it's needed. I changed it to sync because there were no reason for it to be async.

…one waiting

…in if the pending queue is full

… thing to do

KillingSpark · 2022-12-20T14:09:41Z

Edit 3: Thought it might be nice to not make everyone read my ramblings when they dont have to. I misunderstood the AsyncWrite traits doc language and got confused. You may disregard this comment entirely...

Edit:

Gotta go back on this. Looking at how tokio implements this for their TCP socket they don't seem to care and flush is just a noop. https://docs.rs/tokio/1.23.0/src/tokio/net/tcp/split.rs.html#374-405

This does NOT seem to match with the language in the AsyncWrite trait Attempts to flush the object, ensuring that any buffered data reach their destination.

Edit 2: On second thought, I might have misinterpreted the traits language, flush apparently doesn't need to wait until all data has been received, it just has to make sure it will be received at some point. So as long as the data has been put into the pending queue and the association lives long enough to send all the data we should be fine...?

-- original --

I think the AsyncWriter implementation was and is broken with respect to the poll_flush and poll_shutdown functions.

Flush should wait until there are no more bytes buffered or in flight (which isn't easy to do in the current implementation outside of just busy looping).
shutdown should
1. stream.shutdown(Shutdown::Write) and await the result, so new writes get rejected
2. self.poll_flush() and await the result, so nothing is buffered anymore
3. only now return Poll::Ready(Ok(()))

I'll need your opinion on how to proceed on this. I think it's clear that this is an entire new can of worms that I'd like not to open in this PR. Secondly I don't think this PR makes the situation significantly worse than before. poll_flush is now essentially a no-op, but it didn't do the right thing before this PR anyways.

Honestly I'd recommend just deleting (or disabling) the AsyncWriter impl on the Stream. At least poll_flush() does not do what the trait says it must do. (I think the same goes for the implementation for the datachannel). I'm sorry if this sounds harsh, because it looks like a lot of work went into these, but the current stream just doesn't support a correct implementation of poll_flush.

I thought about how to fix this but it definitely isn't easy because you need to tie the amount of buffered / inflight bytes to some kind of signalling that needs to be reset if new bytes came in before anyone could see the signal.

To summarize the options in order of my preference:

Delete/Disable the AsyncWrite impl for Stream and Datachannel until Stream is able to support a flush and a shutdown
Keep them as is, and maybe put a warning somewhere?
Fix them in this PR

Sorry that this PR is such a mess with regards to the AsyncWriter impl, I am learning about this stuff as I go.

…makeover

melekes · 2022-12-21T07:37:32Z

This limit should probably be configurable by the user right?

I'd recommend to be critical about adding more config options as they tend to grow indefinitely and most users don't change defaults any way.

In this case, if we know upper bound of UDP socket, then can't we deduct the max queue len from it?

KillingSpark · 2022-12-21T08:39:15Z

I'd recommend to be critical about adding more config options as they tend to grow indefinitely and most users don't change defaults any way.

In this case, if we know upper bound of UDP socket, then can't we deduct the max queue len from it?

Agreed. I think finding out the upper limit for the UDP sockets is platform dependent. One can get the current sendbuf size, but I don't know if this is a good value it might be too little for optimal througput. Just playing around with the limit indicates that as little as 128kB as the upper limit seems enough to sustain maximum throughput. MacOS seems to use this value as the send buffer size for TCP according to sysctl -n net.inet.tcp.sendspace.

KillingSpark · 2022-12-21T08:52:12Z

Just realized this has a bug with messages that are bigger than the limit, they will never be able to acquire enough permits.

But the append function is used with this comment // NOTE: append is used here instead of push in order to prevent chunks interlacing. Is this actually important? If not I'd change this to append the chunks in bundles with size <= limit / 2 so smooth operation is possible even in the presence of big packets.

Edit:
https://www.rfc-editor.org/rfc/rfc4960#section-3.3

   When a user message is fragmented into multiple chunks, the TSNs are
   used by the receiver to reassemble the message.  This means that the
   TSNs for each fragment of a fragmented user message MUST be strictly
   sequential.

So, this is important.

melekes · 2022-12-21T09:54:56Z

Is this actually important?

I think if chunks interlace, the reader won't be able to reconstruct the message and will skip it entirely.

… limit

…/webrtc into limit_pending_queue

KillingSpark · 2022-12-21T10:00:23Z

Yeah the rfc states that the fragments of one message must be in sequential TSNs.

Fixed it by serializing the access to the semaphore. This whole thing is kinda wonky now, but I hope my comments in the code explain whats going on

melekes

thanks for your work 👍 while I do agree with the general direction, this PR brings more complexity, which we should try to avoid. A possible solution is to make inner pending queues non blocking, which may require some architecture changes. However, I believe we should still try to seek a solution easier to maintain and understand for future contributors.

sctp/CHANGELOG.md

sctp/src/stream/mod.rs

KillingSpark · 2022-12-21T15:31:03Z

A possible solution is to make inner pending queues non blocking, which may require some architecture changes.

I don't agree. The point of this PR is, that the Association might be sending slower than the application is pushing new data into it.

If this situation exists for a sustained amount of time, the data will accumulate in the pending queue. This leads to more and more memory being allocated and in the end the OOM killer will kick in.

Edit: to make my point clearer: at some point the association needs to apply backpressure to the application. Otherwise at some point of the system the data will accumulate as described above. There might be other ways / better places to apply this concept, but it is necessary IMO.

Edit 2: one example for these situations is the throughput example that started all of these PRs. If this PR is not merged, the throughput example just sends packets as fast as possible which overloads my 16Gb of RAM in seconds causing the OOM killer to kick in.

Co-authored-by: Anton <[email protected]>

melekes · 2022-12-22T08:14:31Z

Edit: to make my point clearer: at some point the association needs to apply backpressure to the application. Otherwise at some point of the system the data will accumulate as described above. There might be other ways / better places to apply this concept, but it is necessary IMO.

I agree ☝️

To make my point more clear 😄 I'd rather see simple bounded channels / queues than the current version with locks & semaphores. But I guess we can refactor later

sctp/src/stream/mod.rs

KillingSpark · 2022-12-22T11:32:34Z

I'd rather see simple bounded channels / queues than the current version with locks & semaphores. But I guess we can refactor later

I don't see how that would work, but if it does it would for sure be better than this weird mutex-semaphore combo.

melekes · 2023-01-02T08:03:03Z

sctp/src/queue/pending_queue.rs

+                let mut ordered_queue = self.ordered_queue.write();
+                ordered_queue.push_back(c);
+            }
+            drop(sem_lock);


are you sure we need to drop lock manually? why it's not dropped automatically?

Nope it's not required, but for functions where the Lock is not needed the whole time I like to do it like this. I know this would be dropped and unlocked at the end of the block anyways but I think it communicates intent better.

If you don't like it, it can be removed.

Especially since the construct with the sem_lock isn't required by any language level checks to actually push to the queue I think it is good to make sure the lock actually lives as long, but not longer than needed.

melekes · 2023-01-02T08:05:41Z

sctp/src/queue/pending_queue.rs

+            // unwrap ok because we never close the semaphore unless we have dropped self
+            permits.unwrap().forget();
+            self.append_unlimited(chunks, total_user_data_len);
+            drop(sem_lock);


melekes · 2023-01-02T08:05:47Z

sctp/src/queue/pending_queue.rs

+            self.queue_len.fetch_add(1, Ordering::SeqCst);
+        }
+
+        drop(sem_lock);


sctp/src/stream/mod.rs

Co-authored-by: Anton <[email protected]>

* add comments for QUEUE_BYTES_LIMIT and QUEUE_APPEND_LARGE * remove unnecessary drop(sem_lock) lock will be dropped automatically * use PADDING_MULTIPLE instead of 16 contract: padding_needed <= PADDING_MULTIPLE Refs #364 and #367

KillingSpark added 2 commits December 15, 2022 10:15

limit the bytes in the PendingQueue by using a semaphore

e2a329c

changelog

73c81d6

KillingSpark marked this pull request as draft December 15, 2022 09:44

k0nserv reviewed Dec 15, 2022

View reviewed changes

KillingSpark added 8 commits December 15, 2022 11:34

implement semaphore with async primitives

4357935

if one aquire did not use all the remaining capacity, notify another …

4151527

…one waiting

use right changelog

1c1b62d

use tokio semaphore

c172dbc

Add try_write API to stream that is still sync and returns Err:TryAga…

16bf4e2

…in if the pending queue is full

fix tests and issues around asyncifying the pending queue

fd966da

changelog

a243845

fix last tests that were ignored on my machine

1a7b28f

KillingSpark marked this pull request as ready for review December 18, 2022 11:51

fix AsyncWrite impl again. I think the original thing was the correct…

45dbeeb

… thing to do

make poll_flush a no-op like tokio tcp does and give poll_shutdown a …

9e7b46d

…makeover

limit pending queue to 128kB

44e3044

KillingSpark added 2 commits December 21, 2022 10:56

Fix blocking for ever on appends to the pendign queue larger than the…

b32c367

… limit

Merge branch 'limit_pending_queue' of https://github.com/KillingSpark…

cbc831f

…/webrtc into limit_pending_queue

melekes reviewed Dec 21, 2022

View reviewed changes

sctp/CHANGELOG.md Outdated Show resolved Hide resolved

sctp/CHANGELOG.md Outdated Show resolved Hide resolved

sctp/src/stream/mod.rs Outdated Show resolved Hide resolved

KillingSpark and others added 2 commits December 21, 2022 16:33

typo

5e93120

Co-authored-by: Anton <[email protected]>

Update sctp/CHANGELOG.md

90fc599

Co-authored-by: Anton <[email protected]>

melekes reviewed Dec 22, 2022

View reviewed changes

sctp/src/stream/mod.rs Outdated Show resolved Hide resolved

go back to AsyncWrite implementation from before stream.write was sync

d33dd76

KillingSpark and others added 2 commits December 23, 2022 19:06

remove unneeded sync try_write API on sctp stream

a26097f

Merge branch 'master' into limit_pending_queue

f8e11eb

melekes reviewed Jan 2, 2023

View reviewed changes

KillingSpark and others added 3 commits January 2, 2023 09:23

Fix copypaste error

5ad531b

Co-authored-by: Anton <[email protected]>

Fix copypaste error

f4cf779

Co-authored-by: Anton <[email protected]>

make clippy happy

ee22bc5

melekes enabled auto-merge (squash) January 2, 2023 09:19

melekes merged commit daaf05d into webrtc-rs:master Jan 2, 2023

melekes approved these changes Jan 2, 2023

View reviewed changes

melekes mentioned this pull request Jan 2, 2023

sctp: remove unnecessary drops and use precise padding #381

Merged

KillingSpark mentioned this pull request Jan 4, 2023

Data channel fails to send packets at a frequently speed when works as a rtc server and connected by a browser. #360

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCTP] limit the bytes in the PendingQueue by using a semaphore #367

[SCTP] limit the bytes in the PendingQueue by using a semaphore #367

KillingSpark commented Dec 15, 2022 •

edited

Loading

codecov bot commented Dec 15, 2022 •

edited

Loading

KillingSpark commented Dec 15, 2022 •

edited

Loading

k0nserv Dec 15, 2022

KillingSpark Dec 15, 2022

KillingSpark Dec 15, 2022 •

edited

Loading

k0nserv Dec 15, 2022 •

edited

Loading

KillingSpark Dec 15, 2022

KillingSpark Dec 15, 2022

k0nserv Dec 15, 2022

KillingSpark Dec 15, 2022 •

edited

Loading

k0nserv Dec 15, 2022

melekes Dec 21, 2022

KillingSpark commented Dec 20, 2022 •

edited

Loading

melekes commented Dec 21, 2022

KillingSpark commented Dec 21, 2022

KillingSpark commented Dec 21, 2022 •

edited

Loading

melekes commented Dec 21, 2022

KillingSpark commented Dec 21, 2022

melekes left a comment

KillingSpark commented Dec 21, 2022 •

edited

Loading

melekes commented Dec 22, 2022

KillingSpark commented Dec 22, 2022

melekes Jan 2, 2023

KillingSpark Jan 2, 2023

KillingSpark Jan 2, 2023

melekes Jan 2, 2023

melekes Jan 2, 2023

[SCTP] limit the bytes in the PendingQueue by using a semaphore #367

[SCTP] limit the bytes in the PendingQueue by using a semaphore #367

Conversation

KillingSpark commented Dec 15, 2022 • edited Loading

codecov bot commented Dec 15, 2022 • edited Loading

Codecov Report

KillingSpark commented Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KillingSpark Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

k0nserv Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KillingSpark Dec 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KillingSpark commented Dec 20, 2022 • edited Loading

melekes commented Dec 21, 2022

KillingSpark commented Dec 21, 2022

KillingSpark commented Dec 21, 2022 • edited Loading

melekes commented Dec 21, 2022

KillingSpark commented Dec 21, 2022

melekes left a comment

Choose a reason for hiding this comment

KillingSpark commented Dec 21, 2022 • edited Loading

melekes commented Dec 22, 2022

KillingSpark commented Dec 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KillingSpark commented Dec 15, 2022 •

edited

Loading

codecov bot commented Dec 15, 2022 •

edited

Loading

KillingSpark commented Dec 15, 2022 •

edited

Loading

KillingSpark Dec 15, 2022 •

edited

Loading

k0nserv Dec 15, 2022 •

edited

Loading

KillingSpark Dec 15, 2022 •

edited

Loading

KillingSpark commented Dec 20, 2022 •

edited

Loading

KillingSpark commented Dec 21, 2022 •

edited

Loading

KillingSpark commented Dec 21, 2022 •

edited

Loading