Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebRTC] investigate data-channels-flow-control example throughput performance issue #101

Open
Tracked by #1
rainliu opened this issue Sep 25, 2021 · 14 comments
Open
Tracked by #1
Labels
benchmark benchmark the peformance performance improvement subcrate:data For issues specific to the data crate

Comments

@rainliu
Copy link
Member

rainliu commented Sep 25, 2021

Pion has more than 500Mbps+,

Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
2021/09/25 13:27:29 OnOpen: data-824638619994. Start sending a series of 1024-byte packets as fast as it can
2021/09/25 13:27:29 OnOpen: data-824636958938. Start receiving data
2021/09/25 13:27:30 Throughput: 570.646 Mbps
2021/09/25 13:27:31 Throughput: 569.753 Mbps
2021/09/25 13:27:32 Throughput: 573.001 Mbps
2021/09/25 13:27:33 Throughput: 572.452 Mbps
2021/09/25 13:27:34 Throughput: 571.297 Mbps
2021/09/25 13:27:35 Throughput: 569.525 Mbps
2021/09/25 13:27:36 Throughput: 567.463 Mbps
...

but, webrtc-rs only has around 13Mbps:

Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
OnOpen: data-1. Start sending a series of 1024-byte packets as fast as it can
OnOpen: data-1. Start receiving data
Throughput: 12.990 Mbps
Throughput: 13.698 Mbps
Throughput: 13.559 Mbps
Throughput: 13.345 Mbps
Throughput: 13.565 Mbps
Throughput: 13.582 Mbps
@rainliu
Copy link
Member Author

rainliu commented Sep 25, 2021

cargo build --release --example data-channels-flow-control increases performance, but not comparable to pion.

./target/release/examples/data-channels-flow-control
Press ctlr-c to stop
Peer Connection State has changed: connected (offerer)
Peer Connection State has changed: connected (answerer)
OnOpen: data-1. Start sending a series of 1024-byte packets as fast as it can
OnOpen: data-1. Start receiving data
Throughput: 175.556 Mbps
Throughput: 106.104 Mbps
Throughput: 76.986 Mbps
Throughput: 61.450 Mbps
Throughput: 51.632 Mbps
Throughput: 44.797 Mbps
Throughput: 39.733 Mbps
Throughput: 35.619 Mbps
Throughput: 32.330 Mbps
Throughput: 29.491 Mbps
Throughput: 43.142 Mbps
Throughput: 48.350 Mbps
Throughput: 46.386 Mbps
Throughput: 44.221 Mbps
Throughput: 48.071 Mbps
Throughput: 55.550 Mbps
Throughput: 53.980 Mbps
Throughput: 52.263 Mbps

@rainliu rainliu added benchmark benchmark the peformance performance improvement labels Sep 27, 2021
@whans
Copy link

whans commented Oct 5, 2021

should be tokio performance limit

some other benchmark

goroutines: 3.22234675s total, 3.222346ms avg per iteration
rust_threads: 16.980509645s total, 16.980509ms avg per iteration
rust_tokio: 9.56997204s total, 9.569972ms avg per iteration
rust_tokio_block_in_place: 3.578928749s total, 3.578928ms avg per iteration

https://www.reddit.com/r/rust/comments/lg0a7b/benchmarking_tokio_tasks_and_goroutines/

@vitdevelop
Copy link
Contributor

@rainliu I made some benchmarks between Go(Pion) and Rust with long running time, maybe will help.


Go(Pion)
From start of benchmark throughput was grow up to 844Mps

17:49:06 Throughput: 721.371 Mbps
17:49:07 Throughput: 727.991 Mbps
17:49:08 Throughput: 743.665 Mbps
...
17:49:39 Throughput: 842.728 Mbps
17:49:40 Throughput: 843.339 Mbps
17:49:41 Throughput: 843.672 Mbps
17:49:42 Throughput: 844.272 Mbps
17:49:43 Throughput: 844.782 Mbps
17:49:44 Throughput: 844.855 Mbps

after has been throwed an exception
mux ERROR: 17:49:45 mux: ending readLoop dispatch error packetio.Buffer is full, discarding write

and throughput started to go slow down without stop to stable point.
I stopped benchmark on

18:44:45 Throughput: 9.966 Mbps

Rust
From start of benchmark throughput was

Throughput: 229.521 Mbps
Throughput: 231.489 Mbps
Throughput: 231.780 Mbps
Throughput: 231.662 Mbps
Throughput: 231.965 Mbps

after that, started to go down and reached lowest point

Throughput: 23.023 Mbps
Throughput: 22.849 Mbps
Throughput: 22.677 Mbps

after started slowly to grow up and balancing between 66.436 Mbps - 41.511 Mbps


CPU/RAM

Go

RAM 22.304 MB
CPU ~1%

Rust

RAM 162MB(stopped at that point, without recycling)
CPU ~120%

@vitdevelop
Copy link
Contributor

vitdevelop commented Oct 5, 2021

should be tokio performance limit

some other benchmark

goroutines: 3.22234675s total, 3.222346ms avg per iteration rust_threads: 16.980509645s total, 16.980509ms avg per iteration rust_tokio: 9.56997204s total, 9.569972ms avg per iteration rust_tokio_block_in_place: 3.578928749s total, 3.578928ms avg per iteration

https://www.reddit.com/r/rust/comments/lg0a7b/benchmarking_tokio_tasks_and_goroutines/

@whans Benchmarks was made on I/O for file, not on socket. Linux have different behavior for files and sockets.
If you will try std's file IO inside async block or task::block_in_place you will have very fast values.
That is because Linux use read-ahead for files.
Also benchmarks was made on /dev/urandom and /dev/null which is in-memory files.

@rainliu I suppose that, here example use socket connection.

@whans
Copy link

whans commented Oct 5, 2021

@vitdevelop thanks
compare std::net::UdpSocket vs tokio::net::UdpSocket
std::UdpSocket is almost twice as fast as tokio::UdpSocket

@rainliu
Copy link
Member Author

rainliu commented Oct 6, 2021

thanks @vitdevelop and @whans for the benchmarking.

Look like we need some efforts to profile the hotspots/bottlenecks.

@whans
Copy link

whans commented Oct 6, 2021

add tokio console to check the schedule issue

https://github.com/tokio-rs/console

@vitdevelop
Copy link
Contributor

add tokio console to check the schedule issue

https://github.com/tokio-rs/console

@whans thanks for tokio-console, awesome tool

I tried to check with tokio-console busy/idle times for tasks but didn't connect to console_subscriber.
After I figured out that before offer/answer I can connect and I putted some tokio::time::sleep points to see the image,
after offer/answer started to execute, console hangs up.

  • 10 sec before create_oferer
  • 3 sec before create_answerer
  • 3 sec before create_offer and set_remote_description
  • 3 sec before create_answer and set_remote_description

Here is last tokio-console data
webrtc_rs_tokio_console

@whans
Copy link

whans commented Oct 7, 2021

@vitdevelop
you need to slow down the packet sending rate.
add sleep in sending task

@vitdevelop
Copy link
Contributor

vitdevelop commented Oct 8, 2021

@whans Sleep in sending task helped for some time

@rainliu After ~5 min I figured out that a lot of tasks webrtc-sctp-0.3.8/src/timer/[ack,rtx]_timer.rs:[43,153] was spawned, around 500-600.
Most cpu busy time is in that tasks.

I attached the screenshot.
webrtc_rs_ack_rtc_timer

@rainliu
Copy link
Member Author

rainliu commented Oct 9, 2021

@vitdevelop, thanks for the finding. Could you submit a PR to add tokio-console/console_subscriber to data-channels-flow-control example? so, I can take a look.

@vitdevelop
Copy link
Contributor

@rainliu Added PR
webrtc-rs/examples#1

@whans
Copy link

whans commented Oct 11, 2021

output about: perf top -p

Xnip2021-10-11_08-19-51

@ramyak-mehra
Copy link

I was doing some testing of my own on sctp and on just write to streams it takes almost 22x slower compared to same example on pion side. I was seeing an issue to use IO-free state machine style, while that will speed up the protocol it still doesnt explain the root cause. The rust example shouldn't be 22x slower it should be atleast comparable to the pion one if not faster. Has anyone explored the root cause of such slow down of the protocol?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark benchmark the peformance performance improvement subcrate:data For issues specific to the data crate
Projects
None yet
Development

No branches or pull requests

5 participants