Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bandwidth Estimator #25

Open
4 of 10 tasks
Sean-Der opened this issue Dec 30, 2020 · 33 comments
Open
4 of 10 tasks

Bandwidth Estimator #25

Sean-Der opened this issue Dec 30, 2020 · 33 comments

Comments

@Sean-Der
Copy link
Member

Sean-Der commented Dec 30, 2020

The issue covers everything we want to do around Congestion Estimation and Pion. This covers a Bandwidth Estimator in Pion, and generating feedback for remote peers.

@mengelbart is the owner of this work. If you have feedback/suggestions/ideas he knows this all the best!


  • bc30165 Implement Sender/Receiver Reports
  • c8a26a2 Implement Transport Wide Congestion Control Feedback
  • Design a interface/API that all Bandwidth Estimators will satify
  • Merge the NetworkConditoner so we can test our implementations
    • Start with basic tests (hard limits on total bandwidth/start queueing at certain bitrate]
    • Implement full test suite from RFC8867
  • Implement Google Congestion Control
    • Have it be Sender estimates only using TWCC. No REMB our Receiver Reports.
  • Implement FlexFEC
    • This will be needed by servers attempting to deliver Simulcast/SVC
  • Make WebRTC Bandwidth Estimation accessiable to others
    • Write blog posts about what you are implementing/challenges
    • Write a dedicated chapter in WebRTC for the Curious about Bandwidth Estimation
  • Make work accessible for other projects.

In the future we will also need


@jech

This comment has been minimized.

@Sean-Der

This comment has been minimized.

@aler9

This comment has been minimized.

@Sean-Der Sean-Der mentioned this issue Feb 15, 2021
56 tasks
@dtaht

This comment has been minimized.

@aler9

This comment has been minimized.

@Sean-Der

This comment has been minimized.

@dtaht

This comment has been minimized.

@Sean-Der

This comment has been minimized.

@jech

This comment has been minimized.

@dtaht

This comment has been minimized.

@dtaht

This comment has been minimized.

@dtaht

This comment has been minimized.

@dtaht

This comment has been minimized.

@dtaht

This comment has been minimized.

@dtaht

This comment has been minimized.

@unicomp21
Copy link

unicomp21 commented Nov 3, 2021

@Sean-Der @mengelbart are there use cases where having all connection stats available real-time/globally in something like redpanda/kafka would be useful to pion users? The intention would be connection orchestration, bandwidth concerns, and business logic specific to a user, which up/down scales automatically using kafka consumer groups. In all likelihood it would be a separate repo. What would be useful about it is being able to try out new strategies more quickly/easily. Could even bring something like tensorflow into the picture for bandwidth concerns, etc.

@dtaht
Copy link

dtaht commented Nov 4, 2021

I'd really like to be able to construct a lag meter somehow.

@jech
Copy link
Member

jech commented Nov 4, 2021

@dtaht you already have one — receiver reports contain all the information needed to compute application-layer RTT. Have a look at /stats.html next time you use Galene, where all of that is exported.

@unicomp21 In my experience, the accounting libraries tend to use a set of global atomics to keep statistics, which causes cache-line bouncing in higly concurrent applications. Since Galene has been carefully tuned to scale well across multiple cores, that's something that makes me nervous. Please don't add dependencies on third-party accounting libraries in low-level code.

@dtaht
Copy link

dtaht commented Nov 5, 2021

@jech on-screen, much like you can see a lagmeter in many games. If the data is there, can the javascript pull it out? galene Push it to each client? (e.g. see the lag others are experiencing). Example:

https://www.earthli.com/quake/lagometer.php

https://en.wikipedia.org/wiki/Lagometer

@jech
Copy link
Member

jech commented Nov 5, 2021

That's pretty much off-topic for this thread, perhaps you'll want to open an issue in Galene or raise the issue on Galene's mailing list.

@dtaht
Copy link

dtaht commented Nov 5, 2021

agreed. Apologies.

@thiblahute
Copy link

You should maybe update the Todo list from the issue description now that gcc has landed? :-)

@Sean-Der
Copy link
Member Author

Sorry I missed this @thiblahute. Updated the tracking issue

Are you using the GCC implementation at all? Any feedback or questions I would always love to talk about it!

@thiblahute
Copy link

@Sean-Der I am not using it no, I just read your implementation while implementing GCC for GStreamer: https://gstreamer.freedesktop.org/documentation/rsrtp/rtpgccbwe.html?gi-language=c#rtpgccbwe

@alexpokotilo
Copy link

First of all I want to thanks everybody involved in GCC implementation! Thanks for very useful comments in this thread either.

I faced with a problem during my tests and I'm seeking a bug in GCC implementation to report it appropriately, but as GCC code is full of math let me explain what I encountered and may be @mengelbart or somebody else give me some advises to find this problem quickly.
Problem reproduced with fresh version of Firefox(120.0.1 (64-bit)) and Chrome(Version 120.0.6099.129 (Official Build) (64-bit)).

Unfortunately it's hard to just provide a test to reproduce this case. I'm sorry.
But there are some simple steps I followed to reproduce this problem.

c.InterceptorFactory, err = cc.NewInterceptor(func() (cc.BandwidthEstimator, error) {
	return gcc.NewSendSideBWE(
		gcc.SendSideBWEMinBitrate(300000),        // set loweset bitrate of abr streams
		gcc.SendSideBWEInitialBitrate(1000000),   // set initial bitrate of abr streams
		gcc.SendSideBWEMaxBitrate(2000000),       // set maximum bitrate of abr streams
		gcc.SendSideBWEPacer(gcc.NewNoOpPacer())) // I set NoOp pacer for simplicity sake
})

if err != nil {
	return nil, nil
}

i.Add(c.InterceptorFactory)
if err = webrtc.ConfigureTWCCHeaderExtensionSender(m, i); err != nil {
	return nil, nil
}

I send single bitrate stream of 750K. I don't change stream bitrate during the test
Once start playback I wait till estimator reached maximum bitrate

target bitrate 2000000
target stats {
"averageLoss": 2.253618279230226e-14,
"delayEstimate": -0.203,
"delayMeasurement": -0.065,
"delayTargetBitrate": 2000000,
"delayThreshold": 6,
"lossTargetBitrate": 2000000,
"state": "increase",
"usage": "normal"
}

Once Max bitrate reached by estimator I run following script

sudo tc qdisc del dev lo root netem delay 0ms
sudo tc qdisc add dev lo root netem delay 0ms

echo "start playback and wait till GCC estimator shows maximum bandwidth then press Enter"
read -n 1 x;

echo "delay 5ms"
sudo tc qdisc change dev lo root netem delay 5ms
sleep 9
echo "delay 1ms"
sudo tc qdisc change dev lo root netem delay 1ms
sleep 9
echo "delay 0ms"
sudo tc qdisc del dev lo root netem delay 0ms

make sure estimator still shows max bitrate and press Enter

During script execution estimated bandwidth drops to min bandwidth and sometimes even below(rarely).
The problem is that estimated bandwidth goes around minimum bandwidth once problem reproduced and doesn't reach max bitrate anymore.

I've attached observed stats after script execution.
stats.zip

Unfortunately problem is not 100% reproducible, but if you run this script several times, you will eventually face it.
I'm trying to find where is the bug or it's browser problem. I run my tests on localhost interface.
Any help appreciated

@dtaht
Copy link

dtaht commented Jan 7, 2024

Interesting. Part of my issue is in trusting netem to change anything without screwing up the link. can you also generate tc -s qdisc show stats? a few times after each change?

Are you observing any drops at all? Any queue depth at all?

As a completely opposite test of what you are trying to do, could you try this with cake, changing the bandwidth parameter? I trust cake to take the change command, and want to permute the gcc estimator to what cake presents as bandwidth and delay (e.g, if you change it to 500k, the delay will skyrocket, and loss start to happen), rather than just applying delay.

(btw, you can get rid of a del/add cycle with replace)

tc qdisc replace dev lo root cake bandwidth 2.5mbit #

tc qdisc change dev lo root cake bandwidth 500kbit # etc

@alexpokotilo
Copy link

@dtaht it's a pleasure to be in the same thread with the legend. I've prepared scripts and even run some but had to dig into
https://www.linkedin.com/pulse/explaining-schcakes-statistics-dave-taht to understand tc stats.
I'll prepare test results and get back. I just need some time to not look like a dummy

@alexpokotilo
Copy link

alexpokotilo commented Jan 12, 2024

@dtaht, I've checked netem and it doesn't hurt link.
If I run the same netem command(with the same delay value) several times nothing happens with BWE.
The problem I described reproduced only in case if I set new delay with big dispersion from original.
E.g if I set 200ms before I start test and then set delays ~ 200ms+-0.06*(200ms) everything works fine.
If I set delay to 0ms and then set delay to 5m the problem is reproduced immediately even though delay increased very little in absolute values.

I've checked loss_based_bwe.go. I both checked and debugged code as well as patched "updateLossEstimate" method and hardcoded "lossRatio" to always be "0". Problem reproduced with these patches. This means the problem is not in loss_based_bwe.go. I'll inspect delay based bwe.

My quess is that delay based bwe calculates delay "mean" and if current "dispersion" >= some X% of mean then this is a signal to decrease bandwidth. The problem is probably in fact that in case of initial 0ms delay in localhost interface then I set 5ms this is a huge dispersion from delay estimator POV even though absolute delay value is very little.
Most probably I will add simple condition that if the square of the standard deviation is less than 10ms we should just consider this fluctuation as white noise.
I'll get back once I check delay based bwe.

I've performed cake-based tests. They reproduced the problem the same way as netem based tests but netem is more simple to calculate.

@alexpokotilo
Copy link

Hi @dtaht,
I was able to reach @mengelbart and ask him to help.
He made a fix #221 and this fix helps in eliminating reset target bitrate to minBitrate.
I found two problems either:

  1. one with variance calculation in exponentialMovingAverage
  2. but much more important problem is that rateController never reset latestDecreaseRate but rfc states we need to reset it.
    Since we don't reset latestDecreaseRate our current bitrate could always fall into latestDecreaseRate condition in the future and we will reset target bitrate in "increase" state if our source bitrate is less that target bitrate and close to latestDecreaseRate.
    So we MUST reset latestDecreaseRate.
    We will work with @mengelbart to fix this properly.

Another problem is why we come to overuse state. But we will fix it a little later.

@alexpokotilo
Copy link

Hi all,
I'm proposing to discuss this #226 request as a start point. This version works fine to me and I am going to use it in production.
@dtaht if you know details about math around gcc BWE, you could really help me with review or maybe we can setup a call so you can guide me and check my results.
@mengelbart could you please review #226 and share your opinion too?
@Sean-Der you asked who is using this. I'm going to use this in production and if somebody can help me with review I can sponsor this job. I will all myself, just help to find somebody to review

@dtaht
Copy link

dtaht commented Jan 28, 2024

It has been 10+ years since I looked at gcc. My principal critique of the current codebase was that delay is more important than loss.

@dtaht
Copy link

dtaht commented Jan 28, 2024

I am otherwise kind of low on brain cells!

@alexpokotilo
Copy link

alexpokotilo commented Jan 28, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants