Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate calculator optimization #538

Merged
merged 10 commits into from
Apr 20, 2021
Merged

Conversation

vpalmisano
Copy link
Contributor

@vpalmisano vpalmisano commented Mar 31, 2021

In the current implementation of RateCalculator class, the RemoveOldData has an high CPU usage because it should iterate over an array where each item index represents a time moment with a resolution of 1ms. As a consequence, in most cases we have a sparse array. This PR changes the calculator algorithm, assigning each sample to an array position in a contiguous way, avoiding large iterations when removing the expired elements. The unit test has been modified accordingly.

Test is needed!

  • no downsides
  • no bad corner cases
  • no bugs

Callgrind outputs:

  • v3:
    image

  • with optimization:
    image

@penguinol
Copy link
Contributor

penguinol commented Apr 2, 2021

I've also tried to optimize rate calcautor by using deque,see penguinol@009f78b, but the performence improvement is not as good as i excepted.
The performence under -O3 is quite different from which under -O0.
In our test, the original rate calcautor cost about 5% cpu time under -O3.

@vpalmisano
Copy link
Contributor Author

vpalmisano commented Apr 2, 2021

@penguinol right, I've run the same comparison with -O3:

  • v3: 1.45%
  • with my optimization: 0.67%

Anyway, RateCalculator::Update() is the mediasoup function with the greatest self CPU usage:
v3:
image

with my optimization:
image

@ibc ibc requested a review from jmillan April 2, 2021 10:48
@ibc
Copy link
Member

ibc commented Apr 2, 2021

Thanks for this. How to test this? and how to test that there are no downsides? are unit tests enough?

@jmillan can you please take a look to this?

@vpalmisano
Copy link
Contributor Author

I have used this env variable to trace the process:
MEDIASOUP_USE_VALGRIND=true MEDIASOUP_VALGRIND_OPTIONS='--tool=callgrind --dump-instr=yes --simulate-cache=yes'

Looking at the source code, the RateCalculator class is used only to estimate the recvTransmission and sendTransmission to be reported into the transport statistics. If this is true, I see no potential critical issues.

@ibc
Copy link
Member

ibc commented Apr 2, 2021

Ok, let's merge next week (bank holidays here).

@jmillan
Copy link
Member

jmillan commented Apr 13, 2021

We are currently using oldestTime and latestTime names. Same for indexes. This IMHO makes difficult readability.

I propose using: newestXXX & oldestXXX or firstXXX & lastXXX instead.

Personally I prefer newestXXX& oldestXXX. Being related to time, it makes more sense to me. What do you think @ibc, @vpalmisano?

RateCalculator(
size_t windowSize = DefaultWindowSize,
float scale = DefaultBpsScale,
uint16_t windowItems = DefaultWindowItems)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason/case for windowItems value to be different than windowSize value?

I see no setter for it, and it takes the same value as windowSize in constructor, by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder the very same.

Copy link
Contributor Author

@vpalmisano vpalmisano Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The windowItems value depends on both the rate evaluation window period (windowSize) and the events frequency.
For instance, if the measured event frequency is 1 every 2ms, you can set windowSize=1000 and windowItems=500.

Copy link
Member

@jmillan jmillan Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

windowSize is only used to get the scale factor and to retrieve the new oldestTimeIndex whereas windowItems is the range (from 0 to windowItems) where the items are really stored in the buffer.

Having a windowSize bigger than the windowItems does not provide accurate results I would say. Ie:

windowSize = 1000
|-----------------------------------------------------|

windowItems = 500
|------------------------|

We are calculating the rate for a time window of 1 second, and only considering the data of the last 1/2 second, which is not accurate.

Due to that rationale I think windowItems should be removed. Any comment on this @vpalmisano?

Copy link
Contributor Author

@vpalmisano vpalmisano Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is using different variables to represent the maximum window size expressed in seconds and in number of items, without mixing the two things.
The windowSize is expressed in seconds, while the windowItems is the maximum number of items that can be stored at the same time in the time window, so in your example you must take into account also the frequency of the data sample.
If you receive 1 sample per millisecond, you will fill the array with 500 items considering only 1/2 second of data.
Instead, if the data sample frequency is 1 every 2 milliseconds, a 500 items array will correspond to 1 second of data.
I can perform same tests evaluating if the window is filled or not when changing the windowItems value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmillan right, the problem was not there before this PR. The drawback of the old approach is that it is using a number of items == window size in ms and in some cases this size could be equal to 2500 or 6000 packets.

Can we re-open this PR or should I open a new one with the latest fixes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRs cannot be reopened AFAIK so create a new one, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@notorca thanks for reporting the bug and comments.
Fixed now in: #547

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I will test it soon.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vpalmisano Works good for me.

@ibc
Copy link
Member

ibc commented Apr 13, 2021

Personally I prefer newestXXX& oldestXXX. Being related to time, it makes more sense to me.

Agreed

@vpalmisano
Copy link
Contributor Author

vpalmisano commented Apr 13, 2021

We are currently using oldestTime and latestTime names. Same for indexes. This IMHO makes difficult readability.

I propose using: newestXXX & oldestXXX or firstXXX & lastXXX instead.

Personally I prefer newestXXX& oldestXXX. Being related to time, it makes more sense to me. What do you think @ibc, @vpalmisano?

Or maybe we can use oldestTime, latestTime and oldestTimeIndex, latestTimeIndex.

@vpalmisano
Copy link
Contributor Author

For the same reason it will be better to rename windowSize -> windowDuration.

@jmillan
Copy link
Member

jmillan commented Apr 13, 2021

Or maybe we can use oldestTime, latestTime and oldestTimeIndex, latestTimeIndex.

The thing is that oldest and latest are not antonyms. oldest and newest are, first and last are antonyms too. Being antonyms makes it easier to understand and relate each other.

@jmillan
Copy link
Member

jmillan commented Apr 13, 2021

Wrapping up with your proposal @vpalmisano:

oldestTime, newestTime and oldestTimeIndex, newestTimeIndex would make it.

@vpalmisano
Copy link
Contributor Author

Wrapping up with your proposal @vpalmisano:

oldestTime, newestTime and oldestTimeIndex, newestTimeIndex would make it.

Done with the latest commit.

@ibc
Copy link
Member

ibc commented Apr 19, 2021

Are we ready to merge this? or is there something missing?

@vpalmisano
Copy link
Contributor Author

I've replaced some variable names and added a warning log in the case the calculation buffer is full. I think there are no other changes to be done. @jmillan ?

@jmillan
Copy link
Member

jmillan commented Apr 20, 2021

LGMT, thanks @vpalmisano!

vpalmisano and others added 2 commits April 20, 2021 10:30
@ibc
Copy link
Member

ibc commented Apr 20, 2021

Merging!

@ibc ibc merged commit 56068f8 into versatica:v3 Apr 20, 2021
@ibc
Copy link
Member

ibc commented Apr 20, 2021

Just a thing:

  CXX(target) /Users/ibc/src/v3-mediasoup/worker/out/Release/obj.target/mediasoup-worker-test/src/RTC/RtpObserver.o
../src/RTC/RateCalculator.cpp:33:6: warning: format specifies type 'unsigned long long' but the
      argument has type 'size_t' (aka 'unsigned long') [-Wformat]
                          this->windowSize,
                          ^~~~~~~~~~~~~~~~~
../include/Logger.hpp:435:22: note: expanded from macro 'MS_WARN_TAG'
        #define MS_WARN_TAG MS_WARN_TAG_STD
                            ^
../include/Logger.hpp:227:90: note: expanded from macro 'MS_WARN_TAG_STD'
  ..._MS_LOG_STR_DESC desc _MS_LOG_SEPARATOR_CHAR_STD, _MS_LOG_ARG, ##__VA_ARGS__); \
                      ~~~~                                            ^~~~~~~~~~~

@ibc
Copy link
Member

ibc commented Apr 20, 2021

fixing in master

@MrSurana
Copy link

MrSurana commented Apr 29, 2021

Too many buffer full warnings after this commit.

2021-04-29T09:58:02.592Z mediasoup:WARN:Channel [pid:29676] RTC::RateCalculator::Update() | calculation buffer full, windowSize:2500 ms windowItems:1000

@ibc
Copy link
Member

ibc commented Apr 29, 2021

Too many buffer full warnings after this commit.

2021-04-29T09:58:02.592Z mediasoup:WARN:Channel [pid:29676] RTC::RateCalculator::Update() | calculation buffer full, windowSize:2500 ms windowItems:1000

#547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants