-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MEDIA-3445: Reduce memory consumption for audio only conferences #401
base: master
Are you sure you want to change the base?
Conversation
const uint32_t _entriesPerCacheline; | ||
const uint32_t _capacity; | ||
const size_t _blockSize; | ||
alignas(64) std::atomic<VersionedIndex> _readCursor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the explicit line sepatator fields and forced the alignment on 64 bits. This is a bit more resilient as before we couldn't have sure if readCursor
was not sharing the cache line with another atomic members in a different object. alignas(64)
force it to be the first address on the cache line
concurrency/MpmcQueue.h
Outdated
// Special Cell state for when queue has zero size | ||
// Null Entry is not written and it is only read it when queue is zero size. So it can live in same cache line of | ||
// _readCursor without performance impact | ||
const EntrySate _nullEntry = CellState::State::nullSlot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the same cache line of _readCursor to save a little of memory. This value is unchangeable and it only read when _readCursor is also unchangeable (ie, when the capacity is zero)
concurrency/MpmcQueue.h
Outdated
alignas(64) const uint32_t _capacity; | ||
Entry* const _elements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced that separating this 2 fields from _writeCursor
cache line has a significant impact on the performance as they are read only fields (so they don't cause cache trashing), maybe they can suffer from cache trashing when they are being read but as they are const fileds, I think compiler has ways to optimize it (or even we could to store them on the stack memory on the beginning of each method that needs them).
Anyway, I forced 64 bytes alignment to keep the same idea of _cacheLineSeparator2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contention will be on readcursor and write cursor, those must be separated.
I assume it will invalidate the cacheline if it has been modified but I think that only is valid for atomic variables. If you read a variable that is not atomic like _capacity or _elements, you get whatever value is cached. CPU has not obligation to provide atomic access because you had an atomic in the cache line.
I think the cacheLineSeparator2 was intended to separate _writeCursor from atomics after the MpmcQueue. Otherwise that could cause contention with other unknown object containing atomics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make the comments more concise. (really short and to the point). They take space and make it harder to read the real code.
concurrency/MpmcQueue.h
Outdated
// but forcing a new cache line also ensures tha nothing is placed in this cache line after | ||
// MpmcQueue boundary (which we can predict if can affect performance | ||
// of _writeCursor) | ||
alignas(64) const uint32_t _capacity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought this through, this one should not be aligned. It is not read/written atomically and the value will be from whatever is in cache and it will not stall cpu pipeline to read it (which is ok as it is const).
It is important that writecursor is protected from atomic operations in the next cache line, but that should be resolved by all other objects doing atomic operations align their first atomic member variable. Preferably having the atomic one first to not waste memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this alignas. Then _nullEntry
, _capacity
and _elements
are in the same cache line of _writeCursor to now waste space.
the _writecursor
is protected from other atomic variables outside of this object because the MpmcQueue
as an alignment requirement of 64 and the size needs to be a multiple of 64, so the size of MpmcQueue
is 128. Although there is no standardization on binary level, I think compilers don't have another option than forcing to be a multiple of aligment due to arrays and pointers iterators (Example, if we have a MpmcQueue[N], accessing the memory position should index * sizeof(MpmcQueue)
a and that needs always be a 64 bytes aligned. Same thing for when you iterating a MpmcQueue*
, then a padding needs to added and that ensures no other variables are added to the cache line outside of this object)
ab7ac4a
to
87c598b
Compare
c23a6f0
to
5aafa55
Compare
const bool canDoVideo = hasVideoEnabled && !this->hasVideoDisabled(); | ||
const bool enableUplinkEstimation = canDoVideo; | ||
const size_t expectedInboundStreamCount = canDoVideo ? 16 : 8; | ||
const size_t expectedOutboundStreamCount = canDoVideo ? 256 : 16; | ||
const size_t jobQueueSize = canDoVideo ? 4096 : 1024; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better adjustment of transport memory allocation
The goal of this change is save memory for audio only conferences. On C9 SFU we experienced a problem where sufs runs out of memory with low seats reservation. SMB allocates a lot of memory for each conference and when we have a lot of small conference (low number of participants) the endpoints we can have in a single sfu reduce drastically due to the quantity of memory each conference pre allocated.
This PR tries to save around 1MB allocation per each conference when we know they are audio only (it's always the case for C9) Although for C9 video is never necessary, we decided to make it configurable through the API when the conference is allocated so allow a better memory adjustment when we have mixed scenarios (like we have in Symphony where the VoiceDirect rooms are audio only and maybe the future integration with WhatsApp calls will be also audio only).
If an endpoint tries to allocated video streams, the allocation does not fail. Instead, the video is ignored and the offer does not contains any information about video, only one warning is logged saying that endpoint tried to allocate video in an audio only conference.
Notes
We had to add support to zero capacity on
MpmcHashmap
andMpmcQueue
. Now, we can say we want it with zero capacity and no dynamic memory is allocatedMemory allocated by each EngineMixer before this PR
Memory allocated by each EngineMixer after PR (for audio only). Diff: -6.6MB
Memory allocated by each Transport before this PR
Memory allocated by each Transport after PR (for audio only). Diff: -1MB