Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement and use MultiArrayBuffer #44980

Merged
merged 13 commits into from
Jan 6, 2021

Conversation

geoffkizer
Copy link
Contributor

@geoffkizer geoffkizer commented Nov 20, 2020

MultiArrayBuffer is a struct that manages a sliding buffer, where bytes be added at the end and removed at the beginning, similar to existing ArrayBuffer. Unlike ArrayBuffer, it manages a list of fixed-size (16K) blocks internally.

This is most useful for scenarios where the buffer may grow large, and thus managing it as a single array is probably not efficient.

MultiArrayBuffer is now used in the HTTP2 response buffer code, as well as in StreamBuffer (used by ConnectedStreams and mock QUIC provider).

(Edit: This got moved to a separate PR, now merged) Also, change stream conformance tests to not use Assert.Equals for buffer comparsion, as it is very, very slow. This reduces the time of outerloop StreamConformanceTests by approximately 99%.

Edit: Also add AssertExtensions.SequenceEqual to provide more useful diff information when sequence comparison fails, and use it in stream conformance tests.

@stephentoub @dotnet/ncl

@geoffkizer geoffkizer added area-System.Net tenet-performance Performance related issue labels Nov 20, 2020
@geoffkizer geoffkizer added this to the 6.0.0 milestone Nov 20, 2020
@ghost
Copy link

ghost commented Nov 20, 2020

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

MultiArrayBuffer is a struct that manages a sliding buffer, where bytes be added at the end and removed at the beginning, similar to existing ArrayBuffer. Unlike ArrayBuffer, it manages a list of fixed-size (16K) blocks internally.

This is most useful for scenarios where the buffer may grow large, and thus managing it as a single array is probably not efficient.

MultiArrayBuffer is now used in the HTTP2 response buffer code, as well as in StreamBuffer (used by ConnectedStreams and mock QUIC provider).

Also, change stream conformance tests to not use Assert.Equals for buffer comparsion, as it is very, very slow. This reduces the time of outerloop StreamConformanceTests by approximately 99%.

@stephentoub @dotnet/ncl

Author: geoffkizer
Assignees: -
Labels:

area-System.Net, tenet-performance

Milestone: 6.0.0

@geoffkizer
Copy link
Contributor Author

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the conditions with potential for overflow tests should be added? Or at least -- as it's internal -- asserts for them.

Comment on lines 41 to 45
_blocks = null;
_blockCount = 0;
_activeStart = 0;
_availableStart = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the default-ctor or this = default produces better zeroing code.
Don't know if it's worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I could add a factory method here that does the check, and change callers to use it. Let me look at it...

{
get
{
if (index < 0 || index >= _length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (index < 0 || index >= _length)
if ((uint)index >= (uint)_length)

Saves one cmp.

Debug.Assert(byteCount >= 0);
Debug.Assert(byteCount <= ActiveMemory.Length, $"MultiArrayBuffer.Discard: Expected byteCount={byteCount} <= {ActiveMemory.Length}");

int oldStartBlock = _activeStart / BlockSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIT produces optimized division for this. Casting to uint, then the JIT emits bit-shifts.

Suggested change
int oldStartBlock = _activeStart / BlockSize;
int oldStartBlock = (int)((uint)_activeStart / BlockSize);

But code looks uglier.

The same on other places where / BlockSize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I forgot that it will only optimize to a bit-shift if it's uint. I should probably change the fields here to be uint.

}

private static int GetBlockIndex(int offset) => offset / BlockSize;
private static int GetOffsetInBlock(int offset) => offset % BlockSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast to uint produces the bit-hack modulo by using and instead of the optimized division.

Suggested change
private static int GetOffsetInBlock(int offset) => offset % BlockSize;
private static int GetOffsetInBlock(int offset) => (int)((uint)offset % BlockSize);

src/libraries/Common/src/System/Net/MultiArrayBuffer.cs Outdated Show resolved Hide resolved

public MultiMemory Slice(int start)
{
if (start < 0 || start > _length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (start < 0 || start > _length)
if ((uint)start > (uint)_length)


public MultiMemory Slice(int start, int length)
{
if (start < 0 || length < 0 || start + length > _length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start + length can overflow.

Suggested change
if (start < 0 || length < 0 || start + length > _length)
if ((uint)start > (uint)_length || (uint)length > (uint)(_length - start))

Or tho eke out more perf, one could (on 64-bit) bring it into ulong-domain and compare over there. But I don't think it's worth it.

code
if (Environment.Is64BitProcess)
{
    if ((ulong)(uint)start + (ulong)(uint)length > (ulong)(uint)_length)
    {
        throw new IndexOutOfRangeException();
    }
}
else
{
    if ((uint)start > (uint)_length || (uint)length > (uint)(_length - start))
    {
        throw new IndexOutOfRangeException();
    }
}

@geoffkizer
Copy link
Contributor Author

Lots of stream conformance failures... I didn't see these locally. Will investigate.

I'm going to factor out the span compare change into a separate PR, so we can get that in and confirm that's not an issue here.

@geoffkizer
Copy link
Contributor Author

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@En3Tho
Copy link
Contributor

En3Tho commented Nov 20, 2020

@stephentoub Just a side question - I see more and more API's using ArrayPool.Shared. And my question is: aren't we going to get pool drained quickly this way (considering user code too)? Or are there any changes to Shared pool implementation? Afaik this pool is not endless if I may call it like that

@stephentoub
Copy link
Member

I use the term "drained", but it's actually a bit misleading, in that it suggests there's a fixed number of buffers that are available in the pool and if they're gone they're gone and the pool is no longer effective... that's not actually how it works. Today each size bucket in the pool has a max number of buffers that it can store, but they're not pre-generated: if you go to the pool and there isn't a buffer available, it manufactures a new one for you. It's on returning a buffer to the pool that it might get thrown away if the relevant bucket has already reached its threshold. And this is important because it means you have can way, way more buffers of a given size "rented" at any one time than the bucket's threshold suggests, and as long as they're not all returned to live in the pool at the same time, many of them won't be thrown away: the pool ends up just being a brief hand-off point between one component adding a buffer and another one coming along and taking one. Strange analogy, but kind of like a bathroom at a restaurant: you don't need one bathroom per customer, rather you can be feeding many, many more customers than you have bathrooms, and you only need as many bathrooms as you want to enable customers to use them at the same time.

Which is to say, yes, it's possible that as more and more components use the ArrayPool we'll need to update its thresholds and behaviors to accomodate the demand, but we've not yet seen it to be a problem. #12800 tracks revisiting some of this.

@En3Tho
Copy link
Contributor

En3Tho commented Nov 21, 2020

@stephentoub I see. Thanks for the detailed explanation. I've just been concerned that at some point excessive use of array pool might become an "expensive new" in unexpected ways.

@geoffkizer
Copy link
Contributor Author

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@geoffkizer
Copy link
Contributor Author

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@geoffkizer
Copy link
Contributor Author

geoffkizer commented Dec 15, 2020

@dotnet/ncl Can someone review?

// Unlike ArrayBuffer, the buffer itself is managed using 16K blocks which are added/removed to the block list as necessary.

// [ActiveBuffer] contains the current buffer contents; these bytes will be preserved on any call to TryEnsureAvailableBytesUpToLimit.
// [AvailableBuffer] contains the available bytes past the end of the current content,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: rather than using braces like [ActiveBuffer], how about using single quotes or backticks, e.g. 'ActiveBuffer'? The braces just immediately make me think we're talking about attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

_allocatedEnd = 0;
_activeStart = 0;
_availableStart = 0;
}
Copy link
Member

@stephentoub stephentoub Jan 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it'd be a bit more efficient (and simpler) to do:

public MultiArrayBuffer(int initialBufferSize) : this()
{
    // 'initialBufferSize' is ignored for now. Some callers are passing useful info here that we might want to act on in the future.
    Debug.Assert(initialBufferSize >= 0);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
else
{
// We can shift the array down to make enough space
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider making the byte[][] circular rather than shifting like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems like a potentially good optimization. I elected not to do it for now.

{
uint ustart = (uint)start;
uint ulength = (uint)length;
if (ustart > _length || ustart + ulength > _length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the first check needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, good catch

Copy link
Contributor Author

@geoffkizer geoffkizer Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually, I remember why I did this.

We aren't explicitly checking here if start >= 0, but if it is, this will get caught in the ustart > _length test. EDIT: To be clear, this is because _length is always less than or equal to int.MaxValue.

Unfortunately the second check here isn't quite right, since it doesn't catch a case like start = 5, length = -1, I think it should instead be ulength > _length - ustart and that should correctly catch all invalid cases.

This is also why we don't have asserts here, as you asked about above.

Note also there are asserts in the MultiMemory constructor, so if we miss something here it will get asserted there.

}

public MultiMemory Slice(int start)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(start >= 0);?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above

return new MultiMemory(_blocks, _start + ustart, _length - ustart);
}

public MultiMemory Slice(int start, int length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(start >= 0 && length >= 0);?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above

throw new ArgumentOutOfRangeException(nameof(destination));
}

for (int blockIndex = 0; blockIndex < BlockCount; blockIndex++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockCount looks like it does some computation. Store it into a local rather than recomputing it for each iteration?

throw new ArgumentOutOfRangeException(nameof(source));
}

for (int blockIndex = 0; blockIndex < BlockCount; blockIndex++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@geoffkizer
Copy link
Contributor Author

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@geoffkizer geoffkizer merged commit d448461 into dotnet:master Jan 6, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Feb 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants