-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StreamReader sync/async optimizations: ArrayPool, IndexOfAny, ValueStringBuilder #62552
Conversation
Tagging subscribers to this area: @dotnet/area-system-io Issue Detailsnull
|
src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Outdated
Show resolved
Hide resolved
First commit:
|
Second commit.
|
Third commit.
|
@stephentoub @adamsitnik moved your suggestions to here. |
Thanks. My use of StringBuilder in the other PR comments was mostly for a quick hack to exemplify the wins. I think we should look here at instead using char arrays from ArrayPool. |
Wouldn't be there a problem when there's only one line and we return and rent new char arrays other and other until we hit EOF? |
8c20efb
to
fb6ccb3
Compare
fb6ccb3
to
e2e318f
Compare
The first and the second cases do need reallocating/resizing pooled array.
|
Why is that a problem? |
I meant reallocating and resizing the polled arrays wouldn't the cost be higher? |
There would be some cost, but it should also be rare. We can afford to rent a larger char[] initially than we can proactively allocate a StringBuilder, and by renting/returning, we save the cost of creating a new StringBuilder for each StreamReader, plus avoid increasing the size of the StreamReader itself. We also avoid holding on to a potentially large object for the lifetime of the StreamReader and instead allow other code in the process to use that same memory, or potentially for it to be reclaimed. |
@stephentoub Added array pools as a draft and they do look better. I've picked 80 as default increase factor, probably gonna change to something different. I assume we could redo passing the arrays and copying them as spans, and change Array.Copy to Buffer.Copy but I'm not sure. Also gonna run unit-tests locally. Feel like I broke something.
|
Can't really run the tests I hope the CI would spot an error if there is one (I hope no errors) :( #62586 (created an issue) Anyway the benchmarks are here so feel free to comment. I'm gonna play with the Spans until any other comments/suggestions. |
Fixed the tests. And figured out how to run them locally (wrote about it in the issue and closed it). |
Benchmarks after fixing the tests.
|
@danmoseley removed readtoend changes. |
src/libraries/System.Private.CoreLib/src/System/IO/StreamReader.cs
Outdated
Show resolved
Hide resolved
Can you update the benchmark now -- I don't see ReadLine results. We care about ReadLine and ReadLineAsync benchmarks, I would expect the others to be unchanged (of course we should verify) Some nice improvements in the ReadLineAsync numbers though. I'm guessing the short line cases are slower because IndexOfAny(..) is slower than a trivial loop when the hit is very early in the buffer. Something interesting I noticed is that the perf characteristics will be slightly different on Windows vs Linux/Mac (that you have). That's because the perf tests use the platform line ending: You might want to locally edit that line to use @stephentoub any concerns about the changes here? The local methods seem ugly, but as discussed, we seem to be missing a reusable type here for this case. |
@danmoseley added NewLine to the params ( BenchmarkDotNet=v0.13.1.1786-nightly, OS=macOS Monterey 12.3.1 (21E258) [Darwin 21.4.0]
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.3.22179.4
[Host] : .NET 7.0.0 (7.0.22.17504), X64 RyuJIT
Job-HKXHTD : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
Job-RJNYWV : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1
|
We could fall-back to initial while loop if the line is short. We need to determine which is short tho. Something around 5-16 chars. |
How do you know how long the line is without searching in the first place? It’s just an assumption we have to make that files are mostly content not newline characters. |
Sorry, I meant running the search loop for 10 chars and then switching back to index of any |
@danmoseley that's what I meant. My naive solution (didn't push it) but what I don't understand why it's allocating 74 kb like that. public override string? ReadLine()
{
ThrowIfDisposed();
CheckAsyncTaskInProgress();
if (_charPos == _charLen)
{
if (ReadBuffer() == 0)
{
return null;
}
}
// fast path
(int indexStart, int length) = (_charPos, _charLen - _charPos);
int indexOfNewLine = -1;
int fastPathMaxLength = length % 9;
Span<char> span = _charBuffer.AsSpan(indexStart, fastPathMaxLength);
int index = fastPathMaxLength - 1;
while (index > -1)
{
char @char = span[index];
if (@char is '\r' or '\n')
{
indexOfNewLine = index;
break;
}
index--;
}
if (indexOfNewLine != -1)
{
string s = new string(_charBuffer, indexStart, indexOfNewLine);
char ch = _charBuffer[_charPos + indexOfNewLine];
_charPos += indexOfNewLine + 1;
// if it is two chars for the new-line character then inc the _charPos
if (ch == '\r' && (_charPos < _charLen || ReadBuffer() > 0))
{
if (_charBuffer[_charPos] == '\n')
{
_charPos++;
}
}
return s;
}
using ValueStringBuilder sb = new(stackalloc char[256]);
do
{
// Note the following common line feed chars:
// \n - UNIX/Mac \r\n - Windows
(indexStart, length) = (_charPos, _charLen - _charPos);
indexOfNewLine = _charBuffer.AsSpan(indexStart, length).IndexOfAny('\r', '\n');
if (indexOfNewLine >= 0)
{
string s;
if (sb.Length > 0)
{
sb.Append(_charBuffer.AsSpan(indexStart, indexOfNewLine));
s = sb.ToString();
}
else
{
s = new string(_charBuffer, indexStart, indexOfNewLine);
}
char ch = _charBuffer[_charPos + indexOfNewLine];
_charPos += indexOfNewLine + 1;
// if it is two chars for the new-line character then inc the _charPos
if (ch == '\r' && (_charPos < _charLen || ReadBuffer() > 0))
{
if (_charBuffer[_charPos] == '\n')
{
_charPos++;
}
}
return s;
}
int totalRead = _charLen - _charPos;
sb.Append(_charBuffer.AsSpan(_charPos, totalRead));
} while (ReadBuffer() > 0);
return sb.ToString();
} BenchmarkDotNet=v0.13.1.1786-nightly, OS=macOS Monterey 12.3.1 (21E258) [Darwin 21.4.0]
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.3.22179.4
[Host] : .NET 7.0.0 (7.0.22.17504), X64 RyuJIT
Job-QWLAYE : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
Job-PVIZJQ : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=20
MinIterationCount=15 WarmupCount=1
|
I don't think you can have a faster path here without making shaky assumptions. I think just IndexOfAny is good. |
@danmoseley Do we need something else in this PR or we still waiting on something? (I don't know what to change) |
@danmoseley @adamsitnik any update on the PR? |
oops, overlooked this one. @stephentoub do you have any remaining concerns about the approach (I do think we need to stop returning to pool on array) |
This is going to conflict with #69888. @GrabYourPitchforks' change is fixing some functional issues in addition to perf optimizations. I know this PR has been open longer, but I'd like to land @GrabYourPitchforks' change first and then revisit this. |
I need to look into #69888 to merge it correctly then, I guess. Tag me, please, when it's going to be merged? |
Any updates? @danmoseley |
@Trapov unfortunately #69888 was paused for a while as @GrabYourPitchforks was working on something else. That's been picked up again, although I see there's test failures. So unfortunately, this remains waiting.. |
No problem, just tag me please when it's going to be merged. @danmoseley |
@Trapov, thanks for your patience and for working on this. #69888 is about to be merged, but at this point I think it also covers most or all of what you were trying to accomplish in this PR. If there's anything additional to be done, once that's merged you can rebase and filter this down to just the incremental portions. Otherwise, we can probably close it. Again, thank you. |
@stephentoub it covers mostly everything as you said. so probably better to close this PR. |
Ok. Thanks very much, @Trapov. |
No description provided.