-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HTTP/3] Stress hack for msquic dropping connections #84793
[HTTP/3] Stress hack for msquic dropping connections #84793
Conversation
…m dropping connections
Tagging subscribers to this area: @dotnet/ncl |
/azp run runtime-libraries stress-http |
Azure Pipelines successfully started running 1 pipeline(s). |
<Compile Include="..\..\..\..\Common\src\System\Net\Logging\NetEventSource.Common.cs" Link="Common\System\Net\Logging\NetEventSource.Common.cs" /> | ||
<Compile Include="..\..\..\..\System.Net.Quic\src\System\Net\Quic\Internal\MsQuicApi.cs" Link="ProductionCode\System\Net\Quic\Internal\MsQuicApi.cs" /> | ||
<Compile Include="..\..\..\..\System.Net.Quic\src\System\Net\Quic\Internal\MsQuicSafeHandle.cs" Link="ProductionCode\System\Net\Quic\Internal\MsQuicSafeHandle.cs" /> | ||
<Compile Include="..\..\..\..\System.Net.Quic\src\System\Net\Quic\Interop\*.cs" Link="ProductionCode\System\Net\Quic\Interop\*.cs" /> | ||
<Compile Include="..\..\..\..\System.Net.Quic\src\System\Net\Quic\NetEventSource.Quic.cs" Link="ProductionCode\System\Net\Quic\NetEventSource.Quic.cs" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A refactor of these internals may break stress builds without us noticing, but I guess there is no good way to deal with this. We should remember that HttpStress
took a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: these files are missing from the docker context on Windows, which is why the Windows build failed. Copying them would make this solution even more fragile. Can we implement the worker delay tuning hack with reflection instead and share that implementation between System.Net.Quic.Functional.Tests
and the stress tests to make sure we notice if it's broken? That would also remove the need of including anything but QuicDefaults.cs
in System.Net.Quic.Functional.Tests.csproj
(apart from the utility containing the hack).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflection was one of the options we discussed with @CarnaViire when implementing this in functional tests. I was able to workaround it in functional tests with source code inclusion, which is normal for corresponding tests. However, here it proves problematic.
I could re-work this to create an internal function on MsQuicApi
which we would invoke via reflection from both functional tests as well as stress.
/azp run runtime-libraries stress-http |
Azure Pipelines successfully started running 1 pipeline(s). |
// Do not change the name and signature without looking for textual occurrences! | ||
// This method is invoked via reflection from QUIC functional and HTTP stress tests. | ||
private static (bool, string) SetUpForTests() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely the easiest option to solve this problem, but I thought including test-only methods in production assemblies is no-go. Do we have any existing precedent in the BCL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought about that, but I found this for example:
runtime/src/libraries/Common/src/System/Net/Http/aspnetcore/Http2/Hpack/HPackDecoder.cs
Lines 109 to 110 in 2d833f4
// For testing. | |
internal HPackDecoder(int maxDynamicTableSize, int maxHeadersLength, DynamicTable dynamicTable) |
I can always go back to code sharing though and try to minimize the number of shared files, either with some some reflection, strategically placed #if
and/or code stubs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HPack ctr is not "dead code" otherwise, because it's also being used by the public constructor above. For me having a private, test-only method is a good tradeoff, considering the maintenance burden that comes with other options, but we should stick to BCL guidance. @stephentoub is it ok to expose such methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combo of shared code + reflection is there. @antonfirsov could you look into copying the appropriate files in the docker container please?
src/libraries/System.Net.Quic/tests/FunctionalTests/QuicListenerTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Net.Quic/src/System/Net/Quic/Internal/MsQuicApi.cs
Outdated
Show resolved
Hide resolved
f85d1af
to
8a080e3
Compare
8a080e3
to
fa5820e
Compare
@@ -1,5 +1,5 @@ | |||
# Builds and copies library artifacts into target dotnet sdk image | |||
ARG BUILD_BASE_IMAGE=mcr.microsoft.com/dotnet-buildtools/prereqs:centos-7-f39df28-20191023143754 | |||
ARG BUILD_BASE_IMAGE=mcr.microsoft.com/dotnet-buildtools/prereqs:centos-7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8.0 will not support Centos 7 AFAIK (hurray!) we should update this at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stream9??? Or should we swap for different distro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we are running stress tests on Centos and not ubuntu, unlike enterprise tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the centos-7
images have build tools and we use them for product. I don't know if that is needed.
I think we should pick whatever is easiest for us to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm switching to centos-stream8
in #85342, which seems to work.
@@ -44,14 +44,20 @@ public abstract class QuicTestBase : IDisposable | |||
|
|||
static unsafe QuicTestBase() | |||
{ | |||
Console.WriteLine($"MsQuic {(IsSupported ? "supported" : "not supported")} and using '{MsQuicApi.MsQuicLibraryVersion}'."); | |||
// If any of the reflection bellow breaks due to changes in "System.Net.Quic.MsQuicApi", also check and fix HttpStress project as it uses the same hack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this should go somewhere to Common/test/....
so we can avoid duplication.
It may be worth of adding explanation as this is unusual craft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should keep in mind that whatever utility helper we add, its' file has to be also copied to the container, so avoiding duplication has a higher cost in complexity here than normally. At the moment I'm undecided if it's worth it, would prefer to do the docker copying work mentioned in #84793 (comment) first, planning to jump to it tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm holding back. Let me know when/if I should look into it.
@@ -158,6 +162,9 @@ private static async Task<ExitCode> Run(Configuration config) | |||
|
|||
string GetAssemblyInfo(Assembly assembly) => $"{assembly.Location}, modified {new FileInfo(assembly.Location).LastWriteTime}"; | |||
|
|||
Type msQuicApiType = typeof(QuicConnection).Assembly.GetType("System.Net.Quic.MsQuicApi"); | |||
string msQuicLibraryVersion = (string)msQuicApiType.GetProperty("MsQuicLibraryVersion", BindingFlags.NonPublic | BindingFlags.Static).GetGetMethod(true).Invoke(null, Array.Empty<object?>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ManickaP unfortunately, this line fails with NullReferenceException
in release builds. I assume the MsQuicLibraryVersion
is being trimmed away, since it's unused in the production code.
We could workaround this by moving the version extraction to a shared test-utility (together with the MaxWorkerQueue hack helper), but IMHO it's not worth it unless we think that it's super-important to know the msquic version for stress test troubleshooting.
For now I'm just removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that there is image without MsQuic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? If you are suspecting lack of msquic as the root cause of this error - it's not the case. Removing code messing with MsQuicLibraryVersion makes things work (= HTTP/3 stress runs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that there is image without MsQuic?
In that case you'd get "unknown" as the MsQuic version, the code expects that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to tell the trimmer to keep it there via DynamicDependency
. And this is just one string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to clarify whether test-only usage of DynamicDependency
is ok, without that, I prefer to defensively assume it's not.
… avoid the property being trimmed
@@ -143,7 +143,7 @@ static MsQuicApi() | |||
|
|||
if (NetEventSource.Log.IsEnabled()) | |||
{ | |||
NetEventSource.Info(null, $"Loaded MsQuic library version '{version}', commit '{gitHash}'."); | |||
NetEventSource.Info(null, $"Loaded MsQuic library '{MsQuicLibraryVersion}'."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant 😆
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
If this passes, I think it's done. Thanks for the help @antonfirsov ! |
I'm seeing this build error in Linux runs:
Anything familiar Anton? |
This failed in multiple runs on this PR, and not on main, need to check if my changes broke it. |
This comment was marked as resolved.
This comment was marked as resolved.
/azp run runtime-libraries stress-http |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement the same hack as for functional tests to prevent msquic from dropping connections, see #83687.
Fixes #77126