Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TcpReceiveSendGetsCanceledByDispose test fails on rhel7 #52597

Closed
tmds opened this issue May 11, 2021 · 4 comments · Fixed by #52833
Closed

TcpReceiveSendGetsCanceledByDispose test fails on rhel7 #52597

tmds opened this issue May 11, 2021 · 4 comments · Fixed by #52833

Comments

@tmds
Copy link
Member

tmds commented May 11, 2021

These TcpReceiveSendGetsCanceledByDispose tests fail on our internal CI server on rhel7:

System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True)
System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False)

The tests that are failing are matching this condition:

// RHEL7 kernel has a bug preventing close(AF_UNKNOWN) to succeed with IPv6 sockets.
// In this case Dispose will trigger a graceful shutdown, which means that receive will succeed on socket2.
// TODO: Remove this, once CI machines are updated to a newer kernel.
bool expectGracefulShutdown = UsesSync && PlatformDetection.IsRedHatFamily7 && receiveOrSend && (ipv6Server || dualModeClient);

The failure exception is masked by the Timeout ([Theory(Timeout = 40000)]) that is applied to this test:

        <failure exception-type="Xunit.Sdk.TestTimeoutException">
          <message><![CDATA[Test execution timed out after 40000 milliseconds]]></message>
          <stack-trace><![CDATA[]]></stack-trace>
        </failure>

cc @antonfirsov

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net.Sockets untriaged New issue has not been triaged by the area owner labels May 11, 2021
@ghost
Copy link

ghost commented May 11, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

These TcpReceiveSendGetsCanceledByDispose tests fail on our internal CI server on rhel7:

System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True)
System.Net.Sockets.Tests.SendReceive_SpanSyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True)
System.Net.Sockets.Tests.SendReceive_Sync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True)
System.Net.Sockets.Tests.SendReceive_SyncForceNonBlocking.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: False, dualModeClient: True)
System.Net.Sockets.Tests.SendReceive_SpanSync.TcpReceiveSendGetsCanceledByDispose(receiveOrSend: True, ipv6Server: True, dualModeClient: False)

The tests that are failing are matching this condition:

// RHEL7 kernel has a bug preventing close(AF_UNKNOWN) to succeed with IPv6 sockets.
// In this case Dispose will trigger a graceful shutdown, which means that receive will succeed on socket2.
// TODO: Remove this, once CI machines are updated to a newer kernel.
bool expectGracefulShutdown = UsesSync && PlatformDetection.IsRedHatFamily7 && receiveOrSend && (ipv6Server || dualModeClient);

The failure exception is masked by the Timeout ([Theory(Timeout = 40000)]) that is applied to this test:

        <failure exception-type="Xunit.Sdk.TestTimeoutException">
          <message><![CDATA[Test execution timed out after 40000 milliseconds]]></message>
          <stack-trace><![CDATA[]]></stack-trace>
        </failure>

cc @antonfirsov

Author: tmds
Assignees: -
Labels:

area-System.Net.Sockets, untriaged

Milestone: -

@tmds
Copy link
Member Author

tmds commented May 11, 2021

cc @omajid

@tmds
Copy link
Member Author

tmds commented May 12, 2021

I have debugged this further.

This looks like a regression in the RHEL kernel. I have installed RHEL 7.9 in a VM which came with kernel 3.10.0-1160.el7.x86_64. Tests passed.
I updated my system and kernel changed to 3.10.0-1160.25.1.el7.x86_64 and now tests fail.

I'll look at writing a reproducer in C and when it fails report it as a kernel bug.

When I got the reproducer, I'd like to make a PR to skip this combination. Otherwise our internal CI will fail until there is a patched kernel. The public CI will also fail when/if it moves to the latest RHEL7 kernel.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label May 17, 2021
@tmds
Copy link
Member Author

tmds commented May 17, 2021

It turns out the test misbehaves on a system that has a bugfix for the kernel bug that is mentioned in the comments.
#52833 will fix this.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 17, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2021
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants