Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix epoll TCP implementation #3940

Merged
merged 7 commits into from
Oct 25, 2023
Merged

Fix epoll TCP implementation #3940

merged 7 commits into from
Oct 25, 2023

Conversation

csujedihy
Copy link
Contributor

@csujedihy csujedihy commented Oct 25, 2023

Description

There are a few bugs in the epoll TCP implementation.

  1. TX: busy spinning on non-blocking socket regardless of whether eagain/ewouldblock is encountered.
  2. RX: not draining all of the ready data upon EPOLLIN signal.
  3. Wrong buffer length passed to read, which causes debug build to crash consistently.
  4. Unnecessary RSS config for TCP server sockets.

Local perf test in WSL showed 2x perf improvement with loopback but the numbers from the perf machines didn't seem to move at all.

Testing

CI

@csujedihy csujedihy requested a review from a team as a code owner October 25, 2023 18:47
@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

Merging #3940 (eedd587) into main (102a491) will decrease coverage by 0.14%.
Report is 2 commits behind head on main.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #3940      +/-   ##
==========================================
- Coverage   86.63%   86.50%   -0.14%     
==========================================
  Files          56       56              
  Lines       16901    16901              
==========================================
- Hits        14643    14621      -22     
- Misses       2258     2280      +22     

see 17 files with indirect coverage changes

@nibanks
Copy link
Member

nibanks commented Oct 25, 2023

Local perf test in WSL showed 2x perf improvement with loopback but the numbers from perf machine didn't seem to move at all.

I assume this means our baremetal tests weren't ever hitting this partial sends/recvs (by chance)?

@csujedihy
Copy link
Contributor Author

csujedihy commented Oct 25, 2023

Local perf test in WSL showed 2x perf improvement with loopback but the numbers from perf machine didn't seem to move at all.

I assume this means our baremetal tests weren't ever hitting this partial sends/recvs (by chance)?

I believe they do hit the partial sends/recvs most of the time. If not, it means we are not posting data faster than the socket buffers get drained by the kernel. It's also why the continue was added in the first place in the original PR because otherwise, the sends won't fully complete.

The baremetal loopback tests without encryption also show very slow numbers. Much slower than what I got on my WSL VM actually. I remember I was able to hit 90Gbps with netput on my Z240 WSL VM with 1 connection while on the host I was able to get only 30-50Gbps inconsistently. There's definitely something not quite right.

@nibanks
Copy link
Member

nibanks commented Oct 25, 2023

Local perf test in WSL showed 2x perf improvement with loopback but the numbers from perf machine didn't seem to move at all.

I assume this means our baremetal tests weren't ever hitting this partial sends/recvs (by chance)?

I believe they do hit the partial sends/recvs most of the time. If not, it means we are not posting data faster than the socket buffers get drained by the kernel. It's also why the continue was added in the first place in the original PR because otherwise, the sends won't fully complete.

The baremetal loopback tests without encryption also show very slow numbers. Much slower than what I got on my WSL VM actually. I remember I was able to hit 90Gbps with netput on my Z240 WSL VM with 1 connection while on the host I was able to get only 30-50Gbps inconsistently. There's definitely something not quite right.

The difference here is that we're doing encrypted TCP. This is much different than simple netput. Crypto is a huge bottleneck.

@csujedihy
Copy link
Contributor Author

Local perf test in WSL showed 2x perf improvement with loopback but the numbers from perf machine didn't seem to move at all.

I assume this means our baremetal tests weren't ever hitting this partial sends/recvs (by chance)?

I believe they do hit the partial sends/recvs most of the time. If not, it means we are not posting data faster than the socket buffers get drained by the kernel. It's also why the continue was added in the first place in the original PR because otherwise, the sends won't fully complete.
The baremetal loopback tests without encryption also show very slow numbers. Much slower than what I got on my WSL VM actually. I remember I was able to hit 90Gbps with netput on my Z240 WSL VM with 1 connection while on the host I was able to get only 30-50Gbps inconsistently. There's definitely something not quite right.

The difference here is that we're doing encrypted TCP. This is much different than simple netput. Crypto is a huge bottleneck.

wait, "encrypt:0" not honored for TCP?

@nibanks
Copy link
Member

nibanks commented Oct 25, 2023

wait, "encrypt:0" not honored for TCP?

No, that feature isn't supported for TCP.

@nibanks nibanks merged commit df4cc62 into main Oct 25, 2023
407 of 411 checks passed
@nibanks nibanks deleted the huanyi/linux-tcp-fix branch October 25, 2023 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants