Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeafNodeRank does not close all connections #15

Open
ss7pro opened this issue May 31, 2022 · 3 comments
Open

LeafNodeRank does not close all connections #15

ss7pro opened this issue May 31, 2022 · 3 comments

Comments

@ss7pro
Copy link
Contributor

ss7pro commented May 31, 2022

Hey,

We just noticed that LeafNodeRank forgets about connections and never cleans some of the sockets.

[root@b4969184c538 301421]# netstat -anp | grep 11222 | grep -v ESTA
tcp 0 0 0.0.0.0:11222 0.0.0.0:* LISTEN 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34310 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34370 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34436 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34374 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34386 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34358 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34384 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34346 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34394 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34308 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34344 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34334 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34366 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34338 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34360 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34428 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34328 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34348 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34336 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34312 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34372 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34350 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34316 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34376 CLOSE_WAIT 301419/build/worklo
tcp 0 0 127.0.0.1:11222 127.0.0.1:34318 CLOSE_WAIT 301419/build/worklo

@meteorfox
Copy link

@ss7pro Thanks for reporting this.

Can you share the steps to reproduce this?

@ss7pro
Copy link
Contributor Author

ss7pro commented Jun 7, 2022

For us, it happens during the regular runs. It would be just enough to add netstat -anp | grep CLOSE_WAIT to the hooks. Initially, we thought that this might be behind some unaligned results we're getting from time to time.

@ss7pro
Copy link
Contributor Author

ss7pro commented Jun 8, 2022

We're running this on 32C/64T threads machine with locked frequencies to 1.3core and 1.7uncore

[root@localhost src]# netstat -anp | grep '0 127.0.0.1:11222' | grep WAIT | wc -l
243
[root@localhost src]# timeout 2m build/workloads/ranking/DriverNodeRank --server 0.0.0.0:11222 --threads=15 --connections=3 --qps=55
[root@localhost src]# sleep 15
[root@localhost src]# netstat -anp | grep '0 127.0.0.1:11222' | grep WAIT | wc -l
261

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants