Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully handle TCP Input listener.Accept errors #296

Merged
merged 6 commits into from
May 7, 2021

Conversation

jsirianni
Copy link
Member

@jsirianni jsirianni commented May 6, 2021

Description of Changes

Added continue to reset the listener.Accept loop. This resolves an issue where Stanza will panic when listener.Accept returns an error.

This introduces a lot of debug logging, so I also implemented time.Sleep() with an automatic backoff. The initial sleep time is 100ms but will backoff to three seconds. The log looks like this:

...
{"level":"debug","timestamp":"2021-05-06T19:33:31.286-0400","message":"Received connection: 10.33.104.10:41696","operator_id":"$.tcp_input_1","operator_type":"tcp_input"}
{"level":"debug","timestamp":"2021-05-06T19:33:31.296-0400","message":"Received connection: 10.33.104.10:41698","operator_id":"$.tcp_input_1","operator_type":"tcp_input"}
{"level":"debug","timestamp":"2021-05-06T19:33:31.305-0400","message":"Received connection: 10.33.104.10:41700","operator_id":"$.tcp_input_1","operator_type":"tcp_input"}
{"level":"debug","timestamp":"2021-05-06T19:33:31.305-0400","message":"Listener accept error","operator_id":"$.tcp_input_1","operator_type":"tcp_input","error":"accept tcp [::]:5000: accept4: too many open files"}
{"level":"debug","timestamp":"2021-05-06T19:33:31.406-0400","message":"Listener accept error","operator_id":"$.tcp_input_1","operator_type":"tcp_input","error":"accept tcp [::]:5000: accept4: too many open files"}
{"level":"debug","timestamp":"2021-05-06T19:33:31.606-0400","message":"Listener accept error","operator_id":"$.tcp_input_1","operator_type":"tcp_input","error":"accept tcp [::]:5000: accept4: too many open files"}
{"level":"debug","timestamp":"2021-05-06T19:33:32.008-0400","message":"Listener accept error","operator_id":"$.tcp_input_1","operator_type":"tcp_input","error":"accept tcp [::]:5000: accept4: too many open files"}
{"level":"debug","timestamp":"2021-05-06T19:33:32.809-0400","message":"Listener accept error","operator_id":"$.tcp_input_1","operator_type":"tcp_input","error":"accept tcp [::]:5000: accept4: too many open files"}
{"level":"debug","timestamp":"2021-05-06T19:33:34.409-0400","message":"Listener accept error","operator_id":"$.tcp_input_1","operator_type":"tcp_input","error":"accept tcp [::]:5000: accept4: too many open files"}
{"level":"debug","timestamp":"2021-05-06T19:33:37.410-0400","message":"Listener accept error","operator_id":"$.tcp_input_1","operator_type":"tcp_input","error":"accept tcp [::]:5000: accept4: too many open files"}
{"level":"debug","timestamp":"2021-05-06T19:33:40.410-0400","message":"Listener accept error","operator_id":"$.tcp_input_1","operator_type":"tcp_input","error":"accept tcp [::]:5000: accept4: too many open files"}
{"level":"debug","timestamp":"2021-05-06T19:33:43.410-0400","message":"Listener accept error","operator_id":"$.tcp_input_1","operator_type":"tcp_input","error":"accept tcp [::]:5000: accept4: too many open files"}
...

Notice that the "too many open files" errors max out at 3 seconds apart. Without this throttling, the log is very verbose.

Resolves #295

Please check that the PR fulfills these requirements

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Add a changelog entry (for non-trivial bug fixes / features)
  • CI passes

@jsirianni jsirianni requested a review from djaglowski May 6, 2021 23:47
@djaglowski
Copy link
Member

Log Files Logs / Second CPU Avg (%) CPU Avg Δ (%) Memory Avg (MB) Memory Avg Δ (MB)
1 1000 1.5345516 +0.06903887 124.0726 -2.3069763
1 5000 5.0517445 -0.310431 130.71498 -1.7572784
1 10000 9.965831 -0.44811726 139.47765 +0.43049622
1 50000 49.834866 +0.64421844 171.87958 +0.7805786
1 100000 97.51833 +1.5344162 238.10251 +3.543106
10 100 1.9656388 -0.1034044 128.26024 -0.6932831
10 500 6.06903 -0.17242146 135.47185 +3.1016998
10 1000 12.138115 +0.3104086 140.981 -0.50875854
10 5000 62.0675 +5.9522667 183.48236 +10.645752
10 10000 107.32669 -3.8970947 229.58002 -1.3461609

djaglowski
djaglowski previously approved these changes May 7, 2021
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@djaglowski
Copy link
Member

I believe go mod tidy or make tidy should fix the unit test compilation issues.

@djaglowski
Copy link
Member

Log Files Logs / Second CPU Avg (%) CPU Avg Δ (%) Memory Avg (MB) Memory Avg Δ (MB)
1 1000 1.5000176 -0.08619964 121.34537 -5.028961
1 5000 5.155166 -0.2587638 131.1541 -1.6130066
1 10000 10.413852 -0.2764902 141.43831 -0.059799194
1 50000 50.49422 -0.5888901 168.76764 -5.5724792
1 100000 94.77741 -3.8307648 227.99986 -12.657059
10 100 1.9827919 -0.3967793 128.97226 -1.0856628
10 500 6.258715 -0.34499407 138.10574 +3.6452026
10 1000 12.431083 +0.17251587 143.27559 -3.0829773
10 5000 57.895782 +2.7603989 182.8327 -0.9163513
10 10000 104.295906 -1.6114197 217.8063 -1.9267273

@codecov
Copy link

codecov bot commented May 7, 2021

Codecov Report

Merging #296 (56bda2c) into master (37ab164) will increase coverage by 0.06%.
The diff coverage is 77.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #296      +/-   ##
==========================================
+ Coverage   70.56%   70.62%   +0.06%     
==========================================
  Files         112      112              
  Lines        5961     5970       +9     
==========================================
+ Hits         4206     4216      +10     
  Misses       1307     1307              
+ Partials      448      447       -1     
Impacted Files Coverage Δ
operator/builtin/input/tcp/tcp.go 66.67% <77.78%> (+1.04%) ⬆️
operator/builtin/input/file/file.go 76.03% <0.00%> (+2.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 37ab164...56bda2c. Read the comment docs.

@djaglowski
Copy link
Member

Log Files Logs / Second CPU Avg (%) CPU Avg Δ (%) Memory Avg (MB) Memory Avg Δ (MB)
1 1000 1.534508 -0.05170834 126.684944 +5.0653305
1 5000 4.7759814 -0.31043053 133.45123 +0.03730774
1 10000 10.758965 +0.32759857 139.6755 -0.1376648
1 50000 50.016895 +1.4127808 171.86462 +0.10357666
1 100000 99.098495 +2.551033 251.13942 +15.808472
10 100 2.068978 -0.03451705 124.425514 -3.6916733
10 500 5.931126 +0.12059975 134.92807 -1.2602386
10 1000 12.534426 +0.25835228 144.08783 +2.2704773
10 5000 56.156773 -2.8985481 181.45042 +2.0129242
10 10000 107.70651 -1.7411652 220.86746 -10.386581

@jsirianni jsirianni requested a review from djaglowski May 7, 2021 16:01
@jsirianni jsirianni merged commit 63883bc into master May 7, 2021
@jsirianni jsirianni deleted the tcp-operator-handle-listener-Accept branch May 7, 2021 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TCP Input panics when listener.Accept() returns an error
2 participants