Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conntrack loses events due to ENOBUFS #1137

Closed
2opremio opened this issue Mar 7, 2016 · 10 comments
Closed

Conntrack loses events due to ENOBUFS #1137

2opremio opened this issue Mar 7, 2016 · 10 comments
Labels
accuracy Incorrect information is being shown to the user; usually a bug bug Broken end user or developer functionality; not working as the developers intended it

Comments

@2opremio
Copy link
Contributor

2opremio commented Mar 7, 2016

<probe> ERRO: 2016/03/07 12:21:21.337873 conntrack stderr:WARNING: We have hit ENOBUFS! We are losing events.
<probe> ERRO: 2016/03/07 12:21:21.338081 conntrack stderr:This message means that the current netlink socket buffer size is too small.
<probe> ERRO: 2016/03/07 12:21:21.338189 conntrack stderr:Please, check --buffer-size in conntrack(8) manpage.
<probe> ERRO: 2016/03/07 12:21:21.338211 conntrack stderr:conntrack v1.4.3 (conntrack-tools): Operation failed: No buffer space available
<probe> ERRO: 2016/03/07 12:21:21.338141 conntrack error: EOF
<probe> INFO: 2016/03/07 12:21:21.338307 contrack exiting

Full logs: https://gist.github.com/janwillies/54083a50358718a4fb21
Context: https://weaveworks.slack.com/archives/scope-public/p1457352064000743

@2opremio 2opremio changed the title Conntrack loses events and fails Conntrack loses events and exits Mar 7, 2016
@2opremio 2opremio added this to the 0.14.0 milestone Mar 7, 2016
@2opremio
Copy link
Contributor Author

2opremio commented Mar 7, 2016

User was running kernel 4.3.5-300

@tomwilkie
Copy link
Contributor

This is kinda by design; on a system with a large rate of connections we can't keep up with conntract and it will fail. We degrade gracefully, falling back to polling.

@2opremio
Copy link
Contributor Author

2opremio commented Mar 8, 2016

Wouldn't it be reasonable to dynamically adjust the buffer size?

@tomwilkie
Copy link
Contributor

In the worst case there will always be too many events to keep up with. Its reasonable to fall back to polling.

@2opremio
Copy link
Contributor Author

2opremio commented Mar 8, 2016

I think it would be worth confirming with the user whether he was actually reaching a point in which we want to fall back to polling or whether we want to extend the buffers a bit further.

Also, it would be helpful to print a friendlier error with an explanation similar to the one in this ticket.

@tomwilkie tomwilkie removed this from the 0.14.0 milestone Mar 17, 2016
@2opremio
Copy link
Contributor Author

I've seen this again in our service:

probe> ERRO: 2016/05/11 16:54:00.105107 conntrack stderr:WARNING: We have hit ENOBUFS! We are losing events.
<probe> ERRO: 2016/05/11 16:54:00.105178 conntrack stderr:This message means that the current netlink socket buffer size is too small.
<probe> ERRO: 2016/05/11 16:54:00.105200 conntrack stderr:Please, check --buffer-size in conntrack(8) manpage.
<probe> ERRO: 2016/05/11 16:54:00.105211 conntrack stderr:conntrack v1.4.3 (conntrack-tools): Operation failed: No buffer space available
<probe> ERRO: 2016/05/11 16:54:00.203215 conntrack error: EOF
<probe> INFO: 2016/05/11 16:54:00.203289 contrack exiting
<probe> ERRO: 2016/05/11 16:54:00.203360 conntrack error: exit status 1
<probe> ERRO: 2016/05/11 16:54:00.583857 conntrack stderr:WARNING: We have hit ENOBUFS! We are losing events.
<probe> ERRO: 2016/05/11 16:54:00.584331 conntrack stderr:This message means that the current netlink socket buffer size is too small.
<probe> ERRO: 2016/05/11 16:54:00.584588 conntrack stderr:Please, check --buffer-size in conntrack(8) manpage.
<probe> ERRO: 2016/05/11 16:54:00.590787 conntrack stderr:conntrack v1.4.3 (conntrack-tools): Operation failed: No buffer space available
<probe> ERRO: 2016/05/11 16:54:00.778920 conntrack error: EOF
<probe> INFO: 2016/05/11 16:54:00.779309 contrack exiting
<probe> ERRO: 2016/05/11 16:54:00.779656 conntrack error: exit status 1

@rade rade added the bug Broken end user or developer functionality; not working as the developers intended it label Jul 4, 2016
@rade
Copy link
Member

rade commented Aug 25, 2016

Note that when encountering this (or any other) error we immediately spawn another conntrackWalker, so we generally won't fall back to polling for long. If spikes are frequent though then a larger buffer would be better. Plus the message is alarming to users. So we should consider

  1. making the buffer size configurable
  2. using a higher default value (than provided by /proc/sys/net/core/rmem_default)
  3. printing a friendlier error, noting the impact, current limit, and how to change it

@rade rade changed the title Conntrack loses events and exits Conntrack loses events due to ENOBUFS Aug 25, 2016
@tomwilkie
Copy link
Contributor

(1) done in #1896

@rade
Copy link
Member

rade commented Nov 11, 2016

The error message we log from conntrack is now rather misleading since it suggests that the buffer size can be adjusted in /proc/sys/net/core/rmem_default, which won't help since we have a hard-coded default value.

Perhaps the default value should be read from /proc/sys/net/core/rmem_default? Though that won't help once we do (2).

@rade rade added the accuracy Incorrect information is being shown to the user; usually a bug label Jan 11, 2017
@rade
Copy link
Member

rade commented Aug 14, 2017

We increased the default buffer size significantly in #2739.

I don't think it's worth doing anything more here.

@rade rade closed this as completed Aug 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accuracy Incorrect information is being shown to the user; usually a bug bug Broken end user or developer functionality; not working as the developers intended it
Projects
None yet
Development

No branches or pull requests

3 participants