Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNMP Exporter Timeout #507

Closed
fresaelens opened this issue Apr 24, 2020 · 6 comments · Fixed by #511
Closed

SNMP Exporter Timeout #507

fresaelens opened this issue Apr 24, 2020 · 6 comments · Fixed by #511

Comments

@fresaelens
Copy link

fresaelens commented Apr 24, 2020

Host operating system: output of uname -a

Linux hostname 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23) x86_64 GNU/Linux

snmp_exporter version: output of snmp_exporter -version

0.17.0

What device/snmpwalk OID are you using?

We are walking lots of different devices/manufacturers (Mikrotik devices, Huawei switches, Ruckus wireless switches, Ruckus Wireless APs, Huawei APs, etcetera)

What did you do that produced an error?

We are using curl to execute the query ourselves.

curl 'http://127.0.0.1:9116/snmp?target=10.1.32.110&module=mikrotik'
curl 'http://127.0.0.1:9116/snmp?target=10.1.32.110&module=huawei_switch'

What did you expect to see?

We expect to see SNMP results. Most of the time, the queries succeed and we get back the data. Now and then (there's no correlation as to when it succeeds or fails), the SNMP query times out. Actually after taking packet captures, we see that the query is reaching the device, the device answers, but the response gets lost somewhere along the way. (These queries/replies are being sent over VPN tunnels over the public Internet.)

Due to the connectionless nature of UDP, we expect the SNMP exporter to send another query (after timeout interval of 10s, retries this 3 times; default values).

What did you see instead?

However, as you can see below, the snmp exporter doesn't send another query for 60 seconds:

Screenshot 2020-04-24 at 14 40 41

@brian-brazil
Copy link
Contributor

With the defaults it should be retrying after 3.3s, are you sure you've not changed those settings?

@fresaelens
Copy link
Author

Hi,

I haven't changed any settings.

Would it be helpful if I include the generate.yml and/or snmp.yml files?

@brian-brazil
Copy link
Contributor

I don't think that'd help in this case, I'd have to dig through the upstream code again. Can you try an explicit lower timeout?

@fresaelens
Copy link
Author

Hi,

I noticed when I don’t explicitly define the retries and timeout variables, the requests only get retransmitted after 60 seconds (as described in the original post).

When I explicitly define the variables to the following: retries=3, timeout=15; then the requests are retransmitted after 45 seconds.

When I explicitly define the variables to the following: retries=3, timeout=10; then the requests are retransmitted after 30 seconds.

@brian-brazil
Copy link
Contributor

Ah, looks like the docs are wrong the default timeout is actually 20s, and the upstream docs are incorrect in terms of how the retries and timeout interact.

brian-brazil added a commit that referenced this issue Apr 27, 2020
Upstream changed how their timeout&retries interacted, so
we no longer need to adjust things.

Our docs and code were also out of sync, but let's just reduce timeouts
to 2s as that should be plenty of time for a single GETBULK

Fixes #507

Signed-off-by: Brian Brazil <[email protected]>
brian-brazil added a commit that referenced this issue Apr 28, 2020
Upstream changed how their timeout&retries interacted, so
we no longer need to adjust things.

Our docs and code were also out of sync, but let's just reduce timeouts
to 2s as that should be plenty of time for a single GETBULK

Fixes #507

Signed-off-by: Brian Brazil <[email protected]>
brian-brazil added a commit that referenced this issue May 20, 2020
Upstream changed how their timeout&retries interacted, so
we no longer need to adjust things.

Our docs and code were also out of sync, but let's just reduce timeouts
to 2s as that should be plenty of time for a single GETBULK

Fixes #507

Signed-off-by: Brian Brazil <[email protected]>
@burakkalafat
Copy link

hi brian Community defaults to raritan_public in snmp.yml. I want to call it with another community. is it possible ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants