Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some 'context canceled' error in logs #487

Closed
shdxiang opened this issue Mar 4, 2020 · 10 comments · Fixed by #488
Closed

Some 'context canceled' error in logs #487

shdxiang opened this issue Mar 4, 2020 · 10 comments · Fixed by #488

Comments

@shdxiang
Copy link

shdxiang commented Mar 4, 2020

Host operating system: output of uname -a

Docker container: Linux d8827dda937a 4.19.95-flatcar #1 SMP Sat Feb 8 07:25:12 -00 2020 x86_64 GNU/Linux

snmp_exporter version: output of snmp_exporter -version

version=0.17.0, branch=HEAD, revision=f0ad4551a5c2023e383bc8dde2222f47dc760b83
f0ad455

What device/snmpwalk OID are you using?

generator.yml

nas:
    version: 2
    auth:
      community: public
    walk:
      - 1.3.6.1.4.1.24681.1.3.1 # systemCPU-UsageEX
      - 1.3.6.1.4.1.24681.1.3.11 # systemHdTableEX
      - 1.3.6.1.4.1.24681.1.3.15 # systemFanTable
      - 1.3.6.1.4.1.24681.1.3.17 # systemVolumeTable

If this is a new device, please link to the MIB(s).

Download from QNAP itself.

What did you do that produced an error?

I can get the metrics, but sometimes prometheus says that target is unreachable. And there are several error in snmp-exporter logs.

What did you expect to see?

No error in logs.

What did you see instead?

snmp-exporter_1       | level=info ts=2020-03-04T07:20:52.964Z caller=collector.go:216 module=dell_idrac target=172.30.60.220:161 msg="Error scraping target" err="error walking target 172.30.60.220: context canceled"
snmp-exporter_1       | level=info ts=2020-03-04T07:21:28.405Z caller=collector.go:216 module=nas target=172.30.50.104:161 msg="Error scraping target" err="error getting target 172.30.50.104: context canceled"
snmp-exporter_1       | level=info ts=2020-03-04T07:22:01.730Z caller=collector.go:216 module=dell_idrac target=172.30.60.209:161 msg="Error scraping target" err="error walking target 172.30.60.209: context canceled"
snmp-exporter_1       | level=info ts=2020-03-04T07:24:28.404Z caller=collector.go:216 module=nas target=172.30.50.104:161 msg="Error scraping target" err="error getting target 172.30.50.104: context canceled"
snmp-exporter_1       | level=info ts=2020-03-04T07:28:26.369Z caller=collector.go:216 module=dell_idrac target=172.30.60.211:161 msg="Error scraping target" err="error walking target 172.30.60.211: context canceled"
snmp-exporter_1       | level=info ts=2020-03-04T07:34:28.405Z caller=collector.go:216 module=nas target=172.30.50.104:161 msg="Error scraping target" err="error getting target 172.30.50.104: context canceled"
@brian-brazil
Copy link
Contributor

That's a timeout, your device is presumably slow or far away from the exporter.

@shdxiang
Copy link
Author

shdxiang commented Mar 4, 2020

@brian-brazil They are in the same LAN, maybe the device sent response too slow.

@shdxiang shdxiang closed this as completed Mar 4, 2020
@SuperQ
Copy link
Member

SuperQ commented Mar 4, 2020

@brian-brazil We could improve the error by checking if err == context.Canceled.

SuperQ added a commit that referenced this issue Mar 4, 2020
Catch SNMP context canceled errors so they are displayed to the user as
timeouts.

Fixes: #487

Signed-off-by: Ben Kochie <[email protected]>
SuperQ added a commit that referenced this issue Mar 4, 2020
Catch SNMP context canceled errors so they are displayed to the user as
timeouts.

Fixes: #487

Signed-off-by: Ben Kochie <[email protected]>
SuperQ added a commit that referenced this issue Mar 4, 2020
Catch SNMP context canceled errors so they are displayed to the user as
scrape canceled.

Fixes: #487

Signed-off-by: Ben Kochie <[email protected]>
brian-brazil pushed a commit that referenced this issue Mar 4, 2020
Catch SNMP context canceled errors so they are displayed to the user as
scrape canceled.

Fixes: #487

Signed-off-by: Ben Kochie <[email protected]>
@shdxiang
Copy link
Author

I have troubleshooted this issue for several days, noticed that it is not a simple timeout, my steps are:

  1. For debugging, I simplified the snmp.yml as:
qnap:
  get:
  - 1.3.6.1.4.1.24681.1.3.1.0
  version: 2
  auth:
    community: public

Start snmp_exporter on a Linux server(172.30.50.121) in the same LAN with target QNAP(172.30.50.104):

./snmp_exporter --log.level=debug

Run a script which request to snmp_exporter on 172.30.50.121:

exporter.sh

set -e
while true;
do
   curl -m 10 'http://localhost:9116/snmp?module=qnap&target=172.30.50.104'
   sleep 1
done

It will report timeout in minutes, and the snmp_exporter logged:

...
level=debug ts=2020-03-13T02:35:34.258Z caller=main.go:99 module=qnap target=172.30.50.104:161 msg="Starting scrape"
level=debug ts=2020-03-13T02:35:34.259Z caller=collector.go:132 module=qnap target=172.30.50.104:161 msg="Getting OIDs" oids=1
level=info ts=2020-03-13T02:36:34.259Z caller=collector.go:225 module=qnap target=172.30.50.104:161 msg="Error scraping target" err="scrape canceled (possible timeout) getting target 172.30.50.104"
level=debug ts=2020-03-13T02:36:34.259Z caller=main.go:110 module=qnap target=172.30.50.104:161 msg="Finished scrape" duration_seconds=60.000760423
  1. I tested the same OID with snmpget, also on 172.30.50.121:

snmpget.sh:

set -e
while true;
do
   snmpget -t 5 -v 2c -c public 172.30.50.104 1.3.6.1.4.1.24681.1.3.1.0
   sleep 1
done

But this script never timeout.

  1. I captured their packets with tcpdump, attached them, please help to analyze:
    tcpdump.zip

.

@shdxiang shdxiang reopened this Mar 13, 2020
@brian-brazil
Copy link
Contributor

Those look identical on the wire, so this is probably some networking issue on the box itself. I'd suggest checking your routing setup and which interfaces are being used.

@shdxiang
Copy link
Author

@brian-brazil I am confused, If the NAS did not work well, snmpget should timeout too, right?

@brian-brazil
Copy link
Contributor

netsnmp does some non-standard network stuff iirc, something about not checking that packets are coming back from where you sent them to. So it may work in situations where your routing is actually broken.

@shdxiang
Copy link
Author

Actually the test server and the QNAP NAS are connected to the same switch, so I think there should not have routing issue.

@brian-brazil
Copy link
Contributor

I'd suggest using strace then to see if the packets are making it to the snmp exporter.

@shdxiang
Copy link
Author

I have run tcpdump on the NAS itself, and it captured the response, but the tcpdump on the Linux server which run snmp_exporter did not get that response, so this should be a network issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants