-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate SNMP plugin #1389
Alternate SNMP plugin #1389
Conversation
This looks great, thank you very much @phemmer, I think this will be a good improvement. leave it named snmp2 for now and I'll decide how best to go about maintaining the both of them. Does this plugin also support snmp v3? v1? if not, could it? Please also add some documentation of each of the arguments, thanks! |
|
||
const description = `Retrieves SNMP values from remote agents` | ||
const sampleConfig = ` | ||
[[inputs.snmp2]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[inputs.snmp2]] doesn't need to be here, it gets added automatically
Updated to address comments. Still WIP though (more cleanup, more tests, & documentation)
It didn't, but does now. Are there any objections to having the tests start up a |
@titilambert do you mind reviewing this as well? |
@phemmer I don't mind starting an snmpd process, if it's available. Those tests should be skipped in "short" mode. |
@sparrc As you guess, unfortunately, I'm really really busy for a few months (and I will for one or two more months :/ ) I will try a quick review today. |
Because whoops :-) |
@phemmer I just don't get why you rewrite something new. Why not patch the first one ? |
@titilambert there are quite a few issues and feature requests for the current SNMP plugin. The current plugin is rather complicated and that makes it difficult for me to maintain or have any hope of adding features. If it's difficult for me, then I can only imagine the difficulties for someone in the open-source community. |
@titilambert largely because of what @sparrc mentioned. Changing the config syntax requires changing the structs, and once you change those, you have to change the code that uses them, and the simpler syntax I proposed was radically different from the existing syntax, which means radically different code. And on top of the config, being able to support some of the things proposed also meant pretty radical change. Sometimes it's easier to just start from scratch (and less buggier in the end). We use SNMP very heavily where I work, and because the existing SNMP plugin doesn't work for us, we needed write a new one anyway. The plan was to propose this upstream, and if if gets merged then great, but otherwise we'd just use the plugin internally. |
I would agree that while the current SNMP plugin is great and works, it has limitations and can be very confusing to get it initially configured. I think its for that reason a lot of people are still using influxsnmp with grafana/influxdb instead of telegraf. One thing that is nice about influxsnmp is that it was understood and meant for network devices. It knew you wanted to collect ifHCInOctets, ifHCOutOctets, etc and you could specify specific interfaces if you wanted to vs having to collect on every interface regardless. I'd personally like to see more flexibility long term in the SNMP plugin. |
Hello guys, new to using the snmp plugin for telegraf. I need to query snmp tables and preferably add them as tables into InfluxDB, I came across this discussion, what is the suggested path now? |
I still need to do cleanup, tests & docs. Probably won't get to it until Wednesday at the earliest. Worst case, this weekend as it's a nice 3 day weekend, so plenty of time. |
thank you @phemmer , will this go upstream as snmp or snmp2, will this be part of the next telegraf stable build? |
@sunnos9 that's yet to be determined, I wouldn't expect this in a stable build for at least a month. |
I've started testing this plugin on our network and here are a few things:
I will need to work with ~5k hosts, so I'll try to make this plugin query agents concurrently and will report back after broader testing. |
Did a bunch of code cleanup (including support for max repetitions, and byteslice fields), and added some more tests. I think this is about ready for formal review.
(And yes, still have to update changelog & readme, and squash commits. Will do as very last thing) |
Thanks for fixes. Another issues I've encountered:
|
Likely like some sort of bug in the
Agree with the prefix thing. However this does raise a good point. I think what I'm going to do is just print the errors as they are received instead of collecting them and dumping them all to the telegraf core to print out. Plugins with partial success/failure is unfortunately something telegraf doesn't handle very well. |
I understand, problem can be even lower - in OS network layer. That's why I suggested closing socket on error. Patch looks like this
I've captured a dump, but didn't find any anomalies, I can send it to you privately if you want to take a look.
SGTM |
That'd be great if you could. patrick.hemmer@gmail |
Is there ETA for this plugin? |
@kaey found the cause of the I'll work on a patch for the issue and submit it upstream (in the next day or two, not tonight). Also after looking at the use case you sent me, I think there's another feature that would be highly beneficial to include. A Also due to the multiple errors per interval thing, I think I'm going to make this PR wait for the resolution to #1446 (which I am also working on). |
@kirillkovalenko at the least a month out before appearing in a stable build. #1389 (comment) |
I've got my dashboard buildout on hold until then :) Excited for this new SNMP plugin. |
Ok, I've pushed an update which reconnects on any error from gosnmp, and includes the I've also submitted a PR on gosnmp (gosnmp/gosnmp#69) to fix the underlying issue. Until that PR is merged, I would recommend setting Still haven't done anything about prefixing errors with the agent address. Waiting on #1446 for that. |
@phemmer is this ready to go from your perspective (besides naming)? Could you writeup a README as well? |
Kinda. I was going to wait for #1446 to address some logging issues. But I could somewhat address them without that, and then fix it properly in the PR for #1446. There are 2 outstanding questions though:
|
don't worry about the |
I understand your point of view, however in my opinion depending of the scenario one approach is more efficient than the other, if you are interested in the majority of the entries of the table, a snmpbulk is the best option, and in the other case a snmp get fits better Regarding the cisco link, it's a little tricky, because if SNMP polling is really harmless why they recommend (below in that link) about how to reduce the data to be poll using views, or how to configure the server to specify oids ;) The problem behind the router is not only the snmp agent, sometimes the data could be store/handled by other processes inside the router, so a inter process interaction begins, and depending of the amount of interaction cpu spikes could arise BTW: Thanks for you prompt answer 👍 |
This is incorrect. While there is separation between routing engine (RE) and packet forwarding engine (PFE), daemons, such as bgpd, snmpd and sshd, run on a single RE, thus noise from one will affect the other.
snmpget requires roundtrip per every single metric. I don't think there's a situation, where get is more efficient then bulkwalk, unless you query 5 metrics vs 500. |
Correct, I was referring to routing traffic. The other services might see the impact from snmp usage, but the snmp usage will be very brief (and a few thousand interfaces is very unlikely to cause any noticeable spike in cpu), and thus have no effective impact. |
@kaey In a test doing continuous polling to any table inside jnxBgpM2Peer, when you have more than 4k sessions, you'll see constant high cpu usage from routing daemons, and if you push further you can even affect the sessions. Regarding
I'm totally agree, and In the case of interfaces for service providers this is basically what happens... that you have many customers/services on subinterfaces, but you are only interested on the physical interface statistic. |
@sparrc |
@sparrc you are correct, however the counterpart of using snmp.field (besides what @sparrc mentioned) is that you need to previously know the table index in order to build the full oid, this approach does not scale because the index could change from time to time. So it would be nice to have a table approach that after an initial get_bulk it gather all indexes (and after a filtering), it begins to do gets to specific in other fields of the table. |
GREAT work on another SNMP plugin! The open-source world is sorely lacking anything that is both configurable and scalable and performant. Regarding the discussion about snmpwalk and snmpget.
That is INCORRECT. Let me prove it: By issuing I get all these metrics in one go:
AND before you think there is some magic happening with the snmpget tool. This actual tcpdump shows that it does not:
So it is perfectly possible to make snmpgets much more efficient that one-metric-one-request that some of you assume is the default/only way :-D I did this back a couple of years ago with collectd and observed reduced CPU usage on both server and switch when combining a few known groups of metrics (such as interface metrics which tend to be standard). Wrote about it here: http://www.peritusconsulting.no/articles/2014-06-02-next-generation-monitoring-using-opentsdb.html#Collection (The 2200MHz to 1800MHz on collectd is from combining a few different data type and definitions so that even more metrics are pulled for every SNMP request). |
Good catch :) |
I'm getting errors collecting data from an APC UPS over SNMPv1 - my config is
and telegraf.log displays errors like:
Both OIDs resolve properly and are retrievable via |
@wlcx You might want to hide your public IP in the snippet since I'm guessing you are also using |
@StianOvrevage read-only and behind a firewall, but snipped nonetheless :) |
@wlcx Did you try to add a '.' in the begining of your oid?
|
Leading dot is optional. |
sure.
P.S. In case you were wondering, I have no idea why the UPS is called Derrick :p |
Ah, you need to use I'll look into logging a more useful message, such as |
Ah, spot on, that works perfectly. I wonder if it would be worth defaulting to an instance of 0 if one is not given - this must be what snmpwalk etc do. Thanks all for the help! |
Comparing to |
I've been excitedly waiting for this to merge. Now that it's in, I've got a dumb, somewhat off topic question. How can I get this on telegraf now without waiting for the official 1.0 release? |
@lizaoreo It should be in the nightlies (i haven't confirmed though): https://influxdata.com/downloads/ Edit: Looks like we have an -rc1, probably in there too |
We have 1.0 release candidates available which have this new SNMP plugin, it is available here: https://github.com/influxdata/telegraf#installation |
@3fr61n can you give your config example where tagdrop is working for you? Thanks! |
Here is an example where I just gather physical interface stats and drop any sub-interface stats ''' name = "snmp_data" [[inputs.snmp.table]] And the output from the test is '''
|
@3fr61n thanks. I'm having a terrible time getting tagdrop to work. Even if i put a wildcard in I still see results in my test output. My goal is to try to get rid of the interfaces with no description (IfAlias below) but I can't even seem to get rid of anything with my tagdrop config in just testing it with the wildcard. Do you see anything odd I have going on?
I snipped some of the output lines
|
I think you have a syntax issue, please check below my config ''' [[inputs.snmp]] name = "snmp_data" [[inputs.snmp.table]]
''' root@052f081c68de:/etc/telegraf# telegraf -test -config my.conf
|
Thanks, that double bracket was it. 👍 Now I just need to get the glob match for the space |
This is a proposal for an alternate SNMP plugin.
I wrote this because of issues like #1371, #808 & #1361
I also found the configuration of the existing SNMP plugin to be overly complex & confusing.
This new plugin has:
snmptranslate_file
)Example usage:
SNMP data:
Config:
Output:
Direction
The PR is incomplete. I wasn't sure if this would be accepted at all, and if it was, should it replace the existing SNMP plugin, or go in as an alternate plugin.
Depending on the responses, I can clean up the code a little bit, add more documentation, more tests, and take care of the other stuff like changelog & readme.
Required for all PRs: