Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternate SNMP plugin #1389

Merged
merged 2 commits into from
Aug 22, 2016
Merged

Alternate SNMP plugin #1389

merged 2 commits into from
Aug 22, 2016

Conversation

phemmer
Copy link
Contributor

@phemmer phemmer commented Jun 20, 2016

This is a proposal for an alternate SNMP plugin.

I wrote this because of issues like #1371, #808 & #1361
I also found the configuration of the existing SNMP plugin to be overly complex & confusing.

This new plugin has:

Example usage:

SNMP data:

.1.2.3.0.0.1.0 octet_str "foo"
.1.2.3.0.0.1.1 octet_str "bar"
.1.2.3.0.0.102 octet_str "bad"
.1.2.3.0.0.2.0 uinteger 1
.1.2.3.0.0.2.1 uinteger 2
.1.2.3.0.0.3.0 octet_str "0.123"
.1.2.3.0.0.3.1 octet_str "0.456"
.1.2.3.0.0.3.2 octet_str "9.999"
.1.2.3.0.1 octet_str "baz"
.1.2.3.0.2 integer 234

Config:

[[inputs.snmp2]]
    uris = [ "snmp2c://localhost:1161" ]

    name = "normalvalues"
    [[inputs.snmp2.field]]
        name = "hostname"
        is_tag = true
        oid = ".1.2.3.0.1"
    [[inputs.snmp2.field]]
        name = "loadavg"
        oid = ".1.2.3.0.2"
        conversion = "float(2)" # converts 123 to 1.23

    [[inputs.snmp2.table]]
        name = "mytable"
        inherit_tags = ["hostname"]
        [[inputs.snmp2.table.field]]
            name = "tablefield1"
            oid = ".1.2.3.0.0.1"
            is_tag = true
        [[inputs.snmp2.table.field]]
            name = "tablefield2"
            oid = ".1.2.3.0.0.2"
        [[inputs.snmp2.table.field]]
            name = "tablefield3"
            oid = ".1.2.3.0.0.3"
            conversion = "float" # converts "0.123" to 0.123

Output:

* Plugin: snmp2, Collection 1
> normalvalues,agent_host=localhost,host=myhost,hostname=baz loadavg=2.34 1466400920000000000
> mytable,agent_host=localhost,host=myhost,hostname=baz,tablefield1=bar tablefield2=2i,tablefield3=0.456 1466400920000000000
> mytable,agent_host=localhost,host=myhost,hostname=baz,tablefield1=foo tablefield2=1i,tablefield3=0.123 1466400920000000000

Direction

The PR is incomplete. I wasn't sure if this would be accepted at all, and if it was, should it replace the existing SNMP plugin, or go in as an alternate plugin.
Depending on the responses, I can clean up the code a little bit, add more documentation, more tests, and take care of the other stuff like changelog & readme.

Required for all PRs:

  • CHANGELOG.md updated
  • Sign CLA (if not already signed)
  • README.md updated (if adding a new plugin)

@sparrc
Copy link
Contributor

sparrc commented Jun 20, 2016

This looks great, thank you very much @phemmer, I think this will be a good improvement.

leave it named snmp2 for now and I'll decide how best to go about maintaining the both of them.

Does this plugin also support snmp v3? v1? if not, could it?

Please also add some documentation of each of the arguments, thanks!


const description = `Retrieves SNMP values from remote agents`
const sampleConfig = `
[[inputs.snmp2]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[[inputs.snmp2]] doesn't need to be here, it gets added automatically

@phemmer
Copy link
Contributor Author

phemmer commented Jun 22, 2016

Updated to address comments. Still WIP though (more cleanup, more tests, & documentation)

Does this plugin also support snmp v3? v1? if not, could it?

It didn't, but does now.

Are there any objections to having the tests start up a snmpd process if it's available on the host machine? The tests would be a lot more thorough if they could speak to a real snmp daemon.
I'd still have it run as many tests as possible if snmpd is not available, and it'd just skip the ones that do require it.

@sparrc
Copy link
Contributor

sparrc commented Jun 22, 2016

@titilambert do you mind reviewing this as well?

@sparrc
Copy link
Contributor

sparrc commented Jun 22, 2016

@phemmer I don't mind starting an snmpd process, if it's available.

Those tests should be skipped in "short" mode.

@titilambert
Copy link
Contributor

titilambert commented Jun 22, 2016

@sparrc As you guess, unfortunately, I'm really really busy for a few months (and I will for one or two more months :/ )
Of course, this plugin is simpler (it's hard to make more complex than the old one :D).
But, I think It's better to have only one plugin for one thing. Maybe it's better to try to merge both (I don't know how for now).

I will try a quick review today.
But quickly: about unit test, why @phemmer do you ship binaries (snmpd, snmpget) inside git repo ? circleCI permits you to use docker container. Also, you can use snmpsim (http://snmpsim.sourceforge.net/) which is really cool for unit tests

@phemmer
Copy link
Contributor Author

phemmer commented Jun 22, 2016

why @phemmer do you ship binaries

Because whoops :-)

@titilambert
Copy link
Contributor

@phemmer I just don't get why you rewrite something new. Why not patch the first one ?
Maybe the best think to do is to open a new discussion (github issue) and list all use cases.
Then try to define a config structure to try get some simpler and powerful.
And make it !
@phemmer @sparrc what do you think about ?

@sparrc
Copy link
Contributor

sparrc commented Jun 24, 2016

@titilambert there are quite a few issues and feature requests for the current SNMP plugin. The current plugin is rather complicated and that makes it difficult for me to maintain or have any hope of adding features. If it's difficult for me, then I can only imagine the difficulties for someone in the open-source community.

@phemmer
Copy link
Contributor Author

phemmer commented Jun 24, 2016

@titilambert largely because of what @sparrc mentioned. Changing the config syntax requires changing the structs, and once you change those, you have to change the code that uses them, and the simpler syntax I proposed was radically different from the existing syntax, which means radically different code. And on top of the config, being able to support some of the things proposed also meant pretty radical change. Sometimes it's easier to just start from scratch (and less buggier in the end).

We use SNMP very heavily where I work, and because the existing SNMP plugin doesn't work for us, we needed write a new one anyway. The plan was to propose this upstream, and if if gets merged then great, but otherwise we'd just use the plugin internally.

@rgelobterFH
Copy link

I would agree that while the current SNMP plugin is great and works, it has limitations and can be very confusing to get it initially configured. I think its for that reason a lot of people are still using influxsnmp with grafana/influxdb instead of telegraf. One thing that is nice about influxsnmp is that it was understood and meant for network devices. It knew you wanted to collect ifHCInOctets, ifHCOutOctets, etc and you could specify specific interfaces if you wanted to vs having to collect on every interface regardless.

I'd personally like to see more flexibility long term in the SNMP plugin.

@sunnos9
Copy link

sunnos9 commented Jun 27, 2016

Hello guys, new to using the snmp plugin for telegraf. I need to query snmp tables and preferably add them as tables into InfluxDB, I came across this discussion, what is the suggested path now?

@phemmer
Copy link
Contributor Author

phemmer commented Jun 27, 2016

I still need to do cleanup, tests & docs. Probably won't get to it until Wednesday at the earliest. Worst case, this weekend as it's a nice 3 day weekend, so plenty of time.

@sunnos9
Copy link

sunnos9 commented Jun 27, 2016

thank you @phemmer , will this go upstream as snmp or snmp2, will this be part of the next telegraf stable build?

@sparrc
Copy link
Contributor

sparrc commented Jun 27, 2016

@sunnos9 that's yet to be determined, I wouldn't expect this in a stable build for at least a month.

@kaey
Copy link
Contributor

kaey commented Jul 2, 2016

I've started testing this plugin on our network and here are a few things:

  1. Default value of 50 in gs.MaxRepetitions (https://godoc.org/github.com/soniah/gosnmp#GoSNMP) causes snmp response packet to exceed 1500 mtu on some oids. This revealed a bug in our network config, but I suspect that people would want to reduce packet fragmentation, so I'd suggest to make MaxRepetitions field configurable.
  2. Storing string value in field, rather than tag is currently done incorrectly - since value is of type []byte it doesn't get wrapped in quotes, which results in error from influxdb. Fix is trivial:
 func fieldConvert(conv string, v interface{}) interface{} {
        if conv == "" {
+               switch vt := v.(type) {
+               case []byte:
+                       return string(vt)
+               }
                return v
        }
  1. https://github.com/influxdata/telegraf/pull/1389/files#diff-e352e9fa6539e0f5dea40acef9c2e8d6R135
    You can set all default values on struct initialization, toml will not override them, if they are not present in config file, no need to do those -1 hacks. This will allow you to remove some ifs from getConnection.

I will need to work with ~5k hosts, so I'll try to make this plugin query agents concurrently and will report back after broader testing.

@phemmer
Copy link
Contributor Author

phemmer commented Jul 3, 2016

Did a bunch of code cleanup (including support for max repetitions, and byteslice fields), and added some more tests.

I think this is about ready for formal review.
Though a couple outstanding points for discussion:

  1. Plugin name - I don't like the name "snmp2". I think it's likely to be confused to be in some relation with SNMP protocol version 2. Maybe "snmp_ng"?
  2. name vs name_override - The plugin has a name parameter, which sets the measurement name. On the top level fields, this isn't needed as name_override does the same thing. However on tables it is needed as name_override has no effect on them. So I left name on the top level for consistency. What do we want to do here?
    • Leave as is
    • Remove name on the top level fields
    • Remove name on the top level fields & rename name to name_override on the tables?

(And yes, still have to update changelog & readme, and squash commits. Will do as very last thing)

@kaey
Copy link
Contributor

kaey commented Jul 3, 2016

Thanks for fixes. Another issues I've encountered:

  1. After tonight's cron jobs plugin started emitting error "Out of order response" for all hosts. Restarting telegraf returned everything back to normal.
    You need to close connection gs if gatherTable returned error.
  2. From error log it was unclear which host failed. My suggestion is to prefix errors from getConnection and gatherTable with agent url and change errors join string from ", " to "\n".

@phemmer
Copy link
Contributor Author

phemmer commented Jul 3, 2016

After tonight's cron jobs plugin started emitting error "Out of order response" for all hosts. Restarting telegraf returned everything back to normal.

Likely like some sort of bug in the soniah/gosnmp library (that's who's generating the error). Unfortunately, without sticking a ton of debugging code in the library, the only thing I can think of to figure it out is a packet capture. Is that something you'd be able to do? Get a packet capture of a few of the requests & responses?

From error log it was unclear which host failed. My suggestion is to prefix errors from getConnection and gatherTable with agent url and change errors join string from ", " to "\n".

Agree with the prefix thing. However this does raise a good point. I think what I'm going to do is just print the errors as they are received instead of collecting them and dumping them all to the telegraf core to print out. Plugins with partial success/failure is unfortunately something telegraf doesn't handle very well.

@kaey
Copy link
Contributor

kaey commented Jul 5, 2016

Likely like some sort of bug in the soniah/gosnmp library.

I understand, problem can be even lower - in OS network layer. That's why I suggested closing socket on error.

Patch looks like this

diff --git a/plugins/inputs/snmp2/snmp.go b/plugins/inputs/snmp2/snmp.go
index bfe16c7..ac717e6 100644
--- a/plugins/inputs/snmp2/snmp.go
+++ b/plugins/inputs/snmp2/snmp.go
@@ -217,10 +217,13 @@ func (s *Snmp2) Gather(acc telegraf.Accumulator) error {
        }
        if err := s.gatherTable(acc, gs, t, false); err != nil {
            errs = append(errs, err)
+           s.closeConnection(agent)
        }
        for _, t := range s.Tables {
            if err := s.gatherTable(acc, gs, t, true); err != nil {
                errs = append(errs, err)
+               s.closeConnection(agent)
+               break
            }
        }
    }
@@ -340,6 +343,7 @@ type snmpConnection interface {
    //BulkWalkAll(string) ([]gosnmp.SnmpPDU, error)
    Walk(string, gosnmp.WalkFunc) error
    Get(oids []string) (*gosnmp.SnmpPacket, error)
+   Close() error
 }

 // gosnmpWrapper wraps a *gosnmp.GoSNMP object so we can use it as a snmpConnection.
@@ -347,9 +351,14 @@ type gosnmpWrapper struct {
    *gosnmp.GoSNMP
 }

+func (gsw gosnmpWrapper) Close() error {
+   return gsw.Conn.Close()
+}
+
 func (gsw gosnmpWrapper) Host() string {
    return gsw.Target
 }
+
 func (gsw gosnmpWrapper) Walk(oid string, fn gosnmp.WalkFunc) error {
    if gsw.Version == gosnmp.Version1 {
        return gsw.GoSNMP.Walk(oid, fn)
@@ -358,6 +367,15 @@ func (gsw gosnmpWrapper) Walk(oid string, fn gosnmp.WalkFunc) error {
    }
 }

+func (s *Snmp2) closeConnection(agent string) error {
+   if gs, ok := s.connectionCache[agent]; ok {
+       delete(s.connectionCache, agent)
+       return gs.Close()
+   }
+
+   return nil
+}
+
 // getConnection creates a snmpConnection (*gosnmp.GoSNMP) object and caches the
 // result using `agent` as the cache key.
 func (s *Snmp2) getConnection(agent string) (snmpConnection, error) {

I've captured a dump, but didn't find any anomalies, I can send it to you privately if you want to take a look.

I think what I'm going to do is just print the errors as they are received

SGTM

@phemmer
Copy link
Contributor Author

phemmer commented Jul 5, 2016

I've captured a dump, but didn't find any anomalies, I can send it to you privately if you want to take a look.

That'd be great if you could. patrick.hemmer@gmail

@kirillkovalenko
Copy link

Is there ETA for this plugin?

@phemmer
Copy link
Contributor Author

phemmer commented Jul 8, 2016

@kaey found the cause of the Out of order response errors. It is indeed a bug in the gosnmp library (raised gosnmp/gosnmp#68 for it). Thank you very much for the packet capture you sent, that is how I was able to find the bug.

I'll work on a patch for the issue and submit it upstream (in the next day or two, not tonight).
I'll also put in some code in this plugin to get a new connection on error.

Also after looking at the use case you sent me, I think there's another feature that would be highly beneficial to include. A inherit_tags = [...] option on the table config, which will inherit the specified top-scope tags. This is useful for example if you want to fetch the remote agent's host name, and use that as a tag in all the table measurements.

Also due to the multiple errors per interval thing, I think I'm going to make this PR wait for the resolution to #1446 (which I am also working on).

@Gelob
Copy link

Gelob commented Jul 8, 2016

@kirillkovalenko at the least a month out before appearing in a stable build. #1389 (comment)

@lizaoreo
Copy link

lizaoreo commented Jul 9, 2016

I've got my dashboard buildout on hold until then :) Excited for this new SNMP plugin.

@phemmer
Copy link
Contributor Author

phemmer commented Jul 10, 2016

Ok, I've pushed an update which reconnects on any error from gosnmp, and includes the inherit_tags feature.

I've also submitted a PR on gosnmp (gosnmp/gosnmp#69) to fix the underlying issue. Until that PR is merged, I would recommend setting Retries = 0. Otherwise you can end with a lot of network spam until enough dups are encountered to raise an error and reset the connection.

Still haven't done anything about prefixing errors with the agent address. Waiting on #1446 for that.

@sparrc
Copy link
Contributor

sparrc commented Jul 19, 2016

@phemmer is this ready to go from your perspective (besides naming)? Could you writeup a README as well?

@phemmer
Copy link
Contributor Author

phemmer commented Jul 19, 2016

Kinda. I was going to wait for #1446 to address some logging issues. But I could somewhat address them without that, and then fix it properly in the PR for #1446.

There are 2 outstanding questions though:

  1. Naming. Do we want to call it snmp2, or something like snmp_ng instead? I just worry that snmp2 will confuse it with SNMP protocol version 2.
  2. The name parameter. The plugin has a name parameter which controls the metric name. This kinda duplicates the name_override parameter, but I think is necessary at least for tables because otherwise all the measurements (all the tables) would have the same name (maybe that's how it should be?). Do we want to leave it as is? Remove the name parameter from the top level (non-table) attributes? Something else?

@sparrc
Copy link
Contributor

sparrc commented Jul 19, 2016

don't worry about the name parameter. It's OK if it does the same thing in certain cases as name_override. I don't think that all tables should be named the same thing, that should either be settable or dynamic.

@3fr61n
Copy link

3fr61n commented Aug 23, 2016

I understand your point of view, however in my opinion depending of the scenario one approach is more efficient than the other, if you are interested in the majority of the entries of the table, a snmpbulk is the best option, and in the other case a snmp get fits better

Regarding the cisco link, it's a little tricky, because if SNMP polling is really harmless why they recommend (below in that link) about how to reduce the data to be poll using views, or how to configure the server to specify oids ;)

The problem behind the router is not only the snmp agent, sometimes the data could be store/handled by other processes inside the router, so a inter process interaction begins, and depending of the amount of interaction cpu spikes could arise

BTW: Thanks for you prompt answer 👍

@kaey
Copy link
Contributor

kaey commented Aug 23, 2016

Also SNMP will never cause performance traffic performance degradation on a router.

This is incorrect. While there is separation between routing engine (RE) and packet forwarding engine (PFE), daemons, such as bgpd, snmpd and sshd, run on a single RE, thus noise from one will affect the other.

and in the other case a snmp get fits better

snmpget requires roundtrip per every single metric. I don't think there's a situation, where get is more efficient then bulkwalk, unless you query 5 metrics vs 500.

@phemmer
Copy link
Contributor Author

phemmer commented Aug 23, 2016

This is incorrect. While there is separation between routing engine (RE) and packet forwarding engine (PFE), daemons, such as bgpd, snmpd and sshd, run on a single RE, thus noise from one will affect the other.

Correct, I was referring to routing traffic. The other services might see the impact from snmp usage, but the snmp usage will be very brief (and a few thousand interfaces is very unlikely to cause any noticeable spike in cpu), and thus have no effective impact.

@3fr61n
Copy link

3fr61n commented Aug 23, 2016

@kaey In a test doing continuous polling to any table inside jnxBgpM2Peer, when you have more than 4k sessions, you'll see constant high cpu usage from routing daemons, and if you push further you can even affect the sessions.

Regarding

snmpget requires roundtrip per every single metric. I don't think there's a situation, where get is more efficient then bulkwalk, unless you query 5 metrics vs 500.

I'm totally agree, and In the case of interfaces for service providers this is basically what happens... that you have many customers/services on subinterfaces, but you are only interested on the physical interface statistic.

@sparrc
Copy link
Contributor

sparrc commented Aug 23, 2016

@3fr61n So does an SNMP get solve your use-case? I believe using the snmp.field config does a simple SNMP get. @phemmer correct me if I'm wrong.

@phemmer
Copy link
Contributor Author

phemmer commented Aug 23, 2016

@sparrc snmp.field only does a get if it's in the top level config. I.E. we're fetching a single value. All table metrics are fetched using bulk.

@3fr61n
Copy link

3fr61n commented Aug 23, 2016

@sparrc you are correct, however the counterpart of using snmp.field (besides what @sparrc mentioned) is that you need to previously know the table index in order to build the full oid, this approach does not scale because the index could change from time to time.

So it would be nice to have a table approach that after an initial get_bulk it gather all indexes (and after a filtering), it begins to do gets to specific in other fields of the table.

@StianOvrevage
Copy link
Contributor

GREAT work on another SNMP plugin! The open-source world is sorely lacking anything that is both configurable and scalable and performant.

Regarding the discussion about snmpwalk and snmpget.

snmpget requires roundtrip per every single metric. I don't think there's a situation, where get is more efficient then bulkwalk, unless you query 5 metrics vs 500.

That is INCORRECT. Let me prove it:

By issuing snmpget -m ALL -v2c -c COMMUNITY HOST IF-MIB::ifInMulticastPkts.1 IF-MIB::ifOutMulticastPkts.1 IF-MIB::ifInBroadcastPkts.1 IF-MIB::ifOutBroadcastPkts.1 IF-MIB::ifHCInOctets.1 IF-MIB::ifHCInUcastPkts.1 IF-MIB::ifHCOutOctets.1

I get all these metrics in one go:

IF-MIB::ifInMulticastPkts.1 = Counter32: 0
IF-MIB::ifOutMulticastPkts.1 = Counter32: 0
IF-MIB::ifInBroadcastPkts.1 = Counter32: 0
IF-MIB::ifOutBroadcastPkts.1 = Counter32: 0
IF-MIB::ifHCInOctets.1 = Counter64: 0
IF-MIB::ifHCInUcastPkts.1 = Counter64: 0
IF-MIB::ifHCOutOctets.1 = Counter64: 0

AND before you think there is some magic happening with the snmpget tool. This actual tcpdump shows that it does not:

18:06:27. IP 10.0.0.4.39562 > host.snmp: C=community GetRequest(133) 31.1.1.1.2.1 31.1.1.1.4.1 31.1.1.1.3.1 31.1.1.1.5.1 31.1.1.1.6.1 31.1.1.1.7.1 31.1.1.1.10.1 18:06:28 IP host.snmp > 10.0.0.4.39562: C=community GetResponse(140) 31.1.1.1.2.1=0 31.1.1.1.4.1=0 31.1.1.1.3.1=0 31.1.1.1.5.1=0 31.1.1.1.6.1=0 31.1.1.1.7.1=0 31.1.1.1.10.1=0

So it is perfectly possible to make snmpgets much more efficient that one-metric-one-request that some of you assume is the default/only way :-D

I did this back a couple of years ago with collectd and observed reduced CPU usage on both server and switch when combining a few known groups of metrics (such as interface metrics which tend to be standard). Wrote about it here: http://www.peritusconsulting.no/articles/2014-06-02-next-generation-monitoring-using-opentsdb.html#Collection (The 2200MHz to 1800MHz on collectd is from combining a few different data type and definitions so that even more metrics are pulled for every SNMP request).

@3fr61n
Copy link

3fr61n commented Aug 24, 2016

Good catch :)

@wlcx
Copy link

wlcx commented Aug 24, 2016

I'm getting errors collecting data from an APC UPS over SNMPv1 - my config is

[[inputs.snmp]]
  agents = ["<snipped ip>"]
  version = 1
  community = "public"
  name = "ups"

  [[inputs.snmp.field]]
    name = "name"
    oid = "1.3.6.1.2.1.1.5"
    is_tag = true

  [[inputs.snmp.field]]
    name = "output_load"
    oid = "PowerNet-MIB::upsAdvOutputLoad"

and telegraf.log displays errors like:

2016/08/24 11:45:50 ERROR: {"error":"partial write:\nunable to parse 'ups,agent_host=<snipped ip>,host=vault,name=\u003cnil\u003e output_load= 1472035210000000000': missing field value

Both OIDs resolve properly and are retrievable via snmpwalk

@StianOvrevage
Copy link
Contributor

@wlcx You might want to hide your public IP in the snippet since I'm guessing you are also using private as the read-write community and hence (firewall permitting) anyone might be able to shut down your UPS now remotely :)

@wlcx
Copy link

wlcx commented Aug 24, 2016

@StianOvrevage read-only and behind a firewall, but snipped nonetheless :)

@3fr61n
Copy link

3fr61n commented Aug 24, 2016

@wlcx Did you try to add a '.' in the begining of your oid?

 [[inputs.snmp.field]]
    name = "name"
    oid = "1.3.6.1.2.1.1.5"  <----- ".1.3.6.1.2.1.1.5"
    is_tag = true

@phemmer
Copy link
Contributor Author

phemmer commented Aug 24, 2016

Leading dot is optional.
@wlcx can you provide the snmpwalk output? At least the portion that covers this value?

@wlcx
Copy link

wlcx commented Aug 24, 2016

sure.

snmpwalk -Os -c public -v1 <snipped ip> 1.3.6.1.2.1.1.5
iso.3.6.1.2.1.1.5.0 = STRING: "Derrick"

P.S. In case you were wondering, I have no idea why the UPS is called Derrick :p

@phemmer
Copy link
Contributor Author

phemmer commented Aug 24, 2016

Ah, you need to use 1.3.6.1.2.1.1.5.0 in the config, not 1.3.6.1.2.1.1.5.
I'm guessing the same for your output_load field, PowerNet-MIB::upsAdvOutputLoad.0.

I'll look into logging a more useful message, such as no such object at OID $foo

@wlcx
Copy link

wlcx commented Aug 24, 2016

Ah, spot on, that works perfectly. I wonder if it would be worth defaulting to an instance of 0 if one is not given - this must be what snmpwalk etc do.

Thanks all for the help!

@phemmer
Copy link
Contributor Author

phemmer commented Aug 24, 2016

Comparing to snmpwalk isn't a fair assessment as that utility does not do the same thing. The purpose of snmpwalk is to retrieve multiple values, we only want 1 value. If you use snmpget instead you'll see the same behavior.

@lizaoreo
Copy link

I've been excitedly waiting for this to merge. Now that it's in, I've got a dumb, somewhat off topic question. How can I get this on telegraf now without waiting for the official 1.0 release?

@phemmer
Copy link
Contributor Author

phemmer commented Aug 25, 2016

@lizaoreo It should be in the nightlies (i haven't confirmed though): https://influxdata.com/downloads/

Edit: Looks like we have an -rc1, probably in there too

@sparrc
Copy link
Contributor

sparrc commented Aug 25, 2016

We have 1.0 release candidates available which have this new SNMP plugin, it is available here: https://github.com/influxdata/telegraf#installation

@Gelob
Copy link

Gelob commented Sep 4, 2016

@3fr61n can you give your config example where tagdrop is working for you? Thanks!

@3fr61n
Copy link

3fr61n commented Sep 5, 2016

@Gelob

Here is an example where I just gather physical interface stats and drop any sub-interface stats

'''
[[inputs.snmp]]
interval = "300s"
agents = [ "<x.x.x.x>:161" ]
version = 2
community = ""

name = "snmp_data"
[[inputs.snmp.field]]
name = "hostname"
oid = ".1.3.6.1.2.1.1.5.0"
is_tag = true
[[inputs.snmp.field]]
name = "uptime"
oid = ".1.3.6.1.2.1.1.3.0"

[[inputs.snmp.table]]
name = "interfaces"
inherit_tags = [ "hostname" ]
[inputs.snmp.tagdrop]
ifDescr = [ "."]
'''

And the output from the test is

'''
root@052f081c68de:/etc/telegraf# telegraf -test -config my.conf

  • Plugin: snmp, Collection 1
  • Internal: 5m0s

    snmp_data,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0 uptime=459175805i 1473089545000000000
    interfaces,agent_host=xxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifDescr=pime ifInOctets=0i,ifOutOctets=0i 1473089549000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifDescr=pimd ifInOctets=0i,ifOutOctets=0i 1473089549000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifDescr=ce1-2/0/2 ifInOctets=0i,ifOutOctets=0i 1473089549000000000
    interfaces,agent_host=xxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifDescr=lsi ifInOctets=0i,ifOutOctets=0i 1473089549000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifDescr=ge-3/2/1 ifInOctets=0i,ifOutOctets=0i 1473089549000000000
    interfaces,agent_host=xxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifDescr=pfh-4/0/0 ifInOctets=0i,ifOutOctets=0i 1473089549000000000
    '''

@Gelob
Copy link

Gelob commented Sep 5, 2016

@3fr61n thanks. I'm having a terrible time getting tagdrop to work. Even if i put a wildcard in I still see results in my test output. My goal is to try to get rid of the interfaces with no description (IfAlias below) but I can't even seem to get rid of anything with my tagdrop config in just testing it with the wildcard. Do you see anything odd I have going on?

[[inputs.snmp]]
  agents = [ "1.1.1.1", "1.1.1.2"]
  version = 2
  community = ""
  interval = "10s"
  name = "bwStats"
  [[inputs.snmp.field]]
    name = "hostname"
    oid = ".1.3.6.1.2.1.1.5.0"
    is_tag = true
  [[inputs.snmp.table]]
    name = "bwStats"
    inherit_tags = [ "hostname" ]
  [[inputs.snmp.tagdrop]]
   ifAlias = [" "]
#also tried
#ifAlias = ["*"]
    [[inputs.snmp.table.field]]
      name = "ifAlias"
      oid = ".1.3.6.1.2.1.31.1.1.1.18"
      is_tag = true
        [[inputs.snmp.table.field]]
        name = "ifHCInOctets"
        oid = ".1.3.6.1.2.1.31.1.1.1.6"
        [[inputs.snmp.table.field]]
        name = "ifHCOutOctets"
        oid = ".1.3.6.1.2.1.31.1.1.1.10"
        [[inputs.snmp.table.field]]
        name = "ifHCInUcastPkts"
        oid = ".1.3.6.1.2.1.31.1.1.1.7"
        [[inputs.snmp.table.field]]
        name = "ifHCOutUcastPkts"
        oid = ".1.3.6.1.2.1.31.1.1.1.11"
        [[inputs.snmp.table.field]]
        name = "ifInErrors"
        oid = ".1.3.6.1.2.1.2.2.1.14"
        [[inputs.snmp.table.field]]
        name = "ifOutErrors"
        oid = ".1.3.6.1.2.1.2.2.1.20"

I snipped some of the output lines

* Plugin: snmp, Collection 1
* Internal: 10s
> bwStats,agent_host=1.1.1.1,hostname=edge01 ifHCInOctets=0i,ifHCInUcastPkts=0i,ifHCOutOctets=0i,ifHCOutUcastPkts=0i,ifInErrors=0i,ifOutErrors=0i 1473094905000000000
> bwStats,agent_host=1.1.1.1,hostname=edge01,ifAlias=XO\ 10GG\ CIR\ CktID ifHCInOctets=813194263i,ifHCInUcastPkts=14991788i,ifHCOutOctets=6619529230850i,ifHCOutUcastPkts=7054570727i,ifInErrors=0i,ifOutErrors=0i 1473094905000000000
> bwStats,agent_host=1.1.1.1,hostname=edge01 ifHCInOctets=0i,ifHCInUcastPkts=0i,ifHCOutOctets=0i,ifHCOutUcastPkts=0i,ifInErrors=0i,ifOutErrors=0i 1473094905000000000
> bwStats,agent_host=1.1.1.1,hostname=edge01 ifHCInOctets=0i,ifHCInUcastPkts=0i,ifHCOutOctets=0i,ifHCOutUcastPkts=0i,ifInErrors=0i,ifOutErrors=0i 1473094905000000000
> bwStats,agent_host=1.1.1.1,hostname=edge01 ifHCInOctets=0i,ifHCInUcastPkts=0i,ifHCOutOctets=0i,ifHCOutUcastPkts=0i,ifInErrors=0i,ifOutErrors=0i 1473094905000000000
> bwStats,agent_host=1.1.1.1,hostname=edge01 ifHCInOctets=0i,ifHCInUcastPkts=0i,ifHCOutOctets=0i,ifHCOutUcastPkts=0i,ifInErrors=0i,ifOutErrors=0i 1473094905000000000
> bwStats,agent_host=1.1.1.1,hostname=edge01,ifAlias= ifHCInOctets=0i,ifHCInUcastPkts=0i,ifHCOutOctets=0i,ifHCOutUcastPkts=0i,ifInErrors=0i,ifOutErrors=0i 1473094905000000000

@3fr61n
Copy link

3fr61n commented Sep 5, 2016

@Gelob

I think you have a syntax issue, please check below my config

'''

[[inputs.snmp]]
interval = "60s"
agents = [ "172.30.137.212:161" ]
version = 2
community = "public"

name = "snmp_data"
[[inputs.snmp.field]]
name = "hostname"
oid = ".1.3.6.1.2.1.1.5.0"
is_tag = true
[[inputs.snmp.field]]
name = "uptime"
oid = ".1.3.6.1.2.1.1.3.0"

[[inputs.snmp.table]]
name = "interfaces"
inherit_tags = [ "hostname" ]
[inputs.snmp.tagdrop] <--- you have double bracket instead of single bracket
ifAlias = [ ""] <-- I think you have a 'blank space'

[[inputs.snmp.table.field]]
  name = "ifAlias"
  oid = ".1.3.6.1.2.1.31.1.1.1.18"
  is_tag = true


[[inputs.snmp.table.field]]
  name = "ifHCInOctets"
  oid = ".1.3.6.1.2.1.31.1.1.1.6"
[[inputs.snmp.table.field]]
  name = "ifHCOutOctets"
  oid = ".1.3.6.1.2.1.31.1.1.1.10"

'''

root@052f081c68de:/etc/telegraf# telegraf -test -config my.conf

  • Plugin: snmp, Collection 1
  • Internal: 1m0s

    snmp_data,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0 uptime=459894673i 1473096735000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifAlias=to\ tf-mx960-2\ ge-1/2/0 ifHCInOctets=587134925i,ifHCOutOctets=628032583i 1473096738000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifAlias= ifHCInOctets=173216943i,ifHCOutOctets=195843599i 1473096738000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifAlias=to\ ex4600-1\ ge-0/0/12 ifHCInOctets=100416466i,ifHCOutOctets=0i 1473096738000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifAlias=to\ tf-m320-1-so-2/2/0 ifHCInOctets=908499298i,ifHCOutOctets=835561372i 1473096738000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifAlias=to\ tf-mx960-2\ ge-1/2/1 ifHCInOctets=0i,ifHCOutOctets=0i 1473096738000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifAlias=to\ ex4600\ ge-0/0/13 ifHCInOctets=4179898327i,ifHCOutOctets=24018649693i 1473096738000000000
    interfaces,agent_host=xxxx,host=052f081c68de,hostname=PE2-tf-m120-1-re0,ifAlias=to\ tf-mx480-3\ xe-3/3/0 ifHCInOctets=439825082i,ifHCOutOctets=611439616i 1473096738000000000

@Gelob
Copy link

Gelob commented Sep 5, 2016

Thanks, that double bracket was it. 👍 Now I just need to get the glob match for the space

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SNMP plugin doesn't write proper instance value