This repo contains a number of logstash plugins used at Etsy.
filter {
hashid { }
}
hashid
generates a SHA1 hash on the message field, and uses that to set the _id
used by Elasticsearch.
This makes it possible to reply whole logs and avoid duplicate entires in Elasticsearch.
It comes at a cost of efficiency in Elasticsearch.
Elasticsearch by default uses fluke UUIDs to make make searching for results more predictable.
Using truly random IDs removes this predictability. I many cases the hit at search time isn't terrible
and hardly noticeable, with quite a positive upside for log replay.
filter {
derrick { }
}
derrick
parses the http_headers
field and looks for data generated by Derrick to decode.
derrick
is a handy packet capturing utility which combines multiple packets from a TCP stream and prints
them as a single line to a file.
You can call derrick
like this to capture traffic:
/usr/bin/derrick -i eth0 -m -f "port 80" > /tmp/derrick.log
Then ship those logs to logstash.
filter {
splitkv {
field_split => "&"
value_split => "="
}
}
In the case of very well formed fields which need to be split on a conistent field separator, and a
consistent key/value separator, throughout the field, splitkv
offers significant performance
improvements over the default kv
plugin.
It only works on fields where the entire set of data can be split consistently.
An example of this would be an HTTP query string:
foo1=bar1&foo2=bar2&food3=bar3&foo4=bar4
The plugin first calls split('&')
on the whole string, followed by split('=')
on each of the resulting
elements.
By default, the field separator is a blank space, and the key/value separator is =
.