Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new line protocol parser and serializer, influxdb output #3924

Merged
merged 7 commits into from
Mar 28, 2018

Conversation

danielnelson
Copy link
Contributor

In this pull request I introduce a new line protocol parser and serializer, as well as an updated influxdb output that uses the serializer. Some minor changes are made to the telegraf.Metric interface as well.

Line protocol is no longer the internal represention of the metrics type, the data is stored in a fully parsed and type converted manor. Despite this in my testing there is an performance improvement across both going from line protocol to line protocol, as well as with Telegraf generated metrics going to line protocol, the main reason for this is the ragel based parser. Additionally, the sort order of tags is fixed, which should help on the InfluxDB side.

Ragel is used to compile the parser state machine, the generated file is committed so that there is no build requirement on having ragel. In the future I would love to have this be done at build time so that no generated files are committed.

closes #3452
closes #3631
closes #3648
closes #3291

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@danielnelson danielnelson added this to the 1.6.0 milestone Mar 22, 2018
@danielnelson danielnelson requested a review from desa March 23, 2018 00:16
@danielnelson danielnelson mentioned this pull request Mar 23, 2018
3 tasks
@danielnelson
Copy link
Contributor Author

@phemmer This is a big one, so I won't ask you to do a full review, but I'd grateful for any feedback you have time to provide. I'm excited about the prospect of converting other parsers to use ragel in the future (statsd parsing comes to mind).

Copy link
Contributor

@phemmer phemmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be a good time to add uint support as well.
influxdata/influxdb#8835

Other than that, I've only taken a brief glance through the code. Haven't had a chance to actually run it.

## Compress each HTTP request payload using GZIP.
# content_encoding = "gzip"
## HTTP Content-Encoding for write request body, one of "gzip" or "identity".
# content_encoding = "identity"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might do something like or "identity" (meaning no encoding). I actually had to look up "identity" as I wasn't familiar with the definition.

serializer := config.Serializer
if serializer == nil {
s := influx.NewSerializer()
s.SetMaxLineBytes(config.MaxPacketSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A "udp packet" includes both headers and body. MaxPacketSize is being used to control just the body size. So it's a misnomer.

@danielnelson
Copy link
Contributor Author

@ragzilla has been working on a follow up patch adding uint support: ragzilla@3064edc#commitcomment-28296437

We have to be a little careful as uint support is not generally available in InfluxDB and any field type changes results in an error.

@danielnelson danielnelson merged commit 1c0f63a into master Mar 28, 2018
@danielnelson danielnelson deleted the metrics-rewrite branch March 28, 2018 00:30
case float32:
fields[k] = float64(val)
continue
case float64:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for opening an old issue but this check was blocking a lot of garbage from being attempted to be written to our output. Is there a different place that this same kind of filtering can be done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe what you are seeing now in a new issue?

@stevenh
Copy link

stevenh commented Feb 24, 2020

I've just identified this as the cause of a major performance regression as detailed here:
#7070

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants