Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance deterioration after upgrading to 1.1 #7827

Closed
abadyan-vonage opened this issue Jan 12, 2017 · 3 comments · Fixed by #7832
Closed

Performance deterioration after upgrading to 1.1 #7827

abadyan-vonage opened this issue Jan 12, 2017 · 3 comments · Fixed by #7832

Comments

@abadyan-vonage
Copy link

abadyan-vonage commented Jan 12, 2017

Bug report

InfluxDB 1.1, Centos7, hosted on EC2, c4.xlarge.
Write is done using HTTP.
Our current model includes about 30 databases with ~120 ops points written and 60 ops queries.
This is after limiting the number of connections to 300 because we had memory issues (We are reporting from about 450 servers once a minute).
Also we are using kapacitor that adds to the querying and writes and have Grafana querying the db.
Until a couple of days ago we were using 1.0 and decided to upgrade to 1.1 because of performance improvements.
Unfortunately, our performance seriously deteriorated:
image

I don't have the performance dumps, I'm afraid, because I had to downgrade as this was a production issue, but I do have the data in the _internal db so if you want me to query something let me know.

@jwilder
Copy link
Contributor

jwilder commented Jan 12, 2017

Could you share your _internal db?

mark-rushakoff added a commit that referenced this issue Jan 13, 2017
This leak seems to have been introduced in 8aa224b,
present in 1.1.0 and 1.1.1.

When points were parsed from HTTP payloads, their tags and fields
referred to subslices of the request body; if any tag set introduced a
new series, then those tags then were stored in the in-memory series
index objects, preventing the HTTP body from being garbage collected. If
there were no new series in the payload, then the request body would be
garbage collected as usual.

Now, we clone the tags before we store them in the index. This is an
imperfect fix because the Point still holds references to the original
tags, and the Point's field iterator also refers to the payload buffer.
However, the current write code path does not retain references to the
Point or its fields; and this change will likely be obsoleted when TSI
is introduced.

This change likely fixes #7827, #7810, #7778, and perhaps others.
@abadyan-vonage
Copy link
Author

@jwilder the entire _internal db weighs 80MB when exported and after tar.gz.
I can upload it somewhere if you want, but let me know if it's relevant.

@abadyan-vonage
Copy link
Author

I'm not sure the above pull request fixes the issue, as the number of series was pretty fixed:
image
After the upgrade the number of series jumped up for some reason, but stayed pretty constant afterwards.

gunnaraasen pushed a commit that referenced this issue Feb 16, 2017
This leak seems to have been introduced in 8aa224b,
present in 1.1.0 and 1.1.1.

When points were parsed from HTTP payloads, their tags and fields
referred to subslices of the request body; if any tag set introduced a
new series, then those tags then were stored in the in-memory series
index objects, preventing the HTTP body from being garbage collected. If
there were no new series in the payload, then the request body would be
garbage collected as usual.

Now, we clone the tags before we store them in the index. This is an
imperfect fix because the Point still holds references to the original
tags, and the Point's field iterator also refers to the payload buffer.
However, the current write code path does not retain references to the
Point or its fields; and this change will likely be obsoleted when TSI
is introduced.

This change likely fixes #7827, #7810, #7778, and perhaps others.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants