Performance deterioration after upgrading to 1.1 #7827

abadyan-vonage · 2017-01-12T11:35:20Z

Bug report

InfluxDB 1.1, Centos7, hosted on EC2, c4.xlarge.
Write is done using HTTP.
Our current model includes about 30 databases with ~120 ops points written and 60 ops queries.
This is after limiting the number of connections to 300 because we had memory issues (We are reporting from about 450 servers once a minute).
Also we are using kapacitor that adds to the querying and writes and have Grafana querying the db.
Until a couple of days ago we were using 1.0 and decided to upgrade to 1.1 because of performance improvements.
Unfortunately, our performance seriously deteriorated:

I don't have the performance dumps, I'm afraid, because I had to downgrade as this was a production issue, but I do have the data in the _internal db so if you want me to query something let me know.

jwilder · 2017-01-12T19:24:30Z

Could you share your _internal db?

This leak seems to have been introduced in 8aa224b, present in 1.1.0 and 1.1.1. When points were parsed from HTTP payloads, their tags and fields referred to subslices of the request body; if any tag set introduced a new series, then those tags then were stored in the in-memory series index objects, preventing the HTTP body from being garbage collected. If there were no new series in the payload, then the request body would be garbage collected as usual. Now, we clone the tags before we store them in the index. This is an imperfect fix because the Point still holds references to the original tags, and the Point's field iterator also refers to the payload buffer. However, the current write code path does not retain references to the Point or its fields; and this change will likely be obsoleted when TSI is introduced. This change likely fixes #7827, #7810, #7778, and perhaps others.

abadyan-vonage · 2017-01-15T07:38:04Z

@jwilder the entire _internal db weighs 80MB when exported and after tar.gz.
I can upload it somewhere if you want, but let me know if it's relevant.

abadyan-vonage · 2017-01-15T07:45:01Z

I'm not sure the above pull request fixes the issue, as the number of series was pretty fixed:

After the upgrade the number of series jumped up for some reason, but stayed pretty constant afterwards.

This leak seems to have been introduced in 8aa224b, present in 1.1.0 and 1.1.1. When points were parsed from HTTP payloads, their tags and fields referred to subslices of the request body; if any tag set introduced a new series, then those tags then were stored in the in-memory series index objects, preventing the HTTP body from being garbage collected. If there were no new series in the payload, then the request body would be garbage collected as usual. Now, we clone the tags before we store them in the index. This is an imperfect fix because the Point still holds references to the original tags, and the Point's field iterator also refers to the payload buffer. However, the current write code path does not retain references to the Point or its fields; and this change will likely be obsoleted when TSI is introduced. This change likely fixes #7827, #7810, #7778, and perhaps others.

jwilder added the area/performance label Jan 12, 2017

mark-rushakoff mentioned this issue Jan 13, 2017

Fix memory leak of retained HTTP write payloads #7832

Merged

3 tasks

mark-rushakoff closed this as completed in #7832 Jan 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance deterioration after upgrading to 1.1 #7827

Performance deterioration after upgrading to 1.1 #7827

abadyan-vonage commented Jan 12, 2017 •

edited

Loading

jwilder commented Jan 12, 2017

abadyan-vonage commented Jan 15, 2017

abadyan-vonage commented Jan 15, 2017

Performance deterioration after upgrading to 1.1 #7827

Performance deterioration after upgrading to 1.1 #7827

Comments

abadyan-vonage commented Jan 12, 2017 • edited Loading

Bug report

jwilder commented Jan 12, 2017

abadyan-vonage commented Jan 15, 2017

abadyan-vonage commented Jan 15, 2017

abadyan-vonage commented Jan 12, 2017 •

edited

Loading