error compacting TSM files: cannot allocate memory #6975

denis-arruda · 2016-07-07T14:50:24Z

Bug report

System info: InfluxDB 0.13.0-1, Linux softsw69 2.6.34.10-0.6-desktop #1 SMP PREEMPT 2011-12-13 18:27:38 +0100 x86_64 x86_64 x86_64 GNU/Linux (Suse Linux)

Steps to reproduce:

After almost two days writting around 5.000 metrics per seconds, InfluxDB crashes. When I start InfluxDB process again, I got this error in log:

[cacheloader] 2016/07/07 10:40:44 reading file /home/influxdb/wal/performance/default/2/_02089.wal, size 10485900
[cacheloader] 2016/07/07 10:40:44 reading file /home/influxdb/wal/_internal/monitor/4/_00002.wal, size 0
[shard] 2016/07/07 10:40:44 /home/influxdb/data/_internal/monitor/4 database index loaded in 1.725057ms
[store] 2016/07/07 10:40:44 /home/influxdb/data/_internal/monitor/4 opened in 1.04092455s
[cacheloader] 2016/07/07 10:40:46 reading file /home/influxdb/wal/performance/default/2/_02090.wal, size 10487370
[cacheloader] 2016/07/07 10:40:48 reading file /home/influxdb/wal/performance/default/2/_02091.wal, size 10487395
[cacheloader] 2016/07/07 10:40:50 reading file /home/influxdb/wal/performance/default/2/_02092.wal, size 6489923
[cacheloader] 2016/07/07 10:40:51 reading file /home/influxdb/wal/performance/default/2/_02093.wal, size 10486224
[cacheloader] 2016/07/07 10:40:53 reading file /home/influxdb/wal/performance/default/2/_02094.wal, size 10486823
[cacheloader] 2016/07/07 10:40:54 reading file /home/influxdb/wal/performance/default/2/_02095.wal, size 10486442
[cacheloader] 2016/07/07 10:40:56 reading file /home/influxdb/wal/performance/default/2/_02096.wal, size 6498702
[cacheloader] 2016/07/07 10:40:57 reading file /home/influxdb/wal/performance/default/2/_02097.wal, size 10487287
[cacheloader] 2016/07/07 10:40:59 reading file /home/influxdb/wal/performance/default/2/_02098.wal, size 10487150
[cacheloader] 2016/07/07 10:41:01 reading file /home/influxdb/wal/performance/default/2/_02099.wal, size 10486054
[cacheloader] 2016/07/07 10:41:04 reading file /home/influxdb/wal/performance/default/2/_02100.wal, size 6571972
[cacheloader] 2016/07/07 10:41:05 reading file /home/influxdb/wal/performance/default/2/_02101.wal, size 971433
[cacheloader] 2016/07/07 10:41:05 reading file /home/influxdb/wal/performance/default/2/_02102.wal, size 0
[tsm1] 2016/07/07 10:41:05 beginning full compaction of group 0, 4 TSM files
[tsm1] 2016/07/07 10:41:05 compacting full group (0) /home/influxdb/data/performance/default/2/000000336-000000006.tsm (#0)
[tsm1] 2016/07/07 10:41:05 compacting full group (0) /home/influxdb/data/performance/default/2/000000416-000000005.tsm (#1)
[tsm1] 2016/07/07 10:41:05 compacting full group (0) /home/influxdb/data/performance/default/2/000000464-000000005.tsm (#2)
[tsm1] 2016/07/07 10:41:05 compacting full group (0) /home/influxdb/data/performance/default/2/000000496-000000005.tsm (#3)
[tsm1] 2016/07/07 10:41:05 error compacting TSM files: cannot allocate memory
[tsm1] 2016/07/07 10:41:05 beginning level 1 compaction of group 0, 2 TSM files
[tsm1] 2016/07/07 10:41:05 compacting level 1 group (0) /home/influxdb/data/performance/default/2/000000521-000000001.tsm (#0)
[tsm1] 2016/07/07 10:41:05 compacting level 1 group (0) /home/influxdb/data/performance/default/2/000000522-000000001.tsm (#1)
[tsm1] 2016/07/07 10:41:05 beginning level 3 compaction of group 0, 4 TSM files
[tsm1] 2016/07/07 10:41:05 compacting level 3 group (0) /home/influxdb/data/performance/default/2/000000500-000000003.tsm (#0)
[tsm1] 2016/07/07 10:41:05 compacting level 3 group (0) /home/influxdb/data/performance/default/2/000000504-000000003.tsm (#1)
[tsm1] 2016/07/07 10:41:05 compacting level 3 group (0) /home/influxdb/data/performance/default/2/000000508-000000003.tsm (#2)
[tsm1] 2016/07/07 10:41:05 compacting level 3 group (0) /home/influxdb/data/performance/default/2/000000512-000000003.tsm (#3)
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0xce4740, 0x16)
/usr/local/go/src/runtime/panic.go:547 +0x90
runtime.sysMap(0xc84ef00000, 0x100000, 0x456100, 0x10efab8)

What is happenning?

I know that there are still space on disk.

carbolymer · 2016-07-17T20:04:56Z

This is the same issue: #5440
InfluxDB is using ridulously large amount of RAM.

jwilder · 2016-07-18T18:04:25Z

@denis-arruda Can you test with a current nightly and include the full stack trace as a gist? fatal error: runtime: out of memory indicates you ran out of memory, but the stack trace does not have enough information to indicate where the allocation failed.

denis-arruda · 2016-07-18T18:33:59Z

I can not test it now.

Meanwhile, I am sending the complete log for version 0.13.0:

data.log.txt

jwilder · 2016-07-18T18:46:27Z

@denis-arruda That trace shows a lot of writes blocked up in the WAL and a compaction run which might have triggered the OOM. The backed up writes are likely the cause of the memory issue though. What kind of disks do you have? Can you run iostat -xzd 1 and attach the output? Your disks might be saturated.

#7024 has a fix that may help you. It will be in tonight's nightly build.

denis-arruda · 2016-07-18T19:07:48Z

The command you asked in my server returns "comand not found".

I dont'know the disk type. I know the server I am using is a virtual machine in VMWare.

jwilder · 2016-07-18T19:10:52Z

You would need to install the sysstat package on you system if iostat is not available.

ericb48 · 2017-04-10T09:08:43Z

I have the same messages but influxdb doesn't crash. run on a scaleway ARM server.

Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z beginning optimize compaction of group 0, 8 TSM files engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z beginning optimize compaction of group 0, 8 TSM files engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000033-000000005.tsm (#0) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000033-000000005.tsm (#0) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000065-000000005.tsm (#1) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000065-000000005.tsm (#1) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000065-000000006.tsm (#2) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000065-000000006.tsm (#2) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000081-000000004.tsm (#3) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000081-000000004.tsm (#3) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000081-000000005.tsm (#4) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000081-000000005.tsm (#4) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000097-000000004.tsm (#5) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000097-000000004.tsm (#5) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000113-000000004.tsm (#6) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000113-000000004.tsm (#6) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000129-000000004.tsm (#7) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z compacting optimize group (0) /data/influxdb/data/telegraf/one_year/197/000000129-000000004.tsm (#7) engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z error compacting TSM files: cannot allocate memory engine=tsm1
Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z error compacting TSM files: cannot allocate memory engine=tsm1

think-free · 2017-05-04T09:38:02Z

Same here on Rasperry pi 3 + raspbian, go this two errors :

error adding new TSM files from snapshot: cannot allocate memory engine=tsm1
error writing snapshot: cannot allocate memory engine=tsm1

free -m :

             total       used       free     shared    buffers     cached
Mem:           925        835         89         46         38        351
-/+ buffers/cache:        445        479
Swap:           99          3         96

Influx :

12497 influxdb 20 0 2079960 365808 7948 S 5.9 38.6 167:52.15 influxd

Lenbok · 2017-09-16T20:28:20Z

Also getting the same error as @think-free on my raspberry pi 3.

Influx has been happily running for ages then all of a sudden starts spamming the logs with the two error messages (and the database in /var/lib/influxdb starts rapidly expanding), eventually causing out of disk space. After clearing up the logs and rebooting, it seems to be running fine although it looks like the last few days of data is gone :-(.

Perhaps there is a memory leak somewhere?

mvillarejo · 2017-10-08T23:03:56Z

Same behaviour here

jpetermans · 2017-12-01T19:31:39Z

Getting the similar messages hitting the log every 1 second. Raspberry Pi 3, Influx 1.3.7. My log files are growing faster than my databases at this stage. Any direction on a fix?

Nov 30 22:36:40 3jas11 influxd[14240]: [I] 2017-12-01T06:36:40Z beginning level 3 compaction of group 0, 4 TSM files engine=tsm1
Nov 30 22:36:40 3jas11 influxd[14240]: [I] 2017-12-01T06:36:40Z compacting level 3 group (0) /var/lib/influxdb/data/collectd/annual/59/000000004-000000003.tsm (#0) engine=tsm1
Nov 30 22:36:40 3jas11 influxd[14240]: [I] 2017-12-01T06:36:40Z compacting level 3 group (0) /var/lib/influxdb/data/collectd/annual/59/000000009-000000003.tsm (#1) engine=tsm1
Nov 30 22:36:40 3jas11 influxd[14240]: [I] 2017-12-01T06:36:40Z compacting level 3 group (0) /var/lib/influxdb/data/collectd/annual/59/000000013-000000003.tsm (#2) engine=tsm1
Nov 30 22:36:40 3jas11 influxd[14240]: [I] 2017-12-01T06:36:40Z compacting level 3 group (0) /var/lib/influxdb/data/collectd/annual/59/000000017-000000003.tsm (#3) engine=tsm1
Nov 30 22:36:40 3jas11 influxd[14240]: [I] 2017-12-01T06:36:40Z error compacting TSM files: cannot allocate memory engine=tsm1
Nov 30 22:36:40 3jas11 influxd[14240]: [I] 2017-12-01T06:36:40Z beginning level 2 compaction of group 0, 3 TSM files engine=tsm1

Lenbok · 2017-12-09T02:12:40Z

Mine started doing this again (continuously for the last three days before I noticed). Restarting influxdb didn't help. On an off chance I tried updating influxdb (from 1.3.5 to 1.4.2), and things were looking hopeful when influxd restarted (i.e. it didn't error straight after startup). A few minutes later I got a couple of similar looking error messages which don't sound good, although it's not continuing to spam the logs, so maybe influxdb has got itself into a happier place:

Dec  9 14:37:32 nodered influxd[30010]: [I] 2017-12-09T01:37:32Z beginning full compaction, 3 TSM files engine=tsm1
Dec  9 14:37:32 nodered influxd[30010]: [I] 2017-12-09T01:37:32Z compacting full /var/lib/influxdb/data/sensors/autogen/13/000000016-000000004.tsm (#0) engine=tsm1
Dec  9 14:37:32 nodered influxd[30010]: [I] 2017-12-09T01:37:32Z compacting full /var/lib/influxdb/data/sensors/autogen/13/000000032-000000004.tsm (#1) engine=tsm1
Dec  9 14:37:32 nodered influxd[30010]: [I] 2017-12-09T01:37:32Z compacting full /var/lib/influxdb/data/sensors/autogen/13/000000032-000000005.tsm (#2) engine=tsm1
Dec  9 14:37:32 nodered influxd[30010]: [I] 2017-12-09T01:37:32Z beginning full compaction, 3 TSM files engine=tsm1
Dec  9 14:37:32 nodered influxd[30010]: [I] 2017-12-09T01:37:32Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000032-000000005.tsm (#0) engine=tsm1
Dec  9 14:37:32 nodered influxd[30010]: [I] 2017-12-09T01:37:32Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000048-000000004.tsm (#1) engine=tsm1
Dec  9 14:37:32 nodered influxd[30010]: [I] 2017-12-09T01:37:32Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000048-000000005.tsm (#2) engine=tsm1
Dec  9 14:37:32 nodered influxd[30010]: [I] 2017-12-09T01:37:32Z Sending usage statistics to usage.influxdata.com
Dec  9 14:40:53 nodered influxd[30010]: [I] 2017-12-09T01:40:53Z error replacing new TSM files: cannot allocate memory engine=tsm1
Dec  9 14:42:32 nodered influxd[30010]: [I] 2017-12-09T01:42:32Z error replacing new TSM files: cannot allocate memory engine=tsm1

Lenbok · 2017-12-09T21:08:14Z

Nope, it's not really any better than before. A few hours after the upgrade, influxdb suddenly started using 200% CPU (on my 4 core pi 3), doing constant IO and producing error messages in the log (although much less frequently than prior to the upgrade):

Dec  9 18:07:31 nodered influxd[30010]: [I] 2017-12-09T05:07:31Z Retention policy shard deletion check commencing. service=retention
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z beginning full compaction, 3 TSM files engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000032-000000005.tsm (#0) engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000048-000000004.tsm (#1) engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z beginning full compaction, 7 TSM files engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000048-000000005.tsm (#2) engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000032-000000005.tsm (#0) engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000036-000000003.tsm (#1) engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000040-000000003.tsm (#2) engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000044-000000003.tsm (#3) engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000046-000000002.tsm (#4) engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000047-000000001.tsm (#5) engine=tsm1
Dec  9 18:37:16 nodered influxd[30010]: [I] 2017-12-09T05:37:16Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000047-000000002.tsm (#6) engine=tsm1
Dec  9 18:37:31 nodered influxd[30010]: [I] 2017-12-09T05:37:31Z Retention policy shard deletion check commencing. service=retention
Dec  9 18:41:32 nodered influxd[30010]: [I] 2017-12-09T05:41:32Z error replacing new TSM files: cannot allocate memory engine=tsm1
Dec  9 18:41:34 nodered influxd[30010]: [I] 2017-12-09T05:41:34Z beginning full compaction, 3 TSM files engine=tsm1
Dec  9 18:41:34 nodered influxd[30010]: [I] 2017-12-09T05:41:34Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000032-000000005.tsm (#0) engine=tsm1
Dec  9 18:41:34 nodered influxd[30010]: [I] 2017-12-09T05:41:34Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000048-000000004.tsm (#1) engine=tsm1
Dec  9 18:41:34 nodered influxd[30010]: [I] 2017-12-09T05:41:34Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000048-000000005.tsm (#2) engine=tsm1
Dec  9 18:41:41 nodered influxd[30010]: [I] 2017-12-09T05:41:41Z error replacing new TSM files: cannot allocate memory engine=tsm1
Dec  9 18:41:43 nodered influxd[30010]: [I] 2017-12-09T05:41:43Z beginning full compaction, 7 TSM files engine=tsm1
Dec  9 18:41:43 nodered influxd[30010]: [I] 2017-12-09T05:41:43Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000032-000000005.tsm (#0) engine=tsm1
Dec  9 18:41:43 nodered influxd[30010]: [I] 2017-12-09T05:41:43Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000036-000000003.tsm (#1) engine=tsm1
Dec  9 18:41:43 nodered influxd[30010]: [I] 2017-12-09T05:41:43Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000040-000000003.tsm (#2) engine=tsm1
Dec  9 18:41:43 nodered influxd[30010]: [I] 2017-12-09T05:41:43Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000044-000000003.tsm (#3) engine=tsm1
Dec  9 18:41:43 nodered influxd[30010]: [I] 2017-12-09T05:41:43Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000046-000000002.tsm (#4) engine=tsm1
Dec  9 18:41:43 nodered influxd[30010]: [I] 2017-12-09T05:41:43Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000047-000000001.tsm (#5) engine=tsm1
Dec  9 18:41:43 nodered influxd[30010]: [I] 2017-12-09T05:41:43Z compacting full /var/lib/influxdb/data/sensors/autogen/11/000000047-000000002.tsm (#6) engine=tsm1
Dec  9 18:46:03 nodered influxd[30010]: [I] 2017-12-09T05:46:03Z error replacing new TSM files: cannot allocate memory engine=tsm1
Dec  9 18:46:05 nodered influxd[30010]: [I] 2017-12-09T05:46:05Z beginning full compaction, 3 TSM files engine=tsm1
Dec  9 18:46:05 nodered influxd[30010]: [I] 2017-12-09T05:46:05Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000032-000000005.tsm (#0) engine=tsm1
Dec  9 18:46:05 nodered influxd[30010]: [I] 2017-12-09T05:46:05Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000048-000000004.tsm (#1) engine=tsm1
Dec  9 18:46:05 nodered influxd[30010]: [I] 2017-12-09T05:46:05Z compacting full /var/lib/influxdb/data/sensors/autogen/12/000000048-000000005.tsm (#2) engine=tsm1
Dec  9 18:46:15 nodered influxd[30010]: [I] 2017-12-09T05:46:15Z error replacing new TSM files: cannot allocate memory engine=tsm1

Edit: BTW, memory profile doesn't look like it is running low, unless it's trying to grab a whole bunch at once:

Lenbok · 2018-02-06T04:07:09Z

And it's doing it again...

Feb  6 16:54:58 nodered influxd[332]: [I] 2018-02-06T03:54:58Z beginning full compaction, 4 TSM files engine=tsm1
Feb  6 16:54:58 nodered influxd[332]: [I] 2018-02-06T03:54:58Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000016-000000004.tsm (#0) engine=tsm1
Feb  6 16:54:58 nodered influxd[332]: [I] 2018-02-06T03:54:58Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000021-000000003.tsm (#1) engine=tsm1
Feb  6 16:54:58 nodered influxd[332]: [I] 2018-02-06T03:54:58Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000026-000000003.tsm (#2) engine=tsm1
Feb  6 16:54:58 nodered influxd[332]: [I] 2018-02-06T03:54:58Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000027-000000001.tsm (#3) engine=tsm1
Feb  6 16:56:54 nodered influxd[332]: [I] 2018-02-06T03:56:54Z error replacing new TSM files: cannot allocate memory engine=tsm1
Feb  6 16:56:56 nodered influxd[332]: [I] 2018-02-06T03:56:56Z beginning full compaction, 4 TSM files engine=tsm1
Feb  6 16:56:56 nodered influxd[332]: [I] 2018-02-06T03:56:56Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000016-000000004.tsm (#0) engine=tsm1
Feb  6 16:56:56 nodered influxd[332]: [I] 2018-02-06T03:56:56Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000021-000000003.tsm (#1) engine=tsm1
Feb  6 16:56:56 nodered influxd[332]: [I] 2018-02-06T03:56:56Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000026-000000003.tsm (#2) engine=tsm1
Feb  6 16:56:56 nodered influxd[332]: [I] 2018-02-06T03:56:56Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000027-000000001.tsm (#3) engine=tsm1
Feb  6 16:58:50 nodered influxd[332]: [I] 2018-02-06T03:58:50Z error replacing new TSM files: cannot allocate memory engine=tsm1
Feb  6 16:58:52 nodered influxd[332]: [I] 2018-02-06T03:58:52Z beginning full compaction, 4 TSM files engine=tsm1
Feb  6 16:58:52 nodered influxd[332]: [I] 2018-02-06T03:58:52Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000016-000000004.tsm (#0) engine=tsm1
Feb  6 16:58:52 nodered influxd[332]: [I] 2018-02-06T03:58:52Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000021-000000003.tsm (#1) engine=tsm1
Feb  6 16:58:52 nodered influxd[332]: [I] 2018-02-06T03:58:52Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000026-000000003.tsm (#2) engine=tsm1
Feb  6 16:58:52 nodered influxd[332]: [I] 2018-02-06T03:58:52Z compacting full /var/lib/influxdb/data/sensors/autogen/21/000000027-000000001.tsm (#3) engine=tsm1
Feb  6 17:00:49 nodered influxd[332]: [I] 2018-02-06T04:00:49Z error replacing new TSM files: cannot allocate memory engine=tsm1

@jwilder any tips to how to sort this out?

Redferne · 2018-04-25T15:57:39Z

Having similar issue.

Apr 25 17:55:10 hassbian influxd[4783]: ts=2018-04-25T15:55:10.364336Z lvl=info msg="Error replacing new TSM files" log_id=07gFbOnW000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=07gN~yU0000 op_name=tsm1_compact_group error="cannot allocate memory"

Started out-of-the-blue, been running without a hitch for over a year.
InfluxDB v1.5.2 (git: 1.5 02d7d4f)

Please advice.

dgnorton · 2018-10-16T17:59:46Z

@Redferne are you still seeing this issue?

Redferne · 2018-10-16T23:12:03Z

Omg. Forgot about this. No, my problem was caused by running influxd on 32bit system and database got bigger than x GByte. Would have been nice with a warning though ☺️

dgnorton · 2018-10-17T01:15:45Z

@Redferne thanks. Going to close this for now. We'll address it in the future if it turns out to be a frequent problem for users.

Lenbok · 2018-10-17T04:24:19Z

Just for future reference for anyone coming across this -- the problem went away for me on my raspberry pi 3 after I swapped to a larger / faster SD card. Possibly once compactions were triggered they caused more I/O than the SD card could keep up with and the I/O kept accumulating faster than it could be serviced.

Lenbok · 2018-12-12T09:22:15Z

@dgnorton It seems I spoke too soon. Mine started doing this again since 3rd December. This is so frustrating, I think I'm just going to delete influxdb altogether.

gs4711 · 2018-12-19T13:02:01Z

Suddently started to occure with no visible reason for it:

influxd[19485]: ts=2018-12-19T12:54:01.007492Z lvl=info msg="Error replacing new TSM files" log_id=0CTdI0HW000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0CTdJ3~l000 op_name=tsm1_compact_group error="cannot allocate memory" influxd[19485]: ts=2018-12-19T12:54:02.969238Z lvl=info msg="Aborted compaction" log_id=0CTdI0HW000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0CTdKTr0000 op_name=tsm1_compact_group error="compaction in progress: open /var/lib/influxdb/data/openhab/openhab_retention/1673/000000276-000000003.tsm.tmp: file exists"
The existing files do not point to any possible reason:

-rw-r--r--  1 influxdb influxdb  6118411 Dec 13 21:14 000000131-000000004.tsm
-rw-r--r--  1 influxdb influxdb  6177033 Dec 18 06:52 000000260-000000004.tsm
-rw-r--r--  1 influxdb influxdb   402900 Dec 16 15:03 000000268-000000002.tsm
-rw-r--r--  1 influxdb influxdb   402065 Dec 16 19:59 000000276-000000002.tsm
-rw-r--r--  1 influxdb influxdb 12771613 Dec 19 13:54 000000276-000000003.tsm.tmp
-rw-r--r--  1 influxdb influxdb    31047 Dec 11 00:00 fields.idx

Removing the latest tsm files solves this issue only for some days. During the occurence of the error there is more than enough free memory available.

Any hints?

think-free · 2018-12-19T13:18:08Z

DB size ? You are running on a rpi, right ? I had this problem, moving the database to a 64 bits system solved the issue ... El mié., 19 dic. 2018 a las 14:02, Guenther Schreiner (< [email protected]>) escribió:

…

Suddently started to occure with no visible reason for it: influxd[19485]: ts=2018-12-19T12:54:01.007492Z lvl=info msg="Error replacing new TSM files" log_id=0CTdI0HW000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0CTdJ3~l000 op_name=tsm1_compact_group error="cannot allocate memory" influxd[19485]: ts=2018-12-19T12:54:02.969238Z lvl=info msg="Aborted compaction" log_id=0CTdI0HW000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0CTdKTr0000 op_name=tsm1_compact_group error="compaction in progress: open /var/lib/influxdb/data/openhab/openhab_retention/1673/000000276-000000003.tsm.tmp: file exists" The existing files do not point to any possible reason: -rw-r--r-- 1 influxdb influxdb 6118411 Dec 13 21:14 000000131-000000004.tsm -rw-r--r-- 1 influxdb influxdb 6177033 Dec 18 06:52 000000260-000000004.tsm -rw-r--r-- 1 influxdb influxdb 402900 Dec 16 15:03 000000268-000000002.tsm -rw-r--r-- 1 influxdb influxdb 402065 Dec 16 19:59 000000276-000000002.tsm -rw-r--r-- 1 influxdb influxdb 12771613 Dec 19 13:54 000000276-000000003.tsm.tmp -rw-r--r-- 1 influxdb influxdb 31047 Dec 11 00:00 fields.idx Removing the latest tsm files solves this issue only for some days. During the occurence of the error there is more than enough free memory available. Any hints? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6975 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB2UT2mo7Jzr4ssZnt5AaCFTnE8Z2HR6ks5u6jjdgaJpZM4JHKR3> .

gs4711 · 2018-12-19T14:22:26Z

DB size ? You are running on a rpi, right ? I had this problem, moving the database to a 64 bits system solved the issue ...

1.2G /var/lib/influxdb/data
24K /var/lib/influxdb/meta
1.5M /var/lib/influxdb/wal

Hhhhm, you're pointing into a direction, which I really do not like - still hoping that one developer raises a hand and tells us that this is not the truth!

ryan-williams · 2018-12-22T00:07:44Z

I'm suddenly seeing this on a (32 bit) raspberry pi 3 b+, consistent with the hypothesis that it started when one DB (collectd) reached 1GB:

$ du -sh /var/lib/influxdb/data/*
1.1G	/var/lib/influxdb/data/collectd
14M	/var/lib/influxdb/data/_internal
204M	/var/lib/influxdb/data/temps

influx was running fine for weeks, but now is in a crash-loop every few minutes, generating a ton of logspam.

Here's a histogram of the msg values from the logs above:

  16 msg="Compacting file"
  18 msg="Opened shard"
  38 msg="Opened file"
7521 msg="Reading file"

7512 of the 7521 Reading file messages are for the collectd database, relating to files spanning basically all successive integers from /var/lib/influxdb/wal/collectd/autogen/22/_10769.wal to /var/lib/influxdb/wal/collectd/autogen/22/_18305.wal.

The process seems to die during compaction, with the messages:

Dec 21 23:23:19 rpi2 influxd[21364]: ts=2018-12-21T23:23:19.859433Z lvl=info msg="Error adding new TSM files from snapshot. Removing temp files." log_id=0CWl~kg0000 engine=tsm1 trace_id=0CWm70Sl000 op_name=tsm1_cache_snapshot error="cannot allocate memory"
Dec 21 23:23:19 rpi2 influxd[21364]: ts=2018-12-21T23:23:19.896133Z lvl=info msg="Cache snapshot (end)" log_id=0CWl~kg0000 engine=tsm1 trace_id=0CWm70Sl000 op_name=tsm1_cache_snapshot op_event=end op_elapsed=13668.832ms
Dec 21 23:23:19 rpi2 influxd[21364]: ts=2018-12-21T23:23:19.983874Z lvl=info msg="Error writing snapshot" log_id=0CWl~kg0000 engine=tsm1 error="cannot allocate memory"
Dec 21 23:23:19 rpi2 influxd[21364]: ts=2018-12-21T23:23:19.984015Z lvl=info msg="Cache snapshot (start)" log_id=0CWl~kg0000 engine=tsm1 trace_id=0CWm7rBl000 op_name=tsm1_cache_snapshot op_event=start
Dec 21 23:23:23 rpi2 influxd[21364]: runtime: out of memory: cannot allocate 8192-byte block (694747136 in use)
Dec 21 23:23:23 rpi2 influxd[21364]: fatal error: out of memory

followed by a bunch of stacktrace/heapdump. Overall memory use on the system doesn't seem particularly high when this is happening (and there's plenty of swap space available), so I'm not sure why the 8kb memory-allocation is failing.

I worked around this issue by wiping out / backing up the existing collectd database:

sudo service influxdb stop
sudo mv /var/lib/influxdb/data/collectd ~/collectd
sudo service influxdb start
influx -execute 'create database collectd'

After this, my collectd database seems fine, and it even has about 30 hours of historical data, which I guess was being buffered by influx in some fashion while the compaction crash-loop was occurring.

Of course, this means I can only ever have Influx save the last few weeks of my collectd metrics, and I have to watch for the 1GB limit and keep "backing up" the collectd db, which is obviously not ideal.

Cross-linking #10486 which seems like a canonical discussion of this issue.

fluffynukeit · 2019-03-07T18:01:57Z

FYI I have an open PR in #12362 that is the successor to #10486.

wollew · 2019-04-08T14:43:59Z

It is awfully quiet in #12362, what's influxdata's policy on including them in any release?

fluffynukeit · 2019-04-08T15:38:32Z

I haven't seen any response from the Influx team regarding the PR. I made it a PR off of branch 1.8 because I don't want to adopt version 2.0 any time soon, but I think 2.0 is what the Influx team is focusing on right now. Honestly, it wouldn't be hard for someone familiar with the 2.0 codebase to port the fixes into 2.0, assuming they aren't in there already. I'll request an update on the PR thread.

mumrau · 2021-03-16T22:07:58Z

2.0 won't ever support 32 bits system, thus we kinda are in a dead end are we?

jwilder added the need more info label Jul 18, 2016

dgnorton closed this as completed Oct 17, 2018

ryan-williams mentioned this issue Jan 21, 2019

Compaction crash loops and data loss on Raspberry Pi 3 B+ under minimal load #11339

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error compacting TSM files: cannot allocate memory #6975

error compacting TSM files: cannot allocate memory #6975

denis-arruda commented Jul 7, 2016 •

edited

Loading

carbolymer commented Jul 17, 2016

jwilder commented Jul 18, 2016

denis-arruda commented Jul 18, 2016 •

edited

Loading

jwilder commented Jul 18, 2016

denis-arruda commented Jul 18, 2016

jwilder commented Jul 18, 2016 •

edited

Loading

ericb48 commented Apr 10, 2017

think-free commented May 4, 2017

Lenbok commented Sep 16, 2017 •

edited

Loading

mvillarejo commented Oct 8, 2017

jpetermans commented Dec 1, 2017 •

edited

Loading

Lenbok commented Dec 9, 2017

Lenbok commented Dec 9, 2017 •

edited

Loading

Lenbok commented Feb 6, 2018

Redferne commented Apr 25, 2018

dgnorton commented Oct 16, 2018

Redferne commented Oct 16, 2018

dgnorton commented Oct 17, 2018

Lenbok commented Oct 17, 2018

Lenbok commented Dec 12, 2018

gs4711 commented Dec 19, 2018

think-free commented Dec 19, 2018 via email

gs4711 commented Dec 19, 2018

ryan-williams commented Dec 22, 2018 •

edited

Loading

fluffynukeit commented Mar 7, 2019

wollew commented Apr 8, 2019

fluffynukeit commented Apr 8, 2019

mumrau commented Mar 16, 2021

error compacting TSM files: cannot allocate memory #6975

error compacting TSM files: cannot allocate memory #6975

Comments

denis-arruda commented Jul 7, 2016 • edited Loading

Bug report

carbolymer commented Jul 17, 2016

jwilder commented Jul 18, 2016

denis-arruda commented Jul 18, 2016 • edited Loading

jwilder commented Jul 18, 2016

denis-arruda commented Jul 18, 2016

jwilder commented Jul 18, 2016 • edited Loading

ericb48 commented Apr 10, 2017

think-free commented May 4, 2017

Lenbok commented Sep 16, 2017 • edited Loading

mvillarejo commented Oct 8, 2017

jpetermans commented Dec 1, 2017 • edited Loading

Lenbok commented Dec 9, 2017

Lenbok commented Dec 9, 2017 • edited Loading

Lenbok commented Feb 6, 2018

Redferne commented Apr 25, 2018

dgnorton commented Oct 16, 2018

Redferne commented Oct 16, 2018

dgnorton commented Oct 17, 2018

Lenbok commented Oct 17, 2018

Lenbok commented Dec 12, 2018

gs4711 commented Dec 19, 2018

think-free commented Dec 19, 2018 via email

gs4711 commented Dec 19, 2018

ryan-williams commented Dec 22, 2018 • edited Loading

fluffynukeit commented Mar 7, 2019

wollew commented Apr 8, 2019

fluffynukeit commented Apr 8, 2019

mumrau commented Mar 16, 2021

denis-arruda commented Jul 7, 2016 •

edited

Loading

denis-arruda commented Jul 18, 2016 •

edited

Loading

jwilder commented Jul 18, 2016 •

edited

Loading

Lenbok commented Sep 16, 2017 •

edited

Loading

jpetermans commented Dec 1, 2017 •

edited

Loading

Lenbok commented Dec 9, 2017 •

edited

Loading

ryan-williams commented Dec 22, 2018 •

edited

Loading