-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error compacting TSM files: cannot allocate memory #6975
Comments
This is the same issue: #5440 |
@denis-arruda Can you test with a current nightly and include the full stack trace as a gist? |
I can not test it now. Meanwhile, I am sending the complete log for version 0.13.0: |
@denis-arruda That trace shows a lot of writes blocked up in the WAL and a compaction run which might have triggered the OOM. The backed up writes are likely the cause of the memory issue though. What kind of disks do you have? Can you run #7024 has a fix that may help you. It will be in tonight's nightly build. |
The command you asked in my server returns "comand not found". I dont'know the disk type. I know the server I am using is a virtual machine in VMWare. |
You would need to install the |
I have the same messages but influxdb doesn't crash. run on a scaleway ARM server. Apr 10 11:06:29 scw-c24a7e influxd[11431]: [I] 2017-04-10T09:06:29Z beginning optimize compaction of group 0, 8 TSM files engine=tsm1 |
Same here on Rasperry pi 3 + raspbian, go this two errors :
free -m :
Influx :
|
Also getting the same error as @think-free on my raspberry pi 3. Influx has been happily running for ages then all of a sudden starts spamming the logs with the two error messages (and the database in /var/lib/influxdb starts rapidly expanding), eventually causing out of disk space. After clearing up the logs and rebooting, it seems to be running fine although it looks like the last few days of data is gone :-(. Perhaps there is a memory leak somewhere? |
Same behaviour here |
Getting the similar messages hitting the log every 1 second. Raspberry Pi 3, Influx 1.3.7. My log files are growing faster than my databases at this stage. Any direction on a fix?
|
Mine started doing this again (continuously for the last three days before I noticed). Restarting influxdb didn't help. On an off chance I tried updating influxdb (from 1.3.5 to 1.4.2), and things were looking hopeful when influxd restarted (i.e. it didn't error straight after startup). A few minutes later I got a couple of similar looking error messages which don't sound good, although it's not continuing to spam the logs, so maybe influxdb has got itself into a happier place:
|
And it's doing it again...
@jwilder any tips to how to sort this out? |
Having similar issue.
Started out-of-the-blue, been running without a hitch for over a year. Please advice. |
@Redferne are you still seeing this issue? |
Omg. Forgot about this. No, my problem was caused by running influxd on 32bit system and database got bigger than x GByte. Would have been nice with a warning though |
@Redferne thanks. Going to close this for now. We'll address it in the future if it turns out to be a frequent problem for users. |
Just for future reference for anyone coming across this -- the problem went away for me on my raspberry pi 3 after I swapped to a larger / faster SD card. Possibly once compactions were triggered they caused more I/O than the SD card could keep up with and the I/O kept accumulating faster than it could be serviced. |
@dgnorton It seems I spoke too soon. Mine started doing this again since 3rd December. This is so frustrating, I think I'm just going to delete influxdb altogether. |
Suddently started to occure with no visible reason for it:
Removing the latest tsm files solves this issue only for some days. During the occurence of the error there is more than enough free memory available. Any hints? |
DB size ?
You are running on a rpi, right ?
I had this problem, moving the database to a 64 bits system solved the
issue ...
El mié., 19 dic. 2018 a las 14:02, Guenther Schreiner (<
[email protected]>) escribió:
… Suddently started to occure with no visible reason for it:
influxd[19485]: ts=2018-12-19T12:54:01.007492Z lvl=info msg="Error
replacing new TSM files" log_id=0CTdI0HW000 engine=tsm1 tsm1_strategy=full
tsm1_optimize=false trace_id=0CTdJ3~l000 op_name=tsm1_compact_group
error="cannot allocate memory" influxd[19485]:
ts=2018-12-19T12:54:02.969238Z lvl=info msg="Aborted compaction"
log_id=0CTdI0HW000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false
trace_id=0CTdKTr0000 op_name=tsm1_compact_group error="compaction in
progress: open
/var/lib/influxdb/data/openhab/openhab_retention/1673/000000276-000000003.tsm.tmp:
file exists"
The existing files do not point to any possible reason:
-rw-r--r-- 1 influxdb influxdb 6118411 Dec 13 21:14 000000131-000000004.tsm
-rw-r--r-- 1 influxdb influxdb 6177033 Dec 18 06:52 000000260-000000004.tsm
-rw-r--r-- 1 influxdb influxdb 402900 Dec 16 15:03 000000268-000000002.tsm
-rw-r--r-- 1 influxdb influxdb 402065 Dec 16 19:59 000000276-000000002.tsm
-rw-r--r-- 1 influxdb influxdb 12771613 Dec 19 13:54 000000276-000000003.tsm.tmp
-rw-r--r-- 1 influxdb influxdb 31047 Dec 11 00:00 fields.idx
Removing the latest tsm files solves this issue only for some days. During
the occurence of the error there is more than enough free memory available.
Any hints?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6975 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB2UT2mo7Jzr4ssZnt5AaCFTnE8Z2HR6ks5u6jjdgaJpZM4JHKR3>
.
|
1.2G /var/lib/influxdb/data Hhhhm, you're pointing into a direction, which I really do not like - still hoping that one developer raises a hand and tells us that this is not the truth! |
I'm suddenly seeing this on a (32 bit) raspberry pi 3 b+, consistent with the hypothesis that it started when one DB (
influx was running fine for weeks, but now is in a crash-loop every few minutes, generating a ton of logspam. Here's a histogram of the
7512 of the 7521 The process seems to die during compaction, with the messages:
followed by a bunch of stacktrace/heapdump. Overall memory use on the system doesn't seem particularly high when this is happening (and there's plenty of swap space available), so I'm not sure why the 8kb memory-allocation is failing. I worked around this issue by wiping out / backing up the existing
After this, my Of course, this means I can only ever have Influx save the last few weeks of my Cross-linking #10486 which seems like a canonical discussion of this issue. |
It is awfully quiet in #12362, what's influxdata's policy on including them in any release? |
I haven't seen any response from the Influx team regarding the PR. I made it a PR off of branch 1.8 because I don't want to adopt version 2.0 any time soon, but I think 2.0 is what the Influx team is focusing on right now. Honestly, it wouldn't be hard for someone familiar with the 2.0 codebase to port the fixes into 2.0, assuming they aren't in there already. I'll request an update on the PR thread. |
2.0 won't ever support 32 bits system, thus we kinda are in a dead end are we? |
Bug report
System info: InfluxDB 0.13.0-1, Linux softsw69 2.6.34.10-0.6-desktop #1 SMP PREEMPT 2011-12-13 18:27:38 +0100 x86_64 x86_64 x86_64 GNU/Linux (Suse Linux)
Steps to reproduce:
After almost two days writting around 5.000 metrics per seconds, InfluxDB crashes. When I start InfluxDB process again, I got this error in log:
What is happenning?
I know that there are still space on disk.
The text was updated successfully, but these errors were encountered: