Large database support for 32bit systems #12362

fluffynukeit · 2019-03-05T16:54:25Z

Briefly describe your proposed changes:

Rebased/mergeable
Tests pass
[] http/swagger.yml updated (if modified Go structs or API) <- is this relevant to 1.8 branch?
Sign CLA (if not already signed)

This PR implements a fix to allow 32bit systems to have large databases. Most of the discussion on #10486 still applies, but the implemented fix is different than my original attempt, and much cleaner, I think.

The implementation makes changes to the mmapAccessor in reader.go and supporting files.

Changes the role of the accessor interface. The interface is modified to only be relevant to accessing the block data of the TSM file (previously m.b), and not other things like access locking.
One implementation of the new interface, mmapAccessor (same name as previously) implements the existing method of mmapping TSM block data into the process address space.
Another implementation of the new interface, seekAccessor, uses seek/read operations to access the TSM block data. This eliminates much of the mmapping, preventing out of memory errors on large databases.
A new configuration property tsm-use-seek, default false, can be specified to use the seek accessor instead of the mmap accessor.

I ran the same tests described in #10486. With tsm-use-seek = false, I encountered the same out of memory errors when trying to run compaction jobs at the 3.6GB file limit:

3.6G    /LASERINFO/influxdb/data/stress/autogen/
174.2M  /LASERINFO/influxdb/data/stress/autogen/2
371.6M  /LASERINFO/influxdb/data/stress/autogen/3
839.7M  /LASERINFO/influxdb/data/stress/autogen/4
851.2M  /LASERINFO/influxdb/data/stress/autogen/5
768.5M  /LASERINFO/influxdb/data/stress/autogen/6
635.5M  /LASERINFO/influxdb/data/stress/autogen/7
8.0K    /LASERINFO/influxdb/data/stress/autogen/8

I then killed influxd, set tsm-use-seek=true, then continued the test. Compaction jobs that failed started running immediately, and eventually the total size of the DB far exceeded the 3.6 GB limit. See again, as in #10486, the peak VM memory usage is less than 1 GB.

root@plnx_arm:~# du -hs /LASERINFO/influxdb/data/stress/autogen/ && du -hs /LASERINFO/influxdb/data/stress/autogen/* && cat /proc/1153/status && tail -n 4 /LASERINFO/dan_influxlog_seek
5.2G    /LASERINFO/influxdb/data/stress/autogen/
284.7M  /LASERINFO/influxdb/data/stress/autogen/10
325.7M  /LASERINFO/influxdb/data/stress/autogen/11
416.6M  /LASERINFO/influxdb/data/stress/autogen/12
438.7M  /LASERINFO/influxdb/data/stress/autogen/13
448.8M  /LASERINFO/influxdb/data/stress/autogen/14
452.3M  /LASERINFO/influxdb/data/stress/autogen/15
420.4M  /LASERINFO/influxdb/data/stress/autogen/16
136.5M  /LASERINFO/influxdb/data/stress/autogen/17
174.2M  /LASERINFO/influxdb/data/stress/autogen/2
371.6M  /LASERINFO/influxdb/data/stress/autogen/3
419.6M  /LASERINFO/influxdb/data/stress/autogen/4
425.3M  /LASERINFO/influxdb/data/stress/autogen/5
404.4M  /LASERINFO/influxdb/data/stress/autogen/6
415.3M  /LASERINFO/influxdb/data/stress/autogen/7
8.2M    /LASERINFO/influxdb/data/stress/autogen/8
221.9M  /LASERINFO/influxdb/data/stress/autogen/9
Name:   influxd
Umask:  0022
State:  S (sleeping)
Tgid:   1153
Ngid:   0
Pid:    1153
PPid:   939
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 256
Groups: 0
VmPeak:   924972 kB <----------------------- less than 1GB

I was also able to query influxd using the influx client while running the test without any problems. In my previous fix attempt, querying (even if no data was being written) could cause memory faults and kill the process.

This fix should help ease a consistent pain point with InfluxDB running on 32bit systems (like RPi or Zynq) in which is works great for long periods of time but then hits a wall at the e.g. 3.6 GB limit (depending on system).

…ce to soon work for block byte data.

…yet implemented.

…mmapAccessor and seekAccessor. Seems to have comparable performance.

fluffynukeit · 2019-04-08T15:40:00Z

Hi, Influx team. Is there any plan to merge this PR into the 1.8 branch? I think there is sufficient community interest. If something is holding you back from bringing it it, please advise me on what you'd need to see before getting it merged. Thanks.

pinkynrg · 2019-05-31T16:29:08Z

Would be anyone so nice to reply so we all have an idea on what the influx team plans to do about this issue?

timhallinflux · 2019-06-06T18:16:25Z

@fluffynukeit -- wow, you really dug in here and we appreciate the contribution! I'm wondering if you would be open to discussing this in a bit more detail? DM me... tim at influxdata.com

fluffynukeit · 2019-06-06T18:36:26Z

@timhallinflux Will do.

r00sta · 2019-06-17T12:49:22Z

@fluffynukeit @timhallinflux any news on if this will be merged into 1.8?

fluffynukeit · 2019-06-17T15:54:54Z

Tim contacted me privately with some questions regarding this PR. Currently, that was 11 days ago. I responded to the questions promptly but have not heard back since then, so honestly I don't know for sure what InfluxData is planning to do with this PR or 32 bit support. That's my broad disclaimer for what follows.

I don't think it's appropriate for me to copy/paste a private exchange, so I'll merely try to summarize the salient points from my POV. If I were to extrapolate based on my exchange with Tim, my guess is that Influx will not be merging these changes into version 1.8, if ever. The reason is that they believe the changes in my PR present a lot of risk to the whole influx community, and they are not equipped to test 32 bit platforms to the same rigor as they do 64 bit platforms in order to mitigate that risk. Getting all that testing infrastructure set up would take a lot of work that they can't take on right now because they are focused on the 2.0 release. This could possibly change if there was a significant commercial interest in getting better 32 bit support, but I could not offer any such business opportunity to him from my side. I tried to argue that various IoT applications, like the ones we are almost all dealing with on 32 bit platforms, present a good market opportunity to Influx because Influx's features are particularly suited for such small platforms (aside from the mmap constraint this PR tries to remedy). I also argued that they wouldn't need to test 32 bit platforms exhaustively; they could merely verify that this PR does not break 64 bit systems, and relegate 32 bit to second-class status. I think we would all accept that in lieu of doing nothing. My hope is that they are still considering my input, but it has indeed been a long silence since then (silence toward the community regarding this issue was one thing I mentioned I hope they improve on).

My advice is simply that 32 bit users on InfluxDB should assume we are on our own until we hear something concrete from Influx. I, however, switched companies recently and am no longer working on an application that uses Influx, so I won't be able to make changes or adjustments to this PR in response to Influx's feedback. Anyone else is 100% welcome to carry the torch instead, of course.

aemondis · 2019-09-08T11:22:05Z

For anyone still following this particular issue, specifically when running on a IoT device such as a Raspberry Pi with raspbian (a 32 bit OS) - if you get this dreaded error, the only solution is to simply load up the database on another 64-bit device just to run the compaction periodically, then transfer it back. It somewhat defeats the benefits of a small IoT device, but seems very much that InfluxDB are just not interested in the next wave IoT capabilities that are beginning to emerge, at this time. In short... don't waste too much time trying to get InfluxDB running on a 32-bit platform with even a moderate size DB, as it will inevitably fail with this issue.

Thank to @fluffynukeit for his fantastic work in hunting out the root cause of the issue, as it was no doubt a tough one to find. I hope that InfluxDB change their mind in the future on this issue, but the lack of any feedback or merge suggest this fix is probably dead in the water, unless someone has the time and commitment to make a fork for the IoT/32-bit community.

truekinetix · 2019-09-18T16:04:57Z

It is one thing not to want to take this change ( that would save us ) for business reasons, but why would Influx data just ignore fluffynukeit and the other 32-bit users of Influx ?

And will Influx Data now put a warning on their web pages, so future implementers will know what to expect if they chose to use InfluxDB for their devices ?

stale · 2019-12-17T16:33:33Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

godidog · 2019-12-18T08:47:59Z

It is stale because Influx are not going to addres the issue.
It is one thing to make a business decision not to support 32-bit systems.
But it is quite another to choose to ignore the issue and make no comment on it.
And there still no warning that you can compile influx for 32-bit systems but it is unusable on them.
Poor show, Influx Data, poor show.

JsBergbau · 2020-01-08T11:19:45Z

@fluffynukeit: Thank you very much for work in investigating this issue. I will also run in that limit in about two years if nothing changes till then.

There is now a 64Bit kernel for RaspberryPI available https://www.raspberrypi.org/forums/viewtopic.php?t=250730
Focus is mainly on RPI4 but it also runs on a RPI3

Would enabling the 64Bit kernel resolve this issue? I guess not, because provided influxdb package is still 32 Bit.
If influxdb is not willing to include your fix, perhaps it is more easy to compile for 64 Bit raspbian kernel. Or perhaps work could be done at the Raspbian foundation to include a 64 Bit influxdb in their official repository since it's solving problems of the 32 Bit version.

aemondis · 2020-01-08T13:40:41Z

@JsBergbau I am running the 64-bit kernel on a RPI4 with 4GB RAM, and can confirm that it does indeed resolve the issue. I suspect the issue is more a limitation of the OS's ability to address large chunks of memory, as opposed to within the process itself.

fluffynukeit · 2020-01-08T15:49:39Z

@JsBergbau In my application, I was not using an RPi but a Xilinx Zynq. I am no longer with the company working on that project, though. I will forward the possibility to the current team, however. Thanks.

mattelacchiato · 2020-05-25T04:49:11Z

This is really sad since most IoT applications are indeed running on 32bit...

simonvetter · 2020-06-14T16:16:49Z

I can confirm these commits fix the issue on 32-bit systems (ARM, odroid C1 running armbian).

I started hitting the same "out of memory" compaction issue as everyone else as soon as my DB came close to 3GB. I've re-applied those commits on top of the 1.8 release branch, rebuilt influxDB and my compaction issues went away.

This particular instance has been running without issue for about 10 days now. I haven't noticed any increase in CPU/memory usage, memory leak, or even impact on performance.

I've pushed my branch at https://github.com/simonvetter/influxdb/tree/1.8.0+big_db_32bit for anyone interested.

Thanks @fluffynukeit for the nice work!

aemondis · 2020-06-16T13:29:36Z

Awesome work @simonvetter for pulling together a 32-bit branch for Influx with that fix. I remember going through that solution by @fluffynukeit and wishing the InfluxDB team would even bother to read threads such as this, as it's an elegantly simple fix for a major issue impacting a portion of their user base. The fix is of low risk (it only affects anything IF you enable the option!), and could potentially even scale up the capability of the 32 or 64 bit deployments immensely in certain scenarios.

It boggles the mind that InfluxDB don't even respond to all the fantastic analysis and solutions in this thread. The best we can hope for is it helps others who stumble upon this thread.

godidog · 2020-06-16T13:41:48Z

Normal business decision: Not accepting a bugfix because it might affect something else somewhere ( possibly ).

Unreasonable behaviour: Ignoring a bugfix that affects many many users, AND the discussion of that bugfix, and abandoning an ongoing discussion with the developer.

Very disappointing, influxdata, this does not inspire trust in your company and it led to my company abandoning influx for our consumer device.

"Run away !"

mattelacchiato · 2020-06-17T07:11:10Z

I've migrated to VictoriaMetrics for now. Works like a charm, even with "large" databases =)

tablatronix · 2021-01-05T04:41:22Z

This is absurd, is there at least a failure mode we can enable for this, so tsm doesn't just retry over and over and kill your env?
Fail fatal , or retry limit ?

I would not merge this, it is too large a change, but if these methods can be refactored first or wrapped, then this PR might get smaller and get attention ?

bgilmore-iot · 2021-06-24T16:21:34Z

Hey everyone- My colleague Russ just pointed me at this thread. I’m (brand) new to InfluxData and am the Product Manager responsible for IOT. I’m currently putting together the go-forward strategy and greatly appreciate the info outlined here. Would welcome direct conversation on this and any other IoT related challenges or requirements - community slack works well or feel free to send me an email. Looking forward to collaborating! Thanks so much,

Brian Gilmore

simonvetter · 2021-08-12T19:48:27Z

Rebased on top of 1.8.9, code available at https://github.com/simonvetter/influxdb/tree/1.8.9+big_db_32bit .

As usual, tests pass and I haven't noticed any issue on my test systems.

See my comment above for build instructions and configuration.

Napfton · 2021-09-29T06:30:01Z

Rebased on top of 1.8.9, code available at https://github.com/simonvetter/influxdb/tree/1.8.9+big_db_32bit .

As usual, tests pass and I haven't noticed any issue on my test systems.

See my comment above for build instructions and configuration.

I am failing with

can't load package: package cmd/influx_tsm/.go is not in GOROOT (/usr/local/go/src/cmd/influx_tsm/.go)

on this line

go build -o build/influx_tsm cmd/influx_tsm/*.go

From the git log it says influx_tsm got removed, so I guess we can skip that step as well?

EDIT: successfull build and run without the _tsm line.

simonvetter · 2021-10-02T14:32:56Z

Yeah, influx_tsm was removed in 1.8.9, so

go build -o build/influx_tsm cmd/influx_tsm/*.go

should be omitted from my build instructions.

jensb · 2021-10-26T19:11:47Z

So, if my IoT platform is a Raspbi 4b with 4GB RAM,

to run a supported version of Influx with a database > 3GB, I have to run Influx v2?
Does this mean that I have to install a 64bit OS, since activating the 64bit kernel in Raspbian is not sufficient?

Also, will the RAM requirements of Influx 2 be lower than of Influx 1 with larger datasets?

Or will I be better off applying this patch, and staying with 1.8?
(In this case, @simonvetter - would you rebase your patch on 1.8.10 which was released recently)?

simonvetter · 2021-10-27T17:03:12Z

Yep, I just rebased on top of 1.8.10, code available at https://github.com/simonvetter/influxdb/tree/1.8.10+big_db_32bit .

As usual, tests pass and I haven't noticed any issue on my systems so far.

TL;DR build instructions:

git clone https://github.com/simonvetter/influxdb.git
cd influxdb
git checkout origin/1.8.10+big_db_32bit
mkdir build
# target 32-bit ARM architectures, use GOARCH=386 for 32-bit Intel/AMD
export GOARCH=arm
go build -o build/influx_stress cmd/influx_stress/*.go
go build -o build/influx_tools cmd/influx_tools/*.go
go build -o build/influx_inspect cmd/influx_inspect/*.go
go build -o build/influxd cmd/influxd/main.go
go build -o build/influx cmd/influx/main.go

Your 32-bit binaries will be in the build directory, feel free to move them anywhere you like.

I suggest using the following settings in your influxdb config file:

[data]
    tsm-use-seek = true # use the seek accessor (what this whole PR is about)
    index-version = "tsi1" # use on-disk index files instead of keeping them in-memory
    max-concurrent-compactions = 1 # avoid running multiple compactions at once

[coordinator]
  query-timeout = "60s" # kill runaway queries (you may want to adjust the value depending on your hardware and query set)

jensb · 2021-10-28T22:15:55Z

Great! I crosscompiled it on an Intel i5 desktop, installed it, and boom - memory requirements were down from 3.6G to just about 1G, resident size between 400M and 800M. Running without issues so far.

Thank you!!

Napfton · 2021-10-30T11:04:02Z

Sadly, since I am using this custom code for influxdb, I am getting system freezes/crashes once a week on a raspi that was running 2 years without a single reboot.

Logs are (to me) inconclusive.

Version 2 of influxdb cannot be used on 32 bit systems right?

jensb · 2021-10-30T12:54:54Z

No issues so far on my Raspi 4b (4GB, 32GB flash).
What is your model?

Napfton · 2021-10-30T13:24:35Z

3B+ going to switch to a 64bit kernel to mitigate soon.

tablatronix · 2021-10-30T15:00:13Z

I switched to the rasbian 64bit os a while ago and its been good

simonvetter · 2021-11-01T00:19:32Z

If your hardware is 64bit capable then you should definitely switch to a 64bit kernel and use influx 2.x releases.

I've had a few instances running this branch for a few days without issue. Let me know if you can get your hands on a crash log and i'll investigate.

jensb · 2021-11-01T06:22:54Z

I am planning to do this sa well, but in the process I will also switch to a SSD storage (no uSD card, it's getting too slow) because going 64bit will mean having to reinstall everything anyway.
The problem is that during this transition, our house (read: my family) will have to live without various switches and features. The Raspi running Influx is an integral part of our smart home concept. This definitely needs the right timing. :-)
The big question is whether on the new system I will keep Influx, or use the opportunity to switch to a real database (Timescale DB based on Postgres). There are a few queries and analyses that Flux doesn't seem to be able to do at all. We'll see ...

chrisuki · 2021-12-15T20:49:55Z

@simonvetter Thanks a lot. I could build that on my Raspberry PI 4B, and no more errors at the moment.
EDIT 21.12.2021: After a few days, unfortunately, the "Cannot allocate memory" error is back...

OZ1SEJ · 2022-01-13T14:52:19Z

After compiling, when I run $ sudo /home/pi/influxdb/build/influxd I start getting memory errors (Raspberry Pi 3B, fresh install) and the program ultimately terminates.

lesam · 2022-03-08T18:29:17Z

We no longer support 32-bit systems in the mainline repo - there is a community fork that maintains 32 bit compatibility.

See also https://www.influxdata.com/blog/influxdb-oss-and-enterprise-roadmap-update-from-influxdays-emea/ , https://github.com/simonvetter/influxdb/ .

Closing this PR as it will not be merged.

baerengraben · 2022-06-01T10:16:47Z

@simonvetter : Did this fix on a 32bit system (hc2 ) with armbian. => worked like a charm. Thank you!!!

simonvetter · 2022-06-01T16:41:03Z

Thanks, I'm glad it helped!

lmarceg · 2023-01-23T18:46:59Z

@simonvetter , you rock!
I have a raspi 3b and I was facing the same OOM error; and now, BOOM! It all works!
I just have a couple of comments and one question:

I didn't have git so I easily installed it with apt
I didn't have go and I installed golang with apt, but the version you get from there is too old and three binaries will fail for this reason. Therefore I suggest you to look on the Internet and see how to download the latest version from the website and use that one before compiling the repo

That said, my question is the following: now that I have all those files in the build folder, which one can I substitute?
Can I overwrite influx, influxd, influx_inspect, stress and tools or will I lose something in the DB if I do like that?
I tested influxd and it all worked, not sure I need also the other commands.
But before doing a disaster when I am 99% done, I think I'd better ask.

Thanks again!
Luca

See influxdata/influxdb#12362

raoulbhatia · 2023-04-02T06:46:12Z

FYI I created an updated Docker image for influxdb-arm32 based on the following code trail.
(based on the latest influxdb:1.8.10 Docker image with Debian 11/Bullseye)

I published the Docker image at https://hub.docker.com/repository/docker/raoulbhatia/influxdb-arm32

Feedback welcome

thomasjungblut · 2023-06-18T13:28:26Z

In case anyone needs a newly rebuilt version of the branch with golang 1.20.5 and without dockerhub login:
https://quay.io/repository/thomasjungblut/influxdb-arm32?tab=info

podman pull quay.io/thomasjungblut/influxdb-arm32:287e3ed6

thalesmaoa · 2023-10-11T21:52:13Z

If I just simply change the core image will work or I have to remap my database?

Thx @simonvetter , @raoulbhatia and @thomasjungblut .

EDIT: It worked.
https://gist.github.com/thalesmaoa/a707257ddb0113b7b343fae3ca608199

albertoanta · 2024-06-12T09:21:26Z

FYI I created an updated Docker image for influxdb-arm32 based on the following code trail. (based on the latest influxdb:1.8.10 Docker image with Debian 11/Bullseye)
1. https://github.com/simonvetter/influxdb/tree/1.8.10+big_db_32bit

2. https://github.com/terjesannum/docker-influxdb-arm32

3. https://github.com/noelleehk/docker-influxdb-arm32

4. https://github.com/raoulbhatia/docker-influxdb-arm32/
I published the Docker image at https://hub.docker.com/repository/docker/raoulbhatia/influxdb-arm32

Feedback welcome
Thanks a lot. My odroid xu4 began to throw errors due to memory allocation at influxdb container. Just deployed image/container patched and everything is ok. Also memory footprint es very small.

fluffynukeit added 5 commits February 27, 2019 15:01

Changed mmapAccessor to just accessor. Modified blockAccessor interfa…

95dccc7

…ce to soon work for block byte data.

Moved mmap functions into blockAccessor interface.

ebf10c3

Added tsm-use-seek option for enabling seek-based blockAccessor, not …

578ddfd

…yet implemented.

Added seekAccessor, which uses fadvise also. All tests pass for both …

570d084

…mmapAccessor and seekAccessor. Seems to have comparable performance.

Added tsm-use-seek and description to sample config file.

aa257f7

This was referenced Mar 5, 2019

Support for large databases on 32 bit embedded systems #10486

Closed

error compacting TSM files: cannot allocate memory #6975

Closed

wollew mentioned this pull request Apr 10, 2019

Compaction crash loops and data loss on Raspberry Pi 3 B+ under minimal load #11339

Closed

stale bot added the wontfix label Dec 17, 2019

stale bot removed the wontfix label Dec 18, 2019

JsBergbau mentioned this pull request Jan 20, 2020

influxdb persistence: enable time resolution of 1 second openhab/openhab1-addons#5933

Open

JsBergbau mentioned this pull request Feb 7, 2022

Please add support for Prometheus and VictoriaMetrics database: Support for PromQL and MetricsQL dbeaver/dbeaver#15412

Open

lesam closed this Mar 8, 2022

Setcover added a commit to Setcover/smarthome that referenced this pull request Mar 27, 2023

32bit bugfix for influxdb

59be744

See influxdata/influxdb#12362

Setcover added a commit to Setcover/smarthome that referenced this pull request Mar 27, 2023

32bit bugfix for influxdb

7e79a8c

See influxdata/influxdb#12362

Large database support for 32bit systems #12362

Large database support for 32bit systems #12362

Conversation

fluffynukeit commented Mar 5, 2019

fluffynukeit commented Apr 8, 2019

pinkynrg commented May 31, 2019

timhallinflux commented Jun 6, 2019

fluffynukeit commented Jun 6, 2019

r00sta commented Jun 17, 2019

fluffynukeit commented Jun 17, 2019

aemondis commented Sep 8, 2019

truekinetix commented Sep 18, 2019

stale bot commented Dec 17, 2019

godidog commented Dec 18, 2019

JsBergbau commented Jan 8, 2020

aemondis commented Jan 8, 2020

fluffynukeit commented Jan 8, 2020

mattelacchiato commented May 25, 2020

simonvetter commented Jun 14, 2020

aemondis commented Jun 16, 2020

godidog commented Jun 16, 2020

mattelacchiato commented Jun 17, 2020

tablatronix commented Jan 5, 2021 • edited Loading

bgilmore-iot commented Jun 24, 2021

simonvetter commented Aug 12, 2021

Napfton commented Sep 29, 2021 • edited Loading

simonvetter commented Oct 2, 2021

jensb commented Oct 26, 2021

simonvetter commented Oct 27, 2021

jensb commented Oct 28, 2021

Napfton commented Oct 30, 2021

jensb commented Oct 30, 2021

Napfton commented Oct 30, 2021

tablatronix commented Oct 30, 2021

simonvetter commented Nov 1, 2021

jensb commented Nov 1, 2021 • edited Loading

chrisuki commented Dec 15, 2021 • edited Loading

OZ1SEJ commented Jan 13, 2022

lesam commented Mar 8, 2022

baerengraben commented Jun 1, 2022

simonvetter commented Jun 1, 2022

lmarceg commented Jan 23, 2023

raoulbhatia commented Apr 2, 2023 • edited Loading

thomasjungblut commented Jun 18, 2023

thalesmaoa commented Oct 11, 2023 • edited Loading

albertoanta commented Jun 12, 2024 • edited Loading

tablatronix commented Jan 5, 2021 •

edited

Loading

Napfton commented Sep 29, 2021 •

edited

Loading

jensb commented Nov 1, 2021 •

edited

Loading

chrisuki commented Dec 15, 2021 •

edited

Loading

raoulbhatia commented Apr 2, 2023 •

edited

Loading

thalesmaoa commented Oct 11, 2023 •

edited

Loading

albertoanta commented Jun 12, 2024 •

edited

Loading