Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large database support for 32bit systems #12362

Closed
wants to merge 5 commits into from

Conversation

fluffynukeit
Copy link

Closes #10486

Briefly describe your proposed changes:

  • Rebased/mergeable
  • Tests pass
  • [] http/swagger.yml updated (if modified Go structs or API) <- is this relevant to 1.8 branch?
  • Sign CLA (if not already signed)

This PR implements a fix to allow 32bit systems to have large databases. Most of the discussion on #10486 still applies, but the implemented fix is different than my original attempt, and much cleaner, I think.

The implementation makes changes to the mmapAccessor in reader.go and supporting files.

  • Changes the role of the accessor interface. The interface is modified to only be relevant to accessing the block data of the TSM file (previously m.b), and not other things like access locking.
  • One implementation of the new interface, mmapAccessor (same name as previously) implements the existing method of mmapping TSM block data into the process address space.
  • Another implementation of the new interface, seekAccessor, uses seek/read operations to access the TSM block data. This eliminates much of the mmapping, preventing out of memory errors on large databases.
  • A new configuration property tsm-use-seek, default false, can be specified to use the seek accessor instead of the mmap accessor.

I ran the same tests described in #10486. With tsm-use-seek = false, I encountered the same out of memory errors when trying to run compaction jobs at the 3.6GB file limit:

3.6G    /LASERINFO/influxdb/data/stress/autogen/
174.2M  /LASERINFO/influxdb/data/stress/autogen/2
371.6M  /LASERINFO/influxdb/data/stress/autogen/3
839.7M  /LASERINFO/influxdb/data/stress/autogen/4
851.2M  /LASERINFO/influxdb/data/stress/autogen/5
768.5M  /LASERINFO/influxdb/data/stress/autogen/6
635.5M  /LASERINFO/influxdb/data/stress/autogen/7
8.0K    /LASERINFO/influxdb/data/stress/autogen/8

I then killed influxd, set tsm-use-seek=true, then continued the test. Compaction jobs that failed started running immediately, and eventually the total size of the DB far exceeded the 3.6 GB limit. See again, as in #10486, the peak VM memory usage is less than 1 GB.

root@plnx_arm:~# du -hs /LASERINFO/influxdb/data/stress/autogen/ && du -hs /LASERINFO/influxdb/data/stress/autogen/* && cat /proc/1153/status && tail -n 4 /LASERINFO/dan_influxlog_seek
5.2G    /LASERINFO/influxdb/data/stress/autogen/
284.7M  /LASERINFO/influxdb/data/stress/autogen/10
325.7M  /LASERINFO/influxdb/data/stress/autogen/11
416.6M  /LASERINFO/influxdb/data/stress/autogen/12
438.7M  /LASERINFO/influxdb/data/stress/autogen/13
448.8M  /LASERINFO/influxdb/data/stress/autogen/14
452.3M  /LASERINFO/influxdb/data/stress/autogen/15
420.4M  /LASERINFO/influxdb/data/stress/autogen/16
136.5M  /LASERINFO/influxdb/data/stress/autogen/17
174.2M  /LASERINFO/influxdb/data/stress/autogen/2
371.6M  /LASERINFO/influxdb/data/stress/autogen/3
419.6M  /LASERINFO/influxdb/data/stress/autogen/4
425.3M  /LASERINFO/influxdb/data/stress/autogen/5
404.4M  /LASERINFO/influxdb/data/stress/autogen/6
415.3M  /LASERINFO/influxdb/data/stress/autogen/7
8.2M    /LASERINFO/influxdb/data/stress/autogen/8
221.9M  /LASERINFO/influxdb/data/stress/autogen/9
Name:   influxd
Umask:  0022
State:  S (sleeping)
Tgid:   1153
Ngid:   0
Pid:    1153
PPid:   939
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 256
Groups: 0
VmPeak:   924972 kB <----------------------- less than 1GB

I was also able to query influxd using the influx client while running the test without any problems. In my previous fix attempt, querying (even if no data was being written) could cause memory faults and kill the process.

This fix should help ease a consistent pain point with InfluxDB running on 32bit systems (like RPi or Zynq) in which is works great for long periods of time but then hits a wall at the e.g. 3.6 GB limit (depending on system).

@fluffynukeit
Copy link
Author

Hi, Influx team. Is there any plan to merge this PR into the 1.8 branch? I think there is sufficient community interest. If something is holding you back from bringing it it, please advise me on what you'd need to see before getting it merged. Thanks.

@pinkynrg
Copy link

Would be anyone so nice to reply so we all have an idea on what the influx team plans to do about this issue?

@timhallinflux
Copy link
Contributor

@fluffynukeit -- wow, you really dug in here and we appreciate the contribution! I'm wondering if you would be open to discussing this in a bit more detail? DM me... tim at influxdata.com

@fluffynukeit
Copy link
Author

@timhallinflux Will do.

@r00sta
Copy link

r00sta commented Jun 17, 2019

@fluffynukeit @timhallinflux any news on if this will be merged into 1.8?

@fluffynukeit
Copy link
Author

Tim contacted me privately with some questions regarding this PR. Currently, that was 11 days ago. I responded to the questions promptly but have not heard back since then, so honestly I don't know for sure what InfluxData is planning to do with this PR or 32 bit support. That's my broad disclaimer for what follows.

I don't think it's appropriate for me to copy/paste a private exchange, so I'll merely try to summarize the salient points from my POV. If I were to extrapolate based on my exchange with Tim, my guess is that Influx will not be merging these changes into version 1.8, if ever. The reason is that they believe the changes in my PR present a lot of risk to the whole influx community, and they are not equipped to test 32 bit platforms to the same rigor as they do 64 bit platforms in order to mitigate that risk. Getting all that testing infrastructure set up would take a lot of work that they can't take on right now because they are focused on the 2.0 release. This could possibly change if there was a significant commercial interest in getting better 32 bit support, but I could not offer any such business opportunity to him from my side. I tried to argue that various IoT applications, like the ones we are almost all dealing with on 32 bit platforms, present a good market opportunity to Influx because Influx's features are particularly suited for such small platforms (aside from the mmap constraint this PR tries to remedy). I also argued that they wouldn't need to test 32 bit platforms exhaustively; they could merely verify that this PR does not break 64 bit systems, and relegate 32 bit to second-class status. I think we would all accept that in lieu of doing nothing. My hope is that they are still considering my input, but it has indeed been a long silence since then (silence toward the community regarding this issue was one thing I mentioned I hope they improve on).

My advice is simply that 32 bit users on InfluxDB should assume we are on our own until we hear something concrete from Influx. I, however, switched companies recently and am no longer working on an application that uses Influx, so I won't be able to make changes or adjustments to this PR in response to Influx's feedback. Anyone else is 100% welcome to carry the torch instead, of course.

@aemondis
Copy link

aemondis commented Sep 8, 2019

For anyone still following this particular issue, specifically when running on a IoT device such as a Raspberry Pi with raspbian (a 32 bit OS) - if you get this dreaded error, the only solution is to simply load up the database on another 64-bit device just to run the compaction periodically, then transfer it back. It somewhat defeats the benefits of a small IoT device, but seems very much that InfluxDB are just not interested in the next wave IoT capabilities that are beginning to emerge, at this time. In short... don't waste too much time trying to get InfluxDB running on a 32-bit platform with even a moderate size DB, as it will inevitably fail with this issue.

Thank to @fluffynukeit for his fantastic work in hunting out the root cause of the issue, as it was no doubt a tough one to find. I hope that InfluxDB change their mind in the future on this issue, but the lack of any feedback or merge suggest this fix is probably dead in the water, unless someone has the time and commitment to make a fork for the IoT/32-bit community.

@truekinetix
Copy link

It is one thing not to want to take this change ( that would save us ) for business reasons, but why would Influx data just ignore fluffynukeit and the other 32-bit users of Influx ?

And will Influx Data now put a warning on their web pages, so future implementers will know what to expect if they chose to use InfluxDB for their devices ?

@stale
Copy link

stale bot commented Dec 17, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 17, 2019
@godidog
Copy link

godidog commented Dec 18, 2019

It is stale because Influx are not going to addres the issue.
It is one thing to make a business decision not to support 32-bit systems.
But it is quite another to choose to ignore the issue and make no comment on it.
And there still no warning that you can compile influx for 32-bit systems but it is unusable on them.
Poor show, Influx Data, poor show.

@stale stale bot removed the wontfix label Dec 18, 2019
@JsBergbau
Copy link

@fluffynukeit: Thank you very much for work in investigating this issue. I will also run in that limit in about two years if nothing changes till then.

There is now a 64Bit kernel for RaspberryPI available https://www.raspberrypi.org/forums/viewtopic.php?t=250730
Focus is mainly on RPI4 but it also runs on a RPI3

Would enabling the 64Bit kernel resolve this issue? I guess not, because provided influxdb package is still 32 Bit.
If influxdb is not willing to include your fix, perhaps it is more easy to compile for 64 Bit raspbian kernel. Or perhaps work could be done at the Raspbian foundation to include a 64 Bit influxdb in their official repository since it's solving problems of the 32 Bit version.

@aemondis
Copy link

aemondis commented Jan 8, 2020

@JsBergbau I am running the 64-bit kernel on a RPI4 with 4GB RAM, and can confirm that it does indeed resolve the issue. I suspect the issue is more a limitation of the OS's ability to address large chunks of memory, as opposed to within the process itself.

@fluffynukeit
Copy link
Author

@JsBergbau In my application, I was not using an RPi but a Xilinx Zynq. I am no longer with the company working on that project, though. I will forward the possibility to the current team, however. Thanks.

@mattelacchiato
Copy link

This is really sad since most IoT applications are indeed running on 32bit...

@simonvetter
Copy link

I can confirm these commits fix the issue on 32-bit systems (ARM, odroid C1 running armbian).

I started hitting the same "out of memory" compaction issue as everyone else as soon as my DB came close to 3GB. I've re-applied those commits on top of the 1.8 release branch, rebuilt influxDB and my compaction issues went away.

This particular instance has been running without issue for about 10 days now. I haven't noticed any increase in CPU/memory usage, memory leak, or even impact on performance.

I've pushed my branch at https://github.com/simonvetter/influxdb/tree/1.8.0+big_db_32bit for anyone interested.

Thanks @fluffynukeit for the nice work!

@aemondis
Copy link

Awesome work @simonvetter for pulling together a 32-bit branch for Influx with that fix. I remember going through that solution by @fluffynukeit and wishing the InfluxDB team would even bother to read threads such as this, as it's an elegantly simple fix for a major issue impacting a portion of their user base. The fix is of low risk (it only affects anything IF you enable the option!), and could potentially even scale up the capability of the 32 or 64 bit deployments immensely in certain scenarios.

It boggles the mind that InfluxDB don't even respond to all the fantastic analysis and solutions in this thread. The best we can hope for is it helps others who stumble upon this thread.

@godidog
Copy link

godidog commented Jun 16, 2020

Normal business decision: Not accepting a bugfix because it might affect something else somewhere ( possibly ).

Unreasonable behaviour: Ignoring a bugfix that affects many many users, AND the discussion of that bugfix, and abandoning an ongoing discussion with the developer.

Very disappointing, influxdata, this does not inspire trust in your company and it led to my company abandoning influx for our consumer device.

"Run away !"

@mattelacchiato
Copy link

I've migrated to VictoriaMetrics for now. Works like a charm, even with "large" databases =)

@tablatronix
Copy link

tablatronix commented Jan 5, 2021

This is absurd, is there at least a failure mode we can enable for this, so tsm doesn't just retry over and over and kill your env?
Fail fatal , or retry limit ?

I would not merge this, it is too large a change, but if these methods can be refactored first or wrapped, then this PR might get smaller and get attention ?

@bgilmore-iot
Copy link

Hey everyone- My colleague Russ just pointed me at this thread. I’m (brand) new to InfluxData and am the Product Manager responsible for IOT. I’m currently putting together the go-forward strategy and greatly appreciate the info outlined here. Would welcome direct conversation on this and any other IoT related challenges or requirements - community slack works well or feel free to send me an email. Looking forward to collaborating! Thanks so much,

Brian Gilmore

@simonvetter
Copy link

Rebased on top of 1.8.9, code available at https://github.com/simonvetter/influxdb/tree/1.8.9+big_db_32bit .

As usual, tests pass and I haven't noticed any issue on my test systems.

See my comment above for build instructions and configuration.

@Napfton
Copy link

Napfton commented Sep 29, 2021

Rebased on top of 1.8.9, code available at https://github.com/simonvetter/influxdb/tree/1.8.9+big_db_32bit .

As usual, tests pass and I haven't noticed any issue on my test systems.

See my comment above for build instructions and configuration.

I am failing with

can't load package: package cmd/influx_tsm/.go is not in GOROOT (/usr/local/go/src/cmd/influx_tsm/.go)

on this line

go build -o build/influx_tsm cmd/influx_tsm/*.go

From the git log it says influx_tsm got removed, so I guess we can skip that step as well?

EDIT: successfull build and run without the _tsm line.

@simonvetter
Copy link

Yeah, influx_tsm was removed in 1.8.9, so

go build -o build/influx_tsm cmd/influx_tsm/*.go

should be omitted from my build instructions.

@jensb
Copy link

jensb commented Oct 26, 2021

So, if my IoT platform is a Raspbi 4b with 4GB RAM,

  • to run a supported version of Influx with a database > 3GB, I have to run Influx v2?
  • Does this mean that I have to install a 64bit OS, since activating the 64bit kernel in Raspbian is not sufficient?

Also, will the RAM requirements of Influx 2 be lower than of Influx 1 with larger datasets?

Or will I be better off applying this patch, and staying with 1.8?
(In this case, @simonvetter - would you rebase your patch on 1.8.10 which was released recently)?

@simonvetter
Copy link

Yep, I just rebased on top of 1.8.10, code available at https://github.com/simonvetter/influxdb/tree/1.8.10+big_db_32bit .

As usual, tests pass and I haven't noticed any issue on my systems so far.

TL;DR build instructions:

git clone https://github.com/simonvetter/influxdb.git
cd influxdb
git checkout origin/1.8.10+big_db_32bit
mkdir build
# target 32-bit ARM architectures, use GOARCH=386 for 32-bit Intel/AMD
export GOARCH=arm
go build -o build/influx_stress cmd/influx_stress/*.go
go build -o build/influx_tools cmd/influx_tools/*.go
go build -o build/influx_inspect cmd/influx_inspect/*.go
go build -o build/influxd cmd/influxd/main.go
go build -o build/influx cmd/influx/main.go

Your 32-bit binaries will be in the build directory, feel free to move them anywhere you like.

I suggest using the following settings in your influxdb config file:

[data]
    tsm-use-seek = true # use the seek accessor (what this whole PR is about)
    index-version = "tsi1" # use on-disk index files instead of keeping them in-memory
    max-concurrent-compactions = 1 # avoid running multiple compactions at once

[coordinator]
  query-timeout = "60s" # kill runaway queries (you may want to adjust the value depending on your hardware and query set)

@jensb
Copy link

jensb commented Oct 28, 2021

Great! I crosscompiled it on an Intel i5 desktop, installed it, and boom - memory requirements were down from 3.6G to just about 1G, resident size between 400M and 800M. Running without issues so far.

Thank you!!

@Napfton
Copy link

Napfton commented Oct 30, 2021

Sadly, since I am using this custom code for influxdb, I am getting system freezes/crashes once a week on a raspi that was running 2 years without a single reboot.

Logs are (to me) inconclusive.

Version 2 of influxdb cannot be used on 32 bit systems right?

@jensb
Copy link

jensb commented Oct 30, 2021

No issues so far on my Raspi 4b (4GB, 32GB flash).
What is your model?

@Napfton
Copy link

Napfton commented Oct 30, 2021

3B+ going to switch to a 64bit kernel to mitigate soon.

@tablatronix
Copy link

I switched to the rasbian 64bit os a while ago and its been good

@simonvetter
Copy link

If your hardware is 64bit capable then you should definitely switch to a 64bit kernel and use influx 2.x releases.

I've had a few instances running this branch for a few days without issue. Let me know if you can get your hands on a crash log and i'll investigate.

@jensb
Copy link

jensb commented Nov 1, 2021

I am planning to do this sa well, but in the process I will also switch to a SSD storage (no uSD card, it's getting too slow) because going 64bit will mean having to reinstall everything anyway.
The problem is that during this transition, our house (read: my family) will have to live without various switches and features. The Raspi running Influx is an integral part of our smart home concept. This definitely needs the right timing. :-)
The big question is whether on the new system I will keep Influx, or use the opportunity to switch to a real database (Timescale DB based on Postgres). There are a few queries and analyses that Flux doesn't seem to be able to do at all. We'll see ...

@chrisuki
Copy link

chrisuki commented Dec 15, 2021

@simonvetter Thanks a lot. I could build that on my Raspberry PI 4B, and no more errors at the moment.
EDIT 21.12.2021: After a few days, unfortunately, the "Cannot allocate memory" error is back...

@OZ1SEJ
Copy link

OZ1SEJ commented Jan 13, 2022

After compiling, when I run $ sudo /home/pi/influxdb/build/influxd I start getting memory errors (Raspberry Pi 3B, fresh install) and the program ultimately terminates.

@lesam
Copy link
Contributor

lesam commented Mar 8, 2022

We no longer support 32-bit systems in the mainline repo - there is a community fork that maintains 32 bit compatibility.

See also https://www.influxdata.com/blog/influxdb-oss-and-enterprise-roadmap-update-from-influxdays-emea/ , https://github.com/simonvetter/influxdb/ .

Closing this PR as it will not be merged.

@lesam lesam closed this Mar 8, 2022
@baerengraben
Copy link

@simonvetter : Did this fix on a 32bit system (hc2 ) with armbian. => worked like a charm. Thank you!!!

@simonvetter
Copy link

Thanks, I'm glad it helped!

@lmarceg
Copy link

lmarceg commented Jan 23, 2023

@simonvetter , you rock!
I have a raspi 3b and I was facing the same OOM error; and now, BOOM! It all works!
I just have a couple of comments and one question:

  • I didn't have git so I easily installed it with apt
  • I didn't have go and I installed golang with apt, but the version you get from there is too old and three binaries will fail for this reason. Therefore I suggest you to look on the Internet and see how to download the latest version from the website and use that one before compiling the repo

That said, my question is the following: now that I have all those files in the build folder, which one can I substitute?
Can I overwrite influx, influxd, influx_inspect, stress and tools or will I lose something in the DB if I do like that?
I tested influxd and it all worked, not sure I need also the other commands.
But before doing a disaster when I am 99% done, I think I'd better ask.

Thanks again!
Luca

Setcover added a commit to Setcover/smarthome that referenced this pull request Mar 27, 2023
Setcover added a commit to Setcover/smarthome that referenced this pull request Mar 27, 2023
@raoulbhatia
Copy link

raoulbhatia commented Apr 2, 2023

FYI I created an updated Docker image for influxdb-arm32 based on the following code trail.
(based on the latest influxdb:1.8.10 Docker image with Debian 11/Bullseye)

  1. https://github.com/simonvetter/influxdb/tree/1.8.10+big_db_32bit
  2. https://github.com/terjesannum/docker-influxdb-arm32
  3. https://github.com/noelleehk/docker-influxdb-arm32
  4. https://github.com/raoulbhatia/docker-influxdb-arm32/

I published the Docker image at https://hub.docker.com/repository/docker/raoulbhatia/influxdb-arm32

Feedback welcome

@thomasjungblut
Copy link

In case anyone needs a newly rebuilt version of the branch with golang 1.20.5 and without dockerhub login:
https://quay.io/repository/thomasjungblut/influxdb-arm32?tab=info

podman pull quay.io/thomasjungblut/influxdb-arm32:287e3ed6

@thalesmaoa
Copy link

thalesmaoa commented Oct 11, 2023

If I just simply change the core image will work or I have to remap my database?

Thx @simonvetter , @raoulbhatia and @thomasjungblut .

EDIT: It worked.
https://gist.github.com/thalesmaoa/a707257ddb0113b7b343fae3ca608199

@albertoanta
Copy link

albertoanta commented Jun 12, 2024

FYI I created an updated Docker image for influxdb-arm32 based on the following code trail. (based on the latest influxdb:1.8.10 Docker image with Debian 11/Bullseye)

1. https://github.com/simonvetter/influxdb/tree/1.8.10+big_db_32bit

2. https://github.com/terjesannum/docker-influxdb-arm32

3. https://github.com/noelleehk/docker-influxdb-arm32

4. https://github.com/raoulbhatia/docker-influxdb-arm32/

I published the Docker image at https://hub.docker.com/repository/docker/raoulbhatia/influxdb-arm32

Feedback welcome
Thanks a lot. My odroid xu4 began to throw errors due to memory allocation at influxdb container. Just deployed image/container patched and everything is ok. Also memory footprint es very small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.