-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large database support for 32bit systems #12362
Conversation
…ce to soon work for block byte data.
…mmapAccessor and seekAccessor. Seems to have comparable performance.
Hi, Influx team. Is there any plan to merge this PR into the 1.8 branch? I think there is sufficient community interest. If something is holding you back from bringing it it, please advise me on what you'd need to see before getting it merged. Thanks. |
Would be anyone so nice to reply so we all have an idea on what the influx team plans to do about this issue? |
@fluffynukeit -- wow, you really dug in here and we appreciate the contribution! I'm wondering if you would be open to discussing this in a bit more detail? DM me... tim at influxdata.com |
@timhallinflux Will do. |
@fluffynukeit @timhallinflux any news on if this will be merged into 1.8? |
Tim contacted me privately with some questions regarding this PR. Currently, that was 11 days ago. I responded to the questions promptly but have not heard back since then, so honestly I don't know for sure what InfluxData is planning to do with this PR or 32 bit support. That's my broad disclaimer for what follows. I don't think it's appropriate for me to copy/paste a private exchange, so I'll merely try to summarize the salient points from my POV. If I were to extrapolate based on my exchange with Tim, my guess is that Influx will not be merging these changes into version 1.8, if ever. The reason is that they believe the changes in my PR present a lot of risk to the whole influx community, and they are not equipped to test 32 bit platforms to the same rigor as they do 64 bit platforms in order to mitigate that risk. Getting all that testing infrastructure set up would take a lot of work that they can't take on right now because they are focused on the 2.0 release. This could possibly change if there was a significant commercial interest in getting better 32 bit support, but I could not offer any such business opportunity to him from my side. I tried to argue that various IoT applications, like the ones we are almost all dealing with on 32 bit platforms, present a good market opportunity to Influx because Influx's features are particularly suited for such small platforms (aside from the mmap constraint this PR tries to remedy). I also argued that they wouldn't need to test 32 bit platforms exhaustively; they could merely verify that this PR does not break 64 bit systems, and relegate 32 bit to second-class status. I think we would all accept that in lieu of doing nothing. My hope is that they are still considering my input, but it has indeed been a long silence since then (silence toward the community regarding this issue was one thing I mentioned I hope they improve on). My advice is simply that 32 bit users on InfluxDB should assume we are on our own until we hear something concrete from Influx. I, however, switched companies recently and am no longer working on an application that uses Influx, so I won't be able to make changes or adjustments to this PR in response to Influx's feedback. Anyone else is 100% welcome to carry the torch instead, of course. |
For anyone still following this particular issue, specifically when running on a IoT device such as a Raspberry Pi with raspbian (a 32 bit OS) - if you get this dreaded error, the only solution is to simply load up the database on another 64-bit device just to run the compaction periodically, then transfer it back. It somewhat defeats the benefits of a small IoT device, but seems very much that InfluxDB are just not interested in the next wave IoT capabilities that are beginning to emerge, at this time. In short... don't waste too much time trying to get InfluxDB running on a 32-bit platform with even a moderate size DB, as it will inevitably fail with this issue. Thank to @fluffynukeit for his fantastic work in hunting out the root cause of the issue, as it was no doubt a tough one to find. I hope that InfluxDB change their mind in the future on this issue, but the lack of any feedback or merge suggest this fix is probably dead in the water, unless someone has the time and commitment to make a fork for the IoT/32-bit community. |
It is one thing not to want to take this change ( that would save us ) for business reasons, but why would Influx data just ignore fluffynukeit and the other 32-bit users of Influx ? And will Influx Data now put a warning on their web pages, so future implementers will know what to expect if they chose to use InfluxDB for their devices ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
It is stale because Influx are not going to addres the issue. |
@fluffynukeit: Thank you very much for work in investigating this issue. I will also run in that limit in about two years if nothing changes till then. There is now a 64Bit kernel for RaspberryPI available https://www.raspberrypi.org/forums/viewtopic.php?t=250730 Would enabling the 64Bit kernel resolve this issue? I guess not, because provided influxdb package is still 32 Bit. |
@JsBergbau I am running the 64-bit kernel on a RPI4 with 4GB RAM, and can confirm that it does indeed resolve the issue. I suspect the issue is more a limitation of the OS's ability to address large chunks of memory, as opposed to within the process itself. |
@JsBergbau In my application, I was not using an RPi but a Xilinx Zynq. I am no longer with the company working on that project, though. I will forward the possibility to the current team, however. Thanks. |
This is really sad since most IoT applications are indeed running on 32bit... |
I can confirm these commits fix the issue on 32-bit systems (ARM, odroid C1 running armbian). I started hitting the same "out of memory" compaction issue as everyone else as soon as my DB came close to 3GB. I've re-applied those commits on top of the 1.8 release branch, rebuilt influxDB and my compaction issues went away. This particular instance has been running without issue for about 10 days now. I haven't noticed any increase in CPU/memory usage, memory leak, or even impact on performance. I've pushed my branch at https://github.com/simonvetter/influxdb/tree/1.8.0+big_db_32bit for anyone interested. Thanks @fluffynukeit for the nice work! |
Awesome work @simonvetter for pulling together a 32-bit branch for Influx with that fix. I remember going through that solution by @fluffynukeit and wishing the InfluxDB team would even bother to read threads such as this, as it's an elegantly simple fix for a major issue impacting a portion of their user base. The fix is of low risk (it only affects anything IF you enable the option!), and could potentially even scale up the capability of the 32 or 64 bit deployments immensely in certain scenarios. It boggles the mind that InfluxDB don't even respond to all the fantastic analysis and solutions in this thread. The best we can hope for is it helps others who stumble upon this thread. |
Normal business decision: Not accepting a bugfix because it might affect something else somewhere ( possibly ). Unreasonable behaviour: Ignoring a bugfix that affects many many users, AND the discussion of that bugfix, and abandoning an ongoing discussion with the developer. Very disappointing, influxdata, this does not inspire trust in your company and it led to my company abandoning influx for our consumer device. "Run away !" |
I've migrated to VictoriaMetrics for now. Works like a charm, even with "large" databases =) |
This is absurd, is there at least a failure mode we can enable for this, so tsm doesn't just retry over and over and kill your env? I would not merge this, it is too large a change, but if these methods can be refactored first or wrapped, then this PR might get smaller and get attention ? |
Hey everyone- My colleague Russ just pointed me at this thread. I’m (brand) new to InfluxData and am the Product Manager responsible for IOT. I’m currently putting together the go-forward strategy and greatly appreciate the info outlined here. Would welcome direct conversation on this and any other IoT related challenges or requirements - community slack works well or feel free to send me an email. Looking forward to collaborating! Thanks so much, Brian Gilmore |
Rebased on top of 1.8.9, code available at https://github.com/simonvetter/influxdb/tree/1.8.9+big_db_32bit . As usual, tests pass and I haven't noticed any issue on my test systems. See my comment above for build instructions and configuration. |
I am failing with can't load package: package cmd/influx_tsm/.go is not in GOROOT (/usr/local/go/src/cmd/influx_tsm/.go) on this line go build -o build/influx_tsm cmd/influx_tsm/*.go From the git log it says influx_tsm got removed, so I guess we can skip that step as well? EDIT: successfull build and run without the _tsm line. |
Yeah, influx_tsm was removed in 1.8.9, so go build -o build/influx_tsm cmd/influx_tsm/*.go should be omitted from my build instructions. |
So, if my IoT platform is a Raspbi 4b with 4GB RAM,
Also, will the RAM requirements of Influx 2 be lower than of Influx 1 with larger datasets? Or will I be better off applying this patch, and staying with 1.8? |
Yep, I just rebased on top of 1.8.10, code available at https://github.com/simonvetter/influxdb/tree/1.8.10+big_db_32bit . As usual, tests pass and I haven't noticed any issue on my systems so far. TL;DR build instructions: git clone https://github.com/simonvetter/influxdb.git
cd influxdb
git checkout origin/1.8.10+big_db_32bit
mkdir build
# target 32-bit ARM architectures, use GOARCH=386 for 32-bit Intel/AMD
export GOARCH=arm
go build -o build/influx_stress cmd/influx_stress/*.go
go build -o build/influx_tools cmd/influx_tools/*.go
go build -o build/influx_inspect cmd/influx_inspect/*.go
go build -o build/influxd cmd/influxd/main.go
go build -o build/influx cmd/influx/main.go Your 32-bit binaries will be in the build directory, feel free to move them anywhere you like. I suggest using the following settings in your influxdb config file: [data]
tsm-use-seek = true # use the seek accessor (what this whole PR is about)
index-version = "tsi1" # use on-disk index files instead of keeping them in-memory
max-concurrent-compactions = 1 # avoid running multiple compactions at once
[coordinator]
query-timeout = "60s" # kill runaway queries (you may want to adjust the value depending on your hardware and query set) |
Great! I crosscompiled it on an Intel i5 desktop, installed it, and boom - memory requirements were down from 3.6G to just about 1G, resident size between 400M and 800M. Running without issues so far. Thank you!! |
Sadly, since I am using this custom code for influxdb, I am getting system freezes/crashes once a week on a raspi that was running 2 years without a single reboot. Logs are (to me) inconclusive. Version 2 of influxdb cannot be used on 32 bit systems right? |
No issues so far on my Raspi 4b (4GB, 32GB flash). |
3B+ going to switch to a 64bit kernel to mitigate soon. |
I switched to the rasbian 64bit os a while ago and its been good |
If your hardware is 64bit capable then you should definitely switch to a 64bit kernel and use influx 2.x releases. I've had a few instances running this branch for a few days without issue. Let me know if you can get your hands on a crash log and i'll investigate. |
I am planning to do this sa well, but in the process I will also switch to a SSD storage (no uSD card, it's getting too slow) because going 64bit will mean having to reinstall everything anyway. |
@simonvetter Thanks a lot. I could build that on my Raspberry PI 4B, and no more errors at the moment. |
After compiling, when I run |
We no longer support 32-bit systems in the mainline repo - there is a community fork that maintains 32 bit compatibility. See also https://www.influxdata.com/blog/influxdb-oss-and-enterprise-roadmap-update-from-influxdays-emea/ , https://github.com/simonvetter/influxdb/ . Closing this PR as it will not be merged. |
@simonvetter : Did this fix on a 32bit system (hc2 ) with armbian. => worked like a charm. Thank you!!! |
Thanks, I'm glad it helped! |
@simonvetter , you rock!
That said, my question is the following: now that I have all those files in the build folder, which one can I substitute? Thanks again! |
FYI I created an updated Docker image for influxdb-arm32 based on the following code trail.
I published the Docker image at https://hub.docker.com/repository/docker/raoulbhatia/influxdb-arm32 Feedback welcome |
In case anyone needs a newly rebuilt version of the branch with golang 1.20.5 and without dockerhub login:
|
If I just simply change the core image will work or I have to remap my database? Thx @simonvetter , @raoulbhatia and @thomasjungblut . EDIT: It worked. |
|
Closes #10486
Briefly describe your proposed changes:
This PR implements a fix to allow 32bit systems to have large databases. Most of the discussion on #10486 still applies, but the implemented fix is different than my original attempt, and much cleaner, I think.
The implementation makes changes to the mmapAccessor in reader.go and supporting files.
tsm-use-seek
, default false, can be specified to use the seek accessor instead of the mmap accessor.I ran the same tests described in #10486. With tsm-use-seek = false, I encountered the same out of memory errors when trying to run compaction jobs at the 3.6GB file limit:
I then killed influxd, set tsm-use-seek=true, then continued the test. Compaction jobs that failed started running immediately, and eventually the total size of the DB far exceeded the 3.6 GB limit. See again, as in #10486, the peak VM memory usage is less than 1 GB.
I was also able to query influxd using the influx client while running the test without any problems. In my previous fix attempt, querying (even if no data was being written) could cause memory faults and kill the process.
This fix should help ease a consistent pain point with InfluxDB running on 32bit systems (like RPi or Zynq) in which is works great for long periods of time but then hits a wall at the e.g. 3.6 GB limit (depending on system).