-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
beam.smp chewing CPU on v1.31.0 on arm #4307
Comments
Seems to be caused by rammitmq (quick google search) which is not utilized by teslamate. Could that be ? |
@swiffer There's nothing else running on the box other than teslamate, and the processes in question look like they're coming from docker:
I do have |
Ok, rabbit is based on erlang, that's where it's coming from. I wonder if #4296 is causing this. |
@leewillis77 can you try sertting ERL_MAX_PORTS |
Hi; Thanks for the suggestion. I added the following under
It didn't make any difference - was that the correct way to make the change? E.g.
The thread you linked to seems to be concerned with high memory usage, but the problem I'm experiencing is high CPU, not high memory. |
@leewillis77 yes, you are right, sorry Could one of you try ghcr.io/teslamate-org/teslamate/teslamate:pr-4300 That is the latest before upgrading elixir / erlang |
I would if I knew how :) |
Just replace image in docker compose file, see here |
I tried that actually.:
|
okay, please try again with
the contrib doc had an issue... |
beam.smp CPU usage is minimal (1.3%) with ghcr.io/teslamate-org/teslamate:pr-4300 vs > 100% as originally reported with the latest release.
|
@leewillis77 - thanks testing. I guess someone else needs to step in but seems to be caused by the erlang/elixir update @JakobLichterfeld For now it should be fine to stay on pr-4300 until a fix is on the horizon. |
Out of curiosity - could you check pr-4303 as well, just to make sure if it's caused by dep updates or erlang |
beam.smp CPU is high (over 100%) with pr-4303. |
I'm unfortunately not particularly knowledgeable about the erlang ecosystem, but looking back at the |
Same issue for me, i switched to pr-4300 and it fixed the problem for now. |
It sounds like in that linked issue the process is just hanging, not using lots of CPU. |
Beam is a VM. Probably will have problems debugging this with system tools, think you will need beam tools. One such tool is the erlang observer. https://blog.sequinstream.com/how-we-used-elixirs-observer-to-hunt-down-bottlenecks/ Having said that, it may be fiddly to get this working in a Docker container. Either the observer needs to have X access somehow (e.g. c0b/docker-elixir#19 (comment)), or you need to run a local beam that connects to the remote beam somehow (https://stackoverflow.com/questions/42164298/using-the-erlang-observer-app-with-a-remote-elixir-phoenix-server-inside-docker). Either way likely to be fiddly. Also: It seems that our nix flake doesn't compile in the observer. I tried some things, but so far have been failures. Later: Oh, wait, this fixes it: diff --git a/mix.exs b/mix.exs
index d159eefd..afa1c98a 100644
--- a/mix.exs
+++ b/mix.exs
@@ -27,7 +27,7 @@ defmodule TeslaMate.MixProject do
def application do
[
mod: {TeslaMate.Application, []},
- extra_applications: [:logger, :runtime_tools]
+ extra_applications: [:logger, :runtime_tools, :wx, :observer]
]
end (solution copied from https://stackoverflow.com/a/76760603) |
I can reproduce the high CPU usage, without any high CPU usage, system runs fine with multiple docker containers on Raspberry Pi 3B+ currently logging fine a charge. |
Thanks for your investigation!
--> PR: #4311 |
Perhaps we can disable beams busy waiting: https://stressgrid.com/blog/beam_cpu_usage/ |
Maybe a problem with Raspbian and latest code? What version are people using? Bookworm? Bullseye? Or Older? It could also be an ARM specific issue also with latest beam. |
I propose that somebody create a pull request from the change to the Dockerfile reverted:
We test that. Assuming that test succeeds, we then test:
We should test the following also, but I suspect we can't because this is not a valid option:
My suspicion is we will find otp-26 good and otp-27 bad, but would be good to prove this is indeed the case. On the other hand, if the problem is the elixir version, not the erlang version, we probably should try 1.17.3 also. |
Bullseye here. Have not update my Pi in months (do not judge me ha). TeslaMate is the only thing running on it. Like others, after updating to 1.31.0, beam.vm shot up and chewed through CPU. |
updated my comment with results on amd64 - works without increase in cpu time here. guess having some exploration builds as proposed by brianmay is best to quickly see what is introducing the issue? |
I am on Bullseye too. |
Good point, I hadn't thought of that.
I will be the somebody :-) (as it will be a bit tricky to get CI build as we prevent it if there are changes to the docker file) |
could someone affected try these three please?
tried these three as we know that 1.16.2-otp-26 is working and 1.17.2-otp-27 is not. @JakobLichterfeld - i've been faster 😝 |
I also checked again the latest official release, there the load is jumping between 90 and 140% , which is more than i can see in the 4316 testing release! |
ok, so it looks like otp27 is causing the issue on arm @baresel could you provide more details about your env and hardware (OS, Docker Version, Hardware) |
OS Version: "Raspbian GNU/Linux 11 (bullseye)", Kernel: Linux 6.1.21-v8+ |
Same here, 4314, 4315 are fine, 4316 has high load. OS Version: "Raspbian GNU/Linux 11 (bullseye)", Kernel: Linux 6.1.70-v7+ |
Hello, 4314, 4315 are fine, 4316 has high load. OS Version: "Raspbian GNU/Linux 12 (bookworm) aarch64 |
I assume the OTP27 docker image has set different compiler flags, I suggest a patch release with downgrading to |
what I do find interesting (but maybe completely unrelated) - RabbitMQ is not supporting OTP 27 as it has significant performance regressions in their use cases. https://www.rabbitmq.com/docs/which-erlang#erlang-27-support however we seem to be affected on aarch64 only. |
@JakobLichterfeld you mean |
OK, thats a reason to stay with |
why not |
Yeah, sorry for the typo :-D (edited above to avoid confusion)
Yeah, We will need to update the docs, the CI, the nix flake etc. as well |
I will open a branch for it. |
I had a theory that maybe the problem was otp-27 doesn't work correctly with bullseye. But I think the above tests clearly show bookworm has the same issue. Just a detail I found interesting. |
OTP27 has various optimizations, wonder if one of these broke things for arm. https://www.erlang.org/blog/optimizations/ |
Possibly this is not arm specific, but you notice it more on the slower processors. |
Actually a better explanation than mine, as
__
Yeah good point, currently I do not want dive to deep into this rabbit hole |
Is there an existing issue for this?
What happened?
I've upgraded to v1.31.0 and after the upgrade I noticed* that there's a beam.smp process chewing up CPU time:
The system is idle currently (car is asleep, and neither teslamate nor grafana are open on any devices).
Expected Behavior
System is idle with no excessive CPU hogging.
Steps To Reproduce
No response
Relevant log output
Screenshots
No response
Additional data
No response
Type of installation
Docker
Version
v1.31.0
The text was updated successfully, but these errors were encountered: