You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're experiencing a memory leak on one of our Nerves devices that seems to be related to SystemRegistry's use of ETS tables. See plot below for memory from the last 14 days. We're not 100% sure about the cause of the issue, but it seems to be caused by frequent uevent messages. We will try and generate some more data on this over the next couple of days. I am happy to provide any info you need if I can, so please let me know.
Additional information about your host, target hardware or environment that
may help
We're running Nerves on an Intel Compute Stick (STK1A32SC) using a custom system based on nerves_system_x86_64. Our Nerves application starts Docker, and Docker manages a couple of containers.
Whenever a Docker container is started it creates a new virtual network interface. And when a faulty container is restarted over and over and over again, that generates a lot of virtual network interfaces 😄 and thus a lot of uevent messages.
Current behavior
After noticing the rising memory use on a device, we decided to investigate. Using htop we found that the BEAM was using about 55% of RAM. On a freshly booted device, the BEAM uses about 5% and it usually stabilises around 12% of RAM. The device has 2 GB of RAM.
Comparing the memory usage reported by :erlang.memory/0 on the unhealthy node with a healthy node we found the following:
allocator
unhealthy device
healthy device
comparison (unhealthy / healthy)
atom
594561
586369
1.01
atom_used
580202
559360
1.04
binary
1920088
2032104
0.94
code
13592541
13274957
1.02
ets
296828960
37491096
7.92
processes
61425888
46360584
1.32
processes_used
61424824
46359520
1.32
system
321617232
61810016
5.2
total
383043120
108170600
3.54
Seeing that ETS seemed to be using the most memory, we sorted all ETS tables by their memory usage (using the :memory stat reported by :ets.info/1) and found that the heavy hitters were tables owned by SystemRegistry. We looked to dmesg to see if it could find anything odd, and found repeating patterns like this:
[Wed Oct 16 11:29:54 2019] veth0122b00: renamed from eth0
[Wed Oct 16 11:29:54 2019] br-884c2e680dd2: port 8(veth59b04b9) entered disabled state
[Wed Oct 16 11:29:54 2019] br-884c2e680dd2: port 8(veth59b04b9) entered disabled state
[Wed Oct 16 11:29:54 2019] device veth59b04b9 left promiscuous mode
[Wed Oct 16 11:29:54 2019] br-884c2e680dd2: port 8(veth59b04b9) entered disabled state
[Wed Oct 16 11:29:55 2019] br-884c2e680dd2: port 8(veth398d220) entered blocking state
[Wed Oct 16 11:29:55 2019] br-884c2e680dd2: port 8(veth398d220) entered disabled state
[Wed Oct 16 11:29:55 2019] device veth398d220 entered promiscuous mode
[Wed Oct 16 11:29:55 2019] IPv6: ADDRCONF(NETDEV_UP): veth398d220: link is not ready
[Wed Oct 16 11:29:55 2019] br-884c2e680dd2: port 8(veth398d220) entered blocking state
[Wed Oct 16 11:29:55 2019] br-884c2e680dd2: port 8(veth398d220) entered forwarding state
[Wed Oct 16 11:29:55 2019] eth0: renamed from veth69a99ec
[Wed Oct 16 11:29:55 2019] IPv6: ADDRCONF(NETDEV_CHANGE): veth398d220: link becomes ready
[Wed Oct 16 11:33:11 2019] veth69a99ec: renamed from eth0
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth398d220) entered disabled state
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth398d220) entered disabled state
[Wed Oct 16 11:33:11 2019] device veth398d220 left promiscuous mode
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth398d220) entered disabled state
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered blocking state
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered disabled state
[Wed Oct 16 11:33:11 2019] device veth17ce7b4 entered promiscuous mode
[Wed Oct 16 11:33:11 2019] IPv6: ADDRCONF(NETDEV_UP): veth17ce7b4: link is not ready
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered blocking state
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered forwarding state
[Wed Oct 16 11:33:11 2019] eth0: renamed from veth964b38b
[Wed Oct 16 11:33:11 2019] IPv6: ADDRCONF(NETDEV_CHANGE): veth17ce7b4: link becomes ready
[Wed Oct 16 11:36:27 2019] veth964b38b: renamed from eth0
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered disabled state
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered disabled state
[Wed Oct 16 11:36:27 2019] device veth17ce7b4 left promiscuous mode
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered disabled state
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(vethb123857) entered blocking state
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(vethb123857) entered disabled state
[Wed Oct 16 11:36:27 2019] device vethb123857 entered promiscuous mode
[Wed Oct 16 11:36:27 2019] IPv6: ADDRCONF(NETDEV_UP): vethb123857: link is not ready
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(vethb123857) entered blocking state
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(vethb123857) entered forwarding state
[Wed Oct 16 11:36:28 2019] eth0: renamed from veth7beac03
[Wed Oct 16 11:36:28 2019] IPv6: ADDRCONF(NETDEV_CHANGE): vethb123857: link becomes ready
Approximately every 3 minutes, a virtual network interface was being added. This coincides with a periodic restart of a faulty container.
Based on this, we hypothesize that frequent and repeating addition and removal of network interfaces is a problem for SystemRegistry. To test our hypothesis (and system resilience 😄), we decided to delete all ETS tables owned by SystemRegistry, to see how this would effect memory usage:
So total memory usage dropped a from 383043120 to 88041888, approx. 4 times reduction. (And a little later the system rebooted to recover itself - hurray).
I don't know what is causing the slow leak of memory in this case. When the docker container is rebooted, the virtual network interface is removed from the system (it doesn't show up in ip addr). But for some reason, the size of the ETS table keeps growing.
Expected behavior
Memory usage should be stable regardless how many times an interface is added and removed.
The text was updated successfully, but these errors were encountered:
We're experiencing a memory leak on one of our Nerves devices that seems to be related to SystemRegistry's use of ETS tables. See plot below for memory from the last 14 days. We're not 100% sure about the cause of the issue, but it seems to be caused by frequent uevent messages. We will try and generate some more data on this over the next couple of days. I am happy to provide any info you need if I can, so please let me know.
Environment
elixir -v
): 1.8.1mix nerves.env --info
):may help
We're running Nerves on an Intel Compute Stick (STK1A32SC) using a custom system based on nerves_system_x86_64. Our Nerves application starts Docker, and Docker manages a couple of containers.
Whenever a Docker container is started it creates a new virtual network interface. And when a faulty container is restarted over and over and over again, that generates a lot of virtual network interfaces 😄 and thus a lot of uevent messages.
Current behavior
After noticing the rising memory use on a device, we decided to investigate. Using htop we found that the BEAM was using about 55% of RAM. On a freshly booted device, the BEAM uses about 5% and it usually stabilises around 12% of RAM. The device has 2 GB of RAM.
Comparing the memory usage reported by
:erlang.memory/0
on the unhealthy node with a healthy node we found the following:Seeing that ETS seemed to be using the most memory, we sorted all ETS tables by their memory usage (using the
:memory
stat reported by:ets.info/1
) and found that the heavy hitters were tables owned by SystemRegistry. We looked todmesg
to see if it could find anything odd, and found repeating patterns like this:Approximately every 3 minutes, a virtual network interface was being added. This coincides with a periodic restart of a faulty container.
Based on this, we hypothesize that frequent and repeating addition and removal of network interfaces is a problem for SystemRegistry. To test our hypothesis (and system resilience 😄), we decided to delete all ETS tables owned by SystemRegistry, to see how this would effect memory usage:
So total memory usage dropped a from 383043120 to 88041888, approx. 4 times reduction. (And a little later the system rebooted to recover itself - hurray).
I don't know what is causing the slow leak of memory in this case. When the docker container is rebooted, the virtual network interface is removed from the system (it doesn't show up in
ip addr
). But for some reason, the size of the ETS table keeps growing.Expected behavior
Memory usage should be stable regardless how many times an interface is added and removed.
The text was updated successfully, but these errors were encountered: