Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETS memory leak #41

Open
trarbr opened this issue Oct 16, 2019 · 1 comment
Open

ETS memory leak #41

trarbr opened this issue Oct 16, 2019 · 1 comment

Comments

@trarbr
Copy link

trarbr commented Oct 16, 2019

We're experiencing a memory leak on one of our Nerves devices that seems to be related to SystemRegistry's use of ETS tables. See plot below for memory from the last 14 days. We're not 100% sure about the cause of the issue, but it seems to be caused by frequent uevent messages. We will try and generate some more data on this over the next couple of days. I am happy to provide any info you need if I can, so please let me know.

Screen Shot 2019-10-16 at 14 17 16

Environment

  • Elixir version (elixir -v): 1.8.1
  • Nerves environment: (mix nerves.env --info):
|nerves_bootstrap| Environment Package List

  Pkg:         kit_x86_64
  Vsn:         1.9.0
  Type:        system
  BuildRunner: {Nerves.Artifact.BuildRunners.Docker, []}

  Pkg:         nerves_toolchain_ctng
  Vsn:         1.5.0
  Type:        toolchain_platform
  BuildRunner: {nil, []}

  Pkg:         nerves_toolchain_x86_64_unknown_linux_gnu
  Vsn:         1.1.0
  Type:        toolchain
  BuildRunner: {Nerves.Artifact.BuildRunners.Local, []}

  Pkg:         nerves_system_br
  Vsn:         1.7.1
  Type:        system_platform
  BuildRunner: {nil, []}

|nerves_bootstrap| Loadpaths Start

Nerves environment
  MIX_TARGET:   intel_STK1A32SC
  MIX_ENV:      dev

|nerves_bootstrap| Environment Variable List
  target:     intel_STK1A32SC
  toolchain:  /Users/troels/.nerves/artifacts/nerves_toolchain_x86_64_unknown_linux_gnu-darwin_x86_64-1.1.0
  system:     /Users/troels/.nerves/artifacts/kit_x86_64-portable-1.9.0
  • Additional information about your host, target hardware or environment that
    may help

We're running Nerves on an Intel Compute Stick (STK1A32SC) using a custom system based on nerves_system_x86_64. Our Nerves application starts Docker, and Docker manages a couple of containers.

Whenever a Docker container is started it creates a new virtual network interface. And when a faulty container is restarted over and over and over again, that generates a lot of virtual network interfaces 😄 and thus a lot of uevent messages.

Current behavior

After noticing the rising memory use on a device, we decided to investigate. Using htop we found that the BEAM was using about 55% of RAM. On a freshly booted device, the BEAM uses about 5% and it usually stabilises around 12% of RAM. The device has 2 GB of RAM.

Comparing the memory usage reported by :erlang.memory/0 on the unhealthy node with a healthy node we found the following:

allocator unhealthy device healthy device comparison (unhealthy / healthy)
atom 594561 586369 1.01
atom_used 580202 559360 1.04
binary 1920088 2032104 0.94
code 13592541 13274957 1.02
ets 296828960 37491096 7.92
processes 61425888 46360584 1.32
processes_used 61424824 46359520 1.32
system 321617232 61810016 5.2
total 383043120 108170600 3.54

Seeing that ETS seemed to be using the most memory, we sorted all ETS tables by their memory usage (using the :memory stat reported by :ets.info/1) and found that the heavy hitters were tables owned by SystemRegistry. We looked to dmesg to see if it could find anything odd, and found repeating patterns like this:

[Wed Oct 16 11:29:54 2019] veth0122b00: renamed from eth0
[Wed Oct 16 11:29:54 2019] br-884c2e680dd2: port 8(veth59b04b9) entered disabled state
[Wed Oct 16 11:29:54 2019] br-884c2e680dd2: port 8(veth59b04b9) entered disabled state
[Wed Oct 16 11:29:54 2019] device veth59b04b9 left promiscuous mode
[Wed Oct 16 11:29:54 2019] br-884c2e680dd2: port 8(veth59b04b9) entered disabled state
[Wed Oct 16 11:29:55 2019] br-884c2e680dd2: port 8(veth398d220) entered blocking state
[Wed Oct 16 11:29:55 2019] br-884c2e680dd2: port 8(veth398d220) entered disabled state
[Wed Oct 16 11:29:55 2019] device veth398d220 entered promiscuous mode
[Wed Oct 16 11:29:55 2019] IPv6: ADDRCONF(NETDEV_UP): veth398d220: link is not ready
[Wed Oct 16 11:29:55 2019] br-884c2e680dd2: port 8(veth398d220) entered blocking state
[Wed Oct 16 11:29:55 2019] br-884c2e680dd2: port 8(veth398d220) entered forwarding state
[Wed Oct 16 11:29:55 2019] eth0: renamed from veth69a99ec
[Wed Oct 16 11:29:55 2019] IPv6: ADDRCONF(NETDEV_CHANGE): veth398d220: link becomes ready
[Wed Oct 16 11:33:11 2019] veth69a99ec: renamed from eth0
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth398d220) entered disabled state
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth398d220) entered disabled state
[Wed Oct 16 11:33:11 2019] device veth398d220 left promiscuous mode
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth398d220) entered disabled state
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered blocking state
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered disabled state
[Wed Oct 16 11:33:11 2019] device veth17ce7b4 entered promiscuous mode
[Wed Oct 16 11:33:11 2019] IPv6: ADDRCONF(NETDEV_UP): veth17ce7b4: link is not ready
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered blocking state
[Wed Oct 16 11:33:11 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered forwarding state
[Wed Oct 16 11:33:11 2019] eth0: renamed from veth964b38b
[Wed Oct 16 11:33:11 2019] IPv6: ADDRCONF(NETDEV_CHANGE): veth17ce7b4: link becomes ready
[Wed Oct 16 11:36:27 2019] veth964b38b: renamed from eth0
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered disabled state
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered disabled state
[Wed Oct 16 11:36:27 2019] device veth17ce7b4 left promiscuous mode
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(veth17ce7b4) entered disabled state
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(vethb123857) entered blocking state
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(vethb123857) entered disabled state
[Wed Oct 16 11:36:27 2019] device vethb123857 entered promiscuous mode
[Wed Oct 16 11:36:27 2019] IPv6: ADDRCONF(NETDEV_UP): vethb123857: link is not ready
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(vethb123857) entered blocking state
[Wed Oct 16 11:36:27 2019] br-884c2e680dd2: port 8(vethb123857) entered forwarding state
[Wed Oct 16 11:36:28 2019] eth0: renamed from veth7beac03
[Wed Oct 16 11:36:28 2019] IPv6: ADDRCONF(NETDEV_CHANGE): vethb123857: link becomes ready

Approximately every 3 minutes, a virtual network interface was being added. This coincides with a periodic restart of a faulty container.

Based on this, we hypothesize that frequent and repeating addition and removal of network interfaces is a problem for SystemRegistry. To test our hypothesis (and system resilience 😄), we decided to delete all ETS tables owned by SystemRegistry, to see how this would effect memory usage:

iex> :ets.all() |> Enum.map(fn table -> :ets.info(table) end) |> Enum.filter(fn table -> to_string(table[:name]) |> String.starts_with?("Elixir.SystemRegistry") end) |> Enum.each(fn table -> :ets.delete(table[:id]) end)
:ok
iex> :erlang.memory |> Enum.sort_by(fn {a, size} -> size end)
[
  atom_used: 580285,
  atom: 594561,
  ets: 1062752,
  binary: 1991296,
  code: 13598749,
  system: 25211040,
  processes_used: 62829632,
  processes: 62830848,
  total: 88041888
]

So total memory usage dropped a from 383043120 to 88041888, approx. 4 times reduction. (And a little later the system rebooted to recover itself - hurray).

I don't know what is causing the slow leak of memory in this case. When the docker container is rebooted, the virtual network interface is removed from the system (it doesn't show up in ip addr). But for some reason, the size of the ETS table keeps growing.

Expected behavior

Memory usage should be stable regardless how many times an interface is added and removed.

@fhunleth
Copy link
Contributor

😱

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants