server memory leak #1662

totaam · 2017-10-17T00:19:04Z

Issue migrated from trac ticket # 1662

component: server | priority: critical | resolution: worksforme

2017-10-17 00:19:04: nathan-renniewaldock created the issue

It looks like there's a memory leak somewhere. After running firefox for 19 days, xpra was using over 8GB RAM.

$ ps -eo rss,etime,cmd | grep :64
8912696 19-06:49:25 /usr/bin/python /usr/bin/xpra --bind-tcp=127.0.0.1:15065 --no-daemon --tcp-auth=file:filename=/home/nathan/.winswitch/server/sessions/64/session.pass --systemd-run=no start :64
38588 19-06:49:23 /usr/lib/xorg/Xorg-for-Xpra-:64 -noreset -novtswitch -nolisten tcp +extension GLX +extension RANDR +extension RENDER -auth /home/nathan/.Xauthority -logfile /run/user/1002/xpra/Xorg.:64.log -configdir /home/nathan/.xpra/xorg.conf.d -config /etc/xpra/xorg.conf -depth 24 :64

$ xpra info :64 | grep memory
server.total-memory=50642325504
threads.memory.children.idrss=0
threads.memory.children.inblock=73656
threads.memory.children.isrss=0
threads.memory.children.ixrss=0
threads.memory.children.majflt=285
threads.memory.children.maxrss=9292524
threads.memory.children.minflt=419532
threads.memory.children.msgrcv=0
threads.memory.children.msgsnd=0
threads.memory.children.nivcsw=3749
threads.memory.children.nsignals=0
threads.memory.children.nswap=0
threads.memory.children.nvcsw=2449
threads.memory.children.oublock=16
threads.memory.children.stime=11
threads.memory.children.utime=8
threads.memory.server.idrss=0
threads.memory.server.inblock=23201440
threads.memory.server.isrss=0
threads.memory.server.ixrss=0
threads.memory.server.majflt=412756
threads.memory.server.maxrss=10516220
threads.memory.server.minflt=25829853
threads.memory.server.msgrcv=0
threads.memory.server.msgsnd=0
threads.memory.server.nivcsw=51333893
threads.memory.server.nsignals=0
threads.memory.server.nswap=0
threads.memory.server.nvcsw=222115335
threads.memory.server.oublock=14216
threads.memory.server.stime=5686
threads.memory.server.utime=129036

Ubuntu 17.04 x64, xpra 2.1.2-16903

Currently running glxgears with XPRA_DETECT_LEAKS=1. Anything else I can do to help track this down?

The text was updated successfully, but these errors were encountered:

totaam · 2017-10-17T02:40:34Z

Can you try turning off as many features as you can (sound forwarding, etc) to see if that helps?
Do you need to have any screen activity to trigger it? Does the window have to be shown? Or does it leak no matter what?
Reproducing with glxgears would help.
It would also be useful to see if using mmap (local connection) still leaks.
Another interesting test would be to run the server with XPRA_SCROLL_ENCODING=0 xpra start ... and see if that helps.

totaam · 2017-11-03T12:58:11Z

I can reproduce it with gtkperf -a in a loop.

totaam · 2017-11-04T17:21:59Z

Watching the server memory usage with xpra info | grep server.maxrss=, then running ./tests/xpra/test_apps/simulate_console_user.py in an xterm, the value goes up regularly by about ~0.2 to 2KB/s.
This also happens with mmap.
When re-connecting with a new client, the increase only occurs after the memory usage has reached the point where it left off when the previous client disconnected.

First had to fix memleak debugging (XPRA_DETECT_MEMLEAKS=1 xpra start ..) which broke with (newer versions?) numpy: r17300. (r17302 also helps debugging)

Then found a leak in the protocol layer, so "xpra info" would leak yet more memory when I was trying to find where the real leak was... fixed in r17299.
And then another leak in the window source class fixed in r17301.

Both of those should be backported.
I'll let it run for a few hours more to try to see if there are more leaks to be found..

totaam · 2017-11-04T17:22:52Z

2017-11-04 17:22:52: antoine uploaded file `memleak-examples.patch` (1.1 KiB)

example of patch to enable memleak debugging for the classes that seemed to cause problems

totaam · 2017-11-05T12:00:19Z

There are still some small leaks, so:

r17307 improves leak debugging, can generate graphs using objgraph
r17306 + r17308: potential leaks of protocol instances - not sure if these should be backported (a little bit too intrusive)
r17309, r17310: code refactoring, r17311: minor bug fix

What makes this particularly difficult is that the leak debugging slows things down dramatically and blocking the main thread, so it can cause things to get backed up so much that they look like leaks when they're not.

Another problem is the "traceback reference cycle problem"
(Exception leaks in Python 2 and 3).

And more importantly, we're still leaking somewhere as this gets printed every time the leak detection code runs (always exactly the same leak count):

leaks: count : object
      15 :                             cell :   1469 matches
      14 :                            tuple :   4117 matches
      13 :                            frame :   1017 matches
       2 :                             list :   4145 matches

totaam · 2017-11-05T18:20:02Z

By turning off the ping feature, the leaks are reduced. It also looks like generating network traffic (ie: moving the mouse around) also causes more leaking.

I suspect that this comes from the non-blocking socket timeouts, like this shown at debug level:

untilConcludes(\
    <bound method SocketConnection.is_active of unix-domain socket:/run/user/1000/xpra/desktop-3>, \
    <bound method SocketConnection.can_retry of unix-domain socket:/run/user/1000/xpra/desktop-3>, \
    <built-in method recv of _socket.socket object at 0x7f6ab0585b90>, \
    (65536,), {}) timed out, retry=socket.timeout
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/xpra/net/bytestreams.py", line 101, in untilConcludes
    return f(*a, **kw)
timeout: timed out

totaam · 2017-11-06T08:31:05Z

Lots of related changes:

some cosmetic: r17313, r17314, r17316, r17317, r17318, r17320
leak debug related: r17326
some fixing potential leaks: r17319, r17321, r17322, r17323 (+r17324 fixup), r17325

The main leak is still there though...

totaam · 2017-11-06T13:38:31Z

r17328 (+r17330 fixup) fixes a leak caused by logging.
The alternative fix would be to add a kwargs option to not track the loggers when we know we're not going to be re-using them.

Dumping all the cell objects (matched by type string since there does not seem to be a python type exposed for it), the recurring entries seem to be:

2017-11-06 17:31:17,250 [355] '<cell at 0x7f248ea09fa0: list object at 0x7f2486445518>': '[{\'__setattr__\': <slot wrapper \'__setattr__\ ..  124: <type \'set\'>}, (VideoSubregion(None),)]'
2017-11-06 17:31:17,250 [356] '<cell at 0x7f248ea09ef8: type object at 0x7f24bb122c60>': "<type 'frame'>"
2017-11-06 17:31:17,250 [357] '<cell at 0x7f248ea09e18: tuple object at 0x55c31868b020>': '(<frame object at 0x7f24bb20c790>, <frame objec .. 5c319abbd50>, <frame object at 0x7f2470003610>)'
2017-11-06 17:31:17,250 [358] '<cell at 0x7f248ea09ec0: dict object at 0x7f248e514050>': "{1: <type 'list'>, 4: <type 'cell'>}"

Not sure where they're from yet... could even be the leak debugging code itself.

totaam · 2017-11-07T16:00:42Z

Left "xpra info" running in a loop for 4 hours and those leaks are definitely gone.
However, gtkperf -a still causes another leak - and a pretty big one. At least now we can measure things without causing further misleading leaks:

r17332: more leak avoidance in exception handling
r17333: more thorough and reliable cleanup of window-source objects
r17334: dumps all known frames on SIGUSR2

totaam · 2017-11-10T16:24:55Z

2017-11-10 16:24:55: antoine uploaded file `leak-show-lists.patch` (2.2 KiB)

show the lists that leak and their backref (applies to r17356)

totaam · 2017-11-10T16:37:40Z

More improvements:

r17349 switch to using pympler for object tracking
r17356 do leak detection in a thread (so we don't lock up the server)
r17352 use structured data for "xpra info" response (flatten client side)
r17350: don't records fds if we don't need them
r17345 + r17347: keep track of timers, and cancel them when no longer needed (ie: during cleanup)
r17348 + r17351: avoid using closures
r17346: make object tracking more legible

This is hard...

totaam · 2017-11-11T04:57:13Z

Related improvements: r17358 + r17360: avoid churn

I think the leaks are gone (at least the big ones), it just takes a very long time for the maxrss value to settle on its high water mark, probably because of memory fragmentation.

It would be worth playing with MALLOC_MMAP_THRESHOLD_ to validate this assumption, but I've already spent far too much time on this ticket.

@nathan-renniewaldock: can I close this?

totaam closed this as completed Nov 24, 2017

totaam mentioned this issue Jan 22, 2021

xpra server leaks memory #2730

Closed

totaam mentioned this issue May 11, 2021

client side memory leak #3123

Closed

totaam mentioned this issue May 28, 2023

xpra ram memory footprint #3869

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server memory leak #1662

server memory leak #1662

totaam commented Oct 17, 2017

totaam commented Oct 17, 2017 •

edited

Loading

totaam commented Nov 3, 2017 •

edited

Loading

totaam commented Nov 4, 2017 •

edited

Loading

totaam commented Nov 4, 2017

totaam commented Nov 5, 2017 •

edited

Loading

totaam commented Nov 5, 2017 •

edited

Loading

totaam commented Nov 6, 2017 •

edited

Loading

totaam commented Nov 6, 2017 •

edited

Loading

totaam commented Nov 7, 2017 •

edited

Loading

totaam commented Nov 10, 2017

totaam commented Nov 10, 2017 •

edited

Loading

totaam commented Nov 11, 2017 •

edited

Loading

server memory leak #1662

server memory leak #1662

Comments

totaam commented Oct 17, 2017

2017-10-17 00:19:04: nathan-renniewaldock created the issue

totaam commented Oct 17, 2017 • edited Loading

totaam commented Nov 3, 2017 • edited Loading

totaam commented Nov 4, 2017 • edited Loading

totaam commented Nov 4, 2017

2017-11-04 17:22:52: antoine uploaded file memleak-examples.patch (1.1 KiB)

totaam commented Nov 5, 2017 • edited Loading

totaam commented Nov 5, 2017 • edited Loading

totaam commented Nov 6, 2017 • edited Loading

totaam commented Nov 6, 2017 • edited Loading

totaam commented Nov 7, 2017 • edited Loading

totaam commented Nov 10, 2017

2017-11-10 16:24:55: antoine uploaded file leak-show-lists.patch (2.2 KiB)

totaam commented Nov 10, 2017 • edited Loading

totaam commented Nov 11, 2017 • edited Loading

totaam commented Oct 17, 2017 •

edited

Loading

totaam commented Nov 3, 2017 •

edited

Loading

totaam commented Nov 4, 2017 •

edited

Loading

2017-11-04 17:22:52: antoine uploaded file `memleak-examples.patch` (1.1 KiB)

totaam commented Nov 5, 2017 •

edited

Loading

totaam commented Nov 5, 2017 •

edited

Loading

totaam commented Nov 6, 2017 •

edited

Loading

totaam commented Nov 6, 2017 •

edited

Loading

totaam commented Nov 7, 2017 •

edited

Loading

2017-11-10 16:24:55: antoine uploaded file `leak-show-lists.patch` (2.2 KiB)

totaam commented Nov 10, 2017 •

edited

Loading

totaam commented Nov 11, 2017 •

edited

Loading