Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible VM Memory Leak #24129

Closed
azenla opened this issue Aug 19, 2015 · 25 comments
Closed

Possible VM Memory Leak #24129

azenla opened this issue Aug 19, 2015 · 25 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P1 A high priority bug; for example, a single project is unusable or has many test failures
Milestone

Comments

@azenla
Copy link
Contributor

azenla commented Aug 19, 2015

We have found a possible memory leak in the Dart VM. I have created a ZIP file with the stuff needed to reproduce this issue.

ZIP file: http://files.directcode.io/dart-leak-broker.zip

Steps:

  1. Download and extract the ZIP file.
  2. Run dart --observe broker.dart
  3. Run dart responder.dart
  4. Run dart requester.dart

You will need to run these commands in 3 terminal sessions at the same time.

If you look at the Observatory on broker.dart, you will see the memory use is large. However, when looking at the memory usage of the process on the Operating System side, it's much higher.
In one of our tests, our heap was 32mb. However, the process was actually using 700mb of memory.

Source: https://github.com/IOT-DSA/sdk-dslink-dart

@iposva-google iposva-google added P1 A high priority bug; for example, a single project is unusable or has many test failures area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. labels Aug 19, 2015
@johnmccutchan
Copy link
Contributor

Probably unrelated but I fixed a leak in ThreadRegistry yesterday (a Monitor was leaked for every isolate). Fix is here: https://github.com/dart-lang/sdk/blob/master/runtime/vm/thread_registry.cc#L30

@sethladd
Copy link
Contributor

@kaendfinger can you add the OS and the Dart VM version you were using? Thanks!

@azenla
Copy link
Contributor Author

azenla commented Aug 19, 2015

@sethladd Sure

OS: Linux x64 (Ubuntu 14.04 Server on Amazon EC2)
Dart VM: v1.11.3

I also reproduced it on a v1.13.0 Git Build at commit e2f6739.

@sethladd
Copy link
Contributor

Thanks @kaendfinger . Any chance you can try 1.12-dev ?

@azenla
Copy link
Contributor Author

azenla commented Aug 19, 2015

@sethladd Sure, I should be able to try it in a few hours. I'll report the result ASAP.

@azenla
Copy link
Contributor Author

azenla commented Aug 19, 2015

@sethladd I don't believe I am seeing the leak in the 1.12-dev. I'm getting a very stable memory use from the system stats. While I was investigating, I wanted to know what the relationship between what Observatory reports for memory usage and what the system reports for memory usage is. Should I expect these numbers to be close together?

@zanderso
Copy link
Member

I was able to reproduce using 1.12.0-dev.5.5. I will continue investigating.

@zanderso
Copy link
Member

@kaendfinger could you explain a little about what this program is doing? Is it just an http server responding to requests from the responder and requester, or is there also something else going on that could give us a hint? Thanks!

@azenla
Copy link
Contributor Author

azenla commented Aug 19, 2015

@zanderso After running it for a bit longer, it started producing the leak again.

Broker is an HTTP server that allows WebSocket connections and allows each WebSocket client to communicate with each other using our protocol.
Responder is a WebSocket client that communicates using our protocol and provides data.
Requester is a WebSocket client that communicates using our protocol and subscribes to a data stream from the Responder. With some additional caching and other details like that. It measures how many data updates it receives in 1 second. All of the protocol is in JSON.

@zanderso
Copy link
Member

@kodandersson Please investigate the proper deallocation of the InterruptableThreadState objects.

@johnmccutchan
Copy link
Contributor

Related to this- an isolate's ThreadRegistry remembers every thread that an isolate ever runs on. This list is never pruned, so even after a thread exits- the ThreadRegistry is holding onto a ThreadRegistry::Entry for each thread that entered the isolate. We could get into a bad state where an isolate is run on more and more threads and memory usage will grow by a (small) amount for each thread.

@iposva-google
Copy link
Contributor

@kaendfinger:

While I was investigating, I wanted to know what the relationship between what Observatory reports for memory usage and what the system reports for memory usage is. Should I expect these numbers to be close together?

The Observatory reports the Dart heap capacity in use, separately for the new and old generations. There are C++ based allocations in the VM within your process which are not being exposed in the Observatory, but are included in the system reports. The difference between the two numbers comes from VM internal data structures, networking buffers, thread stacks, ...

@azenla
Copy link
Contributor Author

azenla commented Aug 20, 2015

@iposva-google Thanks for the explanation.

@mit-mit
Copy link
Member

mit-mit commented Aug 20, 2015

Tagging for 1.12 consideration

@mit-mit mit-mit added this to the 1.12 milestone Aug 20, 2015
@kodandersson
Copy link
Contributor

This fixes leaks related to StoreBuffer:
https://codereview.chromium.org/1309603003/

kodandersson added a commit that referenced this issue Aug 21, 2015
…locks.

Add StoreBuffer::ShutDown to delete global cache of free blocks on VM shutdown.

BUG=#24129
[email protected]

Review URL: https://codereview.chromium.org//1309603003 .
whesse added a commit that referenced this issue Aug 24, 2015
Cherry-picks and associated conflict resolution for:
664742f
d0b03ee
3aa62d0
74e0278
028c995

=====

Keep StoreBuffer::global_mutex_ and global_empty_ alive.

At Dart::CleanUp time, there may still be threads around that access their store buffer, so we need to keep these around until @zanderso lands the clean shutdown CL.

We should stop leaking them once we cleanly stop all threads at VM shutdown (see issue 23844).

BUG=#24169
[email protected]

Review URL: https://codereview.chromium.org//1305123003 .

Conflicts:
	runtime/vm/dart.cc

Fix reuse of free store buffer blocks from the global cache of free blocks.

Add StoreBuffer::ShutDown to delete global cache of free blocks on VM shutdown.

BUG=#24129
[email protected]

Review URL: https://codereview.chromium.org//1309603003 .

Conflicts:
	runtime/vm/dart.cc

- Prevent interrupting the NULL thread when the isolate is   exiting from within a thread.

BUG=

Review URL: https://codereview.chromium.org//1307833002 .

- Initialize fields used in the profiler interrupts.

BUG=
[email protected]

Review URL: https://codereview.chromium.org//1304043003 .

Completely remove InterruptableThreadState

- InterruptableThreadState is gone.
- Moved InterruptableThreadState fields directly into Thread.
- Iterate over all threads in an isolate when profiling.
- Still only sample the mutator thread.

Fix ThreadRegistry leak

- When deleting a Thread, iterate over all isolates and remove it from the isolate's thread registry.

[email protected]

Review URL: https://codereview.chromium.org//1293253005 .

Conflicts:
	runtime/vm/isolate.h
	runtime/vm/thread.cc
	runtime/vm/thread.h
	runtime/vm/thread_interrupter.cc
	runtime/vm/thread_interrupter_android.cc
	runtime/vm/thread_interrupter_linux.cc
	runtime/vm/thread_interrupter_macos.cc

- Fix leak in TimelineEventRecorder.
- Fix leak in ThreadRegistry.

patch from issue 1305353003 at patchset 1 (http://crrev.com/1305353003#ps1)
@whesse
Copy link
Contributor

whesse commented Aug 24, 2015

I believe that this issue is fixed, and the fixes were merged into 1.12.0-dev.5.9. Please verify and close this issue.

@azenla
Copy link
Contributor Author

azenla commented Aug 24, 2015

@whesse Ok I will check them out ASAP.

@dglogik
Copy link

dglogik commented Aug 25, 2015

The memory leak is verified as fixed. Thank you very much. There is another issue with same test perhaps unrelated; however we're seeing very high CPU usage and observatory is showing 0% CPU utilization.

@azenla
Copy link
Contributor Author

azenla commented Aug 25, 2015

@johnmccutchan I believe this would be in your area. To elaborate on the CPU Usage at 0% thing, we are actually getting a Pi Chart that says 100% Idle, even though it's doing constant work.

Also, when I open a CPU Profile, it's empty.

Pi Chart
CPU Profile

@johnmccutchan
Copy link
Contributor

What program can I use to reproduce this?

@azenla
Copy link
Contributor Author

azenla commented Aug 25, 2015

@johnmccutchan This is profiling broker.dart
If you would like, here is the live thing: http://benchmark.iot-dsa.org:8181/#/vm

@johnmccutchan
Copy link
Contributor

@kaendfinger I need a local reproducible so I can diagnose the problem

@whesse
Copy link
Contributor

whesse commented Aug 25, 2015

Can we start a different bug for the CPU usage issue - this 1.12 milestone issue about the VM memory leak should be closed. Post a link to the new issue to this bug, when it is created.

@whesse whesse closed this as completed Aug 25, 2015
@johnmccutchan
Copy link
Contributor

@kaendfinger The CPU profile issue is likely a known bug with programs that mostly process events quickly. We schedule a profile tick and the isolate finishes processing messages before it is sampled. A local reproduction would be helpful to use while I figure out the correct heurestic for this type of program.

@azenla
Copy link
Contributor Author

azenla commented Aug 25, 2015

@johnmccutchan I will open a new issue for this in just a little bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P1 A high priority bug; for example, a single project is unusable or has many test failures
Projects
None yet
Development

No branches or pull requests

9 participants