task-related memory leaks #6597

simonster · 2014-04-21T23:52:37Z

This is split off from #6567, since the memory leak seems to be a separate issue from the crash. That leak happened because getnext_tasklet got leaked. In fact, any closure that gets used in a sync/async block seems to get leaked. That is pretty easy to reproduce:

function f()
    a = zeros(2^28)
    g() = a
    @sync @async g()
    nothing
end
f()
gc()

results in Julia holding on to 2GB. The leak happens in part because current_task().last is still around, and that holds a reference to the task, which holds a reference to g.

I tried setting current_task().last = current_task(), but Julia still seems to hold on to the memory, I think because g is referenced from Main's constant_table. This does not leak:

function f()
    a = zeros(2^28)
    g = ()->a
    @sync @async g()
    current_task().last = current_task()
    nothing
end
f()
gc()

The text was updated successfully, but these errors were encountered:

amitmurthy · 2014-11-11T12:23:47Z

I don't know if it is related, but the following code also causes a leak.

    @schedule begin
        put!(RemoteRef(), ones(10^8))
        while true
         sleep(1.0)
        end 
    end

When the task exits, the memory is freed, but as long as the task is around, no amount of calling gc() helps.

amitmurthy · 2014-12-12T11:39:22Z

Just

@schedule begin
  ones(10^8)
  while true
    gc()
    sleep(1.0)
  end
end

does not release the unreferenced temporary array.

vtjnash · 2014-12-16T18:50:42Z

In the second two examples, the async function has not returned yet. This is an example of a place where an early free of the gc slot would be desirable, but not (fully) implemented yet.

vtjnash · 2014-12-16T18:54:30Z

@JeffBezanson in finish_task, should we NULL the task->start reference (Or perhaps from start_task)?

simonster · 2014-12-17T05:47:47Z

This seems to leak with no task involved:

function f()
    a = zeros(2^28)
    g() = a
    (()->g())()
    nothing
end

amitmurthy · 2014-12-17T06:16:03Z

Yes, and only the first invocation results in a leak. Repeat invocations are garbage collected.

amitmurthy · 2014-12-17T06:32:32Z

Redefining f() results in a new leak though - every definition and the first invocation results in a leak.

amitmurthy · 2015-02-24T01:17:21Z

Bump. https://groups.google.com/d/msg/julia-users/q39vyGQF4Fs/pmbBdsiP6kQJ

I think this might be related to much of the parallel memory leak issues opened here. I have no idea on how to proceed to fix this. Any pointers?

amitmurthy · 2015-03-30T04:44:45Z

Bump again. Can we target this for 0.4 instead of 0.4.1 ? Memory leaks in parallel code keep getting reported (latest one - JuliaParallel/DistributedArrays.jl#11) and I suspect it is due to this.

ViralBShah · 2015-03-30T10:09:16Z

Changed to 0.4.0.

carnaval · 2015-04-03T18:34:42Z

So if I'm reading things correctly, this suffices to reproduce :

function f()
    a = alloc()
    g() = a
    nothing
end
f()

The allocated a is reachable through the environment of a Function which is stored as the unspecialized field of the lambda data of g, which ends up in the constant_table of Main. I'm not sure why since of course this instance of a is only valid in the env of the first closure instanciation for g.

simonster · 2015-04-03T19:07:00Z

@carnaval At least in my tests, it seems you have to call g to get the leak. It may be that a is added to the constant table when the type inferred AST for g is compressed.

carnaval · 2015-04-03T19:08:19Z

Yes sorry, that's what I get from typing the example from memory. You do have to call g() before returning from f.

vtjnash · 2015-04-17T00:11:50Z

the problem comes from the following lines of gf.c:

    if (newmeth->linfo != NULL && newmeth->linfo->sparams == jl_null) {
        // when there are no static parameters, one unspecialized version
        // of a function can be shared among all cached specializations.
        if (method->linfo->unspecialized == NULL) {
            method->linfo->unspecialized =
                jl_instantiate_method(method, jl_null);
            gc_wb(method->linfo, method->linfo->unspecialized);
        }    
        newmeth->linfo->unspecialized = method->linfo->unspecialized;
        gc_wb(newmeth->linfo, newmeth->linfo->unspecialized);
    }

in particular, this is resulting in method->env being copied to linfo->unspecialized->env the first time the lambda got called (where linfo is in the module constant_table due to ast compression)

i believe the comment is incomplete and that an unspecialized version of the function should also not be shared if method->env != jl_null @JeffBezanson

an unspecialized function should not share it's env variable with other closures. this sets env to NULL in those cases to prevent invalid access, and creates a new function with the proper env when attempting to call the unspecialized code.

timholy · 2015-04-22T22:13:12Z

Really nice, @vtjnash! I wonder if this will also plug the frequently-reported parallel leaks.

an unspecialized function should not share it's env variable with other closures. this sets env to NULL in those cases to prevent invalid access, and creates a new function with the proper env when attempting to call the unspecialized code.

cberzan · 2016-07-27T21:50:48Z

This issue is still present. If you paste the snippet originally posted into a julia REPL, there is no leak. But if you place the snippet in a file and execute the file, there is a leak.

To reproduce, put the following in example.jl:

function f()
    a = zeros(2^28)
    g() = a
    @sync @async g()
    nothing
end
f()
gc()
print("Check top now...")
readline()

And then run julia example.jl. You will see that julia never releases the 2GB.

Also, the current_task().last = current_task() workaround does not seem to help. Are there any other known workarounds?

@simonster Could you confirm on your end, and reopen this issue?

julia> versioninfo()
Julia Version 0.4.6
Commit 2e358ce (2016-06-19 17:16 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

carnaval · 2016-07-27T22:54:35Z

Seems to be kept alive by a reference on the stack of another unrelated task (looks like the gc message one).
I'm assuming it's because of the way the scheduler works that a finished task can be referenced in the wait function in another task for a little while, although I'm not very familiar with this thing.
Still in vaccations so no time for deep investigation but I couldn't resist having a look :-)

vtjnash · 2016-07-28T05:44:34Z

Not exactly the same issue as OP, but seems similar enough. We can repro at the repl with the following code:

julia> function f()
           t = @schedule nothing
           finalizer(t, t->Core.println("finalized t"))
           wait(t)
           nothing
       end
f (generic function with 1 method)

julia> begin
       t2 = @schedule wait()
       f()
       gc() # this should run the finalizer for t, but doesn't
       end

it'll run the finalizer if we do any of the following:

run schedule(t2); yield() (we don't care what t2 does afterwards, it doesn't need to exit)
set t2 = nothing
set t2.stkbuf = C_NULL; t2.ssize = 0

Edit: ah, I see. This task was started from the now-blocked task, so it lives in the local variable frame there. We need to instead provide a way to transfer ownership of this gc reference from the local frame to jl_yieldto.

Using an extra indirection through a Ref allows jl_switch_task to ensure that a task-switch in wait() won't hold onto the Task in a gc-root after switching to it, since popping a Task from the Workqueue shouldn't cause it to get a gc-root in the current Task. fix #6597

Using an extra indirection through a Ref allows jl_switch_task to ensure that a task-switch in wait() won't hold onto the Task in a gc-root after switching to it, since popping a Task from the Workqueue shouldn't cause it to get a gc-root in the current Task. fix JuliaLang#6597

since save_stack can allocate, need to delay clearing the gc-root for the new task until between the allocation and the stack copy (bug introduced by the fix for #6597)

simonster mentioned this issue Apr 23, 2014

worker crashes on pmap with large array on Mac #6567

Closed

ihnorton added the bug label May 5, 2014

jakebolewski added this to the 0.4 milestone Oct 28, 2014

amitmurthy mentioned this issue Nov 7, 2014

DArray : memory not fully recovered upon gc() #8912

Closed

amitmurthy mentioned this issue Dec 13, 2014

unnecessary gc root? #9338

Closed

simonster mentioned this issue Dec 16, 2014

references to interpolated values persist in modules #9375

Closed

ViralBShah modified the milestones: 0.4.0, 0.4.1 Mar 30, 2015

timholy mentioned this issue Apr 14, 2015

pmap memory leak #10818

Closed

vtjnash closed this as completed in 3bbc5fc Apr 22, 2015

simonster mentioned this issue Jun 1, 2015

SharedArray not being garbage collected #9348

Closed

vtjnash reopened this Jul 28, 2016

vtjnash removed this from the 0.4.0 milestone Jul 28, 2016

vtjnash mentioned this issue Jul 13, 2017

prevent wait() from gc-rooting the next task #22786

Merged

vtjnash closed this as completed in #22786 Jul 18, 2017

vtjnash added a commit that referenced this issue Aug 7, 2017

avoid early deletion of current_task gc-root

b679648

since save_stack can allocate, need to delay clearing the gc-root for the new task until between the allocation and the stack copy (bug introduced by the fix for #6597)

vtjnash mentioned this issue Aug 7, 2017

avoid early deletion of current_task gc-root #23166

Merged

amitmurthy mentioned this issue Feb 22, 2018

Need to invoke gc explicitly to collect reassigned local variables. #26101

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

task-related memory leaks #6597

task-related memory leaks #6597

simonster commented Apr 21, 2014

amitmurthy commented Nov 11, 2014

amitmurthy commented Dec 12, 2014

vtjnash commented Dec 16, 2014

vtjnash commented Dec 16, 2014

simonster commented Dec 17, 2014

amitmurthy commented Dec 17, 2014

amitmurthy commented Dec 17, 2014

amitmurthy commented Feb 24, 2015

amitmurthy commented Mar 30, 2015

ViralBShah commented Mar 30, 2015

carnaval commented Apr 3, 2015

simonster commented Apr 3, 2015

carnaval commented Apr 3, 2015

vtjnash commented Apr 17, 2015

timholy commented Apr 22, 2015

cberzan commented Jul 27, 2016

carnaval commented Jul 27, 2016

vtjnash commented Jul 28, 2016 •

edited

Loading

task-related memory leaks #6597

task-related memory leaks #6597

Comments

simonster commented Apr 21, 2014

amitmurthy commented Nov 11, 2014

amitmurthy commented Dec 12, 2014

vtjnash commented Dec 16, 2014

vtjnash commented Dec 16, 2014

simonster commented Dec 17, 2014

amitmurthy commented Dec 17, 2014

amitmurthy commented Dec 17, 2014

amitmurthy commented Feb 24, 2015

amitmurthy commented Mar 30, 2015

ViralBShah commented Mar 30, 2015

carnaval commented Apr 3, 2015

simonster commented Apr 3, 2015

carnaval commented Apr 3, 2015

vtjnash commented Apr 17, 2015

timholy commented Apr 22, 2015

cberzan commented Jul 27, 2016

carnaval commented Jul 27, 2016

vtjnash commented Jul 28, 2016 • edited Loading

vtjnash commented Jul 28, 2016 •

edited

Loading