Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

task-related memory leaks #6597

Closed
simonster opened this issue Apr 21, 2014 · 18 comments · Fixed by #22786
Closed

task-related memory leaks #6597

simonster opened this issue Apr 21, 2014 · 18 comments · Fixed by #22786
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@simonster
Copy link
Member

This is split off from #6567, since the memory leak seems to be a separate issue from the crash. That leak happened because getnext_tasklet got leaked. In fact, any closure that gets used in a sync/async block seems to get leaked. That is pretty easy to reproduce:

function f()
    a = zeros(2^28)
    g() = a
    @sync @async g()
    nothing
end
f()
gc()

results in Julia holding on to 2GB. The leak happens in part because current_task().last is still around, and that holds a reference to the task, which holds a reference to g.

I tried setting current_task().last = current_task(), but Julia still seems to hold on to the memory, I think because g is referenced from Main's constant_table. This does not leak:

function f()
    a = zeros(2^28)
    g = ()->a
    @sync @async g()
    current_task().last = current_task()
    nothing
end
f()
gc()
@amitmurthy
Copy link
Contributor

I don't know if it is related, but the following code also causes a leak.

    @schedule begin
        put!(RemoteRef(), ones(10^8))
        while true
         sleep(1.0)
        end 
    end

When the task exits, the memory is freed, but as long as the task is around, no amount of calling gc() helps.

@amitmurthy
Copy link
Contributor

Just

@schedule begin
  ones(10^8)
  while true
    gc()
    sleep(1.0)
  end
end

does not release the unreferenced temporary array.

@vtjnash
Copy link
Member

vtjnash commented Dec 16, 2014

In the second two examples, the async function has not returned yet. This is an example of a place where an early free of the gc slot would be desirable, but not (fully) implemented yet.

@vtjnash
Copy link
Member

vtjnash commented Dec 16, 2014

@JeffBezanson in finish_task, should we NULL the task->start reference (Or perhaps from start_task)?

@simonster
Copy link
Member Author

This seems to leak with no task involved:

function f()
    a = zeros(2^28)
    g() = a
    (()->g())()
    nothing
end

@amitmurthy
Copy link
Contributor

Yes, and only the first invocation results in a leak. Repeat invocations are garbage collected.

@amitmurthy
Copy link
Contributor

Redefining f() results in a new leak though - every definition and the first invocation results in a leak.

@amitmurthy
Copy link
Contributor

Bump. https://groups.google.com/d/msg/julia-users/q39vyGQF4Fs/pmbBdsiP6kQJ

I think this might be related to much of the parallel memory leak issues opened here. I have no idea on how to proceed to fix this. Any pointers?

@amitmurthy
Copy link
Contributor

Bump again. Can we target this for 0.4 instead of 0.4.1 ? Memory leaks in parallel code keep getting reported (latest one - JuliaParallel/DistributedArrays.jl#11) and I suspect it is due to this.

@ViralBShah ViralBShah modified the milestones: 0.4.0, 0.4.1 Mar 30, 2015
@ViralBShah
Copy link
Member

Changed to 0.4.0.

@carnaval
Copy link
Contributor

carnaval commented Apr 3, 2015

So if I'm reading things correctly, this suffices to reproduce :

function f()
    a = alloc()
    g() = a
    nothing
end
f()

The allocated a is reachable through the environment of a Function which is stored as the unspecialized field of the lambda data of g, which ends up in the constant_table of Main. I'm not sure why since of course this instance of a is only valid in the env of the first closure instanciation for g.

@simonster
Copy link
Member Author

@carnaval At least in my tests, it seems you have to call g to get the leak. It may be that a is added to the constant table when the type inferred AST for g is compressed.

@carnaval
Copy link
Contributor

carnaval commented Apr 3, 2015

Yes sorry, that's what I get from typing the example from memory. You do have to call g() before returning from f.

@vtjnash
Copy link
Member

vtjnash commented Apr 17, 2015

the problem comes from the following lines of gf.c:

    if (newmeth->linfo != NULL && newmeth->linfo->sparams == jl_null) {
        // when there are no static parameters, one unspecialized version
        // of a function can be shared among all cached specializations.
        if (method->linfo->unspecialized == NULL) {
            method->linfo->unspecialized =
                jl_instantiate_method(method, jl_null);
            gc_wb(method->linfo, method->linfo->unspecialized);
        }    
        newmeth->linfo->unspecialized = method->linfo->unspecialized;
        gc_wb(newmeth->linfo, newmeth->linfo->unspecialized);
    }    

in particular, this is resulting in method->env being copied to linfo->unspecialized->env the first time the lambda got called (where linfo is in the module constant_table due to ast compression)

i believe the comment is incomplete and that an unspecialized version of the function should also not be shared if method->env != jl_null @JeffBezanson

vtjnash added a commit that referenced this issue Apr 21, 2015
an unspecialized function should not share it's env variable with other closures.
this sets env to NULL in those cases to prevent invalid access,
and creates a new function with the proper env when attempting to call the unspecialized code.
@timholy
Copy link
Member

timholy commented Apr 22, 2015

Really nice, @vtjnash! I wonder if this will also plug the frequently-reported parallel leaks.

vtjnash added a commit that referenced this issue Apr 27, 2015
an unspecialized function should not share it's env variable with other closures.
this sets env to NULL in those cases to prevent invalid access,
and creates a new function with the proper env when attempting to call the unspecialized code.
mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015
an unspecialized function should not share it's env variable with other closures.
this sets env to NULL in those cases to prevent invalid access,
and creates a new function with the proper env when attempting to call the unspecialized code.
@cberzan
Copy link

cberzan commented Jul 27, 2016

This issue is still present. If you paste the snippet originally posted into a julia REPL, there is no leak. But if you place the snippet in a file and execute the file, there is a leak.

To reproduce, put the following in example.jl:

function f()
    a = zeros(2^28)
    g() = a
    @sync @async g()
    nothing
end
f()
gc()
print("Check top now...")
readline()

And then run julia example.jl. You will see that julia never releases the 2GB.

Also, the current_task().last = current_task() workaround does not seem to help. Are there any other known workarounds?

@simonster Could you confirm on your end, and reopen this issue?

julia> versioninfo()
Julia Version 0.4.6
Commit 2e358ce (2016-06-19 17:16 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

@carnaval
Copy link
Contributor

Seems to be kept alive by a reference on the stack of another unrelated task (looks like the gc message one).
I'm assuming it's because of the way the scheduler works that a finished task can be referenced in the wait function in another task for a little while, although I'm not very familiar with this thing.
Still in vaccations so no time for deep investigation but I couldn't resist having a look :-)

@vtjnash vtjnash reopened this Jul 28, 2016
@vtjnash
Copy link
Member

vtjnash commented Jul 28, 2016

Not exactly the same issue as OP, but seems similar enough. We can repro at the repl with the following code:

julia> function f()
           t = @schedule nothing
           finalizer(t, t->Core.println("finalized t"))
           wait(t)
           nothing
       end
f (generic function with 1 method)

julia> begin
       t2 = @schedule wait()
       f()
       gc() # this should run the finalizer for t, but doesn't
       end

it'll run the finalizer if we do any of the following:

  • run schedule(t2); yield() (we don't care what t2 does afterwards, it doesn't need to exit)
  • set t2 = nothing
  • set t2.stkbuf = C_NULL; t2.ssize = 0

Edit: ah, I see. This task was started from the now-blocked task, so it lives in the local variable frame there. We need to instead provide a way to transfer ownership of this gc reference from the local frame to jl_yieldto.

@vtjnash vtjnash removed this from the 0.4.0 milestone Jul 28, 2016
vtjnash added a commit that referenced this issue Jul 13, 2017
Using an extra indirection through a Ref allows jl_switch_task to ensure that a task-switch
in wait() won't hold onto the Task in a gc-root after switching to it,
since popping a Task from the Workqueue shouldn't cause it to get a gc-root in
the current Task.

fix #6597
jeffwong pushed a commit to jeffwong/julia that referenced this issue Jul 24, 2017
Using an extra indirection through a Ref allows jl_switch_task to ensure that a task-switch
in wait() won't hold onto the Task in a gc-root after switching to it,
since popping a Task from the Workqueue shouldn't cause it to get a gc-root in
the current Task.

fix JuliaLang#6597
vtjnash added a commit that referenced this issue Aug 7, 2017
since save_stack can allocate, need to delay clearing the gc-root
for the new task until between the allocation and the stack copy

(bug introduced by the fix for #6597)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants