-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in Base #50345
Comments
Doing it with forloops indeed does not lead to a memory leak [begin
forecast_samples = [randn(30) for i in 1:10_000]
sums = [sum([forecast_samples[s][i] for s in eachindex(forecast_samples)]) for i in 1:30]
GC.gc(); Base.gc_live_bytes() / 2^20
end for i in 1:50] this is then my workaround |
Could be a duplicate of #49545, where the GC decides to not do a full collection. |
|
true is the default, according to documentation, and setting it explicitly still has a memory leak of same size EDIT: Now I understood that you are saying one should run both! [begin
forecast_samples = [randn(30) for i in 1:10_000]
sums = vec(mapslices(sum, reduce(vcat, forecast_samples'), dims=1))
GC.gc(true); GC.gc(false); Base.gc_live_bytes() / 2^20
end for i in 1:50] |
I think the example can be reduced to a const forecast_samples = [randn(30) for i in 1:10_000];
const forecast_samples_adj = [randn(30) for i in 1:10_000]';
function f(arr)
reduce(vcat, arr)
GC.gc(true)
GC.gc(false)
return Base.gc_live_bytes() / 2^20
end without adjointjulia> [f(forecast_samples) for i in 1:50]
50-element Vector{Float64}:
10.236428260803223
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
⋮
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416
10.23690128326416 with adjointjulia> [f(forecast_samples_adj) for i in 1:50]
50-element Vector{Float64}:
10.919855117797852
11.124429702758789
11.328531265258789
11.532632827758789
11.736734390258789
11.940835952758789
12.144937515258789
12.349039077758789
12.553140640258789
12.757242202758789
12.961343765258789
13.165445327758789
13.369546890258789
13.573648452758789
13.777750015258789
13.981851577758789
14.185953140258789
⋮
17.85978126525879
18.06388282775879
18.26798439025879
18.47208595275879
18.67618751525879
18.88028907775879
19.08439064025879
19.28849220275879
19.49259376525879
19.69669532775879
19.90079689025879
20.10489845275879
20.30900001525879
20.51310157775879
20.71720314025879
20.92130470275879 |
I can reproduce the leak on master but I can't find what the difference is. The GC is finding it because it counts it as live but I'm not sure what is going on, the reduce function is leaking some memory somewhere. |
I can reproduce on Mac M2. @barucden I also noticed that the "without adjoint" version does also allocate, but only on the first iteration (perhaps the array itself?) julia> [f(forecast_samples) for i in 1:50] |> diff
49-element Vector{Float64}:
0.0004730224609375
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
⋮
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0 julia> [f(forecast_samples_adj) for i in 1:50] |> diff
49-element Vector{Float64}:
0.2045745849609375
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
⋮
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625
0.2041015625 julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 10 × Apple M2 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 1 on 6 virtual cores |
One Potential way of understanding this would be to use heapsnapshot and Google Chrome has the ability to diff two snapshots. |
So I did that, and I couldn't find stuff, though I might have been misunderstanding how the tool works. |
Run with different versions of Juliatest.jlconst forecast_samples_adj = [randn(30) for i in 1:10_000]';
function f(arr)
reduce(vcat, arr)
GC.gc(true)
GC.gc(false)
return Base.gc_live_bytes() / 2^20
end
a = [f(forecast_samples_adj) for i in 1:50]
println("START=$(a[1])\nEND =$(a[end])\n")
Run with valgrindtest scripttest with master 02f80c6, Julia Version 1.10.0-DEV.1607 gc.jl const forecast_samples_adj = [randn(30) for i in 1:10_000]';
function f(arr)
reduce(vcat, arr)
GC.gc(true)
GC.gc(false)
return Base.gc_live_bytes() / 2^20
end
count = parse(Int, ARGS[1]);
a = [f(forecast_samples_adj) for i in 1:count]
print("run=$count times\nSTART=$(a[1])\nEND =$(a[end])\n") valgrind output lograw logs: https://gist.github.com/inkydragon/12a26f5ab5acfd5fb93a76862ee493ca
The In the last test
Maybe julia did garbage collection before exiting? |
Any chance this could be an issue with the memory accounting code (and the GC not reporting live_bytes accurately)? FWIW, we've seen a case in #54275 recently and, internally, we've been struggling with memory accounting bugs at RAI (e.g. negative live_bytes in some workloads). |
Hi there
I found a small example to reproduce a memory leak which kept me busy for several days. (Hoping to find a workaround soon).
Running the following in Julia REPL several times will show an increase in memory usage (about 200 KB per run).
For sure this example is not yet minimal, but at least super tiny.
(tested on Julia 1.9.0 and 1.9.1)
EDIT: Just realized that gc_live_bytes is quite noisy. Hence better do the above a couple of times. E.g.
The text was updated successfully, but these errors were encountered: