-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterating a SortedDict causes memory allocation #93
Comments
@cberzan, thanks for the issue. @StephenVavasis might have some ideas, since he wrote the code. Iteration in Julia is done with I just took a quick look, and it looks like an |
I'll admit that I don't have a clear understanding of Julia's memory allocation behavior. In C/C++, if a loop body allocates a stack variable, e.g.,
} then the code does not pay a price for allocating j on every iteration; the compiler reserves a spot on the stack for j once and for all at the beginning of the loop. How about Julia? If a loop body creates a fixed-size variable (e.g. an Int or small immutable object), is there memory allocation on each iteration? If so, how does one reduce or minimize this overhead? By the way, the reason for nexthelper is that the code supports a large number of looping constructs. Disjoint start/next/done routines for each possible loop construct would have resulted in a lot of repetitive code, so I tried to abstract out as much commonality as I could between the various start/next/done routines. -- Steve Vavasis |
@StephenVavasis This is true of Julia when variables are "bits types" (which you can check at the command line using |
@simonster, thanks for the explanation. Just so everyone understands the issue: a 'token' is an immutable object that contains two fields: an integer (wrapped in another single-field immutable called a semitoken) and a pointer to the container. Right now, I don't see how to fix the problem raised by @cberzan. The issue is that the iteration state somehow needs to remember the sorted container over which the iteration is taking place, with a pointer I suppose. If somehow I could split the iteration state into two pieces, one part that is not reallocated on each loop iteration and contains the pointer, and another part that contains the index information (Ints), then everything would be OK. But I don't see how to make this happen in the start/next/done framework of Julia: the state of the loop apparently needs to be recreated on each iteration, and there is no place to hide some permanent state information. @cberzan, you could improve your performance by calling the lower-level non-exported routine nextloc0 from balancedTree.jl instead of using the documented container loops, but this would make your code fragile this because I or someone else might rewrite these routines in a future release. Possibly I should expose these inner routines via a documented interface? Question for @simonster: is there any plan to update the Julia core so that fixed-sized objects go on the stack instead of the heap? How far in the future is that? -- Steve |
By the way, does anyone know whether the built-in Dict suffers from this issue (that iteration with start/next/done requires heap allocation for the iteration state)? I found the relevant code in dict.jl difficult to understand because of calls to C-routines and the use of 'reinterpret'. |
It appears that iterating a built-in Dict also causes memory allocation.
gives me The |
@StephenVavasis Do you need to store the container in the state? The object being iterated over is passed as the first argument to There are plans to allocate objects on the stack when references cannot escape to the heap (JuliaLang/julia#8134) but we're not there yet. @cberzan For a |
I implemented this change in the branch SortedContainers. I did not yet create a pull request for this commit (there is a pull request for an earlier commit on this branch), but I did push the commit to my personal repository: https://github.com/StephenVavasis/DataStructures.jl, branch SortedContainers. In more detail, the change implemented is as follows. SCIterationState now holds two integers (instead of two integers and a pointer). The various non-token iterations do not construct tokens any more but only semitokens (Ints). I saw a small improvement (0-10%) in my own timing tests. I am doubtful that these changes will help @cberzan 's timing tests much either because of the structure of his tests (keys are strings; iteration object constructed many times), but if he wishes to try, he can carry out a Pkg.clone(..) and Pkg.checkout(..) from my repository. I think these changes might lead to a more substantial improvement in a loop over a container with many entries (instead of just 2) and in which the keys and values are bits-types (rather than strings), but I don't have a test like that yet. |
I ran a test with integer key/value pairs and with a large SortedDict using my latest code (that removes the unnecessary pointers from SCIterationState and nexthelper) versus an older version, and I observed a speedup of a factor of 3! So this is a very useful improvement. Thanks to @cberzan for pointing out the performance issue and to @simonster for explaining the issue. My code is below. I tested it with the statement: @time sumsquares1.sumsquares(80000,10000)
module sumsquares1
import DataStructures
using DataStructures.SortedDict
function sumsquares(m,n)
u = SortedDict(Dict{Int,Int}())
for j = 1 : m
u[j] = j*j
end
for trial = 1 : n
sum1 = 0
for item in u
sum1 += item[2]
end
@assert sum1 == m * (m+1) * (2m+1) / 6
end
end
end Edit (kms): Formatting. |
@StephenVavasis Thanks; that was fast! Using your branch, I'm seeing a ~30% speedup in my code, and ~40% reduction in bytes allocated. (I meant my actual code, not the examples posted in this issue.) Also thanks to @simonster for pointing out that some tuples are bits types and some are not. I will try to rewrite my code to use bits types so that iterating dicts can be done without heap allocation. |
Dear Readers, Recall the issue with tokens in the SortedDict data structure: they are fixed-size immutable objects consisting of a wrapped Int and a pointer to a SortedDict, but nonetheless they are allocated on the heap and thus cause a significant loss of performance. I carried out more experimenting and discovered that if I define tokens to be tuples of (pointer,int) instead of a two-field immutable, then this performance problem goes away. See the example code below. A few questions about this: (1) Is it a good idea? Does it lead to readable code? It would mean that the old token interface would have to be deprecated in favor of tokens as tuples. (2) Can anyone explain why tuples do not suffer from the same problem as immutables? I would like to understand the issue to make sure I don't accidentally step in another pitfall. Thanks, The following code creates a sorted dict in which they key i (an integer) is associated with the value i^2 (also an integer). Then it iterates over the sorted dict repeatedly and sums the squares using tokens explicitly (instead of a for-loop). This code uses the proposed new token interface in which a token is a tuple (pointer, IntSemiToken): Timing tests indicate that this code is much faster and allocates far less memory than the old token code in which a token is a two-field immutable. function ssq4(m,n)
u = SortedDict(Dict{Int,Int}())
for j = 1 : m
u[j] = j*j
end
for trial = 1 : n
sum1 = 0
(u,s) = startof(u)
while !isequal((u,s), pastendtoken(u))
sum1 += deref_value((u,s))
(u,s) = advance((u,s))
end
@assert sum1 == m * (m+1) * (2m+1) / 6
end
end Edit (kms): added backticks. |
Sorry to follow up on my own post, but here is some additional weirdness with using tuples: if I give the tuple a name t in the above code, then all of a sudden the run-time jumps by 50% and the memory allocation sky-rockets. Why? Here is the same code in which the tuple has a name:: function ssq5(m,n)
u = SortedDict(Dict{Int,Int}())
for j = 1 : m
u[j] = j*j
end
for trial = 1 : n
sum1 = 0
t = startof(u)
while !isequal(t, pastendtoken(u))
sum1 += deref_value(t)
t = advance(t)
end
@assert sum1 == m * (m+1) * (2m+1) / 6
end
end Edit (kms): added backticks. |
@StephenVavasis, you should add triple backticks around code that you paste into github, so that it is formatted properly, but also because unquoted macros will send a message to users with the same name (in this case, the (Bonus: If you use ```julia as the first line, you'll get syntax highlighting.) |
I can't claim that I understand the intricacies of the SortedDict. However, as @simonster suggested above, is there a reason to carry around the dict in the state? For that the other argument of the iterator functions should be used. For instance the normal Dict iterator functions do not carry a reference to the dict in the state as that reference in passed in the first argument: start(t::Dict) = skip_deleted(t, 1)
done(t::Dict, i) = i > length(t.vals)
next(t::Dict, i) = ((t.keys[i],t.vals[i]), skip_deleted(t,i+1))
function skip_deleted(h::Dict, i)
L = length(h.slots)
while i<=L && !isslotfilled(h,i)
i += 1
end
return i
end Translated to your code this might look like: start(m::SortedDict) = (nextloc0(m.bt,1), 2)
function next(m::SortedDict, state)
dt, t, ni = nexthelper(m,state)
(dt.k, dt.d), ni
end
done(m::SortedDict, state) = state[1] == state[2]
function nexthelper(m::SortedDict, state)
sn = state[1]
(sn < 3 || !(sn in m.bt.useddatacells)) && throw(BoundsError())
m.bt.data[sn], sdtoken_construct(m,sn), (nextloc0(m.bt, sn), state[2])
end Note how immutable KeyIterator{T<:Associative}
dict::T
end
immutable ValueIterator{T<:Associative}
dict::T
end
start(v::Union(KeyIterator,ValueIterator)) = start(v.dict)
done(v::Union(KeyIterator,ValueIterator), state) = done(v.dict, state)
function next(v::KeyIterator, state)
n = next(v.dict, state)
n[1][1], n[2]
end
function next(v::ValueIterator, state)
n = next(v.dict, state)
n[1][2], n[2]
end So it iterates through the dict just like the ordinary iterator except in next it only returns the key or value. |
For ordinary for-loops, I have already fixed the start/done/next functions following @simonster 's suggestion so that the 'token' overhead issue goes away. The issue remains, however, in situations in which a token is explicitly needed by the user of the package. When a token is explicitly needed, then the question is whether the package should provide a token that indexes the item+container or just the item (in which case the user keeps track of the container). I have called the latter a 'semitoken' in my documentation. In some earlier rounds of discussion with @kmsquire, we decided that providing an index to an item+container would be more user-friendly and elegant. It is, for example, the way that iterators work in C++ standard containers. Of course, we didn't know about this performance overhead issue when we made that decision. So far I haven't heard from anyone who can explain the issue of why there is overhead with tuples if you bind a name to them (as opposed to binding names to their components), so I opened a discussion on that topic on the julia-dev newsgroup here: https://groups.google.com/forum/#!topic/julia-dev/uQ1DsGGSrD0 I'm not sure how to proceed with the sorted-container package development-- drop the token interface (because of performance problems) and provide only a semitoken interface? Provide both interfaces at the cost of more code and documentation complexity? Provide only the token interface (for elegance) and hope that the performance issue is fixed in a future version of Julia? |
Sorry, I somehow missed the updates in your SortedContainer branch (relevant bits are here: |
Closed via #787 |
Is there a way to iterate through a SortedDict without causing memory allocation? (I'm using a SortedDict in an inner loop, and most of the time is spent in the
next
function used for iterating.)Here is a small example:
I'm running Julia 0.3.4 with
--track-allocation=all
. I'm seeing allocations on thefor item in dict
line in my script, and in thenexthelper
function insortedDict.jl
. It seems that 320 bytes are allocated at every iteration.If you think this is doable and point me in the right direction, I will consider contributing a pull request :)
The text was updated successfully, but these errors were encountered: