-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak? #66
Comments
Aren't you trying to allocate a 74 GB matrix (1e5 x 1e5 Float64s)? |
The main "mystery" is why the output allocation would succeed initially but then continue to consume memory as the array is used... which happens because the kernel cheats and doesn't actually allocate any memory for pages until they're used, at which point the kernel gives your process its own real RAM-backed page for that memory. |
@KristofferC, yes you are correct. My desktop has large resources, so I thought I could make it and since that initial allocation was working I was confused, but @StefanKarpinski explanation makes sense. Any of you have recommendations for handling this size problem? Splitting it up, map-reduce style? |
To be clear, do you have 100,000 variables with 500 observations, or 500 variables with 100,000 observations? IIRC this package is using a different convention than many other software and stores variables as rows. |
500 variables, 100,000 observations |
OK, so try with the transposed matrix. :-) |
I'm not sure I follow that. I need the pair-wise distance between observations.... and I believe this package uses column-to-column computations If distance is symmetric I could try computing only triangular matrix |
Yes, but what I mean is that maybe your EDIT: Sorry, I misread, so that's not a rows vs. columns issue. So the full matrix really needs |
Float32 is an idea, but before sacrificing precision... is it possible to work with memory-mapped matrices or something like that? |
Thanks all for answers. I'll close this issue, since it's not a leak problem. If anyone has any further recommendations, they're greatly appreciated |
See |
For this sort of computation, you generally need to figure out a clever way to avoid doing all the pairwise distance computations. It's hard to imagine that you need all of the pairs of distances, so there may be some way to avoid doing most of the computation. One probabilistic approach is to use locality-sensitive hashing to decide which pairs to look at (only the ones in the same hash bucket). There are various other exact and approximate approaches (see this wikipedia page for starters). |
Hi,
I'm computing pairwise distances for a large matrix (500 x 100,000) - the memory footprint keeps growing indefinitely. I am pre-allocating the output matrix so I don't think that should be happening. In fact, the process eventually get killed by the kernel (after using 60Gb+ of men)... I suspect a memory leak. The code I'm running looks something like
Any insights? I haven't profiled for memory leaks in Julia before, but I'll try to see if I can help with more specifics.
Thanks!
The text was updated successfully, but these errors were encountered: