Not all device memory freed #37

fwinter · 2011-09-30T10:18:27Z

When using the QUDA clover inverter within Chroma, after the inversion some device memory areas remain allocated. This might be okay if QUDA was the only program part that accesses the GPU. However, there is work ongoing to extend QDP++ to use the GPU(s) as well. Thus when using the QDP++ extension along with QUDA in the same Chroma run, after exiting the QUDA inverter device memory remains allocated and can not be used in the remainder of Chroma, e.g. sink smear, hadspec, etc.

A thin CUDA layer inserted to QUDA provided for a dump of the allocation history made during QUDA Clover inverter:

0: 0x200300000 524288 1 blas_quda.cu:108
1: 0x200380000 1048576 1 blas_quda.cu:114
2: 0x200480000 1572864 1 blas_quda.cu:120

This refers to where cudaMalloc was called without calling cudaFree later.
(Master branch of QUDA pulled today, Sep 30 10am CET. Single GPU version.)

maddyscientist · 2011-10-02T15:12:28Z

Thanks for reporting this Frank. I'll look into it this coming week.

We're currently doing a large rewrite of much of the library (removing all global variables) which will make it much easier to prevent memory leaks.

maddyscientist · 2011-10-12T04:30:09Z

Ok, got around to looking at this. This memory is freed when endBlas() is called which is invoked when endQuda() is called. These buffers represent a small amount of storage used for reductions which should be of minor impact on calculations.

What are wanting? The option to be able to free some GPU memory, but not to do a complete endQuda()?

fwinter · 2011-10-12T10:12:37Z

Thanks for looking into this. initQuda() is called at Chroma initialization time. And endQuda() in turn should be called at the end. That means some (small) amount of memory stays allocated during individual QUDA inversions. Which is fine in principle. What worries me a bit is device memory fragmentation. The objects that are allocated between individual inversions, i.e. with QDP++, are rather large (propagators, etc.). These objects need continuous memory regions and even small allocated fragments might make it impossible to allocate such an object. Thus memory is not optimally used. There are ways around this. One might think of having separate memory domains on the device for small and large resp. objects. But this is not implemented yet.

One workaround that occurs to me: Do you think its safe to call initBlas/endBlas each time an inversion starts/ends? If I understand you correctly this should make sure that these memory fractions are correctly freed before leaving the QUDA inverter.

maddyscientist · 2011-10-12T12:08:50Z

Yes, it should be safe to call endBlas and then initials inbetween solvers. Of course things will go bad if endBlas is called and a solver is then called.........

fwinter · 2011-10-12T13:46:01Z

Just realized that its not so straight-forward to call endBlas from Chroma. There are name clashes. E.g. "Complex" is defined in QDP and aliased to global namespace as you have a "Complex" type as well...

maddyscientist · 2011-10-12T17:39:23Z

Ok, this has motivated me to do something I've been planning for a while: to create a quda namespace. For a first step all I have done is moved the blas creation / destroy functions into the namespace, e.g., quda::initBlas, etc.

This is pushed to master. Can you tell me what conflicts you have remaining, and I'll make the necessary changes to fix this?

I won't move everything into the namespace quite yet, as it would take too long. This will be an evolutionary process....

fwinter · 2011-10-13T09:16:04Z

blas_cuda.h still uses "Complex" from global namespace. If you could move this declaration to your new namespace we should be fine.

maddyscientist · 2011-10-13T15:14:12Z

I've moved this to the namespace now (commit 3de6e8f). Hopefully this closes this issue.

fwinter · 2011-10-13T17:43:53Z

No more name clashes! Now (with endBlas/initBlas), segfault in invertQuda. It seems its not safe to call endBlas and then initBlas again and hoping everything stays fine. I investigated this further: (At Chroma/init) calling just initQuda and nothing else works fine. But, calling initQuda; endBlas(); initBlas(); crashes then in invertQuda. Any other sideeffects? Here the backtrace (no debug symbols):

#0 0x00007fffe92e2130 in ?? () from /usr/lib/libcuda.so
#1 0x00007fffe92baeae in ?? () from /usr/lib/libcuda.so
#2 0x00007fffe92c691b in ?? () from /usr/lib/libcuda.so
#3 0x00007fffe92bce9c in ?? () from /usr/lib/libcuda.so
#4 0x00007fffe929ce41 in ?? () from /usr/lib/libcuda.so
#5 0x00007fffe92a0bc8 in ?? () from /usr/lib/libcuda.so
#6 0x00007fffe9293244 in ?? () from /usr/lib/libcuda.so
#7 0x00007ffff784ade2 in ?? () from /usr/local/cuda/lib64/libcudart.so.4
#8 0x00007ffff786e824 in cudaMemcpy () from /usr/local/cuda/lib64/libcudart.so.4
#9 0x0000000001f00f48 in normCuda(cudaColorSpinorField const&) ()
#10 0x0000000001e4dd7a in invertQuda ()

fwinter · 2011-10-19T13:31:20Z

That fixed it: No more memory leaks now!

ghost assigned maddyscientist Oct 2, 2011

fwinter closed this as completed Oct 12, 2011

fwinter reopened this Oct 12, 2011

maddyscientist closed this as completed in 981e1bf Oct 18, 2011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not all device memory freed #37

Not all device memory freed #37

fwinter commented Sep 30, 2011

maddyscientist commented Oct 2, 2011

maddyscientist commented Oct 12, 2011

fwinter commented Oct 12, 2011

maddyscientist commented Oct 12, 2011

fwinter commented Oct 12, 2011

maddyscientist commented Oct 12, 2011

fwinter commented Oct 13, 2011

maddyscientist commented Oct 13, 2011

fwinter commented Oct 13, 2011

fwinter commented Oct 19, 2011

Not all device memory freed #37

Not all device memory freed #37

Comments

fwinter commented Sep 30, 2011

maddyscientist commented Oct 2, 2011

maddyscientist commented Oct 12, 2011

fwinter commented Oct 12, 2011

maddyscientist commented Oct 12, 2011

fwinter commented Oct 12, 2011

maddyscientist commented Oct 12, 2011

fwinter commented Oct 13, 2011

maddyscientist commented Oct 13, 2011

fwinter commented Oct 13, 2011

fwinter commented Oct 19, 2011