Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all device memory freed #37

Closed
fwinter opened this issue Sep 30, 2011 · 10 comments
Closed

Not all device memory freed #37

fwinter opened this issue Sep 30, 2011 · 10 comments
Assignees

Comments

@fwinter
Copy link
Member

fwinter commented Sep 30, 2011

When using the QUDA clover inverter within Chroma, after the inversion some device memory areas remain allocated. This might be okay if QUDA was the only program part that accesses the GPU. However, there is work ongoing to extend QDP++ to use the GPU(s) as well. Thus when using the QDP++ extension along with QUDA in the same Chroma run, after exiting the QUDA inverter device memory remains allocated and can not be used in the remainder of Chroma, e.g. sink smear, hadspec, etc.

A thin CUDA layer inserted to QUDA provided for a dump of the allocation history made during QUDA Clover inverter:

0: 0x200300000 524288 1 blas_quda.cu:108
1: 0x200380000 1048576 1 blas_quda.cu:114
2: 0x200480000 1572864 1 blas_quda.cu:120

This refers to where cudaMalloc was called without calling cudaFree later.
(Master branch of QUDA pulled today, Sep 30 10am CET. Single GPU version.)

@maddyscientist
Copy link
Member

Thanks for reporting this Frank. I'll look into it this coming week.

We're currently doing a large rewrite of much of the library (removing all global variables) which will make it much easier to prevent memory leaks.

@ghost ghost assigned maddyscientist Oct 2, 2011
@maddyscientist
Copy link
Member

Ok, got around to looking at this. This memory is freed when endBlas() is called which is invoked when endQuda() is called. These buffers represent a small amount of storage used for reductions which should be of minor impact on calculations.

What are wanting? The option to be able to free some GPU memory, but not to do a complete endQuda()?

@fwinter
Copy link
Member Author

fwinter commented Oct 12, 2011

Thanks for looking into this. initQuda() is called at Chroma initialization time. And endQuda() in turn should be called at the end. That means some (small) amount of memory stays allocated during individual QUDA inversions. Which is fine in principle. What worries me a bit is device memory fragmentation. The objects that are allocated between individual inversions, i.e. with QDP++, are rather large (propagators, etc.). These objects need continuous memory regions and even small allocated fragments might make it impossible to allocate such an object. Thus memory is not optimally used. There are ways around this. One might think of having separate memory domains on the device for small and large resp. objects. But this is not implemented yet.

One workaround that occurs to me: Do you think its safe to call initBlas/endBlas each time an inversion starts/ends? If I understand you correctly this should make sure that these memory fractions are correctly freed before leaving the QUDA inverter.

@maddyscientist
Copy link
Member

Yes, it should be safe to call endBlas and then initials inbetween solvers. Of course things will go bad if endBlas is called and a solver is then called.........

@fwinter
Copy link
Member Author

fwinter commented Oct 12, 2011

Just realized that its not so straight-forward to call endBlas from Chroma. There are name clashes. E.g. "Complex" is defined in QDP and aliased to global namespace as you have a "Complex" type as well...

@fwinter fwinter closed this as completed Oct 12, 2011
@fwinter fwinter reopened this Oct 12, 2011
@maddyscientist
Copy link
Member

Ok, this has motivated me to do something I've been planning for a while: to create a quda namespace. For a first step all I have done is moved the blas creation / destroy functions into the namespace, e.g., quda::initBlas, etc.

This is pushed to master. Can you tell me what conflicts you have remaining, and I'll make the necessary changes to fix this?

I won't move everything into the namespace quite yet, as it would take too long. This will be an evolutionary process....

@fwinter
Copy link
Member Author

fwinter commented Oct 13, 2011

blas_cuda.h still uses "Complex" from global namespace. If you could move this declaration to your new namespace we should be fine.

@maddyscientist
Copy link
Member

I've moved this to the namespace now (commit 3de6e8f). Hopefully this closes this issue.

@fwinter
Copy link
Member Author

fwinter commented Oct 13, 2011

No more name clashes! Now (with endBlas/initBlas), segfault in invertQuda. It seems its not safe to call endBlas and then initBlas again and hoping everything stays fine. I investigated this further: (At Chroma/init) calling just initQuda and nothing else works fine. But, calling initQuda; endBlas(); initBlas(); crashes then in invertQuda. Any other sideeffects? Here the backtrace (no debug symbols):

#0 0x00007fffe92e2130 in ?? () from /usr/lib/libcuda.so
#1 0x00007fffe92baeae in ?? () from /usr/lib/libcuda.so
#2 0x00007fffe92c691b in ?? () from /usr/lib/libcuda.so
#3 0x00007fffe92bce9c in ?? () from /usr/lib/libcuda.so
#4 0x00007fffe929ce41 in ?? () from /usr/lib/libcuda.so
#5 0x00007fffe92a0bc8 in ?? () from /usr/lib/libcuda.so
#6 0x00007fffe9293244 in ?? () from /usr/lib/libcuda.so
#7 0x00007ffff784ade2 in ?? () from /usr/local/cuda/lib64/libcudart.so.4
#8 0x00007ffff786e824 in cudaMemcpy () from /usr/local/cuda/lib64/libcudart.so.4
#9 0x0000000001f00f48 in normCuda(cudaColorSpinorField const&) ()
#10 0x0000000001e4dd7a in invertQuda ()

@fwinter
Copy link
Member Author

fwinter commented Oct 19, 2011

That fixed it: No more memory leaks now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants