-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HSA] Implicitly resource leaking on kernelBufferMap #48
Comments
@yan-ming in hsa_async_copy branch the work is mostly about removing hsa_memory_copy() and swithc to hsa_amd_memory_copy_async() , so it's most likely be orthogonal with your proposed changes here. |
@whchung I see. Although the leaking here doesn't really hurt the performance, I'm still feeling it would be better if we can have this patch applied in HCC. Would it be okay for you if I propose a PR with the patch above to the |
sure thing. |
Hi @whchung,
I'm going through the HCC runtime to seek other chances to improve the overall performance for the porting applications. After some profiling experiments, I noticed that
LaunchKernelWithDynamicGroupMemoryAsync
somehow takes a portion of time and the cost ofstd::map
operations seems to be one main reason for that.By looking closer, I noticed that the map used for recording kernel-buffer dependency chain might take part here. In
HSAQueue::Push()
, map elements are created implicitly while there are twostd::for_each
calls to traverse the whole map, which takes O(map.size()) time complexity. The map will be released at a very late stage of HCC runtime.I came up with the following patch to erase those used map elements. This does reduce the map size but I didn't actually see obvious performance gain from my application. Perhaps we might need a bigger refactoring here on the kernel-buffer dependency handling.
I knew you have another ongoing
hsa_async_copy
branch to leverage the AMD APIs for async copy operations, wondering if you have any further plans for async kernels here.This won't bring regressions on my end.
The text was updated successfully, but these errors were encountered: