[Feat] Add new setting num_of_buckets_per_alloc from HKV bata 12. #433
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
What's new
Add new setting num_of_buckets_per_alloc from HKV bata 12. It might improve performance of memory access. And this feature also reduce unessential BFC reallocating information to user when CUDA OOM.
Try to prevent billion of HKV buckets allocating small piece memory which may make BFC allocator re-chunk frequently.
In beta 11, it might print more than 10,000 info. For now, only about 1,000.
Why choose 512
According to https://developer.nvidia.com/blog/improving-gpu-memory-oversubscription-performance/,
which said "In our experiments, a memory page is set to be 2 MB, which is the largest page size at which GPU MMU can operate." and "128-byte aligned access ensures that the CPU-GPU link and system DRAM are used efficiently. "
And one bucket in HKV is 2048+128 bytes. So a simple calculation, 2MB/(2048+128)B=963.76~=512. We choose 512 as a default num_of_buckets_per_alloc.
BFC allocator would create massive information like these:
Also [fix] Missing Bucketize class in DE keras horovod demo.
Type of change
Checklist:
How Has This Been Tested?
Run a big model with a big batch size when using HKV.