[Feat] Add new setting num_of_buckets_per_alloc from HKV bata 12. #433

MoFHeka · 2024-06-14T19:42:37Z

Description

What's new

Add new setting num_of_buckets_per_alloc from HKV bata 12. It might improve performance of memory access. And this feature also reduce unessential BFC reallocating information to user when CUDA OOM.
Try to prevent billion of HKV buckets allocating small piece memory which may make BFC allocator re-chunk frequently.
In beta 11, it might print more than 10,000 info. For now, only about 1,000.

Why choose 512

According to https://developer.nvidia.com/blog/improving-gpu-memory-oversubscription-performance/,
which said "In our experiments, a memory page is set to be 2 MB, which is the largest page size at which GPU MMU can operate." and "128-byte aligned access ensures that the CPU-GPU link and system DRAM are used efficiently. "
And one bucket in HKV is 2048+128 bytes. So a simple calculation, 2MB/(2048+128)B=963.76~=512. We choose 512 as a default num_of_buckets_per_alloc.

BFC allocator would create massive information like these:

2024-06-04 00:47:52.356200: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d827a00 of size 2304 next 24266
2024-06-04 00:47:52.356203: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d828300 of size 2304 next 24267
2024-06-04 00:47:52.356207: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d828c00 of size 2304 next 24268
2024-06-04 00:47:52.356210: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d829500 of size 2304 next 24269
2024-06-04 00:47:52.356214: I tensorflow/core/common_runtime/bfc_allocator.cc:1083] InUse at 7fad2d829e00 of size 2304 next 24270

Also [fix] Missing Bucketize class in DE keras horovod demo.

Type of change

Checklist:

I've properly formatted my code according to the guidelines
- By running yapf
- By running clang-format
This PR addresses an already submitted issue for TensorFlow Recommenders-Addons
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works

How Has This Been Tested?

Run a big model with a big batch size when using HKV.

It might improve performance of memory access. And this feature also reduce unessential BFC reallocating information to user when CUDA OOM.

rhdong · 2024-06-16T07:48:30Z

demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py

@@ -184,6 +184,25 @@ def embedding_out_split(embedding_out_concat, input_split_dims):
  return embedding_out


+class Bucketize(tf.keras.layers.Layer):


It looks unused

In here

recommenders-addons/demo/dynamic_embedding/movielens-1m-keras-with-horovod/movielens-1m-keras-with-horovod.py

Line 375 in 743cc4b

input_tensor = Bucketize(

rhdong

LGTM

MoFHeka added 2 commits June 15, 2024 03:34

[feat] Add new setting num_of_buckets_per_alloc from HKV bata 12.

9a55bed

It might improve performance of memory access. And this feature also reduce unessential BFC reallocating information to user when CUDA OOM.

[fix] Missing Bucketize class in DE keras horovod demo.

743cc4b

MoFHeka requested a review from rhdong as a code owner June 14, 2024 19:42

rhdong reviewed Jun 16, 2024

View reviewed changes

rhdong approved these changes Jun 17, 2024

View reviewed changes

rhdong merged commit abfb531 into tensorflow:master Jun 17, 2024
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add new setting num_of_buckets_per_alloc from HKV bata 12. #433

[Feat] Add new setting num_of_buckets_per_alloc from HKV bata 12. #433

MoFHeka commented Jun 14, 2024 •

edited

Loading

rhdong Jun 16, 2024 •

edited

Loading

MoFHeka Jun 16, 2024

rhdong left a comment

		@@ -184,6 +184,25 @@ def embedding_out_split(embedding_out_concat, input_split_dims):
		return embedding_out


		class Bucketize(tf.keras.layers.Layer):

[Feat] Add new setting num_of_buckets_per_alloc from HKV bata 12. #433

[Feat] Add new setting num_of_buckets_per_alloc from HKV bata 12. #433

Conversation

MoFHeka commented Jun 14, 2024 • edited Loading

Description

What's new

Why choose 512

Type of change

Checklist:

How Has This Been Tested?

rhdong Jun 16, 2024 • edited Loading

Choose a reason for hiding this comment

MoFHeka Jun 16, 2024

Choose a reason for hiding this comment

rhdong left a comment

Choose a reason for hiding this comment

MoFHeka commented Jun 14, 2024 •

edited

Loading

rhdong Jun 16, 2024 •

edited

Loading