[Done] Memory Management: Buddy Allocator #2674

gangliao · 2017-06-29T11:22:50Z

Please start review from here @wangkuiyi @typhoonzero

… cpu_mem

gangliao · 2017-06-30T05:40:24Z

paddle/memory/detail/system_allocator.cc

    munlock(p, size);
  }
  free(p);
 }

 #ifndef PADDLE_ONLY_CPU

-void* GPUAllocator::Alloc(size_t size) {
+void* GPUAllocator::Alloc(size_t& index, size_t size) {


in order to support fallback allocation when standard allocation failed, we need parameter index so that buddy allocator knows adopt which method to release the memory

What is index? I vaguely remember that in Majel, the index here is the device ID, but in our design, we have a GPUAllocator instance for each GPU?

because index devotes which system allocator been used.

… cpu_mem

* Free will be added soon

typhoonzero · 2017-07-11T05:41:55Z

paddle/memory/detail/buddy_allocator.cc

+      cache_(system_allocator->UseGpu()),
+      system_allocator_(std::move(system_allocator)) {}
+
+BuddyAllocator::~BuddyAllocator() {


Are there multiple instances of BuddyAllocator in one trainer?

https://github.com/PaddlePaddle/Paddle/pull/2674/files#diff-8d7a07775123d20061c8100dd8ed402dR52

BuddyAlloctor is singleton, it belongs to each GPU/CPU

@typhoonzero

typhoonzero · 2017-07-11T05:51:27Z

paddle/memory/detail/buddy_allocator.cc

+
+  // Allocate a new maximum sized block
+  size_t index = 0;
+  void* p = system_allocator_->Alloc(index, max_chunk_size_);


Should we allocate more than max_chunk_size_ if there's not enough in the pool_, so that allocated memory are continues, introducing less memory fragments. Or I don't know if max_chunk_size_ could be like 1G to do that?

yes, we can allocate chunk size bigger than the max_chunk_size_, but it will not be managed by buddy allocator. You can chek this line: https://github.com/PaddlePaddle/Paddle/pull/2674/files#diff-dd894d330dd6a0deb01afe3fe24b1752R59

@typhoonzero

Well I mean

shall we set max_chunk_size_ >= 1G so that alloc ops after will be faster.

or shall we alocate 10 * max_chunk_size_ in RefillPool for performance.

There are two situations in here.

For GPU, it's bad to specify max_chunk_size >= 1G or 10 * max_chunk_size_ . It's better to set max_chunk_size_ according the current device's resouce.

size_t GpuMaxChunkSize() { size_t total = 0; size_t available = 0; GpuMemoryUsage(available, total); // Reserving the rest memory for page tables, etc. size_t reserving = (1 - FLAGS_fraction_of_gpu_memory_to_use) * total; // If available less than minimum chunk size, no usable memory exists. available = std::max(available, GpuMinChunkSize()) - GpuMinChunkSize(); // If available less than reserving, no usable memory exists. size_t usable = std::max(available, reserving) - reserving; return usable; }

For CPU, again, too large memory chunk should not be managed by Buddy allocator, it‘s one-time usage.

size_t CpuMaxAllocSize() { // For distributed systems, it requires configuring and limiting // the fraction of memory to use. return FLAGS_fraction_of_cpu_memory_to_use * CpuTotalPhysicalMemory(); } size_t CpuMinChunkSize() { // Allow to allocate the minimum chunk size is 256 bytes. return 1 << 8; } size_t CpuMaxChunkSize() { // Allow to allocate the maximum chunk size is roughly 3% of CPU memory. return CpuMaxAllocSize() / 32; }

For 16GB node, 3% means roughly 500 MB, I think it's good enough.

FLAGS_fraction_of_cpu_memory_to_use is to expose to kubernetes.

Great explanation! I totally agree with you!

Maybe minimum chunk size of 4K is best for performance because default linux memory page size is 4K.

@typhoonzero That's a good idea. But 4k means 4096 bytes -> 1024 floats,
if we frequently allocate small chunks, like 256, 128, 32, 64 floats, any of them will be padding to 4K, is that waste memory?

Probably for CPU, using 4k. For GPU, maybe default 4k is not a good idea.

Yes, only for CPU.

… cpu_mem

typhoonzero · 2017-07-12T09:12:33Z

Do we need some unit test case to check the buddy allocator is correctly spliting and merging memroy blocks?

gangliao · 2017-07-12T12:45:11Z

@typhoonzero

TEST(BuddyAllocator, CPUMultAlloc) {
  paddle::platform::CPUPlace cpu;

  std::vector<void *> ps;
  ps.reserve(8);

  for (auto size : {256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304}) {
    ps.emplace_back(paddle::memory::Alloc(cpu, size));
  }

  for (auto p : ps) {
    paddle::memory::Free(cpu, p);
  }
}

69: [ RUN      ] BuddyAllocator.CPUMultAlloc
69: I0712 20:44:41.779470 2569728960 buddy_allocator.cc:55] Allocate 256 bytes from chunk size 512
69: I0712 20:44:41.779475 2569728960 buddy_allocator.cc:75] Allocation from existing memory block 0x115d70000 at address 0x115d70040
69: I0712 20:44:41.779479 2569728960 buddy_allocator.cc:240] Split block (0x115d70000, 536870912) into
69: I0712 20:44:41.779484 2569728960 buddy_allocator.cc:244] Left block (0x115d70000, 512)
69: I0712 20:44:41.779489 2569728960 buddy_allocator.cc:251] Insert right block (0x115d70200, 536870400)
69: I0712 20:44:41.779494 2569728960 buddy_allocator.cc:55] Allocate 1024 bytes from chunk size 1280
69: I0712 20:44:41.779496 2569728960 buddy_allocator.cc:75] Allocation from existing memory block 0x115d70200 at address 0x115d70240
69: I0712 20:44:41.779500 2569728960 buddy_allocator.cc:240] Split block (0x115d70200, 536870400) into
69: I0712 20:44:41.779505 2569728960 buddy_allocator.cc:244] Left block (0x115d70200, 1280)
69: I0712 20:44:41.779510 2569728960 buddy_allocator.cc:251] Insert right block (0x115d70700, 536869120)
69: I0712 20:44:41.779515 2569728960 buddy_allocator.cc:55] Allocate 4096 bytes from chunk size 4352
69: I0712 20:44:41.779517 2569728960 buddy_allocator.cc:75] Allocation from existing memory block 0x115d70700 at address 0x115d70740
69: I0712 20:44:41.779521 2569728960 buddy_allocator.cc:240] Split block (0x115d70700, 536869120) into
69: I0712 20:44:41.779525 2569728960 buddy_allocator.cc:244] Left block (0x115d70700, 4352)
69: I0712 20:44:41.779531 2569728960 buddy_allocator.cc:251] Insert right block (0x115d71800, 536864768)
69: I0712 20:44:41.779536 2569728960 buddy_allocator.cc:55] Allocate 16384 bytes from chunk size 16640
69: I0712 20:44:41.779538 2569728960 buddy_allocator.cc:75] Allocation from existing memory block 0x115d71800 at address 0x115d71840
69: I0712 20:44:41.779542 2569728960 buddy_allocator.cc:240] Split block (0x115d71800, 536864768) into
69: I0712 20:44:41.779548 2569728960 buddy_allocator.cc:244] Left block (0x115d71800, 16640)
69: I0712 20:44:41.779553 2569728960 buddy_allocator.cc:251] Insert right block (0x115d75900, 536848128)
69: I0712 20:44:41.779558 2569728960 buddy_allocator.cc:55] Allocate 65536 bytes from chunk size 65792
69: I0712 20:44:41.779561 2569728960 buddy_allocator.cc:75] Allocation from existing memory block 0x115d75900 at address 0x115d75940
69: I0712 20:44:41.779566 2569728960 buddy_allocator.cc:240] Split block (0x115d75900, 536848128) into
69: I0712 20:44:41.779572 2569728960 buddy_allocator.cc:244] Left block (0x115d75900, 65792)
69: I0712 20:44:41.779575 2569728960 buddy_allocator.cc:251] Insert right block (0x115d85a00, 536782336)
69: I0712 20:44:41.779580 2569728960 buddy_allocator.cc:55] Allocate 262144 bytes from chunk size 262400
69: I0712 20:44:41.779589 2569728960 buddy_allocator.cc:75] Allocation from existing memory block 0x115d85a00 at address 0x115d85a40
69: I0712 20:44:41.779593 2569728960 buddy_allocator.cc:240] Split block (0x115d85a00, 536782336) into
69: I0712 20:44:41.779600 2569728960 buddy_allocator.cc:244] Left block (0x115d85a00, 262400)
69: I0712 20:44:41.779604 2569728960 buddy_allocator.cc:251] Insert right block (0x115dc5b00, 536519936)
69: I0712 20:44:41.779609 2569728960 buddy_allocator.cc:55] Allocate 1048576 bytes from chunk size 1048832
69: I0712 20:44:41.779613 2569728960 buddy_allocator.cc:75] Allocation from existing memory block 0x115dc5b00 at address 0x115dc5b40
69: I0712 20:44:41.779616 2569728960 buddy_allocator.cc:240] Split block (0x115dc5b00, 536519936) into
69: I0712 20:44:41.779623 2569728960 buddy_allocator.cc:244] Left block (0x115dc5b00, 1048832)
69: I0712 20:44:41.779628 2569728960 buddy_allocator.cc:251] Insert right block (0x115ec5c00, 535471104)
69: I0712 20:44:41.779633 2569728960 buddy_allocator.cc:55] Allocate 4194304 bytes from chunk size 4194560
69: I0712 20:44:41.779636 2569728960 buddy_allocator.cc:75] Allocation from existing memory block 0x115ec5c00 at address 0x115ec5c40
69: I0712 20:44:41.779640 2569728960 buddy_allocator.cc:240] Split block (0x115ec5c00, 535471104) into
69: I0712 20:44:41.779647 2569728960 buddy_allocator.cc:244] Left block (0x115ec5c00, 4194560)
69: I0712 20:44:41.779651 2569728960 buddy_allocator.cc:251] Insert right block (0x1162c5d00, 531276544)
69: I0712 20:44:41.779656 2569728960 buddy_allocator.cc:94] Free from address 0x115d70000
69: I0712 20:44:41.779661 2569728960 buddy_allocator.cc:114] Merging this block 0x115d70000 with its right buddy 0x115d70200
69: I0712 20:44:41.779665 2569728960 buddy_allocator.cc:149] Inserting free block (0x115d70000, 512)
69: I0712 20:44:41.779670 2569728960 buddy_allocator.cc:94] Free from address 0x115d70200
69: I0712 20:44:41.779675 2569728960 buddy_allocator.cc:114] Merging this block 0x115d70200 with its right buddy 0x115d70700
69: I0712 20:44:41.779678 2569728960 buddy_allocator.cc:132] Merging this block 0x115d70200 with its left buddy 0x115d70000
69: I0712 20:44:41.779685 2569728960 buddy_allocator.cc:149] Inserting free block (0x115d70000, 1792)
69: I0712 20:44:41.779688 2569728960 buddy_allocator.cc:94] Free from address 0x115d70700
69: I0712 20:44:41.779693 2569728960 buddy_allocator.cc:114] Merging this block 0x115d70700 with its right buddy 0x115d71800
69: I0712 20:44:41.779697 2569728960 buddy_allocator.cc:132] Merging this block 0x115d70700 with its left buddy 0x115d70000
69: I0712 20:44:41.779703 2569728960 buddy_allocator.cc:149] Inserting free block (0x115d70000, 6144)
69: I0712 20:44:41.779707 2569728960 buddy_allocator.cc:94] Free from address 0x115d71800
69: I0712 20:44:41.779711 2569728960 buddy_allocator.cc:114] Merging this block 0x115d71800 with its right buddy 0x115d75900
69: I0712 20:44:41.779716 2569728960 buddy_allocator.cc:132] Merging this block 0x115d71800 with its left buddy 0x115d70000
69: I0712 20:44:41.779721 2569728960 buddy_allocator.cc:149] Inserting free block (0x115d70000, 22784)
69: I0712 20:44:41.779726 2569728960 buddy_allocator.cc:94] Free from address 0x115d75900
69: I0712 20:44:41.779729 2569728960 buddy_allocator.cc:114] Merging this block 0x115d75900 with its right buddy 0x115d85a00
69: I0712 20:44:41.779733 2569728960 buddy_allocator.cc:132] Merging this block 0x115d75900 with its left buddy 0x115d70000
69: I0712 20:44:41.779739 2569728960 buddy_allocator.cc:149] Inserting free block (0x115d70000, 88576)
69: I0712 20:44:41.779743 2569728960 buddy_allocator.cc:94] Free from address 0x115d85a00
69: I0712 20:44:41.779747 2569728960 buddy_allocator.cc:114] Merging this block 0x115d85a00 with its right buddy 0x115dc5b00
69: I0712 20:44:41.779752 2569728960 buddy_allocator.cc:132] Merging this block 0x115d85a00 with its left buddy 0x115d70000
69: I0712 20:44:41.779757 2569728960 buddy_allocator.cc:149] Inserting free block (0x115d70000, 350976)
69: I0712 20:44:41.779762 2569728960 buddy_allocator.cc:94] Free from address 0x115dc5b00
69: I0712 20:44:41.779765 2569728960 buddy_allocator.cc:114] Merging this block 0x115dc5b00 with its right buddy 0x115ec5c00
69: I0712 20:44:41.779769 2569728960 buddy_allocator.cc:132] Merging this block 0x115dc5b00 with its left buddy 0x115d70000
69: I0712 20:44:41.779775 2569728960 buddy_allocator.cc:149] Inserting free block (0x115d70000, 1399808)
69: I0712 20:44:41.779779 2569728960 buddy_allocator.cc:94] Free from address 0x115ec5c00
69: I0712 20:44:41.779783 2569728960 buddy_allocator.cc:114] Merging this block 0x115ec5c00 with its right buddy 0x1162c5d00
69: I0712 20:44:41.779789 2569728960 buddy_allocator.cc:132] Merging this block 0x115ec5c00 with its left buddy 0x115d70000
69: I0712 20:44:41.779795 2569728960 buddy_allocator.cc:149] Inserting free block (0x115d70000, 536870912)
69: [       OK ] BuddyAllocator.CPUMultAlloc (0 ms)

typhoonzero · 2017-07-12T15:02:42Z

@gangliao I saw this test, I mean to check the right buddy size after split and check merged size after merge to ensure the allocator's internal behavior.

Well not sure whether this is needed.

gangliao · 2017-07-12T15:35:27Z

@typhoonzero Yeah, I guess this information already in here

gangliao · 2017-07-12T15:36:01Z

For instance,

Split:

69: I0712 20:44:41.779479 2569728960 buddy_allocator.cc:240] Split block (0x115d70000, 536870912) into
69: I0712 20:44:41.779484 2569728960 buddy_allocator.cc:244] Left block (0x115d70000, 512)
69: I0712 20:44:41.779489 2569728960 buddy_allocator.cc:251] Insert right block (0x115d70200, 536870400)

512 + 536870400 = 536870912

gangliao · 2017-07-12T15:37:43Z

It's hard to review this PR, maybe take a look at this page, it explained how it works.

jacquesqiao · 2017-07-12T23:20:16Z

The website looks great!

gangliao · 2017-07-14T04:50:12Z

TEST(BuddyAllocator, CPUMultAlloc) {
  paddle::platform::CPUPlace cpu;

  std::unordered_map<void *, size_t> ps;

  size_t total_size = paddle::memory::Used(cpu);
  EXPECT_EQ(total_size, 0UL);

  for (auto size :
       {128, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304}) {
    ps[paddle::memory::Alloc(cpu, size)] = size;

    // Buddy Allocator doesn't manage too large memory chunk
    if (paddle::memory::Used(cpu) == total_size) continue;

    size_t aligned_size = align(size, cpu);
    total_size += aligned_size;

    // check memory block is allocated and split correctly
    EXPECT_EQ(total_size, paddle::memory::Used(cpu));
  }

  for (auto p : ps) {
    // check each memory address is aligned
    EXPECT_EQ(is_aligned(p.first), true);
    paddle::memory::Free(cpu, p.first);

    // Buddy Allocator doesn't manage too large memory chunk
    if (paddle::memory::Used(cpu) == total_size) continue;

    size_t aligned_size = align(p.second, cpu);
    total_size -= aligned_size;

    // check memory block is free and merged correctly
    EXPECT_EQ(total_size, paddle::memory::Used(cpu));
  }
}

I updated the memory test as the above code snippet.

typhoonzero

The code LGTM!
Anyway I think we need another lgtm for this PR is important and big.

gangliao · 2017-07-14T15:12:35Z

This PR blocked my other work progress, so I will merge it first, any comments is welcome.

gangliao added 10 commits June 29, 2017 10:01

ENH: Add GPU throw error

0e6ddcc

ENH: Add Gpu info

d3b77a5

ENH: Add CPU info

b29923f

FIX: Improve fallback gpu allocator

169022d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

68ab1ef

… cpu_mem

ENH: Polish cpu info interface

e6c14f7

ENH: Add gpu info interface

6e7209f

FIX: fix typo in piece.h

464886b

ENH: count allocated fallback size for performance

26cd0bb

FIX: add compile dependency gflags

fb51c3d

gangliao commented Jun 30, 2017

View reviewed changes

gangliao added 12 commits June 30, 2017 17:28

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5ff172d

… cpu_mem

Merge remote-tracking branch 'paddlepaddle/develop' into cpu_mem

ec9e12a

FIX: yapf format version

275e5b7

ENH: Add useGpu in system allocator

89110fd

ENH: Add Metadata for memory block

929f9cb

ENH: Add Alloc for buddy Allocator

bbd3eab

* Free will be added soon

ENH: add buddy alloctor Free

4e1617d

ENH: code style

ff36389

Merge conflict

fb41350

Delete cmake in dynload

379434b

ENH: Add buddy allocator Free

0ba6347

ENH: Add paddle_memory for external usage

4dc3c9e

gangliao changed the title ~~[WIP] Basic CPU/GPU hardware memory info and allocation statistics~~ [WIP] Buddy Allocator Jul 5, 2017

gangliao added 6 commits July 5, 2017 16:33

FIX: glog dependency

d0ad031

FIX: Buddy Allocator Free with Merge feature

ada1c20

ENH: add memory unit test

7469178

FIX: code format

936cd1e

FIX: dynamic loader deps

5d2e8ed

FIX: merge static libs with propagation dependencies

3ad8e36

typhoonzero reviewed Jul 11, 2017

View reviewed changes

gangliao added 4 commits July 11, 2017 15:18

ENH: Add auto-free if allocate too much

d4017ca

FIX: clang-format

6a3b841

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f404282

… cpu_mem

FIX: merge conflicts

383b96f

gangliao changed the title ~~[WIP] Buddy Allocator~~ [Done] Buddy Allocator Jul 11, 2017

gangliao changed the title ~~[Done] Buddy Allocator~~ [Done] Memory Management: Buddy Allocator Jul 11, 2017

gangliao closed this Jul 12, 2017

gangliao reopened this Jul 12, 2017

gangliao added 5 commits July 13, 2017 14:26

ENH: Remove comments

ff98e3c

Add memory alignment test

00572aa

ENH: memory test: check alignment and memory size

ab5fe1e

Merge conflicts

365b457

Fix condition compile

21b7915

gangliao added 2 commits July 14, 2017 13:00

Fix: alignment metric

ea916c8

update

033523e

typhoonzero approved these changes Jul 14, 2017

View reviewed changes

gangliao requested a review from QiJune July 14, 2017 12:10

Follow comments

03b3d0d

gangliao merged commit 48cf64e into PaddlePaddle:develop Jul 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Done] Memory Management: Buddy Allocator #2674

[Done] Memory Management: Buddy Allocator #2674

gangliao commented Jun 29, 2017 •

edited

Loading

gangliao Jun 30, 2017 •

edited

Loading

wangkuiyi Jun 30, 2017

gangliao Jul 13, 2017

typhoonzero Jul 11, 2017

gangliao Jul 11, 2017 •

edited

Loading

typhoonzero Jul 11, 2017

gangliao Jul 11, 2017

typhoonzero Jul 11, 2017

gangliao Jul 13, 2017 •

edited

Loading

typhoonzero Jul 13, 2017

gangliao Jul 14, 2017 •

edited

Loading

gangliao Jul 14, 2017 •

edited

Loading

typhoonzero Jul 14, 2017

typhoonzero commented Jul 12, 2017

gangliao commented Jul 12, 2017 •

edited

Loading

typhoonzero commented Jul 12, 2017

gangliao commented Jul 12, 2017

gangliao commented Jul 12, 2017 •

edited

Loading

gangliao commented Jul 12, 2017

jacquesqiao commented Jul 12, 2017

gangliao commented Jul 14, 2017 •

edited

Loading

typhoonzero left a comment

gangliao commented Jul 14, 2017

[Done] Memory Management: Buddy Allocator #2674

[Done] Memory Management: Buddy Allocator #2674

Conversation

gangliao commented Jun 29, 2017 • edited Loading

gangliao Jun 30, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gangliao Jul 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gangliao Jul 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gangliao Jul 14, 2017 • edited Loading

Choose a reason for hiding this comment

gangliao Jul 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

typhoonzero commented Jul 12, 2017

gangliao commented Jul 12, 2017 • edited Loading

typhoonzero commented Jul 12, 2017

gangliao commented Jul 12, 2017

gangliao commented Jul 12, 2017 • edited Loading

gangliao commented Jul 12, 2017

jacquesqiao commented Jul 12, 2017

gangliao commented Jul 14, 2017 • edited Loading

typhoonzero left a comment

Choose a reason for hiding this comment

gangliao commented Jul 14, 2017

gangliao commented Jun 29, 2017 •

edited

Loading

gangliao Jun 30, 2017 •

edited

Loading

gangliao Jul 11, 2017 •

edited

Loading

gangliao Jul 13, 2017 •

edited

Loading

gangliao Jul 14, 2017 •

edited

Loading

gangliao Jul 14, 2017 •

edited

Loading

gangliao commented Jul 12, 2017 •

edited

Loading

gangliao commented Jul 12, 2017 •

edited

Loading

gangliao commented Jul 14, 2017 •

edited

Loading