Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory grows without limit #4112

Closed
toktarev opened this issue Jul 11, 2018 · 189 comments
Closed

Memory grows without limit #4112

toktarev opened this issue Jul 11, 2018 · 189 comments
Labels
abandoned-or-aged-out waiting Waiting for a response from the issue creator.

Comments

@toktarev
Copy link

toktarev commented Jul 11, 2018

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://www.facebook.com/groups/rocksdb.dev

Expected behavior

Process consumes about 10 megabytes

Actual behavior

Memory grows without limit

Steps to reproduce the behavior

Run this code:

https://pastebin.com/Ch8RhsSB

Sorry RocksDB team, but it is huge problem.

This is trivial test and I expect it will work as finite state machine
Populate memory - flush - re-use memory.

I see like memory grows.

@toktarev
Copy link
Author

BTW: Block cache is totally disabled

@toktarev
Copy link
Author

toktarev commented Jul 11, 2018

#include <thread>
#include <cstdio>
#include <string>
#include <iostream>
#include <cinttypes>
#include "rocksdb/db.h"
#include <db/memtable.h>
#include "memtablefactory.h"
#include <memtable/inlineskiplist.h>

using namespace std;
using namespace rocksdb;
using namespace std::chrono;

std::string kDBPath = "/repos/rocksdata";

template<typename T>
T swap_endian(T u) {
static_assert(CHAR_BIT == 8, "CHAR_BIT != 8");
union {
T u;
unsigned char u8[sizeof(T)];
} source, dest;

source.u = u;

for (size_t k = 0; k < sizeof(T); k++)
    dest.u8[k] = source.u8[sizeof(T) - k - 1];

return dest.u;

}

rocksdb::TableFactory *makeDictionaryTableFactory() {
auto block_opts = rocksdb::BlockBasedTableOptions{};
block_opts.checksum = ChecksumType::kCRC32c;
block_opts.no_block_cache = true;
return rocksdb::NewBlockBasedTableFactory(block_opts);
}

int main() {
system("rm -rf /repos/rocksdata/*");

DB *db;
Options options;
// Optimize RocksDB. This is the easiest way to get RocksDB to perform well
//options.IncreaseParallelism();
//options.OptimizeLevelStyleCompaction();
// create the DB if it's not already present

options.create_if_missing = true;
options.db_write_buffer_size = 10 * 1024 * 1024;
options.compression = CompressionType::kNoCompression;
options.statistics = rocksdb::CreateDBStatistics();
options.write_buffer_size = 10 * 1024 * 1024;

// open DB
Status s = DB::Open(options, kDBPath, &db);

if (!s.ok()) {
    std::cout << s.ToString();
}

assert(s.ok());

ColumnFamilyOptions cf_options{};
cf_options.table_factory.reset(makeDictionaryTableFactory());
cf_options.prefix_extractor.reset(rocksdb::NewNoopTransform());
cf_options.memtable_prefix_bloom_size_ratio = 0;
cf_options.write_buffer_size = 10 * 1024 * 1024;

std::string name("Name");
ColumnFamilyHandle *cf;
Status status = db->CreateColumnFamily(cf_options, name, &cf);

assert(s.ok());

u_int64_t *buffer = new u_int64_t[4];
char *pointer = reinterpret_cast<char *>(buffer);
WriteBatch writeBatch{};
u_int64_t max = 10000000;

Slice key(pointer, 32);
Slice value(reinterpret_cast<char *>(&max), 8);

uint64_t begin = (uint64_t) std::chrono::duration_cast<std::chrono::nanoseconds>(
        system_clock::now().time_since_epoch()).count();

for (u_int64_t i = 0; i < 10000000000; i++) {
    *(buffer) = swap_endian(i);
    *(buffer + 1) = i + 1;
    *(buffer + 2) = i + 2;
    *(buffer + 3) = i + 3;

    writeBatch.Put(cf, key, value);

    if (i % 1000 == 0) {
        Status s1 = db->Write(WriteOptions(), &writeBatch);
        assert(s1.ok());
        writeBatch.Clear();
    }

    if (i % 1000000 == 0) {
        uint64_t end = (uint64_t) std::chrono::duration_cast<std::chrono::nanoseconds>(
                system_clock::now().time_since_epoch()).count();
        double time = (end - begin) / 1000000000;
        double delta = i / time;
        std::cout << "Speed=" << std::to_string(delta) << "\n\n";
    }
}

db->DestroyColumnFamilyHandle(cf);
delete db;
return 0;

}

@toktarev
Copy link
Author

Allocator tracker shows memory leak in:

rocksdb::BlockFetcher::ReadBlockContents

+0x00 pushq %rbp
+0x01 movq %rsp, %rbp
+0x04 pushq %r15
+0x06 pushq %r14
+0x08 pushq %r13
+0x0a pushq %r12
+0x0c pushq %rbx
+0x0d subq $392, %rsp
+0x14 movq %rsi, %rbx
+0x17 movq %rdi, %r14
+0x1a movq 160(%rbx), %rax
+0x21 movq 8(%rax), %rax
+0x25 movq %rax, 248(%rbx)
+0x2c movq %rbx, %rdi
+0x2f callq "rocksdb::BlockFetcher::TryGetUncompressBlockFromPersistentCache()"
+0x34 testb %al, %al
+0x36 je "rocksdb::BlockFetcher::ReadBlockContents()+0x46"
+0x38 vpxor %xmm0, %xmm0, %xmm0
+0x3c vmovdqu %xmm0, (%r14)
+0x41 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x841"
+0x46 movq 8(%rbx), %rdi
+0x4a testq %rdi, %rdi
+0x4d je "rocksdb::BlockFetcher::ReadBlockContents()+0xaa"
+0x4f movq 160(%rbx), %rax
+0x56 movq (%rax), %rsi
+0x59 movq 8(%rax), %rdx
+0x5d addq $5, %rdx
+0x61 leaq 224(%rbx), %rcx
+0x68 callq "rocksdb::FilePrefetchBuffer::TryReadFromCache(unsigned long long, unsigned long, rocksdb::Slice*) const"
+0x6d testb %al, %al
+0x6f je "rocksdb::BlockFetcher::ReadBlockContents()+0xaa"
+0x71 movq 160(%rbx), %rax
+0x78 movq 8(%rax), %rax
+0x7c movq %rax, 248(%rbx)
+0x83 movq %rbx, %rdi
+0x86 callq "rocksdb::BlockFetcher::CheckBlockChecksum()"
+0x8b movl 208(%rbx), %eax
+0x91 testl %eax, %eax
+0x93 jne "rocksdb::BlockFetcher::ReadBlockContents()+0xc1"
+0x95 movb $1, 5264(%rbx)
+0x9c movq 224(%rbx), %rax
+0xa3 movq %rax, 240(%rbx)
+0xaa cmpb $0, 5264(%rbx)
+0xb1 je "rocksdb::BlockFetcher::ReadBlockContents()+0xea"
+0xb3 movl 208(%rbx), %eax
+0xb9 testl %eax, %eax
+0xbb je "rocksdb::BlockFetcher::ReadBlockContents()+0x67f"
+0xc1 movl %eax, (%r14)
+0xc4 movl 212(%rbx), %eax
+0xca movl %eax, 4(%r14)
+0xce movq 216(%rbx), %rdi
+0xd5 xorl %eax, %eax
+0xd7 testq %rdi, %rdi
+0xda je "rocksdb::BlockFetcher::ReadBlockContents()+0xe1"
+0xdc callq "rocksdb::Status::CopyState(char const*)"
+0xe1 movq %rax, 8(%r14)
+0xe5 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x841"
+0xea movq %rbx, %rdi
+0xed callq "rocksdb::BlockFetcher::TryGetCompressedBlockFromPersistentCache()"
+0xf2 testb %al, %al
+0xf4 jne "rocksdb::BlockFetcher::ReadBlockContents()+0x67f"
+0xfa movq 248(%rbx), %rdi
+0x101 addq $5, %rdi
+0x105 cmpb $0, 184(%rbx)
+0x10c je "rocksdb::BlockFetcher::ReadBlockContents()+0x120"
+0x10e cmpq $4999, %rdi
+0x115 ja "rocksdb::BlockFetcher::ReadBlockContents()+0x120"
+0x117 leaq 264(%rbx), %rax
+0x11e jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x144"
+0x120 callq "DYLD-STUB$$operator new[](unsigned long)"
+0x125 movq 256(%rbx), %rdi
+0x12c movq %rax, 256(%rbx)
+0x133 testq %rdi, %rdi
+0x136 je "rocksdb::BlockFetcher::ReadBlockContents()+0x144"
+0x138 callq "DYLD-STUB$$operator delete"
+0x13d movq 256(%rbx), %rax
+0x144 movq %rax, 240(%rbx)
+0x14b leaq 1138286(%rip), %rdi
+0x152 callq (%rdi)
+0x154 movzbl (%rax), %eax
+0x157 movb $1, %r12b
+0x15a cmpl $3, %eax
+0x15d ja "rocksdb::BlockFetcher::ReadBlockContents()+0x171"
+0x15f movzbl %al, %eax
+0x162 cmpl $2, %eax
+0x165 seta %r12b
+0x169 xorl %r15d, %r15d
+0x16c cmpl $3, %eax
+0x16f jb "rocksdb::BlockFetcher::ReadBlockContents()+0x179"
+0x171 callq "rocksdb::Env::Default()"
+0x176 movq %rax, %r15
+0x179 xorl %eax, %eax
+0x17b testb %r12b, %r12b
+0x17e je "rocksdb::BlockFetcher::ReadBlockContents()+0x18f"
+0x180 movq (%r15), %rax
+0x183 movq 248(%rax), %rax
+0x18a movq %r15, %rdi
+0x18d callq %rax
+0x18f movq %rax, -424(%rbp)
+0x196 movq (%rbx), %rsi
+0x199 movq 160(%rbx), %rax
+0x1a0 movq (%rax), %rdx
+0x1a3 movq 248(%rbx), %rcx
+0x1aa addq $5, %rcx
+0x1ae leaq 224(%rbx), %r8
+0x1b5 movq 240(%rbx), %r9
+0x1bc leaq -56(%rbp), %r13
+0x1c0 movq %r13, %rdi
+0x1c3 callq "rocksdb::RandomAccessFileReader::Read(unsigned long long, unsigned long, rocksdb::Slice
, char
) const"
+0x1c8 leaq 208(%rbx), %r12
+0x1cf cmpq %r13, %r12
+0x1d2 je "rocksdb::BlockFetcher::ReadBlockContents()+0x223"
+0x1d4 movq -56(%rbp), %rax
+0x1d8 movl %eax, 208(%rbx)
+0x1de movl $0, -56(%rbp)
+0x1e5 shrq $32, %rax
+0x1e9 movl %eax, 212(%rbx)
+0x1ef movl $0, -52(%rbp)
+0x1f6 movq 216(%rbx), %rdi
+0x1fd testq %rdi, %rdi
+0x200 movq -424(%rbp), %r13
+0x207 je "rocksdb::BlockFetcher::ReadBlockContents()+0x20e"
+0x209 callq "DYLD-STUB$$operator delete"
+0x20e movq -48(%rbp), %rax
+0x212 movq %rax, 216(%rbx)
+0x219 movq $0, -48(%rbp)
+0x221 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x238"
+0x223 movq -48(%rbp), %rdi
+0x227 testq %rdi, %rdi
+0x22a movq -424(%rbp), %r13
+0x231 je "rocksdb::BlockFetcher::ReadBlockContents()+0x238"
+0x233 callq "DYLD-STUB$$operator delete"
+0x238 testq %r13, %r13
+0x23b je "rocksdb::BlockFetcher::ReadBlockContents()+0x25f"
+0x23d movq (%r15), %rax
+0x240 movq 248(%rax), %rax
+0x247 movq %r15, %rdi
+0x24a callq %rax
+0x24c movq %rax, %r15
+0x24f subq %r13, %r15
+0x252 leaq 1137999(%rip), %rdi
+0x259 callq (%rdi)
+0x25b addq %r15, 32(%rax)
+0x25f leaq 1138010(%rip), %rdi
+0x266 callq (%rdi)
+0x268 movzbl (%rax), %eax
+0x26b cmpl $1, %eax
+0x26e jbe "rocksdb::BlockFetcher::ReadBlockContents()+0x2a1"
+0x270 movq 248(%rbx), %r15
+0x277 addq $5, %r15
+0x27b leaq 1137958(%rip), %rdi
+0x282 callq (%rdi)
+0x284 movl $1, %ecx
+0x289 vmovq %rcx, %xmm0
+0x28e vmovq %r15, %xmm1
+0x293 vpunpcklqdq %xmm1, %xmm0, %xmm0
+0x297 vpaddq 16(%rax), %xmm0, %xmm0
+0x29c vmovdqu %xmm0, 16(%rax)
+0x2a1 movl (%r12), %eax
+0x2a5 testl %eax, %eax
+0x2a7 je "rocksdb::BlockFetcher::ReadBlockContents()+0x2d2"
+0x2a9 movl %eax, (%r14)
+0x2ac movl 212(%rbx), %eax
+0x2b2 movl %eax, 4(%r14)
+0x2b6 movq 216(%rbx), %rdi
+0x2bd xorl %eax, %eax
+0x2bf testq %rdi, %rdi
+0x2c2 je "rocksdb::BlockFetcher::ReadBlockContents()+0xe1"
+0x2c8 callq "rocksdb::Status::CopyState(char const
)"
+0x2cd jmp "rocksdb::BlockFetcher::ReadBlockContents()+0xe1"
+0x2d2 movq 248(%rbx), %rax
+0x2d9 addq $5, %rax
+0x2dd cmpq %rax, 232(%rbx)
+0x2e4 jne "rocksdb::BlockFetcher::ReadBlockContents()+0x323"
+0x2e6 movq %rbx, %rdi
+0x2e9 callq "rocksdb::BlockFetcher::CheckBlockChecksum()"
+0x2ee movl (%r12), %eax
+0x2f2 testl %eax, %eax
+0x2f4 je "rocksdb::BlockFetcher::ReadBlockContents()+0x637"
+0x2fa movl %eax, (%r14)
+0x2fd movl 212(%rbx), %eax
+0x303 movl %eax, 4(%r14)
+0x307 movq 216(%rbx), %rdi
+0x30e xorl %eax, %eax
+0x310 testq %rdi, %rdi
+0x313 je "rocksdb::BlockFetcher::ReadBlockContents()+0xe1"
+0x319 callq "rocksdb::Status::CopyState(char const
)"
+0x31e jmp "rocksdb::BlockFetcher::ReadBlockContents()+0xe1"
+0x323 movq (%rbx), %rsi
+0x326 addq $8, %rsi
+0x32a leaq -312(%rbp), %rdi
+0x331 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::basic_string(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&)"
+0x336 leaq 1004900(%rip), %rdx
+0x33d leaq -312(%rbp), %rdi
+0x344 xorl %esi, %esi
+0x346 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::insert(unsigned long, char const
)"
+0x34b movq 16(%rax), %rcx
+0x34f movq %rcx, -272(%rbp)
+0x356 vmovdqu (%rax), %xmm0
+0x35a vmovdqa %xmm0, -288(%rbp)
+0x362 vpxor %xmm0, %xmm0, %xmm0
+0x366 vmovdqu %xmm0, (%rax)
+0x36a movq $0, 16(%rax)
+0x372 leaq 1004639(%rip), %rsi
+0x379 leaq -288(%rbp), %rdi
+0x380 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::append(char const)"
+0x385 movq 16(%rax), %rcx
+0x389 movq %rcx, -240(%rbp)
+0x390 vmovdqu (%rax), %xmm0
+0x394 vmovdqa %xmm0, -256(%rbp)
+0x39c vpxor %xmm0, %xmm0, %xmm0
+0x3a0 vmovdqu %xmm0, (%rax)
+0x3a4 movq $0, 16(%rax)
+0x3ac movq 160(%rbx), %rax
+0x3b3 movq (%rax), %rsi
+0x3b6 leaq -336(%rbp), %rdi
+0x3bd callq "DYLD-STUB$$std::__1::to_string(unsigned long long)"
+0x3c2 movzbl -336(%rbp), %edx
+0x3c9 testb $1, %dl
+0x3cc jne "rocksdb::BlockFetcher::ReadBlockContents()+0x3da"
+0x3ce leaq -335(%rbp), %rsi
+0x3d5 shrq %rdx
+0x3d8 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x3e8"
+0x3da movq -328(%rbp), %rdx
+0x3e1 movq -320(%rbp), %rsi
+0x3e8 leaq -256(%rbp), %rdi
+0x3ef callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::append(char const*, unsigned long)"
+0x3f4 movq 16(%rax), %rcx
+0x3f8 movq %rcx, -208(%rbp)
+0x3ff vmovdqu (%rax), %xmm0
+0x403 vmovdqa %xmm0, -224(%rbp)
+0x40b vpxor %xmm0, %xmm0, %xmm0
+0x40f vmovdqu %xmm0, (%rax)
+0x413 movq $0, 16(%rax)
+0x41b leaq 1004698(%rip), %rsi
+0x422 leaq -224(%rbp), %rdi
+0x429 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::append(char const*)"
+0x42e movq 16(%rax), %rcx
+0x432 movq %rcx, -176(%rbp)
+0x439 vmovdqu (%rax), %xmm0
+0x43d vmovdqa %xmm0, -192(%rbp)
+0x445 vpxor %xmm0, %xmm0, %xmm0
+0x449 vmovdqu %xmm0, (%rax)
+0x44d movq $0, 16(%rax)
+0x455 movq 248(%rbx), %rsi
+0x45c addq $5, %rsi
+0x460 leaq -360(%rbp), %rdi
+0x467 callq "DYLD-STUB$$std::__1::to_string(unsigned long)"
+0x46c movzbl -360(%rbp), %edx
+0x473 testb $1, %dl
+0x476 jne "rocksdb::BlockFetcher::ReadBlockContents()+0x484"
+0x478 leaq -359(%rbp), %rsi
+0x47f shrq %rdx
+0x482 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x492"
+0x484 movq -352(%rbp), %rdx
+0x48b movq -344(%rbp), %rsi
+0x492 leaq -192(%rbp), %rdi
+0x499 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::append(char const*, unsigned long)"
+0x49e movq 16(%rax), %rcx
+0x4a2 movq %rcx, -144(%rbp)
+0x4a9 vmovdqu (%rax), %xmm0
+0x4ad vmovdqa %xmm0, -160(%rbp)
+0x4b5 vpxor %xmm0, %xmm0, %xmm0
+0x4b9 vmovdqu %xmm0, (%rax)
+0x4bd movq $0, 16(%rax)
+0x4c5 leaq 1004540(%rip), %rsi
+0x4cc leaq -160(%rbp), %rdi
+0x4d3 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::append(char const*)"
+0x4d8 movq 16(%rax), %rcx
+0x4dc movq %rcx, -112(%rbp)
+0x4e0 vmovdqu (%rax), %xmm0
+0x4e4 vmovdqa %xmm0, -128(%rbp)
+0x4e9 vpxor %xmm0, %xmm0, %xmm0
+0x4ed vmovdqu %xmm0, (%rax)
+0x4f1 movq $0, 16(%rax)
+0x4f9 movq 232(%rbx), %rsi
+0x500 leaq -384(%rbp), %rdi
+0x507 callq "DYLD-STUB$$std::__1::to_string(unsigned long)"
+0x50c movzbl -384(%rbp), %edx
+0x513 testb $1, %dl
+0x516 jne "rocksdb::BlockFetcher::ReadBlockContents()+0x524"
+0x518 leaq -383(%rbp), %rsi
+0x51f shrq %rdx
+0x522 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x532"
+0x524 movq -376(%rbp), %rdx
+0x52b movq -368(%rbp), %rsi
+0x532 leaq -128(%rbp), %rdi
+0x536 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::append(char const*, unsigned long)"
+0x53b movq 16(%rax), %rcx
+0x53f movq %rcx, -80(%rbp)
+0x543 vmovdqu (%rax), %xmm0
+0x547 vmovdqa %xmm0, -96(%rbp)
+0x54c vpxor %xmm0, %xmm0, %xmm0
+0x550 vmovdqu %xmm0, (%rax)
+0x554 movq $0, 16(%rax)
+0x55c movzbl -96(%rbp), %eax
+0x560 testb $1, %al
+0x562 jne "rocksdb::BlockFetcher::ReadBlockContents()+0x571"
+0x564 leaq -95(%rbp), %rcx
+0x568 movq %rcx, -72(%rbp)
+0x56c shrq %rax
+0x56f jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x57d"
+0x571 movq -80(%rbp), %rax
+0x575 movq %rax, -72(%rbp)
+0x579 movq -88(%rbp), %rax
+0x57d movq %rax, -64(%rbp)
+0x581 leaq 977072(%rip), %rax
+0x588 movq %rax, -400(%rbp)
+0x58f movq $0, -392(%rbp)
+0x59a leaq -72(%rbp), %rcx
+0x59e leaq -400(%rbp), %r8
+0x5a5 movl $2, %esi
+0x5aa xorl %edx, %edx
+0x5ac movq %r14, %rdi
+0x5af callq "rocksdb::Status::Status(rocksdb::Status::Code, rocksdb::Status::SubCode, rocksdb::Slice const&, rocksdb::Slice const&)"
+0x5b4 leaq -96(%rbp), %rdi
+0x5b8 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x5bd leaq -384(%rbp), %rdi
+0x5c4 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x5c9 leaq -128(%rbp), %rdi
+0x5cd callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x5d2 leaq -160(%rbp), %rdi
+0x5d9 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x5de leaq -360(%rbp), %rdi
+0x5e5 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x5ea leaq -192(%rbp), %rdi
+0x5f1 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x5f6 leaq -224(%rbp), %rdi
+0x5fd callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x602 leaq -336(%rbp), %rdi
+0x609 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x60e leaq -256(%rbp), %rdi
+0x615 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x61a leaq -288(%rbp), %rdi
+0x621 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x626 leaq -312(%rbp), %rdi
+0x62d callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x632 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x841"
+0x637 cmpb $0, 77(%rbx)
+0x63b je "rocksdb::BlockFetcher::ReadBlockContents()+0x67f"
+0x63d movq 200(%rbx), %rax
+0x644 movq 8(%rax), %rdi
+0x648 testq %rdi, %rdi
+0x64b je "rocksdb::BlockFetcher::ReadBlockContents()+0x67f"
+0x64d movq (%rdi), %rax
+0x650 movq 32(%rax), %rax
+0x654 callq %rax
+0x656 testb %al, %al
+0x658 je "rocksdb::BlockFetcher::ReadBlockContents()+0x67f"
+0x65a movq 160(%rbx), %rsi
+0x661 movq 200(%rbx), %rdi
+0x668 movq 240(%rbx), %rdx
+0x66f movq 248(%rbx), %rcx
+0x676 addq $5, %rcx
+0x67a callq "rocksdb::PersistentCacheHelper::InsertRawPage(rocksdb::PersistentCacheOptions const&, rocksdb::BlockHandle const&, char const
, unsigned long)"
+0x67f leaq 1136954(%rip), %rdi
+0x686 callq (%rdi)
+0x688 movzbl (%rax), %eax
+0x68b movb $1, %r13b
+0x68e cmpl $3, %eax
+0x691 ja "rocksdb::BlockFetcher::ReadBlockContents()+0x6a5"
+0x693 movzbl %al, %eax
+0x696 cmpl $2, %eax
+0x699 seta %r13b
+0x69d xorl %r15d, %r15d
+0x6a0 cmpl $3, %eax
+0x6a3 jb "rocksdb::BlockFetcher::ReadBlockContents()+0x6ad"
+0x6a5 callq "rocksdb::Env::Default()"
+0x6aa movq %rax, %r15
+0x6ad xorl %r12d, %r12d
+0x6b0 testb %r13b, %r13b
+0x6b3 je "rocksdb::BlockFetcher::ReadBlockContents()+0x6c7"
+0x6b5 movq (%r15), %rax
+0x6b8 movq 248(%rax), %rax
+0x6bf movq %r15, %rdi
+0x6c2 callq %rax
+0x6c4 movq %rax, %r12
+0x6c7 movq 224(%rbx), %rsi
+0x6ce movq 248(%rbx), %rdx
+0x6d5 movb (%rsi,%rdx), %al
+0x6d8 movb %al, 5265(%rbx)
+0x6de testb %al, %al
+0x6e0 je "rocksdb::BlockFetcher::ReadBlockContents()+0x786"
+0x6e6 movb 184(%rbx), %al
+0x6ec testb %al, %al
+0x6ee je "rocksdb::BlockFetcher::ReadBlockContents()+0x786"
+0x6f4 movq 16(%rbx), %rax
+0x6f8 movq 168(%rbx), %rcx
+0x6ff movl (%rax), %r8d
+0x702 movq 192(%rbx), %r9
+0x709 movq 176(%rbx), %rax
+0x710 movq %rax, (%rsp)
+0x714 leaq -416(%rbp), %r13
+0x71b movq %r13, %rdi
+0x71e callq "rocksdb::UncompressBlockContents(char const
, unsigned long, rocksdb::BlockContents
, unsigned int, rocksdb::Slice const&, rocksdb::ImmutableCFOptions const&)"
+0x723 leaq 208(%rbx), %rax
+0x72a cmpq %r13, %rax
+0x72d je "rocksdb::BlockFetcher::ReadBlockContents()+0x790"
+0x72f movq -416(%rbp), %rax
+0x736 movl %eax, 208(%rbx)
+0x73c movl $0, -416(%rbp)
+0x746 shrq $32, %rax
+0x74a movl %eax, 212(%rbx)
+0x750 movl $0, -412(%rbp)
+0x75a movq 216(%rbx), %rdi
+0x761 testq %rdi, %rdi
+0x764 je "rocksdb::BlockFetcher::ReadBlockContents()+0x76b"
+0x766 callq "DYLD-STUB$$operator delete"
+0x76b movq -408(%rbp), %rax
+0x772 movq %rax, 216(%rbx)
+0x779 movq $0, -408(%rbp)
+0x784 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x7a1"
+0x786 movq %rbx, %rdi
+0x789 callq "rocksdb::BlockFetcher::GetBlockContents()"
+0x78e jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x7a1"
+0x790 movq -408(%rbp), %rdi
+0x797 testq %rdi, %rdi
+0x79a je "rocksdb::BlockFetcher::ReadBlockContents()+0x7a1"
+0x79c callq "DYLD-STUB$$operator delete"
+0x7a1 cmpl $0, 208(%rbx)
+0x7a8 jne "rocksdb::BlockFetcher::ReadBlockContents()+0x7f0"
+0x7aa cmpb $0, 5264(%rbx)
+0x7b1 jne "rocksdb::BlockFetcher::ReadBlockContents()+0x7f0"
+0x7b3 cmpb $0, 77(%rbx)
+0x7b7 je "rocksdb::BlockFetcher::ReadBlockContents()+0x7f0"
+0x7b9 movq 200(%rbx), %rax
+0x7c0 movq 8(%rax), %rdi
+0x7c4 testq %rdi, %rdi
+0x7c7 je "rocksdb::BlockFetcher::ReadBlockContents()+0x7f0"
+0x7c9 movq (%rdi), %rax
+0x7cc movq 32(%rax), %rax
+0x7d0 callq %rax
+0x7d2 testb %al, %al
+0x7d4 jne "rocksdb::BlockFetcher::ReadBlockContents()+0x7f0"
+0x7d6 movq 200(%rbx), %rdi
+0x7dd movq 160(%rbx), %rsi
+0x7e4 movq 168(%rbx), %rdx
+0x7eb callq "rocksdb::PersistentCacheHelper::InsertUncompressedPage(rocksdb::PersistentCacheOptions const&, rocksdb::BlockHandle const&, rocksdb::BlockContents const&)"
+0x7f0 movl 208(%rbx), %eax
+0x7f6 movl %eax, (%r14)
+0x7f9 movl 212(%rbx), %eax
+0x7ff movl %eax, 4(%r14)
+0x803 movq 216(%rbx), %rdi
+0x80a xorl %eax, %eax
+0x80c testq %rdi, %rdi
+0x80f je "rocksdb::BlockFetcher::ReadBlockContents()+0x816"
+0x811 callq "rocksdb::Status::CopyState(char const
)"
+0x816 movq %rax, 8(%r14)
+0x81a testq %r12, %r12
+0x81d je "rocksdb::BlockFetcher::ReadBlockContents()+0x841"
+0x81f movq (%r15), %rax
+0x822 movq 248(%rax), %rax
+0x829 movq %r15, %rdi
+0x82c callq *%rax
+0x82e movq %rax, %rbx
+0x831 subq %r12, %rbx
+0x834 leaq 1136493(%rip), %rdi
+0x83b callq *(%rdi)
+0x83d addq %rbx, 48(%rax)
+0x841 movq %r14, %rax
+0x844 addq $392, %rsp
+0x84b popq %rbx
+0x84c popq %r12
+0x84e popq %r13
+0x850 popq %r14
+0x852 popq %r15
+0x854 popq %rbp
+0x855 retq
+0x856 movq %rax, %r13
+0x859 testq %r12, %r12
+0x85c je "rocksdb::BlockFetcher::ReadBlockContents()+0x8da"
+0x85e movq (%r15), %rax
+0x861 movq 248(%rax), %rax
+0x868 movq %r15, %rdi
+0x86b callq *%rax
+0x86d movq %rax, %rbx
+0x870 subq %r12, %rbx
+0x873 leaq 1136430(%rip), %rdi
+0x87a callq *(%rdi)
+0x87c addq %rbx, 48(%rax)
+0x880 movq %r13, %rdi
+0x883 callq "DYLD-STUB$$_Unwind_Resume"
+0x888 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x8d7"
+0x88a jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x8d7"
+0x88c movq %rax, %rdi
+0x88f callq "__clang_call_terminate"
+0x894 movq %rax, %r13
+0x897 cmpq $0, -424(%rbp)
+0x89f je "rocksdb::BlockFetcher::ReadBlockContents()+0x8da"
+0x8a1 movq (%r15), %rax
+0x8a4 movq 248(%rax), %rax
+0x8ab movq %r15, %rdi
+0x8ae callq *%rax
+0x8b0 movq %rax, %r14
+0x8b3 subq -424(%rbp), %r14
+0x8ba leaq 1136359(%rip), %rdi
+0x8c1 callq *(%rdi)
+0x8c3 addq %r14, 32(%rax)
+0x8c7 movq %r13, %rdi
+0x8ca callq "DYLD-STUB$$_Unwind_Resume"
+0x8cf movq %rax, %rdi
+0x8d2 callq "__clang_call_terminate"
+0x8d7 movq %rax, %r13
+0x8da movq %r13, %rdi
+0x8dd callq "DYLD-STUB$$_Unwind_Resume"
+0x8e2 movq %rax, %rdi
+0x8e5 callq "__clang_call_terminate"
+0x8ea movq %rax, %rdi
+0x8ed callq "__clang_call_terminate"
+0x8f2 movq %rax, %r13
+0x8f5 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x9a2"
+0x8fa movq %rax, %r13
+0x8fd jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x996"
+0x902 movq %rax, %r13
+0x905 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x98a"
+0x90a movq %rax, %r13
+0x90d jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x97e"
+0x90f movq %rax, %r13
+0x912 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x972"
+0x914 movq %rax, %r13
+0x917 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x966"
+0x919 movq %rax, %r13
+0x91c jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x95a"
+0x91e movq %rax, %r13
+0x921 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x94e"
+0x923 movq %rax, %r13
+0x926 jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x945"
+0x928 movq %rax, %r13
+0x92b jmp "rocksdb::BlockFetcher::ReadBlockContents()+0x939"
+0x92d movq %rax, %r13
+0x930 leaq -96(%rbp), %rdi
+0x934 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x939 leaq -384(%rbp), %rdi
+0x940 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x945 leaq -128(%rbp), %rdi
+0x949 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x94e leaq -160(%rbp), %rdi
+0x955 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x95a leaq -360(%rbp), %rdi
+0x961 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x966 leaq -192(%rbp), %rdi
+0x96d callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x972 leaq -224(%rbp), %rdi
+0x979 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x97e leaq -336(%rbp), %rdi
+0x985 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x98a leaq -256(%rbp), %rdi
+0x991 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x996 leaq -288(%rbp), %rdi
+0x99d callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x9a2 leaq -312(%rbp), %rdi
+0x9a9 callq "DYLD-STUB$$std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::~basic_string()"
+0x9ae movq %r13, %rdi
+0x9b1 callq "DYLD-STUB$$_Unwind_Resume"

@siying
Copy link
Contributor

siying commented Jul 11, 2018

Can you paste the full call stack of the allocation?

@toktarev
Copy link
Author

No, this is all information I have.

I am trying to find allocation in code as well.

@toktarev
Copy link
Author

void BlockFetcher::GetBlockContents() {
  if (slice_.data() != used_buf_) {
    // the slice content is not the buffer provided
    *contents_ = BlockContents(Slice(slice_.data(), block_size_), false,
                               compression_type);
  } else {
    // page is uncompressed, the buffer either stack or heap provided
    if (got_from_prefetch_buffer_ || used_buf_ == &stack_buf_[0]) {
      heap_buf_ = std::unique_ptr<char[]>(new char[block_size_]);   <------- This is a leak
      memcpy(heap_buf_.get(), used_buf_, block_size_);
    }
    *contents_ = BlockContents(std::move(heap_buf_), block_size_, true,
                               compression_type);
  }
}

@toktarev
Copy link
Author

I’ve found the reason of leak

this is not exactly a leak
but a lot of memory is allocated
but not released
const int table_cache_size = (mutable_db_options_.max_open_files == -1)
? TableCache::kInfiniteCapacity
: mutable_db_options_.max_open_files - 10;
table_cache_ = NewLRUCache(table_cache_size,
immutable_db_options_.table_cache_numshardbits);
all allocated records are stored in this cache
mutable_db_options_.max_open_files is equal 1
so table_cache_size= 4 mb

I set size of the cache to zero
no more memory grows
they store TableReader into this cache
TableReader keeps object Block
Block stores 350K of char *
due to all they are in cache Block is not released
and Rocks just writes new data to cache and allocate new blocks

@koldat
Copy link
Contributor

koldat commented Jul 16, 2018

Hi @siying . I am having exactly same issue. It is serious one. Cache is simply growing. Callstack using jemalloc See attachment:

out1250

leak

My configuration is:

2018/07/16-08:57:30.518138 7fa622ed4700            table_factory options:   flush_block_policy_factory: FlushBlockBySizePolicyFactory (0x7fa61f618b58)
  cache_index_and_filter_blocks: 0
  cache_index_and_filter_blocks_with_high_priority: 0
  pin_l0_filter_and_index_blocks_in_cache: 0
  index_type: 0
  hash_index_allow_collision: 1
  checksum: 1
  no_block_cache: 0
  block_cache: 0x7fa5f9b21ce0
  block_cache_name: LRUCache
  block_cache_options:
    capacity : 268435456
    num_shard_bits : 4
    strict_capacity_limit : 0
    high_pri_pool_ratio: 0.000
  block_cache_compressed: (nil)
  persistent_cache: (nil)
  block_size: 4096
  block_size_deviation: 10
  block_restart_interval: 16
  index_block_restart_interval: 1
  metadata_block_size: 4096
  partition_filters: 0
  use_delta_encoding: 1
  filter_policy: nullptr
  whole_key_filtering: 1
  verify_compression: 0
  read_amp_bytes_per_bit: 0
  format_version: 2
  enable_index_compression: 1

Is there some workaround without need to disable cache?

@koldat
Copy link
Contributor

koldat commented Jul 16, 2018

There is this bugfix message:
5.14.1 (6/20/2018)
Fix block-based table reader pinning blocks throughout its lifetime, causing memory usage increase.

Is it related to this issue? I was not able to find any change related to block table between 5.14.0 and 5.14.1. What changeset fixes this bug?

@toktarev
Copy link
Author

toktarev commented Jul 16, 2018

Executed the same benchmark on the latest RocksDB version
git checkout tags/v5.14.2 -b b14.

Still see memory grows

Bytes Used Count Symbol Name
188.32 MB 67.8% 36907 rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)
188.32 MB 67.8% 36900 rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)
188.15 MB 67.7% 34992 rocksdb::DBImpl::BackgroundCallFlush()
188.13 MB 67.7% 34876 rocksdb::DBImpl::BackgroundFlush(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*)
188.13 MB 67.7% 34876 rocksdb::DBImpl::FlushMemTableToOutputFile(rocksdb::ColumnFamilyData*, rocksdb::MutableCFOptions const&, bool*, rocksdb::JobContext*, rocksdb::LogBuffer*)
188.11 MB 67.7% 34865 rocksdb::FlushJob::Run(rocksdb::LogsWithPrepTracker*, rocksdb::FileMetaData*)
187.99 MB 67.7% 34818 rocksdb::FlushJob::WriteLevel0Table()
187.99 MB 67.7% 34815 rocksdb::BuildTable(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::TableCache*, rocksdb::InternalIterator*, std::__1::unique_ptr<rocksdb::InternalIterator, std::__1::default_deleterocksdb::InternalIterator >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::__1::vector<std::__1::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::__1::default_deleterocksdb::IntTblPropCollectorFactory >, std::__1::allocator<std::__1::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::__1::default_deleterocksdb::IntTblPropCollectorFactory > > > const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, std::__1::vector<unsigned long long, std::__1::allocator >, unsigned long long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long long, unsigned long long, rocksdb::Env::WriteLifeTimeHint)
187.99 MB 67.7% 34815 rocksdb::TableCache::NewIterator(rocksdb::ReadOptions const&, rocksdb::EnvOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::RangeDelAggregator*, rocksdb::TableReader**, rocksdb::HistogramImpl*, bool, rocksdb::Arena*, bool, int)
187.99 MB 67.7% 34815 rocksdb::TableCache::FindTable(rocksdb::EnvOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, rocksdb::Cache::Handle**, bool, bool, rocksdb::HistogramImpl*, bool, int, bool)
187.94 MB 67.6% 34182 rocksdb::TableCache::GetTableReader(rocksdb::EnvOptions const&, rocksdb::InternalKeyComparator const&, rocksdb::FileDescriptor const&, bool, unsigned long, bool, rocksdb::HistogramImpl*, std::__1::unique_ptr<rocksdb::TableReader, std::__1::default_deleterocksdb::TableReader >, bool, int, bool, bool)
187.82 MB 67.6% 31650 rocksdb::BlockBasedTableFactory::NewTableReader(rocksdb::TableReaderOptions const&, std::__1::unique_ptr<rocksdb::RandomAccessFileReader, std::__1::default_deleterocksdb::RandomAccessFileReader >&&, unsigned long long, std::__1::unique_ptr<rocksdb::TableReader, std::__1::default_deleterocksdb::TableReader >
, bool) const
187.82 MB 67.6% 31650 rocksdb::BlockBasedTable::Open(rocksdb::ImmutableCFOptions const&, rocksdb::EnvOptions const&, rocksdb::BlockBasedTableOptions const&, rocksdb::InternalKeyComparator const&, std::__1::unique_ptr<rocksdb::RandomAccessFileReader, std::__1::default_deleterocksdb::RandomAccessFileReader >&&, unsigned long long, std::__1::unique_ptr<rocksdb::TableReader, std::__1::default_deleterocksdb::TableReader >, bool, bool, int)
185.75 MB 66.8% 1899 rocksdb::BlockBasedTable::CreateIndexReader(rocksdb::FilePrefetchBuffer
, rocksdb::BlockBasedTable::IndexReader**, rocksdb::InternalIterator*, int)
185.75 MB 66.8% 1899 rocksdb::BinarySearchIndexReader::Create(rocksdb::RandomAccessFileReader*, rocksdb::FilePrefetchBuffer*, rocksdb::Footer const&, rocksdb::BlockHandle const&, rocksdb::ImmutableCFOptions const&, rocksdb::InternalKeyComparator const*, rocksdb::BlockBasedTable::IndexReader**, rocksdb::PersistentCacheOptions const&)
185.73 MB 66.8% 1266 rocksdb::(anonymous namespace)::ReadBlockFromFile(rocksdb::RandomAccessFileReader*, rocksdb::FilePrefetchBuffer*, rocksdb::Footer const&, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, std::__1::unique_ptr<rocksdb::Block, std::__1::default_deleterocksdb::Block >*, rocksdb::ImmutableCFOptions const&, bool, rocksdb::Slice const&, rocksdb::PersistentCacheOptions const&, unsigned long long, unsigned long)
185.68 MB 66.8% 633 rocksdb::BlockFetcher::ReadBlockContents()
185.68 MB 66.8% 633 operator new(unsigned long)

@koldat
Copy link
Contributor

koldat commented Jul 18, 2018

My solution was to use recommendation here: https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters (I have huge database and I require small memory footprint)

Table files are using cache driven by capacity of max_open_files. When I used big value it was growing in time a lot, because my index is around 5MB per file (and I had 5k files - 200MB each). When I used smaller value then query performance went down. Using two level index helps here, because direct impact in table cache is small and indices are cached in block cache that can be controlled. I have also increased block size which made my index even smaller.

@toktarev your test is not setting max_open_files thus it is infinite size (4M files). That is why it probably just grows. Try to set that max_open_files=50 (just for test). I guess it will stop growing after some time.

@toktarev
Copy link
Author

toktarev commented Jul 18, 2018

@koldat it is clear that max_open_files reduces corresponding cache capacity.

But if I set it small and add Get operation to the benchmark I see considerable performance degradation after some time

@koldat
Copy link
Contributor

koldat commented Jul 18, 2018

@toktarev sure you will see performance degradation. You have to choose how you want to achieve your result. In ideal case everything fits in memory for other cases you have to tune. What I am doing:

  1. I try to keep my table files big, but not huge (not multiplying file size between levels).
  2. I am always trying to time partition my data (prefix with time partition like day number)
  3. I am deleting old data using delete files in range (because of 1 and 2 it works perfectly). Delete operation kills seek performance due to tomb stones.
  4. Queries to partitioned data is hitting caches better (data are collocated).
  5. I have started to use second level index. In case you have big number of files you will after some time end up in situation it will not fit in memory. Two level index helps here. And you can control index cache size.

Can you please do the test (max_open_file=50) and confirm that it is not memory leak? (Original issue)

@toktarev
Copy link
Author

toktarev commented Jul 18, 2018

@koldat sure I'll test it a bit latter (a bit busy right now).

I don't understand why RocksDB keeps file blocks in cache.
There is no big needs for this.

I'd like to ask them revise their memory management and free allocated memory in Blocks after each Flush or Read operation.

This is 360K cache blocks which just consumes memory for nothing.

@koldat
Copy link
Contributor

koldat commented Jul 18, 2018

@toktarev It makes a lot of sense to keep it in memory. Let me describe it on example:

You are calling GET operation

  1. It checks memtable. If it is there return.
  2. It checks all L0 files in range for GET (see table get steps down). If it is there return.
  3. It checks L1 files in range. If it is there return.
  4. It checks L2 files in range. If it is there return.
  5. etc.

Every table check for GET has to do this. Let me describe worst scenario (nothing opened and cached).

  1. Open file and read index. Index describes (binary tree) of all blocks in file. If you will not have it in memory it will slow down every operation. That is the consumed memory. Sometimes index is big (5M), because more blocks make it bigger. If you will need to load it for every operation it will impact the performance.
  2. It finds correct block. This block is cached and deserialized. Then it does local search in block for requested key.
  3. Value is returned.

Now imagine you have to do this for all table files that are possibly hitting that key. It is a lot of operation you need to do.

Solution I used (two level index) seems brilliant (for my use case!). The difference is that it loads first level index into memory. But this guy is much smaller thus you can have much higher number of files opened. Second level (index partitions) are than using block cache using standard caching and eviction algorithms. That means if you have enough memory for caching it is used and fast. When you start to starve on memory, performance will go down.

RocksDB is mostly about tuning. There is not an ideal way how to use it. But if you are using that correctly, there cannot be anything faster. On the other hand, incorrect usage can make it really bad. Personally I do not see anything wrong in memory management after I started to understand what it does. In case the test will prove there is not a leak.

Anyway kudos to authors!

@toktarev
Copy link
Author

toktarev commented Jul 18, 2018

There is no leak.
There is just memory consumption.
@koldat
Thanks for explanation

But I still see at least 1 problem:

I can't use https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters
for Hash-optimized MemTables (optimized for lookUp) at least I didn't find how to do it

@toktarev
Copy link
Author

My custom application load huge number of data into RocksDB (about 70G)

After some time I see:

-rw-r--r-- 1 ubuntu ubuntu 118410341 Jul 18 15:19 000121.sst
-rw-r--r-- 1 ubuntu ubuntu 118412305 Jul 18 15:19 000122.sst
-rw-r--r-- 1 ubuntu ubuntu 118412022 Jul 18 15:19 000123.sst
-rw-r--r-- 1 ubuntu ubuntu 118410456 Jul 18 15:19 000124.sst
-rw-r--r-- 1 ubuntu ubuntu 118411486 Jul 18 15:19 000125.sst
-rw-r--r-- 1 ubuntu ubuntu 118413017 Jul 18 15:19 000126.sst
-rw-r--r-- 1 ubuntu ubuntu 118408698 Jul 18 15:19 000127.sst
-rw-r--r-- 1 ubuntu ubuntu 118413935 Jul 18 15:19 000128.sst
-rw-r--r-- 1 ubuntu ubuntu 118414146 Jul 18 15:20 000129.sst
-rw-r--r-- 1 ubuntu ubuntu 118412573 Jul 18 15:20 000130.sst
-rw-r--r-- 1 ubuntu ubuntu 118415362 Jul 18 15:20 000131.sst
-rw-r--r-- 1 ubuntu ubuntu 118410121 Jul 18 15:20 000132.sst
-rw-r--r-- 1 ubuntu ubuntu 118410383 Jul 18 15:20 000133.sst
-rw-r--r-- 1 ubuntu ubuntu 118414743 Jul 18 15:20 000134.sst
-rw-r--r-- 1 ubuntu ubuntu 118410111 Jul 18 15:20 000135.sst
-rw-r--r-- 1 ubuntu ubuntu 31155716 Jul 18 15:20 000136.sst
-rw-r--r-- 1 ubuntu ubuntu 133695647 Jul 18 15:20 000137.sst

And process consumes more that 70G of RAM.

Corresponding column family is optimized for lookUp and Partitioned Filters and Index doesn't work there.

@toktarev
Copy link
Author

toktarev commented Jul 18, 2018

I see only 17 files and 70G of RAM consumed.

@koldat
Copy link
Contributor

koldat commented Jul 18, 2018

It is hard to say. Anyway it looks strange, because listed files are around 2GB size in total. That means compression generates files 35 times smaller (strange). It also strangely correlates with your data size. Are you sure that you do not have issue (memory leak) in your application?

@yuslepukhin
Copy link
Contributor

Our profiling indicates there is a huge leak accumulating within a couple of days with a process growing up to 60 GB a day. The process usually recycles. I am seeing multiple stacks that has a common theme of a BlockFetcher and trying to insert the block into cache etc. The below happens during Get()/MultiGet(), at the end of compaction etc. The first below stack is the most prolific raking many Gbs.My suspicion it may have something to do with the recent LRU cache changes though not sure. Does not happen in 5.6.1

ntdll!RtlpAllocateHeap+1DE9 (d:\rs1\minkernel\ntos\rtl\heap.c, 6734)
ntdll!RtlpAllocateHeapInternal+727 (d:\rs1\minkernel\ntos\rtl\heap.c, 2021)
ucrtbase!_malloc_base+36 (d:\rs1\minkernel\crts\ucrt\src\appcrt\heap\malloc_base.cpp, 34)
RocksDBStore!operator new+31 (f:\dd\vctools\crt\vcstartup\src\heap\new_scalar.cpp, 19)
RocksDBStore!rocksdb::ZSTD_Uncompress+65 (d:\vsts\agent_1\_work\1\s\util\compression.h, 1044)
RocksDBStore!rocksdb::UncompressBlockContentsForCompressionType+27D (d:\vsts\agent_1\_work\1\s\table\format.cc, 354)
RocksDBStore!rocksdb::UncompressBlockContents+3B (d:\vsts\agent_1\_work\1\s\table\format.cc, 390)
RocksDBStore!rocksdb::BlockFetcher::ReadBlockContents+552 (d:\vsts\agent_1\_work\1\s\table\block_fetcher.cc, 230)
RocksDBStore!rocksdb::`anonymous namespace'::ReadBlockFromFile+12D (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 87)
RocksDBStore!rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache+3A9 (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 1748)
RocksDBStore!rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter>+19D (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 1624)
RocksDBStore!rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter>::InitDataBlock+2DE (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 2103)
RocksDBStore!rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter>::FindKeyForward+D8 (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 2125)

ntdll!RtlpAllocateHeap+1DE9 (d:\rs1\minkernel\ntos\rtl\heap.c, 6734)
ntdll!RtlpAllocateHeapInternal+727 (d:\rs1\minkernel\ntos\rtl\heap.c, 2021)
ucrtbase!_malloc_base+36 (d:\rs1\minkernel\crts\ucrt\src\appcrt\heap\malloc_base.cpp, 34)
RocksDBStore!operator new+31 (f:\dd\vctools\crt\vcstartup\src\heap\new_scalar.cpp, 19)
RocksDBStore!rocksdb::BlockFetcher::GetBlockContents+A4 (d:\vsts\agent_1\_work\1\s\table\block_fetcher.cc, 172)
RocksDBStore!rocksdb::BlockFetcher::ReadBlockContents+5CA (d:\vsts\agent_1\_work\1\s\table\block_fetcher.cc, 237)
RocksDBStore!rocksdb::`anonymous namespace'::ReadBlockFromFile+12D (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 87)
RocksDBStore!rocksdb::BlockBasedTable::MaybeLoadDataBlockToCache+3A9 (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 1748)
RocksDBStore!rocksdb::BlockBasedTable::NewDataBlockIterator<rocksdb::DataBlockIter>+19D (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 1624)
RocksDBStore!rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter>::InitDataBlock+2DE (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 2103)
RocksDBStore!rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter>::FindKeyForward+D8 (d:\vsts\agent_1\_work\1\s\table\block_based_table_reader.cc, 2125)

@yuslepukhin
Copy link
Contributor

@siying @ajkr This is real. There is something going on with cached blocks.

@siying
Copy link
Contributor

siying commented Jul 23, 2018

@koldat which release or commit are you running on while seeing the leak?
@yuslepukhin same question

CC @maysamyabandeh . One the the previous heap profile shows the partitioned index took most of the memory.

@siying
Copy link
Contributor

siying commented Jul 23, 2018

@yuslepukhin Get()/MultiGet() is not supposed to go this path. Maybe it's the compaction path? Can you paste the whole stack so that we can figure out where it is from?

@yuslepukhin
Copy link
Contributor

@siying Thanks for responding.
The commit I am using is de98fd8, however, the leak shows up in earlier commits such as 1f32dc7

There are multiple stacks that show in my profiling. Let me gather the top and I will post it here.

@maysamyabandeh Partitioned index also caught my attention.

@yuslepukhin
Copy link
Contributor

This covers practically all of it.

TopStacksPosted.txt

@toktarev
Copy link
Author

I confirm

RocksDBStore!rocksdb::UncompressBlockContentsForCompressionType+27D (d:\vsts\agent_1\_work\1\s\table\format.cc, 354)
RocksDBStore!rocksdb::UncompressBlockContents+3B (d:\vsts\agent_1\_work\1\s\table\format.cc, 390)
RocksDBStore!rocksdb::BlockFetcher::ReadBlockContents+552 (d:\vsts\agent_1\_work\1\s\table\block_fetcher.cc, 230)

This causes a lot of allocations and problems

@koldat
Copy link
Contributor

koldat commented Jul 23, 2018

I am using 5.12. branch with our JNI modifications. I can say that two level index maybe "hide" the leak, because grow of memory is not noticeable (so I thought it was incorrect usage). Most of memory was taken from compaction and seek and seekPrev (see picture). Seek is hitting that ReadBlockContents as well as compaction.

I guess that maybe index cache does not evict indices for files that are deleted.

@toktarev
Copy link
Author

We switched all column families on two level index.

We still observe memory grows on intensive read operations.

@siying
Copy link
Contributor

siying commented Jul 23, 2018

@koldat what's your block cache size? Data blocks from a deleted file are not necessarily being deleted immediately. Eventually they will be evicted based on LRU. If your actual block cache usage is larger than capacity, then it's a problem. Otherwise, it is still expected.

@siying
Copy link
Contributor

siying commented Jul 23, 2018

For those who run on C++ or you can get block cache size using another way, what's your reading of Cache::GetCapacity(), Cache::GetUsage() and Cache::GetPinnedUsage() reading?

@toktarev
Copy link
Author

Actually this is very big surprise for me that C++ application has such problems with memory.

I understand when Java application dies due to frequent GCs and so on ..

But C++ application written in Silicon Valley, hmmm ....

@toktarev
Copy link
Author

@gaojieliu
you should set MALLOC_ARENA_MAX=2 and check
--Do you mind to share the additional tuning of RocksDB configs?
I can't

Is it a commercial secret or you don't know for sure?
If it is a secret (why?) or anybody knows how to fix it please don't hesitate to contact me by mail

[email protected]

Hello Oleg.

I can't share all steps we did (commercial secret).
But all necessary pieces of required steps where described in this issue.

@toktarev
Copy link
Author

@koldat
https://github.com/facebook/rocksdb/pull/7148/files

please, take a look at this change.
It is really possible that RocksDB will just hung on lock.

Of course no a big deal, but quality of the software = set of well written "not big deal" pieces.

@koldat
Copy link
Contributor

koldat commented Feb 11, 2021

I am sorry, but I do not know anything about your "lock" issue and it is not related to this one so I would not mix it. I was commenting on memory usage that I was also fighting with on my start (due to lack of my knowledge). As you have your own view on that I am not able to help more here as I am just a happy user of this library. If someone will want to help with memory configuration please create new issue with fresh data. I am unsubscribing from this one.

@toktarev
Copy link
Author

@koldat, ok up to you.
---As you have your own view on that
I don't need some help more, I just receive new and new complains being author of this issue.

Good luck !

@olegloktev
Copy link

Thank you guys for trying to help.
I've tried this config and as I see memory keeps growing slowly as it did

use jemalloc and config like this:
cache_index_blocks=true
cache_index_and_filter_blocks_with_high_priority=true
pin_l0_filter_and_index_blocks_in_cache=false
partition_filters = true
index_type = rocksdb::BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch

my process's rss almost hold on a stale size (centos8 kernel 4.18.0)

but on centos7(kernel 3.10.0) memory still grows

Also I've tried to play with these settings: https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters#how-to-use-it
with such configuration and aslo with cache settings set to false, not sure if it helps, looks like it doesn't.
Unfortunately it takes a long time to see if my changes have any impact on memory consuming and it is not so easy in my case to measure data streams to compare memory usage.
Keeping searching, any further information is appreciated

@toktarev
Copy link
Author

Oleg, did you try glibc + MALLOC_ARENA_MAX=2 ?

@olegloktev
Copy link

Oleg, did you try glibc + MALLOC_ARENA_MAX=2 ?

Yes, I saw some decrease of memory consuming but it didn't help generally.
After some experience I may conclude that in my case the "issue" was caused by relatively small db size and rocks thought that it may grow significant and didn't want to free memory. So what helped me to see stable memory size consuming is using OptimizeForSmallDb which is recommended for DBs less than 1Gb which is my case. But I still keep MALLOC_ARENA_MAX=2 too because it helps to keep memory consuming level in lower bounds

@linas
Copy link

linas commented Apr 13, 2021

FYI, this appears to be a duplicate of #3216 (I posted some notes there) I can add that the memory leak appears to be linearly associated with a file descriptor leak. I can count file descriptors by saying lsof -p pid |grep sst |wc and watch that number go up. I can force compaction to occur by closing the DB and reopening, and reclosing. After this, lsof shows that most of these files have been deleted: that is, lsof -p pid |grep sst |grep deleted |wc is now large.

Resolved: See my own comment #4112 (comment) immediately below.

(In my case, the compacted size of my db is 500 MBytes. Continuous editing of this DB will blow up RAM use to 200GBytes in an hour or two. There appears to be about 50MBytes RAM use per sst file, which is the same as the average sst file size ... Running with the default ulimit -n 1024 will cause rdb to get an error on max open files at around 50GBytes ram usage. Increasing ulimit -n 4096 allows processing to continue, and RAM usage to grow. My best guess is that each sst file is memory mapped, and that, even upon compaction/deletion, this memory mapping remains and eats RAM. That is, this is NOT a malloc bug! This appears to be memory-mapping, file-handle management bug. At least, to me. ... and it turned out to be my bug. This is associated with my failure to delete the iterator returned by NewIteratior())

@linas
Copy link

linas commented Apr 13, 2021

Update: after this comment: #3216 (comment) I came to realize that I have made a complete newbie mistake in my c++ code: rocks iterators are NOT smart pointers that self-delete when they go out of scope. They must be explicitly deleted! Upon fixing this complete-newbie mistake, all my disk and RAM usage problems go away! Wow!

I suggest that anyone else reading this take a good close look at their iterators, and review <rocksdb/db.h> for anything else that needs an explicit delete.

@czs007
Copy link

czs007 commented Oct 8, 2021

hi guys, any update on this?

@ajkr
Copy link
Contributor

ajkr commented Nov 18, 2021

hi guys, any update on this?

I am not sure what you are referring to specifically. We are working on limiting memory usage and updates on that will be in the release notes ("HISTORY.md").

@ajkr ajkr closed this as completed Nov 18, 2021
@toktarev
Copy link
Author

Thank you FaceBook team.

My complaints were not useless.

@cculianu
Copy link
Contributor

cculianu commented Nov 18, 2021

Just FYI -- My application https://github.com/cculianu/fulcrum , which makes heavy use of rocksdb runs great on Linux. But on macOS and on Windows it leaks memory. On Linux memory usage is rock solid at 800- 900MB even if the process is up for a month. On Windows or macOS it grows and grows and grows so much that after a few days it's like 7-9 GB consumption and growing. I'm pretty sure the leak is inside rocksdb. Rest of app is tight. Sadly, I lack the tools on macOS or Windows to actually run things like valgrind, etc.

So yes -- kindly do search for leaks on macOS and/or Windows. I suspect you have plenty.

@toktarev
Copy link
Author

We had problems with runs on Linux as well, but after 1-2 weeks of stress testing.

@toktarev
Copy link
Author

I suspect that the root cause is a bad collaboration with the internals of libc allocator, and that
caused problems with fragmentation.

At least fix using MALLOC_ARENA points on this.

Please, correct me if I am wrong.

@cculianu
Copy link
Contributor

cculianu commented Nov 18, 2021

Well I tried also with and without using jemalloc on both Windows and macOS. Note that macOS I believe has a superior allocator to the standard glibc one.. I rarely see fragmentation issues on macOS.

I have no idea if it's fragmentation or if it's something else... really I do not. Curious that on Linux it's fine..

@areyohrahul
Copy link

Hi @toktarev, were you able to find any solution for this?

I'm using RocksJni (8.5.4) and I'm seeing a constant growth in memory. All the things mentioned here were checked.

My DB size on disk is 5 GB. Block cache size is 10 GB with strict limits. Memtables should take note more than 1GB.
Max open files are 100. Total files of my DB are 70-80 though.

I tried changing MAX_ARENA config. Then, I tried using jemalloc / tcmalloc as well. But nothing worked.

I'm closing all RocksObject from code also. From my observations, the issue is in the read flow only.

Been stuck on this for weeks now. Can someone please help?

@cculianu
Copy link
Contributor

cculianu commented Oct 22, 2023

For what it’s worth I’ve had great success by using jemalloc along with the following ENV var to get jemalloc to not use thread-local caches for allocated RAM (which wastes memory since each thread has effectively its own heap)

MALLOC_CONF=tcache:false

@toktarev
Copy link
Author

toktarev commented Oct 22, 2023

My DB size on disk is 5 GB. Block cache size is 10 GB with strict limits. Memtables should take note more than 1GB.

Hi @areyohrahul

I understand your pain and remember my pain when I worked on this problem.
We closed this problem using MAX_ARENA config.

But i don't know why it is not a solution for you.

I can give you 2 advices here:

  1. If you are a programmer - increase you deep expertise (what happens during memory allocation, when it happens (from OS point of view))
  2. If you are a manager - hire smart programmer with deep system expertices.

Otherwise you will just experience pain and "guess on the coffee's heap".

@areyohrahul
Copy link

Hi @cculianu, how did you end up using jemalloc? Did you have to compile RocksDB again with some special flag? Or did you just change some setting for Java?

@cculianu
Copy link
Contributor

cculianu commented Oct 24, 2023

Oh I'm a C++ guy.. I don't use Java. There is a way to "force" the java runtime process to use jemalloc though without its knowledge.. via some LD_PRELOAD magic. If you install jemalloc, you can use LD_PRELOAD to force it. See this guide:

https://github.com/jemalloc/jemalloc/wiki/Getting-Started

You can try that -- this assumes the java runtime uses malloc() though internally and doesnt' do some crazy stuff like calling the OS's sbrk() system call itself..

@areyohrahul
Copy link

Thanks @cculianu, this helps a lot. Will try this out.

@areyohrahul
Copy link

Hey @toktarev, I tried changing the MAX_ARENA config but it didn't work. Can you share how did you analyse the source of the problem and how exactly did you change it?

Also, do you use JNI or CPP?

@toktarev
Copy link
Author

@areyohrahul

JNI (Java + CPP).

I remember that MAX_ARENA fixed memory growth in long runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abandoned-or-aged-out waiting Waiting for a response from the issue creator.
Projects
None yet
Development

No branches or pull requests