Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide simple option for limiting total memory usage #1268

Closed
gonzojive opened this issue Mar 21, 2020 · 17 comments
Closed

Provide simple option for limiting total memory usage #1268

gonzojive opened this issue Mar 21, 2020 · 17 comments
Labels
area/documentation Documentation related issues. kind/enhancement Something could be better. status/accepted We accept to investigate or work on it.

Comments

@gonzojive
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.14 linux/amd64

What version of Badger are you using?

v1.6.0

Does this issue reproduce with the latest master?

As far as I know, yes.

What are the hardware specifications of the machine (RAM, OS, Disk)?

32 GB RAM
AMD Ryzen 9 3900X 12-core, 24-Thread
1 TB Samsung SSD

What did you do?

Used the default options to populate a table with about 1000 key/val pairs where each value is roughly 30MB.

The badger database directory is 101GB according to du. There are 84 .vlog files.

When I start my server up, it quickly consumes 10 GB of ram and dies due to OOM. dmesg output:

[654397.093709] Out of memory: Killed process 15281 (taskserver) total-vm:20565228kB, anon-rss:12610116kB, file-rss:0kB, shmem-rss:0kB

What did you expect to see?

I would expect the database to provide a simple option to limit memory usage to an approximate cap.

What did you see instead?

  1. The recommended mechanism of tweaking a many-dimension parameter space is confusing and hasn't worked for me.

  2. The memory related parameters are not explained in much detail. For example, the docstring for options.MemoryMap doen't indicate roughly how expensive MemoryMap is vs FileIO.

  3. I haven't managed to successfully reduce memory usage using the following parameters:

func opts(dbPath string) badger.Options {
	return badger.DefaultOptions(dbPath).
		WithValueLogLoadingMode(options.FileIO).
		WithTableLoadingMode(options.FileIO).
		WithNumMemtables(1)
}

I can create an example program if the issue is of interest.

@jarifibrahim jarifibrahim added the area/documentation Documentation related issues. label Mar 22, 2020
@Kleissner
Copy link

Kleissner commented Mar 26, 2020

Agree. We use it for a simple key-value lookup with a couple of billion records (database directory is 700 GB).

It uses about 200 GB of RAM which is unacceptable. The culprit are memory mapped files according to Process Explorer.
Good thing we have a lot of RAM, but there should be an easy well-defined max memory limit to set.

@gonzojive
Copy link
Author

I am attempting to restore from backup and running out of memory. Here's a memory profile:
memprofile25s

latest options:

func badgerOpts(dbPath string) badger.Options {
	return badger.DefaultOptions(dbPath).
		WithValueLogLoadingMode(options.FileIO).
		WithTableLoadingMode(options.FileIO).
		WithNumMemtables(1).
		WithCompression(options.Snappy).
		WithKeepL0InMemory(false).WithLogger(&gLogger{})
}

@jarifibrahim
Copy link
Contributor

@gonzojive I will have a look at the high memory usage. What is the size of your data directory and the size of the backup file?

@gonzojive
Copy link
Author

gonzojive commented Mar 26, 2020

The backup file is 100GB (99,969,891,936 bytes)

In this case, backup.go's Load function. seems to be a major offender. It does not account for the size of the values at all. Added logging shows huge key/value accumulation and no flushing:

I0326 09:16:43.884962    5195 taskstorage.go:147] not flushing with 1269 entries, 73.2K key size, 3.2G combined size, 9.6M limit

I'm guessing there are many places where value size is not accounted for when making memory management decisions.

@gonzojive
Copy link
Author

gonzojive commented Mar 26, 2020

I modified backup.go to flush when accumulated key + value size exceeds 100 MB. I can send a pull request for this at some point.

Before backup.go modifications the process consumes memory until the OS kills it:

Screenshot from 2020-03-26 09-32-01

After:
(it's basically flat at 21 GB).... but I didn't manage to grab a screenshot because of Ubuntu/gnome flakiness.

When I set the threshold to 500MB instead of 100 MB, memory usage still causes a crash for some reason.

gonzojive added a commit to gonzojive/badger that referenced this issue Mar 27, 2020
The previous behavior only accouned for key size. For databases where keys are
small (e.g., URLs) and values are much larger (megabytes), OOM errors were easy
to trigger during restor operations.

This change does not set the threshold use for flushing elegantly - a const is
used instead of a configurable option.

Related to dgraph-io#1268, but not a full fix.
@stale
Copy link

stale bot commented Apr 25, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Apr 25, 2020
@gonzojive
Copy link
Author

Although I have a fix for the backup restoration issue, this issue as a whole has not been addressed.

I'm not aware of what causes badger to take up the amount of memory that it does. That seems like the first step towards introducing a flag for setting a fixed memory limit. May someone from the badger team weigh in?

@stale stale bot removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Apr 25, 2020
@jarifibrahim
Copy link
Contributor

I'm not aware of what causes badger to take up the amount of memory that it does. That seems like the first step towards introducing a flag for setting a fixed memory limit. May someone from the badger team weigh in?

The amount of memory being used depends on your DB options. For instance, each table has a bloom filter and these bloom filters are kept in memory. Each bloomfilter takes up 5 MB of memory. So if you have 100 GB of data, that means you have (1001000/64) = 1562 tables, and 15625 MB is about 7.8 GB of memory. So your bloom filters alone would take up 7.8 GB of memory. We have a separate cache in badger v2 to reduce the memory used by bloom filters.

Other things that might affect memory usage is the table loading mode. If you set the table loading mode to fileIO, the memory usage should reduce but then your reads would be very slow.

@stale
Copy link

stale bot commented May 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label May 27, 2020
@jarifibrahim jarifibrahim added the status/accepted We accept to investigate or work on it. label May 27, 2020
@stale stale bot removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label May 27, 2020
@jarifibrahim jarifibrahim added the kind/enhancement Something could be better. label May 27, 2020
@gonzojive
Copy link
Author

Perhaps something else to keep in mind when tracking down memory hog issues: The Go memory profile doesn't seem to capture the full extent of memory usage.

Here is a screenshot that shows the system's accounting (12.7 GB) vs Go's accounting (84.34 MB).
image

@gonzojive
Copy link
Author

gonzojive commented May 30, 2020

Here are the runtime.Memstats for a similar process to the screenshot above.

edit: It could be that the OS is not reclaiming memory freed by Go as discussed in this bug: golang/go#14521. However, I'm not sure how to confirm this. Badger also makes low-level system calls that might not be tracked by the above memory profiles (mmap).

# runtime.MemStats
# Alloc = 8994773352
# TotalAlloc = 142559328392
# Sys = 19054750096
# Lookups = 0
# Mallocs = 173259
# Frees = 161722
# HeapAlloc = 8994773352
# HeapSys = 18450841600
# HeapIdle = 9454788608
# HeapInuse = 8996052992
# HeapReleased = 3498221568
# HeapObjects = 11537
# Stack = 4063232 / 4063232
# MSpan = 135320 / 180224
# MCache = 8680 / 49152
# BuckHashSys = 1528578
# GCSys = 596795800
# OtherSys = 1291510
# NextGC = 4058104544
# LastGC = 1590858449129344940
# PauseNs = [720764 12044 12475 7990262 7809221 10238422 11080910 26676777 11483433 50231078 15615060 6761507 9593387 21339990 30935701 48671278 33504211 28017768 14602732 6955500 39479912 9759023 76460498 15589806 25668442 15236919 15399 8833874 12794857 58970453 10943138 13950377 16293396 32542175 8410993 13622020 12043529 32008031 12635226 15306547 27373405 13418150 23685828 68681901 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# PauseEnd = [1590858409431780297 1590858409480216799 1590858414430519349 1590858419259323466 1590858419275574116 1590858419335061131 1590858419470989711 1590858419820272840 1590858419912902151 1590858420513245024 1590858422213604632 1590858423000299952 1590858424082382764 1590858424291966196 1590858424705682993 1590858425123361202 1590858425954499123 1590858427235619883 1590858427997664669 1590858429465166747 1590858429700411478 1590858429977301044 1590858430767177012 1590858432254219118 1590858432726467896 1590858434046992645 1590858434430874744 1590858434750260245 1590858434836953495 1590858435420921176 1590858436777906262 1590858437837892960 1590858438434330473 1590858439224690949 1590858439618659050 1590858440508708633 1590858441142277899 1590858442053297406 1590858443890553903 1590858444739211063 1590858446473432441 1590858447066636466 1590858447807950895 1590858449129344940 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# NumGC = 44
# NumForcedGC = 7
# GCCPUFraction = 0.034290922233429534
# DebugGC = false

@gonzojive gonzojive changed the title Reducing memory usage is complicated Provide simple option for limiting total memory usage May 30, 2020
@gonzojive
Copy link
Author

On the other hand, sometimes memory usage is quite high & there is a lot of allocing.

I can't find a way to force the OS to reclaim the memory freed by Go, which seems to use MADV_FREE on recent linux version (https://golang.org/src/runtime/mem_linux.go). It would be helpful to force the OS to reclaim such memory get a more accurate picture of what's going on.

in use
image

allocs
image

@gonzojive
Copy link
Author

gonzojive commented May 31, 2020

In my case, it would help if prefetchValues had an option to restrict prefetches based on value byte size, not number of values. Perhaps the IteratorOptions could become

// IteratorOptions is used to set options when iterating over Badger key-value
// stores.
//
// This package provides DefaultIteratorOptions which contains options that
// should work for most applications. Consider using that as a starting point
// before customizing it for your own needs.
type IteratorOptions struct {
	// Indicates whether we should prefetch values during iteration and store them.
	PrefetchValues bool
	// How many KV pairs to prefetch while iterating. Valid only if PrefetchValues is true.
	PrefetchSize int
	// If non-zero, specifies the maximum number of bytes to prefetch while
	// prefetching iterator values. This will overrule the PrefetchSize option
	// if the values fetched exceed the configured value.
	PrefetchBytesSize int
	Reverse           bool // Direction of iteration. False is forward, true is backward.
	AllVersions       bool // Fetch all valid versions of the same key.

	// The following option is used to narrow down the SSTables that iterator picks up. If
	// Prefix is specified, only tables which could have this prefix are picked based on their range
	// of keys.
	Prefix      []byte // Only iterate over this given prefix.
	prefixIsKey bool   // If set, use the prefix for bloom filter lookup.

	InternalAccess bool // Used to allow internal access to badger keys.
}

Even better would be a database-wide object for restricting memory use to a strict cap.

@jarifibrahim
Copy link
Contributor

jarifibrahim commented Jun 1, 2020

@gonzojive How big are your values? The memory profile you shared shows that y.Slice was holding 15 GB of data. That's unusual unless you have a big value.

I can't find a way to force the OS to reclaim the memory freed by Go, which seems to use MADV_FREE on recent linux version (https://golang.org/src/runtime/mem_linux.go). It would be helpful to force the OS to reclaim such memory get a more accurate picture of what's going on.

debug.FreeOSMemory() https://golang.org/pkg/runtime/debug/#FreeOSMemory is what you're looking for

From https://golang.org/pkg/runtime/,

    // HeapIdle minus HeapReleased estimates the amount of memory
    // that could be returned to the OS, but is being retained by
    // the runtime so it can grow the heap without requesting more
    // memory from the OS. If this difference is significantly
    // larger than the heap size, it indicates there was a recent
    // transient spike in live heap size.
    HeapIdle uint64

So heapIdle - heapreleased in your case is

>>> (9454788608-3498221568) >> 20
5680

which is 5.6 GB. That's the amount of memory golang runtime is holding.

@gonzojive
Copy link
Author

In this case, the values are 25 MB or more. The memory usage was from prefetching 100 values for each request, and many requests are run in parallel. Limiting prefetching fixed the specific issue I was having, but the general feature request remains open.

@jarifibrahim
Copy link
Contributor

Ah, that makes sense. Thanks for debugging it @gonzojive . The feature request still remains open.

@minhaj-shakeel
Copy link

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Documentation related issues. kind/enhancement Something could be better. status/accepted We accept to investigate or work on it.
Development

No branches or pull requests

4 participants