Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: badger v3 memory leak #1841

Closed
RyouZhang opened this issue Dec 15, 2022 · 5 comments
Closed

[BUG]: badger v3 memory leak #1841

RyouZhang opened this issue Dec 15, 2022 · 5 comments
Assignees
Labels
kind/bug Something is broken.

Comments

@RyouZhang
Copy link

RyouZhang commented Dec 15, 2022

What version of Badger are you using?

No response

What version of Go are you using?

go 1.19.4

Have you tried reproducing the issue with the latest release?

None

What is the hardware spec (RAM, CPU, OS)?

a container schedule by k8s

8GB 4Core X86 Linux

What steps will reproduce the bug?

it will OOM

Expected behavior and actual result.

No response

Additional information

package main

import (
	"time"
	"fmt"

	"github.com/google/uuid"
	badger "github.com/dgraph-io/badger/v3"
	"github.com/dgraph-io/badger/v3/options"
)

func main() {

	ch := make(chan bool)

	db, err := badger.Open(badger.DefaultOptions("/pv_data/ryou.zhang/temp/data").
		WithCompression(options.None).
		WithIndexCacheSize(256<<20).
		WithBlockCacheSize(0))
	if err != nil {
		panic(err)
	}
	defer db.Close()


	for i:=0; i<100; i++ {
	go func() {
		for {
			key := uuid.NewString()
			raw := make([]byte, 1024*64)

			txn := db.NewTransaction(true)
			txn.SetEntry(badger.NewEntry([]byte(key), raw))
			txn.Commit()


			// txn := db.NewTransactionAt(uint64(time.Now().UnixNano()), true)
			// txn.SetEntry(badger.NewEntry([]byte(key), raw))
			// txn.Commit()
			<-time.After(1 * time.Millisecond)
		}
	}()
	}
	go func() {
		for {
			fmt.Println("cost:",(db.IndexCacheMetrics().CostAdded()- db.IndexCacheMetrics().CostEvicted())/1024.0/1024.0, "MB item:", db.IndexCacheMetrics().KeysAdded() - db.IndexCacheMetrics().KeysEvicted())
			<-time.After(1000 * time.Millisecond)
		}
	}()
	<-ch
}

code like before, when using v3 it will oom, but use v2 it's ok

@RyouZhang RyouZhang added the kind/bug Something is broken. label Dec 15, 2022
@epolar
Copy link

epolar commented Feb 22, 2023

v3 memory use have problem? In v3, i was oom, but
v2 is ok.
and WithValueLogFileSize seems not work correct

@harshil-goel
Copy link
Contributor

harshil-goel commented Apr 11, 2023

@RyouZhang Thanks a lot for the script. We have started to look into the issue now. I have a couple of questions:

  1. Could you explain your use case a bit?
  2. You are checking and setting Index Cache metrics, but those only seem to be called when we are doing Encryption. Are you running badger with Encryption on? In that case, you would also need to set block cache.
  3. Are you specifically worried about a memory leak in one of our caches or just badger in general?

@Zach-Johnson
Copy link
Contributor

@harshil-goel I believe I have a similar issue. I don't have a minimal reproducible example, but the behavior described here looks to be the same. My badger settings:

numMemTables: 5
numLevelZeroTables: 5
numLevelZeroTablesStall: 15
numCompactors: 2
maxTableSizeMB: 1
valueLogFileSizeMB: 1000
memTableSizeMB: 64
levelSizeMultiplier: 10
detectConflicts: false
compactL0OnClose: true

The application has a single-threaded writer, there is only ever one writer and one transaction at at time. I've observed memory usage grow slowly over the course of weeks with occasional OOM kills by k8s once it hits ~64gb. I've observed this during periods when the overall size of the badger cache is growing and when its decreasing, so it does not seem to be tied to the total size of the cache. Let me know if there is other info that might be useful.

@harshil-goel
Copy link
Contributor

harshil-goel commented May 8, 2023

Hey guys, I tried running v2..v4. What happened was, we changed the default configurations from v2 -> v3,v4. These configurations increase the memory usage to give more performance benefit. This was done because we use default configurations of badger in dgraph. I tried running v4 with these older settings and my machine didn't crash for like 3 hours straight. (It crashes within 2 minutes without these configuration changes)
Configurations

  1. NumCompactors: 4 -> 2
  2. NumLevelZeroTableStall: 15 -> 10
  3. SyncWrites: false -> true

NumCompactors define how many compaction we would run at any given time. Increasing this increases the number of compaction happening at any given time. This would increase the memory usage, but it would move data in badger faster. This movement would allow badger to write more data quickly.

Level 0 Stall limit: We allow a certain amount of data to be written in badger L0 before it stalls out. This data is "unclean" and needs to be compacted as soon as possible. Having more tables leads to more ingestion in badger, more time for the data to be moved out. But, it also leads to more memory usage as we allow for more data to stay inside badger, rather than just on disk.

SyncWrites: This is a flag that makes sure that you sync the WAL file in badger before actually commiting the transcation. This is a safety feature, which makes sure that the data is protected in case of kernel panic and sudden power outages. This feature would first make sure that any write to badger is complete before we actually commit. As you can understand, this would slow things down, which gives badger enough time to move things out of memory onto disk.

All of these flags would reduce memory and performance. With the new configs I was getting around 6GB of memory usage by badger on average. Please let me know if this makes sense. I will close out the ticket as there is no actual memory leak. Please try out the configurations, if these don't work for you, please re-open the ticket / tag me here.

@mangalaman93
Copy link
Contributor

This is the PR that does changes NumCompactors #1574

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken.
Development

No branches or pull requests

5 participants