Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak in oracle #1238

Closed
popbones opened this issue Mar 3, 2020 · 8 comments
Closed

Potential memory leak in oracle #1238

popbones opened this issue Mar 3, 2020 · 8 comments
Labels
area/performance Performance related issues. kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/stale The issue hasn't had activity for a while and it's marked for closing.

Comments

@popbones
Copy link

popbones commented Mar 3, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.4 darwin/amd64

What version of Badger are you using?

github.com/dgraph-io/badger v1.6.0

Does this issue reproduce with the latest master?

Not tested yet

What are the hardware specifications of the machine (RAM, OS, Disk)?

                    'c.          
                 ,xNMM.          -------------------------------------------------------
               .OMMMMo           OS: macOS 10.15.2 19C57 x86_64
               OMMM0,            Host: MacBookPro15,1
     .;loddo:' loolloddol;.      Kernel: 19.2.0
   cKMMMMMMMMMMNWMMMMMMMMMM0:    Uptime: 22 hours, 36 mins
 .KMMMMMMMMMMMMMMMMMMMMMMMWd.    Packages: 177 (port)
 XMMMMMMMMMMMMMMMMMMMMMMMX.      Shell: fish 3.0.2
;MMMMMMMMMMMMMMMMMMMMMMMM:       Resolution: 2880x1800 , 5120x2880@2x @ -Hz
:MMMMMMMMMMMMMMMMMMMMMMMM:       DE: Aqua
.MMMMMMMMMMMMMMMMMMMMMMMMX.      WM: Quartz Compositor
 kMMMMMMMMMMMMMMMMMMMMMMMMWd.    WM Theme: Blue (Dark)
 .XMMMMMMMMMMMMMMMMMMMMMMMMMMk   Terminal: iTerm2
  .XMMMMMMMMMMMMMMMMMMMMMMMMK.   Terminal Font: SFMono-Regular 13
    kMMMMMMMMMMMMMMMMMMMMMMd     CPU: Intel i9-9880H (16) @ 2.30GHz
     ;KMMMMMMMWXXWMMMMMMMk.      GPU: Intel UHD Graphics 630, Radeon Pro 560X
       .cooc,.    .,coo:.        Memory: 11015MiB / 16384MiB



What did you do?

Basically we have a command line program that reads from a file and for each line we do a check and set operation using badger as our database.

The snippet for the check and set looks like this

func (b *BadgerStore) get(ctx context.Context, ids []string) (map[string]uint64, error) {
	result := map[string]uint64{}
	notFound := make([]string, 0, len(ids))

	err := b.db.Update(func(txn *badger.Txn) error {
		valueByteBuff := make([]byte, 8)
		for _, id := range ids {
			select {
			case <-ctx.Done():
				return ctx.Err()
			default:
			}

			item, err := txn.Get([]byte(id))
			if err == badger.ErrKeyNotFound {
				notFound = append(notFound, id)
				continue
			}
			if err != nil {
				return err
			}
			value, err := item.ValueCopy(valueByteBuff)
			if err != nil {
				return err
			}
			i, err := parseUint64(value)
			if err != nil {
				return err
			}
			result[id] = i
		}

		if len(notFound) == 0 {
			return nil
		}

		for _, id := range notFound {
			select {
			case <-ctx.Done():
				return ctx.Err()
			default:
			}

			k, err := b.rando.Get()
			if err != nil {
				return err
			}

			v := randUint64()

			if err := txn.Set([]byte(id), uint64ToBytes(v)); err != nil {
				return err
			}
			result[id] = v
		}

		return nil
	})
	return result, err
}

This is all standard. But the program does try to do things in batch and concurrently (say 8 goroutine and 1000 keys per call). And we find for large input, say >100M records, the memory usage continues to grow and get an OOM if swap is not enabled (Tested on a EC2 box with about 30G of RAM).

The heap profile indicates the most of the space inuse is allocated in newCommitTs, https://github.com/dgraph-io/badger/blob/master/txn.go#L188 to be exact.

What did you expect to see?

Memory should be released more promptly after the transaction has finished.

What did you see instead?

Memory continually to grow even many transactions have finished.

My thought

It seems to me commits map only get released when there's no more reference to the oracle according to https://github.com/dgraph-io/badger/blob/master/txn.go#L86

So in our case, since the concurrency, that rather unlikely to happen. So the commits map just hanged around way longer than it should.

The memory is way more stable when we run 1 goroutine but 8K keys per call.

@jarifibrahim jarifibrahim added area/performance Performance related issues. kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. labels Mar 4, 2020
@jarifibrahim
Copy link
Contributor

Hey @popbones, I'm aware of this issue but I don't have a fix for it yet. @damz was kind enough to send a PR to fix this #1198 but we haven't been able to merge it yet because I am supposed to make some changes to transaction which might affect his PR.

I will try to find some time and fix this.

@jarifibrahim
Copy link
Contributor

@popbones if you're running 8 transactions parallelly, you will end up with multiple conflicts. I don't completely understand which part of your program is supposed to run parallelly, but if the parallelism is to insert data into badger faster, you might want to use a single badger.WriteBatch which will internally use multiple goroutines to speed up the insertions.

Also, it is expensive to create a transaction again and again. Every time a transaction is created, we wait for some (worst case all) of the preceeeding transactions to complete.

@popbones
Copy link
Author

@popbones if you're running 8 transactions parallelly, you will end up with multiple conflicts. I don't completely understand which part of your program is supposed to run parallelly, but if the parallelism is to insert data into badger faster, you might want to use a single badger.WriteBatch which will internally use multiple goroutines to speed up the insertions.

Also, it is expensive to create a transaction again and again. Every time a transaction is created, we wait for some (worst case all) of the preceeeding transactions to complete.

Thanks for the reply. The program performs the get function posted in the description concurrently. The get function basically tries to do a check and update operation on a batch of keys, so I'm not sure how to do the check part with badger.WriteBatch without explicit locking.

Could you please elaborate on the multiple conflicts part in your comment. I was under the impression the transactions are concurrent safe. Are they not?

@muXxer
Copy link
Contributor

muXxer commented Mar 24, 2020

We also have the same problem, several batchWriter that run in parallel.
RAM used in newCommitTs keeps growing until the app crashes with OOM.

Any idea when #1198 will be merged?
Badger V2.0.1

@stale
Copy link

stale bot commented Apr 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label Apr 23, 2020
@popbones
Copy link
Author

Just trying to keep things tracked

#1275 contains updated version of PR #1198

@stale stale bot removed the status/stale The issue hasn't had activity for a while and it's marked for closing. label Apr 23, 2020
@stale
Copy link

stale bot commented May 23, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status/stale The issue hasn't had activity for a while and it's marked for closing. label May 23, 2020
@jarifibrahim
Copy link
Contributor

Fixed via #1275

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance related issues. kind/enhancement Something could be better. priority/P2 Somehow important but would not block a release. status/stale The issue hasn't had activity for a while and it's marked for closing.
Development

No branches or pull requests

3 participants