Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db ignores files handle limit (during shutdown) #21877

Closed
holiman opened this issue Nov 19, 2020 · 9 comments
Closed

db ignores files handle limit (during shutdown) #21877

holiman opened this issue Nov 19, 2020 · 9 comments
Labels

Comments

@holiman
Copy link
Contributor

holiman commented Nov 19, 2020

This happens while shutting down the node, as the database is being closed:

archive@archive:~$ lsof -p 18496 > lsof.1
archive@archive:~$ lsof -p 18496 > lsof.2
archive@archive:~$ lsof -p 18496 > lsof.3
archive@archive:~$ wc -l lsof.1
298017 lsof.1
archive@archive:~$ wc -l lsof.2
321139 lsof.2
archive@archive:~$ wc -l lsof.3
1000227 lsof.3

The db keeps opening ldb files:

archive@archive:~$ cat lsof.1 | grep ".ldb" | wc -l
297967
archive@archive:~$ cat lsof.2 | grep ".ldb" | wc -l
321100
archive@archive:~$ cat lsof.3 | grep ".ldb" | wc -l
1000184

To clarify: despite the geth file handles having been limited to ~500K, the process has 1M+ file handles opened!

This is during shutdown, so there's no block processing happening. There hasn't even been any, because I told it to shut down before it even started getting any blocks to import.

Tangential about networking

Eventually, this causes networking to fail, which incidentally
explains why ssh sessions become torn down (some other ticket).

It also causes a "temporary error" in p2p to happen, which causes a spin-loop:

archive@archive:~$ cat dump2.log | grep "Temporary read error" | wc -l
12832728

I've added a change to fix the spin-loop effect (on https://github.com/holiman/go-ethereum/tree/archivefix, commit fd53776).

Back the the root problem

Essentially the problem is:

  • If the database is huge (archive node), closing the db seems to have the effect of
    triggering something (compaction?) which does not respect the given limit on handles,
    and which eventually causes OOM.

Actual number of ldb-files are close to 3M:

archive@archive:~$ ls -l ~/geth_data/geth/chaindata/ | wc -l
2923415

I eventually killed it. Probably related parts of the stack:

goroutine 40 [runnable]:
github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Put(0xc000136fc0, 0xcf1ce4f400, 0x11ef, 0x200a)
	github.com/syndtr/[email protected]/leveldb/util/buffer_pool.go:158 +0x184
github.com/syndtr/goleveldb/leveldb/table.(*Reader).readRawBlock(0xcc32ce6f70, 0x0, 0x11ea, 0xdc91e48a1ce63501, 0x8, 0xc022323198, 0x713402, 0x0, 0x0)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:589 +0x551
github.com/syndtr/goleveldb/leveldb/table.(*Reader).readBlock(0xcc32ce6f70, 0x0, 0x11ea, 0x1, 0x0, 0x71bb63, 0x12ed060)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:603 +0x55
github.com/syndtr/goleveldb/leveldb/table.(*Reader).readBlockCached(0xcc32ce6f70, 0x0, 0x11ea, 0xccea270001, 0xccea278090, 0xc022323438, 0x7183f0, 0xccea278090, 0x44ac6a)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:648 +0x1fe
github.com/syndtr/goleveldb/leveldb/table.(*Reader).getDataIter(0xcc32ce6f70, 0x0, 0x11ea, 0x0, 0x1, 0x0, 0x3)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:765 +0x5b
github.com/syndtr/goleveldb/leveldb/table.(*Reader).getDataIterErr(0xcc32ce6f70, 0x0, 0x11ea, 0x0, 0x1, 0x0, 0x0)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:780 +0x160
github.com/syndtr/goleveldb/leveldb/table.(*indexIter).Get(0xcc32c76a20, 0xcc32d3e5a0, 0x709501)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:507 +0x2b0
github.com/syndtr/goleveldb/leveldb/iterator.(*indexedIterator).setData(0xcc32cd1080)
	github.com/syndtr/[email protected]/leveldb/iterator/indexed_iter.go:39 +0x41
github.com/syndtr/goleveldb/leveldb/iterator.(*indexedIterator).First(0xcc32cd1080, 0xcf1cee4060)
	github.com/syndtr/[email protected]/leveldb/iterator/indexed_iter.go:88 +0xfe
github.com/syndtr/goleveldb/leveldb/iterator.(*mergedIterator).First(0xca950b5a00, 0xc000197560)
	github.com/syndtr/[email protected]/leveldb/iterator/merged_iter.go:72 +0x8a
github.com/syndtr/goleveldb/leveldb/iterator.(*mergedIterator).Next(0xca950b5a00, 0x15ae420)
	github.com/syndtr/[email protected]/leveldb/iterator/merged_iter.go:157 +0x322
github.com/syndtr/goleveldb/leveldb.(*tableCompactionBuilder).run(0xc000234640, 0xc00019d170, 0x0, 0x0)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:442 +0x31a
github.com/syndtr/goleveldb/leveldb.(*DB).compactionTransact(0xc000282000, 0x1320774, 0xb, 0x15895a0, 0xc000234640)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:186 +0x181
github.com/syndtr/goleveldb/leveldb.(*DB).tableCompaction(0xc000282000, 0xc000160000, 0xc000120100)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:580 +0x63f
github.com/syndtr/goleveldb/leveldb.(*DB).tableAutoCompaction(0xc000282000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:644 +0x54
github.com/syndtr/goleveldb/leveldb.(*DB).tCompaction(0xc000282000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:863 +0x2ef
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:155 +0x582

goroutine 93 [select]:
github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).drain(0xc000137340)
	github.com/syndtr/[email protected]/leveldb/util/buffer_pool.go:209 +0x134
created by github.com/syndtr/goleveldb/leveldb/util.NewBufferPool
	github.com/syndtr/[email protected]/leveldb/util/buffer_pool.go:240 +0x176

goroutine 94 [select, 6 minutes]:
github.com/syndtr/goleveldb/leveldb.(*session).refLoop(0xc2a5d13ef0)
	github.com/syndtr/[email protected]/leveldb/session_util.go:189 +0x5f9
created by github.com/syndtr/goleveldb/leveldb.newSession
	github.com/syndtr/[email protected]/leveldb/session.go:93 +0x2b4

goroutine 95 [select, 24 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).compactionError(0xc0000d2000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:91 +0xcd
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:148 +0x40c

goroutine 96 [select, 2 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).mpoolDrain(0xc0000d2000)
	github.com/syndtr/[email protected]/leveldb/db_state.go:101 +0xf6
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:149 +0x42e

goroutine 97 [select, 24 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).tCompaction(0xc0000d2000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:836 +0x25d
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:155 +0x582

goroutine 98 [select, 24 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).mCompaction(0xc0000d2000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:773 +0x150
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:156 +0x5a4

goroutine 279 [semacquire, 24 minutes]:
sync.runtime_Semacquire(0xc000282188)
	runtime/sema.go:56 +0x42
sync.(*WaitGroup).Wait(0xc000282180)
	sync/waitgroup.go:130 +0x64
github.com/syndtr/goleveldb/leveldb.(*DB).Close(0xc000282000, 0xc2be3b3d80, 0xc2be1c6060)
	github.com/syndtr/[email protected]/leveldb/db.go:1175 +0x1ad
github.com/ethereum/go-ethereum/ethdb/leveldb.(*Database).Close(0xc000218100, 0x0, 0x0)
	github.com/ethereum/go-ethereum@/ethdb/leveldb/leveldb.go:150 +0xb1
github.com/ethereum/go-ethereum/core/rawdb.(*freezerdb).Close(0xc0001f6420, 0xc0004db740, 0xc0001f6440)
	github.com/ethereum/go-ethereum@/core/rawdb/database.go:48 +0x8a
github.com/ethereum/go-ethereum/node.(*closeTrackingDB).Close(0xc0001f6440, 0x0, 0x0)
	github.com/ethereum/go-ethereum@/node/node.go:606 +0x90
github.com/ethereum/go-ethereum/eth.(*Ethereum).Stop(0xc000136c40, 0xc2be1a1700, 0x206d800)
	github.com/ethereum/go-ethereum@/eth/backend.go:546 +0xcc
github.com/ethereum/go-ethereum/node.(*Node).stopServices(0xc0000a5040, 0xc292fd6d50, 0x1, 0x1, 0x0, 0x0)
	github.com/ethereum/go-ethereum@/node/node.go:282 +0xbd
github.com/ethereum/go-ethereum/node.(*Node).Close(0xc0000a5040, 0x0, 0x0)
	github.com/ethereum/go-ethereum@/node/node.go:204 +0x13f
created by github.com/ethereum/go-ethereum/cmd/utils.StartNode.func1
	github.com/ethereum/go-ethereum@/cmd/utils/cmd.go:76 +0x184
@holiman
Copy link
Contributor Author

holiman commented Nov 19, 2020

cc @rjl493456442 any ideas?

@rjl493456442
Copy link
Member

Regarding the fdlimit, I'm not sure it's Geth doesn't respect the limitation, but we do give a very high number for it

// makeDatabaseHandles raises out the number of allowed file handles per process
// for Geth and returns half of the allowance to assign to the database.
func makeDatabaseHandles() int {
	limit, err := fdlimit.Maximum()
	if err != nil {
		Fatalf("Failed to retrieve file descriptor allowance: %v", err)
	}
	raised, err := fdlimit.Raise(uint64(limit))
	if err != nil {
		Fatalf("Failed to raise file descriptor allowance: %v", err)
	}
	return int(raised / 2) // Leave half for networking and other stuff
}

@rjl493456442
Copy link
Member

For the leveldb, it will maintain all the handlers for all "referenced" sstables. So it's the reason why we open 3m files because the ldb has 3m files.

First, should we set a hardcap for the openfile cache? The FdCache is really useful for READ, because it caches the metadata and can prevent to open the files over and over again. While the downside is during the shutdown we have to close 3M files. I guess it will take a long time.

@rjl493456442
Copy link
Member

https://github.com/syndtr/goleveldb/blob/master/leveldb/db.go#L1175

Obviously, it's waiting for all the background threads, (e.g. close the fd cache)

@holiman
Copy link
Contributor Author

holiman commented Nov 20, 2020

More info

WARN [11-19|11:37:53.616] Sanitizing cache to Go's GC limits       provided=16384 updated=10721
DEBUG[11-19|11:37:53.616] Sanitizing Go's GC trigger               percent=20
INFO [11-19|11:37:53.620] Maximum peer count                       ETH=50 LES=0 total=50
INFO [11-19|11:37:53.620] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
DEBUG[11-19|11:37:53.621] FS scan times                            list="76.036µs" set="3.839µs" diff="4.2µs"
INFO [11-19|11:37:53.623] Set global gas cap                       cap=25000000
INFO [11-19|11:37:53.623] Allocated trie memory caches             clean=5.23GiB dirty=0.00B
INFO [11-19|11:37:53.623] Allocated cache and file handles         database=/home/archive/geth_data/geth/chaindata cache=5.23GiB handles=524288

That message is printed in ethdb/leveldb/leveldb.go, so 524288 is what we give leveldb as the OpenFilesCacheCapacity, which is thus ignored. Which seems like a bug, to me

@holiman
Copy link
Contributor Author

holiman commented Nov 20, 2020

Some more info... The file descriptors grows until the OS won't give any more. That does not cause leveldb to crash, however, but the memory consumption grows to ~24Gb

The initial boot said:

WARN [11-20|10:00:56.198] Sanitizing cache to Go's GC limits       provided=16384 updated=10721

So we're apparently at 2.4x the memory that we gave it

Open ldb files:
1048548
MB memory used:
22685
Open ldb files:
1048548
MB memory used:
23677
Open ldb files:
1048548
MB memory used:
24568
Open ldb files:
1048548
MB memory used:
24887
Open ldb files:
1048548
MB memory used:
24919
Open ldb files:
1048548
MB memory used:
24831

script:

pid=6861
while :
do
  echo "Open ldb files:"
  lsof -p $pid | grep ldb | wc -l
  echo "MB memory used:"
  ps -o rss= $pid | awk '{printf "%.0f\n", $1 / 1024}'
done

Again: it has not imported a single block, just start + stop, and the shutdown sequence has been going on for 15 minutes at least now

@holiman
Copy link
Contributor Author

holiman commented Nov 20, 2020

Shutdown finished after 18m (not OOM-reaped this time)

So then the question might be if we can prevent the memory blowup during compaction?

@holiman
Copy link
Contributor Author

holiman commented Nov 20, 2020

On the archive node:

archive@archive:~$ iostat -d -x
Linux 5.4.0-53-generic (archive) 	11/20/20 	_x86_64_	(4 CPU)

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
fd0              0.00      0.00     0.00   0.00 12361.80     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop0            0.00      0.01     0.00   0.00    0.32     9.04    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop1            0.02      0.03     0.00   0.00    0.41     1.38    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop2            0.00      0.03     0.00   0.00    0.73    21.57    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop3            0.00      0.03     0.00   0.00    0.53    18.06    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop4            0.00      0.01     0.00   0.00    0.50     9.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop5            0.00      0.01     0.00   0.00    1.34     4.95    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop6            0.00      0.00     0.00   0.00    0.00     1.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
md0            124.89   1135.50     0.00   0.00    0.00     9.09   27.65    139.54     0.00   0.00    0.00     5.05    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda              6.34     60.88     1.71  21.22    1.00     9.60    0.69     22.05     4.90  87.73    8.82    32.15    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.10
sdb             10.11     73.28     1.71  14.46    0.69     7.25    0.65     21.64     4.84  88.13    9.02    33.22    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.48
sdc             42.11    410.92     1.89   4.30    0.33     9.76    0.73     22.66     5.02  87.36    9.03    31.22    0.00      0.00     0.00   0.00    0.00     0.00    0.01   3.17
sdd              4.04     43.34     1.87  31.70    1.53    10.74    0.71     22.58     5.01  87.56    9.13    31.72    0.00      0.00     0.00   0.00    0.00     0.00    0.01   0.86
sde             40.32    399.20     1.68   4.00    0.32     9.90    0.65     22.20     4.98  88.49    8.88    34.26    0.00      0.00     0.00   0.00    0.00     0.00    0.01   3.12
sdf              6.43     63.07     1.81  21.99    1.00     9.81    0.69     22.59     5.03  87.88    8.92    32.56    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.13
sdg             10.11     73.43     1.68  14.26    0.70     7.27    0.65     22.09     4.95  88.39    9.06    33.97    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.50
sdh              3.87     41.22     1.71  30.66    1.71    10.66    0.68     22.22     4.96  88.00    8.92    32.86    0.00      0.00     0.00   0.00    0.00     0.00    0.01   0.88

Compared with #21648 (comment) :

on our aws node, rkB/s is 8712:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1        735.15   82.32   8712.76   8506.39     3.10    53.63   0.42  39.45    0.14    1.64   0.08    11.85   103.34   0.39  31.69

archive node, rKB/s is 1135:

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md0            124.89   1135.50     0.00   0.00    0.00     9.09   27.65    139.54     0.00   0.00    0.00     5.05    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00

@karalabe karalabe removed the area:db label May 3, 2023
@holiman
Copy link
Contributor Author

holiman commented Oct 11, 2023

Closing this; leveldb is on the way out and no longer the default

@holiman holiman closed this as completed Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants