db ignores files handle limit (during shutdown) #21877

holiman · 2020-11-19T12:36:14Z

This happens while shutting down the node, as the database is being closed:

archive@archive:~$ lsof -p 18496 > lsof.1
archive@archive:~$ lsof -p 18496 > lsof.2
archive@archive:~$ lsof -p 18496 > lsof.3
archive@archive:~$ wc -l lsof.1
298017 lsof.1
archive@archive:~$ wc -l lsof.2
321139 lsof.2
archive@archive:~$ wc -l lsof.3
1000227 lsof.3

The db keeps opening ldb files:

archive@archive:~$ cat lsof.1 | grep ".ldb" | wc -l
297967
archive@archive:~$ cat lsof.2 | grep ".ldb" | wc -l
321100
archive@archive:~$ cat lsof.3 | grep ".ldb" | wc -l
1000184

To clarify: despite the geth file handles having been limited to ~500K, the process has 1M+ file handles opened!

This is during shutdown, so there's no block processing happening. There hasn't even been any, because I told it to shut down before it even started getting any blocks to import.

Tangential about networking

Eventually, this causes networking to fail, which incidentally
explains why ssh sessions become torn down (some other ticket).

It also causes a "temporary error" in p2p to happen, which causes a spin-loop:

archive@archive:~$ cat dump2.log | grep "Temporary read error" | wc -l
12832728

I've added a change to fix the spin-loop effect (on https://github.com/holiman/go-ethereum/tree/archivefix, commit fd53776).

Back the the root problem

Essentially the problem is:

If the database is huge (archive node), closing the db seems to have the effect of
triggering something (compaction?) which does not respect the given limit on handles,
and which eventually causes OOM.

Actual number of ldb-files are close to 3M:

archive@archive:~$ ls -l ~/geth_data/geth/chaindata/ | wc -l
2923415

I eventually killed it. Probably related parts of the stack:

goroutine 40 [runnable]:
github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).Put(0xc000136fc0, 0xcf1ce4f400, 0x11ef, 0x200a)
	github.com/syndtr/[email protected]/leveldb/util/buffer_pool.go:158 +0x184
github.com/syndtr/goleveldb/leveldb/table.(*Reader).readRawBlock(0xcc32ce6f70, 0x0, 0x11ea, 0xdc91e48a1ce63501, 0x8, 0xc022323198, 0x713402, 0x0, 0x0)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:589 +0x551
github.com/syndtr/goleveldb/leveldb/table.(*Reader).readBlock(0xcc32ce6f70, 0x0, 0x11ea, 0x1, 0x0, 0x71bb63, 0x12ed060)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:603 +0x55
github.com/syndtr/goleveldb/leveldb/table.(*Reader).readBlockCached(0xcc32ce6f70, 0x0, 0x11ea, 0xccea270001, 0xccea278090, 0xc022323438, 0x7183f0, 0xccea278090, 0x44ac6a)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:648 +0x1fe
github.com/syndtr/goleveldb/leveldb/table.(*Reader).getDataIter(0xcc32ce6f70, 0x0, 0x11ea, 0x0, 0x1, 0x0, 0x3)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:765 +0x5b
github.com/syndtr/goleveldb/leveldb/table.(*Reader).getDataIterErr(0xcc32ce6f70, 0x0, 0x11ea, 0x0, 0x1, 0x0, 0x0)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:780 +0x160
github.com/syndtr/goleveldb/leveldb/table.(*indexIter).Get(0xcc32c76a20, 0xcc32d3e5a0, 0x709501)
	github.com/syndtr/[email protected]/leveldb/table/reader.go:507 +0x2b0
github.com/syndtr/goleveldb/leveldb/iterator.(*indexedIterator).setData(0xcc32cd1080)
	github.com/syndtr/[email protected]/leveldb/iterator/indexed_iter.go:39 +0x41
github.com/syndtr/goleveldb/leveldb/iterator.(*indexedIterator).First(0xcc32cd1080, 0xcf1cee4060)
	github.com/syndtr/[email protected]/leveldb/iterator/indexed_iter.go:88 +0xfe
github.com/syndtr/goleveldb/leveldb/iterator.(*mergedIterator).First(0xca950b5a00, 0xc000197560)
	github.com/syndtr/[email protected]/leveldb/iterator/merged_iter.go:72 +0x8a
github.com/syndtr/goleveldb/leveldb/iterator.(*mergedIterator).Next(0xca950b5a00, 0x15ae420)
	github.com/syndtr/[email protected]/leveldb/iterator/merged_iter.go:157 +0x322
github.com/syndtr/goleveldb/leveldb.(*tableCompactionBuilder).run(0xc000234640, 0xc00019d170, 0x0, 0x0)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:442 +0x31a
github.com/syndtr/goleveldb/leveldb.(*DB).compactionTransact(0xc000282000, 0x1320774, 0xb, 0x15895a0, 0xc000234640)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:186 +0x181
github.com/syndtr/goleveldb/leveldb.(*DB).tableCompaction(0xc000282000, 0xc000160000, 0xc000120100)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:580 +0x63f
github.com/syndtr/goleveldb/leveldb.(*DB).tableAutoCompaction(0xc000282000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:644 +0x54
github.com/syndtr/goleveldb/leveldb.(*DB).tCompaction(0xc000282000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:863 +0x2ef
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:155 +0x582

goroutine 93 [select]:
github.com/syndtr/goleveldb/leveldb/util.(*BufferPool).drain(0xc000137340)
	github.com/syndtr/[email protected]/leveldb/util/buffer_pool.go:209 +0x134
created by github.com/syndtr/goleveldb/leveldb/util.NewBufferPool
	github.com/syndtr/[email protected]/leveldb/util/buffer_pool.go:240 +0x176

goroutine 94 [select, 6 minutes]:
github.com/syndtr/goleveldb/leveldb.(*session).refLoop(0xc2a5d13ef0)
	github.com/syndtr/[email protected]/leveldb/session_util.go:189 +0x5f9
created by github.com/syndtr/goleveldb/leveldb.newSession
	github.com/syndtr/[email protected]/leveldb/session.go:93 +0x2b4

goroutine 95 [select, 24 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).compactionError(0xc0000d2000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:91 +0xcd
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:148 +0x40c

goroutine 96 [select, 2 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).mpoolDrain(0xc0000d2000)
	github.com/syndtr/[email protected]/leveldb/db_state.go:101 +0xf6
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:149 +0x42e

goroutine 97 [select, 24 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).tCompaction(0xc0000d2000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:836 +0x25d
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:155 +0x582

goroutine 98 [select, 24 minutes]:
github.com/syndtr/goleveldb/leveldb.(*DB).mCompaction(0xc0000d2000)
	github.com/syndtr/[email protected]/leveldb/db_compaction.go:773 +0x150
created by github.com/syndtr/goleveldb/leveldb.openDB
	github.com/syndtr/[email protected]/leveldb/db.go:156 +0x5a4

goroutine 279 [semacquire, 24 minutes]:
sync.runtime_Semacquire(0xc000282188)
	runtime/sema.go:56 +0x42
sync.(*WaitGroup).Wait(0xc000282180)
	sync/waitgroup.go:130 +0x64
github.com/syndtr/goleveldb/leveldb.(*DB).Close(0xc000282000, 0xc2be3b3d80, 0xc2be1c6060)
	github.com/syndtr/[email protected]/leveldb/db.go:1175 +0x1ad
github.com/ethereum/go-ethereum/ethdb/leveldb.(*Database).Close(0xc000218100, 0x0, 0x0)
	github.com/ethereum/go-ethereum@/ethdb/leveldb/leveldb.go:150 +0xb1
github.com/ethereum/go-ethereum/core/rawdb.(*freezerdb).Close(0xc0001f6420, 0xc0004db740, 0xc0001f6440)
	github.com/ethereum/go-ethereum@/core/rawdb/database.go:48 +0x8a
github.com/ethereum/go-ethereum/node.(*closeTrackingDB).Close(0xc0001f6440, 0x0, 0x0)
	github.com/ethereum/go-ethereum@/node/node.go:606 +0x90
github.com/ethereum/go-ethereum/eth.(*Ethereum).Stop(0xc000136c40, 0xc2be1a1700, 0x206d800)
	github.com/ethereum/go-ethereum@/eth/backend.go:546 +0xcc
github.com/ethereum/go-ethereum/node.(*Node).stopServices(0xc0000a5040, 0xc292fd6d50, 0x1, 0x1, 0x0, 0x0)
	github.com/ethereum/go-ethereum@/node/node.go:282 +0xbd
github.com/ethereum/go-ethereum/node.(*Node).Close(0xc0000a5040, 0x0, 0x0)
	github.com/ethereum/go-ethereum@/node/node.go:204 +0x13f
created by github.com/ethereum/go-ethereum/cmd/utils.StartNode.func1
	github.com/ethereum/go-ethereum@/cmd/utils/cmd.go:76 +0x184

The text was updated successfully, but these errors were encountered:

holiman · 2020-11-19T14:05:48Z

cc @rjl493456442 any ideas?

rjl493456442 · 2020-11-20T05:14:57Z

Regarding the fdlimit, I'm not sure it's Geth doesn't respect the limitation, but we do give a very high number for it

// makeDatabaseHandles raises out the number of allowed file handles per process
// for Geth and returns half of the allowance to assign to the database.
func makeDatabaseHandles() int {
	limit, err := fdlimit.Maximum()
	if err != nil {
		Fatalf("Failed to retrieve file descriptor allowance: %v", err)
	}
	raised, err := fdlimit.Raise(uint64(limit))
	if err != nil {
		Fatalf("Failed to raise file descriptor allowance: %v", err)
	}
	return int(raised / 2) // Leave half for networking and other stuff
}

rjl493456442 · 2020-11-20T05:15:20Z

For the leveldb, it will maintain all the handlers for all "referenced" sstables. So it's the reason why we open 3m files because the ldb has 3m files.

First, should we set a hardcap for the openfile cache? The FdCache is really useful for READ, because it caches the metadata and can prevent to open the files over and over again. While the downside is during the shutdown we have to close 3M files. I guess it will take a long time.

rjl493456442 · 2020-11-20T05:15:26Z

https://github.com/syndtr/goleveldb/blob/master/leveldb/db.go#L1175

Obviously, it's waiting for all the background threads, (e.g. close the fd cache)

holiman · 2020-11-20T07:44:52Z

More info

WARN [11-19|11:37:53.616] Sanitizing cache to Go's GC limits       provided=16384 updated=10721
DEBUG[11-19|11:37:53.616] Sanitizing Go's GC trigger               percent=20
INFO [11-19|11:37:53.620] Maximum peer count                       ETH=50 LES=0 total=50
INFO [11-19|11:37:53.620] Smartcard socket not found, disabling    err="stat /run/pcscd/pcscd.comm: no such file or directory"
DEBUG[11-19|11:37:53.621] FS scan times                            list="76.036µs" set="3.839µs" diff="4.2µs"
INFO [11-19|11:37:53.623] Set global gas cap                       cap=25000000
INFO [11-19|11:37:53.623] Allocated trie memory caches             clean=5.23GiB dirty=0.00B
INFO [11-19|11:37:53.623] Allocated cache and file handles         database=/home/archive/geth_data/geth/chaindata cache=5.23GiB handles=524288

That message is printed in ethdb/leveldb/leveldb.go, so 524288 is what we give leveldb as the OpenFilesCacheCapacity, which is thus ignored. Which seems like a bug, to me

holiman · 2020-11-20T10:22:56Z

Some more info... The file descriptors grows until the OS won't give any more. That does not cause leveldb to crash, however, but the memory consumption grows to ~24Gb

The initial boot said:

WARN [11-20|10:00:56.198] Sanitizing cache to Go's GC limits       provided=16384 updated=10721

So we're apparently at 2.4x the memory that we gave it

Open ldb files:
1048548
MB memory used:
22685
Open ldb files:
1048548
MB memory used:
23677
Open ldb files:
1048548
MB memory used:
24568
Open ldb files:
1048548
MB memory used:
24887
Open ldb files:
1048548
MB memory used:
24919
Open ldb files:
1048548
MB memory used:
24831

script:

pid=6861
while :
do
  echo "Open ldb files:"
  lsof -p $pid | grep ldb | wc -l
  echo "MB memory used:"
  ps -o rss= $pid | awk '{printf "%.0f\n", $1 / 1024}'
done

Again: it has not imported a single block, just start + stop, and the shutdown sequence has been going on for 15 minutes at least now

holiman · 2020-11-20T10:26:36Z

Shutdown finished after 18m (not OOM-reaped this time)

So then the question might be if we can prevent the memory blowup during compaction?

holiman · 2020-11-20T10:40:59Z

On the archive node:

archive@archive:~$ iostat -d -x
Linux 5.4.0-53-generic (archive) 	11/20/20 	_x86_64_	(4 CPU)

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
fd0              0.00      0.00     0.00   0.00 12361.80     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop0            0.00      0.01     0.00   0.00    0.32     9.04    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop1            0.02      0.03     0.00   0.00    0.41     1.38    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop2            0.00      0.03     0.00   0.00    0.73    21.57    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop3            0.00      0.03     0.00   0.00    0.53    18.06    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop4            0.00      0.01     0.00   0.00    0.50     9.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop5            0.00      0.01     0.00   0.00    1.34     4.95    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
loop6            0.00      0.00     0.00   0.00    0.00     1.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
md0            124.89   1135.50     0.00   0.00    0.00     9.09   27.65    139.54     0.00   0.00    0.00     5.05    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda              6.34     60.88     1.71  21.22    1.00     9.60    0.69     22.05     4.90  87.73    8.82    32.15    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.10
sdb             10.11     73.28     1.71  14.46    0.69     7.25    0.65     21.64     4.84  88.13    9.02    33.22    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.48
sdc             42.11    410.92     1.89   4.30    0.33     9.76    0.73     22.66     5.02  87.36    9.03    31.22    0.00      0.00     0.00   0.00    0.00     0.00    0.01   3.17
sdd              4.04     43.34     1.87  31.70    1.53    10.74    0.71     22.58     5.01  87.56    9.13    31.72    0.00      0.00     0.00   0.00    0.00     0.00    0.01   0.86
sde             40.32    399.20     1.68   4.00    0.32     9.90    0.65     22.20     4.98  88.49    8.88    34.26    0.00      0.00     0.00   0.00    0.00     0.00    0.01   3.12
sdf              6.43     63.07     1.81  21.99    1.00     9.81    0.69     22.59     5.03  87.88    8.92    32.56    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.13
sdg             10.11     73.43     1.68  14.26    0.70     7.27    0.65     22.09     4.95  88.39    9.06    33.97    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.50
sdh              3.87     41.22     1.71  30.66    1.71    10.66    0.68     22.22     4.96  88.00    8.92    32.86    0.00      0.00     0.00   0.00    0.00     0.00    0.01   0.88

Compared with #21648 (comment) :

on our aws node, rkB/s is 8712:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
nvme0n1        735.15   82.32   8712.76   8506.39     3.10    53.63   0.42  39.45    0.14    1.64   0.08    11.85   103.34   0.39  31.69

archive node, rKB/s is 1135:

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md0            124.89   1135.50     0.00   0.00    0.00     9.09   27.65    139.54     0.00   0.00    0.00     5.05    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00

holiman · 2023-10-11T12:50:59Z

Closing this; leveldb is on the way out and no longer the default

holiman mentioned this issue Nov 19, 2020

p2p: avoid spinning loop on out-of-handles #21878

Merged

ligi added type:bug area:db labels Nov 19, 2020

karalabe removed the area:db label May 3, 2023

holiman closed this as completed Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db ignores files handle limit (during shutdown) #21877

db ignores files handle limit (during shutdown) #21877

holiman commented Nov 19, 2020 •

edited

Loading

holiman commented Nov 19, 2020

rjl493456442 commented Nov 20, 2020

rjl493456442 commented Nov 20, 2020

rjl493456442 commented Nov 20, 2020

holiman commented Nov 20, 2020

holiman commented Nov 20, 2020 •

edited

Loading

holiman commented Nov 20, 2020

holiman commented Nov 20, 2020

holiman commented Oct 11, 2023

db ignores files handle limit (during shutdown) #21877

db ignores files handle limit (during shutdown) #21877

Comments

holiman commented Nov 19, 2020 • edited Loading

Tangential about networking

Back the the root problem

holiman commented Nov 19, 2020

rjl493456442 commented Nov 20, 2020

rjl493456442 commented Nov 20, 2020

rjl493456442 commented Nov 20, 2020

holiman commented Nov 20, 2020

holiman commented Nov 20, 2020 • edited Loading

holiman commented Nov 20, 2020

holiman commented Nov 20, 2020

holiman commented Oct 11, 2023

holiman commented Nov 19, 2020 •

edited

Loading

holiman commented Nov 20, 2020 •

edited

Loading