Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression 1.6 to 2.0.2 #1254

Closed
dougdonohoe opened this issue Mar 9, 2020 · 30 comments
Closed

Performance regression 1.6 to 2.0.2 #1254

dougdonohoe opened this issue Mar 9, 2020 · 30 comments
Assignees
Labels
area/performance Performance related issues. kind/enhancement Something could be better. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate or work on it.

Comments

@dougdonohoe
Copy link

What version of Go are you using (go version)?

go version go1.12.7 darwin/amd64

What version of Badger are you using?

2.0.2 (upgrading from 1.6.0)

Does this issue reproduce with the latest master?

Haven't tried.

What are the hardware specifications of the machine (RAM, OS, Disk)?

GCP 8 CPU (Intel Haswell), 32 GB RAM, 750 GB local ssd

What did you do?

Running code which extracts data from Kafka and saves to Badger DB. I'm running on exact same hardware, disk and my code against exact same Kafka topic.

What did you expect to see?

Better or equal performance with Badger 2.

What did you see instead?

Severe slowdown after writing ~1,461,000 records. See below

1.6.0 performance:

Performance in 1.6.0 takes about 300-400ms to extract 1000 messages.

  Up to offset 1453000 Time[330ms] Events[1453001] UrisCreated[1975] PathsCreated[0] Bytes[11.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T07:35:07.000] EstTimeToFinish[4h17m58s]
  Up to offset 1454000 Time[360ms] Events[1454001] UrisCreated[1954] PathsCreated[0] Bytes[11.2 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T11:31:43.000] EstTimeToFinish[4h18m1s]
  Up to offset 1455000 Time[340ms] Events[1455001] UrisCreated[1969] PathsCreated[0] Bytes[11.0 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T15:33:31.000] EstTimeToFinish[4h18m4s]
  Up to offset 1456000 Time[360ms] Events[1456001] UrisCreated[1789] PathsCreated[0] Bytes[13.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T20:46:14.000] EstTimeToFinish[4h18m7s]
  Up to offset 1457000 Time[320ms] Events[1457001] UrisCreated[1720] PathsCreated[0] Bytes[13.0 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T06:56:07.000] EstTimeToFinish[4h18m9s]
  Up to offset 1458000 Time[300ms] Events[1458001] UrisCreated[1736] PathsCreated[1] Bytes[10.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T18:40:17.000] EstTimeToFinish[4h18m9s]
badger 2020/02/17 15:10:15 DEBUG: Flushing memtable, mt.size=194491818 size of flushChan: 0
badger 2020/02/17 15:10:15 DEBUG: Storing value log head: {Fid:1 Len:45 Offset:87078740}
  Up to offset 1459000 Time[380ms] Events[1459001] UrisCreated[2140] PathsCreated[0] Bytes[11.4 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T21:04:18.000] EstTimeToFinish[4h18m13s]
  Up to offset 1460000 Time[370ms] Events[1460001] UrisCreated[1776] PathsCreated[0] Bytes[10.4 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T00:02:01.000] EstTimeToFinish[4h18m17s]
badger 2020/02/17 15:10:16 DEBUG: Flushing memtable, mt.size=119942867 size of flushChan: 0
badger 2020/02/17 15:10:16 DEBUG: Storing value log head: {Fid:1 Len:45 Offset:87168065}
  Up to offset 1461000 Time[430ms] Events[1461001] UrisCreated[1753] PathsCreated[0] Bytes[10.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T06:01:21.000] EstTimeToFinish[4h18m23s]
  Up to offset 1462000 Time[370ms] Events[1462001] UrisCreated[1779] PathsCreated[0] Bytes[10.5 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T16:45:03.000] EstTimeToFinish[4h18m26s]
  Up to offset 1463000 Time[360ms] Events[1463001] UrisCreated[1735] PathsCreated[0] Bytes[11.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T20:10:04.000] EstTimeToFinish[4h18m29s]
  Up to offset 1464000 Time[370ms] Events[1464001] UrisCreated[1664] PathsCreated[0] Bytes[10.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T23:03:44.000] EstTimeToFinish[4h18m33s]
  Up to offset 1465000 Time[350ms] Events[1465001] UrisCreated[1732] PathsCreated[0] Bytes[10.2 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T02:38:13.000] EstTimeToFinish[4h18m35s]
  Up to offset 1466000 Time[380ms] Events[1466001] UrisCreated[1825] PathsCreated[0] Bytes[10.6 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T06:12:39.000] EstTimeToFinish[4h18m39s]
  Up to offset 1467000 Time[360ms] Events[1467001] UrisCreated[1868] PathsCreated[0] Bytes[11.1 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T10:08:51.000] EstTimeToFinish[4h18m42s]
  Up to offset 1468000 Time[380ms] Events[1468001] UrisCreated[1920] PathsCreated[1] Bytes[11.3 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T13:54:45.000] EstTimeToFinish[4h18m46s]
  Up to offset 1469000 Time[350ms] Events[1469001] UrisCreated[1875] PathsCreated[0] Bytes[11.5 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T17:20:47.000] EstTimeToFinish[4h18m48s]
  Up to offset 1470000 Time[350ms] Events[1470001] UrisCreated[1767] PathsCreated[0] Bytes[11.3 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T20:41:05.000] EstTimeToFinish[4h18m51s]
  Up to offset 1471000 Time[340ms] Events[1471001] UrisCreated[1768] PathsCreated[0] Bytes[10.8 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T23:51:59.000] EstTimeToFinish[4h18m54s]
  Up to offset 1472000 Time[370ms] Events[1472001] UrisCreated[1758] PathsCreated[0] Bytes[10.8 MiB] TotalBytes[11.9 GiB] Date[2014-11-14T03:28:45.000] EstTimeToFinish[4h18m57s]

2.0.2 performance:

Notice that at approximately offset 1462000 (1,462,000 records), things start slowing down from a rate of 300-400ms per 1,000 records to 25-30 seconds per 1,000 records! It happens after the very first Flushing memtable debug message. If you look above, the Flushing happens at the exact same place, but things continue speedily after.

  Up to offset 1453000 Time[360ms] Events[1453001] UrisCreated[1975] PathsCreated[0] Bytes[11.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T07:35:07.000] EstTimeToFinish[4h19m33s]
  Up to offset 1454000 Time[330ms] Events[1454001] UrisCreated[1954] PathsCreated[0] Bytes[11.2 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T11:31:43.000] EstTimeToFinish[4h19m35s]
  Up to offset 1455000 Time[380ms] Events[1455001] UrisCreated[1969] PathsCreated[0] Bytes[11.0 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T15:33:31.000] EstTimeToFinish[4h19m39s]
  Up to offset 1456000 Time[320ms] Events[1456001] UrisCreated[1789] PathsCreated[0] Bytes[13.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-10T20:46:14.000] EstTimeToFinish[4h19m41s]
  Up to offset 1457000 Time[340ms] Events[1457001] UrisCreated[1720] PathsCreated[0] Bytes[13.0 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T06:56:07.000] EstTimeToFinish[4h19m43s]
  Up to offset 1458000 Time[310ms] Events[1458001] UrisCreated[1736] PathsCreated[1] Bytes[10.3 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T18:40:17.000] EstTimeToFinish[4h19m44s]
badger 2020/03/09 17:36:39 DEBUG: Flushing memtable, mt.size=194487650 size of flushChan: 0
badger 2020/03/09 17:36:39 DEBUG: Storing value log head: {Fid:1 Len:32 Offset:74078864}
  Up to offset 1459000 Time[680ms] Events[1459001] UrisCreated[2140] PathsCreated[0] Bytes[11.4 MiB] TotalBytes[11.7 GiB] Date[2014-11-11T21:04:18.000] EstTimeToFinish[4h20m0s]
  Up to offset 1460000 Time[500ms] Events[1460001] UrisCreated[1776] PathsCreated[0] Bytes[10.4 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T00:02:01.000] EstTimeToFinish[4h20m8s]
badger 2020/03/09 17:36:40 DEBUG: Flushing memtable, mt.size=119942767 size of flushChan: 0
badger 2020/03/09 17:36:40 DEBUG: Storing value log head: {Fid:1 Len:32 Offset:74168111}
  Up to offset 1461000 Time[430ms] Events[1461001] UrisCreated[1753] PathsCreated[0] Bytes[10.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T06:01:21.000] EstTimeToFinish[4h20m14s]
  Up to offset 1462000 Time[4.74s] Events[1462001] UrisCreated[1779] PathsCreated[0] Bytes[10.5 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T16:45:03.000] EstTimeToFinish[4h23m6s]
  Up to offset 1463000 Time[14.45s] Events[1463001] UrisCreated[1735] PathsCreated[0] Bytes[11.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T20:10:04.000] EstTimeToFinish[4h32m12s]
  Up to offset 1464000 Time[19.38s] Events[1464001] UrisCreated[1664] PathsCreated[0] Bytes[10.0 MiB] TotalBytes[11.8 GiB] Date[2014-11-12T23:03:44.000] EstTimeToFinish[4h44m27s]
  Up to offset 1465000 Time[24.52s] Events[1465001] UrisCreated[1732] PathsCreated[0] Bytes[10.2 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T02:38:13.000] EstTimeToFinish[4h59m59s]
  Up to offset 1466000 Time[27.25s] Events[1466001] UrisCreated[1825] PathsCreated[0] Bytes[10.6 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T06:12:39.000] EstTimeToFinish[5h17m15s]
  Up to offset 1467000 Time[31.8s] Events[1467001] UrisCreated[1868] PathsCreated[0] Bytes[11.1 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T10:08:51.000] EstTimeToFinish[5h37m24s]
  Up to offset 1468000 Time[32.87s] Events[1468001] UrisCreated[1920] PathsCreated[1] Bytes[11.3 MiB] TotalBytes[11.8 GiB] Date[2014-11-13T13:54:45.000] EstTimeToFinish[5h58m12s]
  Up to offset 1469000 Time[28.9s] Events[1469001] UrisCreated[1875] PathsCreated[0] Bytes[11.5 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T17:20:47.000] EstTimeToFinish[6h16m27s]
  Up to offset 1470000 Time[27.58s] Events[1470001] UrisCreated[1767] PathsCreated[0] Bytes[11.3 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T20:41:05.000] EstTimeToFinish[6h33m49s]
  Up to offset 1471000 Time[30.04s] Events[1471001] UrisCreated[1768] PathsCreated[0] Bytes[10.8 MiB] TotalBytes[11.9 GiB] Date[2014-11-13T23:51:59.000] EstTimeToFinish[6h52m44s]
  Up to offset 1472000 Time[34.09s] Events[1472001] UrisCreated[1758] PathsCreated[0] Bytes[10.8 MiB] TotalBytes[11.9 GiB] Date[2014-11-14T03:28:45.000] EstTimeToFinish[7h14m13s]

I tried the same with compression disabled and saw similar results. The options I'm using are DefaultOptions with the following tweaks:

	actualOpts := opts.
		WithMaxTableSize(256 << 20). // max size 256M
		WithSyncWrites(false).       // don't sync writes for faster performance
		WithCompression(options.None)

I literally just started on the 2.0 migration today. I'm running the same code I've been running for 6 months.

@dougdonohoe
Copy link
Author

dougdonohoe commented Mar 9, 2020

FYI, this is the code that is the subject of my GopherCon talk. The extract program shown above successfully extracts 57+ million records in under 4 hours using Badger 1.6, at a fairly consistent 200-400ms per 1000 (variability mostly due to size of messages coming in from Kafka).

@dougdonohoe
Copy link
Author

cpu-2.0.pdf

I profiled 5000 records.

@dougdonohoe
Copy link
Author

cpu-2.0-10k.pdf

Here's 10,000 records. Seems to be spending a crazy amount of time doing JSON.

@shekarm shekarm added area/performance Performance related issues. status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it labels Mar 10, 2020
@jarifibrahim
Copy link
Contributor

hey @dougdonohoe, the bloom filters stored in the table seem to take up most of the time in your profile (that's where the json comes from). Can you show me the output of https://godoc.org/github.com/dgraph-io/badger#DB.CacheMetrics ? The bloom filters are loaded on DB startup into the cache and read from the cache.

The default size of the cache is 1 GB so it should have enough space to hold a lot of bloom filters.

@dougdonohoe
Copy link
Author

When do you want me to fetch the metrics? After db startup, or during processing? It seems like things get slow after some processing?

Also, what is a bloom filter? Is this new in 2.0? Is it necessary?

@connorgorman
Copy link
Contributor

@jarifibrahim It's also worth noting when I ran the benchmarks in #1248 and set WithBloomFalsePositive(1), then the performance is much more similar to 1.6, which is the opposite of what I would expect 🤔. These two tickets are probably describing the same issue

@dougdonohoe
Copy link
Author

Here are stats at beginning and every 1000 records:

Resuming at 1488518
    Metrics Uri -> Id: hit: 7 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 285409351 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 9 hit-ratio: 0.78
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
Starting at offset 1488518 (stopping at 57861676)
  Up to offset 1489000 Time[320ms] Events[483] UrisCreated[843] PathsCreated[0] Bytes[4.9 MiB] TotalBytes[4.9 MiB] Date[2014-11-18T20:26:28.000] EstTimeToFinish[10h26m23s]
    Metrics Uri -> Id: hit: 8867 miss: 1939 keys-added: 112 keys-updated: 4 keys-evicted: 87 cost-added: 3969453119 cost-evicted: 2967954575 sets-dropped: 0 sets-rejected: 1824 gets-dropped: 0 gets-kept: 10688 gets-total: 10806 hit-ratio: 0.82
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
  Up to offset 1490000 Time[3.57s] Events[1483] UrisCreated[1692] PathsCreated[0] Bytes[10.1 MiB] TotalBytes[15.0 MiB] Date[2014-11-18T23:32:29.000] EstTimeToFinish[1d17h11m18s]
    Metrics Uri -> Id: hit: 26382 miss: 5513 keys-added: 621 keys-updated: 44 keys-evicted: 590 cost-added: 23562448981 cost-evicted: 22560280233 sets-dropped: 0 sets-rejected: 4849 gets-dropped: 0 gets-kept: 31680 gets-total: 31895 hit-ratio: 0.83
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
  Up to offset 1491000 Time[11.23s] Events[2483] UrisCreated[1759] PathsCreated[0] Bytes[10.0 MiB] TotalBytes[25.1 MiB] Date[2014-11-19T03:08:05.000] EstTimeToFinish[3d23h28m18s]
    Metrics Uri -> Id: hit: 44444 miss: 9875 keys-added: 2393 keys-updated: 212 keys-evicted: 2364 cost-added: 97537818522 cost-evicted: 96529564427 sets-dropped: 0 sets-rejected: 7271 gets-dropped: 0 gets-kept: 53952 gets-total: 54319 hit-ratio: 0.82
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
  Up to offset 1492000 Time[13.37s] Events[3483] UrisCreated[1794] PathsCreated[0] Bytes[10.4 MiB] TotalBytes[35.5 MiB] Date[2014-11-19T06:44:24.000] EstTimeToFinish[5d8h10m5s]
    Metrics Uri -> Id: hit: 62548 miss: 14569 keys-added: 4699 keys-updated: 436 keys-evicted: 4670 cost-added: 190059690588 cost-evicted: 189058857901 sets-dropped: 0 sets-rejected: 9431 gets-dropped: 0 gets-kept: 76544 gets-total: 77117 hit-ratio: 0.81
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
  Up to offset 1493000 Time[17.91s] Events[4483] UrisCreated[1703] PathsCreated[0] Bytes[10.9 MiB] TotalBytes[46.4 MiB] Date[2014-11-19T10:32:37.000] EstTimeToFinish[6d18h8m35s]
    Metrics Uri -> Id: hit: 80595 miss: 19733 keys-added: 7669 keys-updated: 681 keys-evicted: 7641 cost-added: 321878498560 cost-evicted: 320872996455 sets-dropped: 0 sets-rejected: 11384 gets-dropped: 0 gets-kept: 99648 gets-total: 100328 hit-ratio: 0.80
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
  Up to offset 1494000 Time[19.81s] Events[5483] UrisCreated[1832] PathsCreated[0] Bytes[11.5 MiB] TotalBytes[57.8 MiB] Date[2014-11-19T14:22:37.000] EstTimeToFinish[7d21h9m29s]
    Metrics Uri -> Id: hit: 99285 miss: 25263 keys-added: 11129 keys-updated: 991 keys-evicted: 11103 cost-added: 473194758795 cost-evicted: 472175812950 sets-dropped: 0 sets-rejected: 13144 gets-dropped: 0 gets-kept: 123712 gets-total: 124548 hit-ratio: 0.80
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
  Up to offset 1495000 Time[22.25s] Events[6483] UrisCreated[1704] PathsCreated[0] Bytes[10.3 MiB] TotalBytes[68.2 MiB] Date[2014-11-19T17:51:05.000] EstTimeToFinish[8d21h42m57s]
    Metrics Uri -> Id: hit: 116792 miss: 30802 keys-added: 14618 keys-updated: 1338 keys-evicted: 14592 cost-added: 641467305114 cost-evicted: 640625227204 sets-dropped: 0 sets-rejected: 14847 gets-dropped: 0 gets-kept: 146496 gets-total: 147594 hit-ratio: 0.79
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
  Up to offset 1496000 Time[21.75s] Events[7483] UrisCreated[1740] PathsCreated[0] Bytes[10.7 MiB] TotalBytes[78.8 MiB] Date[2014-11-19T21:10:39.000] EstTimeToFinish[9d14h40m30s]
    Metrics Uri -> Id: hit: 133648 miss: 35706 keys-added: 17810 keys-updated: 1656 keys-evicted: 17786 cost-added: 784596422933 cost-evicted: 783594107743 sets-dropped: 0 sets-rejected: 16241 gets-dropped: 0 gets-kept: 168064 gets-total: 169354 hit-ratio: 0.79
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
  Up to offset 1497000 Time[17.96s] Events[8483] UrisCreated[1605] PathsCreated[0] Bytes[9.7 MiB] TotalBytes[88.6 MiB] Date[2014-11-20T00:08:51.000] EstTimeToFinish[9d20h38m24s]
    Metrics Uri -> Id: hit: 148810 miss: 40094 keys-added: 20814 keys-updated: 1938 keys-evicted: 20788 cost-added: 927603233586 cost-evicted: 926599719379 sets-dropped: 0 sets-rejected: 17343 gets-dropped: 0 gets-kept: 187392 gets-total: 188904 hit-ratio: 0.79
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67
  Up to offset 1498000 Time[21.57s] Events[9483] UrisCreated[1731] PathsCreated[0] Bytes[10.3 MiB] TotalBytes[98.9 MiB] Date[2014-11-20T04:38:47.000] EstTimeToFinish[10d7h18m24s]
    Metrics Uri -> Id: hit: 165977 miss: 45144 keys-added: 24296 keys-updated: 2195 keys-evicted: 24273 cost-added: 1079730938686 cost-evicted: 1078713292082 sets-dropped: 0 sets-rejected: 18654 gets-dropped: 0 gets-kept: 209472 gets-total: 211121 hit-ratio: 0.79
    Metrics Id -> Uri: hit: 4 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 162555144 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 6 hit-ratio: 0.67

Note that I have two tables I'm listing metrics for.

  • "Uri -> Id" table maps a NYT URI to a unique Id (an integer generated via a sequence). Our URIs are something like nyt://article/c178e916-68b1-547e-aba6-2f48f9e15221.
  • "Id -> Uri" is a reverse lookup of Id back to Uri

In our application, these tables will have 23+ million entries. We have other tables that have 59+ million entries. I would be worried if the cache has to contain all keys to get 1.6 level performance.

@dougdonohoe
Copy link
Author

dougdonohoe commented Mar 10, 2020

The access pattern that causes this is as follows:

  • Inserting a stream of values with new keys (our Uris)
  • Interspersed are lookups of existing values.

Psuedo code for this has 3mm new keys being created, and for each new key doing 10 random lookups of previous keys.

After about 2022000 records (after 1st flush), things start getting really slow.


random     = rand.New(rand.NewSource(1))

for i := 0; i < 3000000; i++ {
    process(i)
} 

func process(num int) {
	uri := getUri(num)
	id := IdForString(uri)

	// randomly lookup previous URIs
	if num > 100 {
		for i := 0; i < 10; i++ {
			uri2 := getUri(random.Intn(num-1))
			id2 := IdForString(uri2)
		}
	}
}

func getUri(num int) string {
	return fmt.Sprintf("nyt://article/some-long-string-with-%d", num);
}

func IdForString(s string) uint64 {
	var id uint64
	var ok bool
	var err error

	if id, ok = h.GetId(s); !ok {
		// doesn't exist, get next
		id, err = h.seq.Next()
		if err != nil {
			panic(fmt.Errorf("unable to get next sequence for %s", s))
		}
		h.db.SetStringLong(s, id)
		h.rev.SetLongString(id, s)

		h.Created++ // track how many we've created
	}
	return id
}

Under the covers, GetId(s) eventually calls:

func (db *KeyValueDB) getBytesBytes(key []byte, dst []byte) ([]byte, bool) {
	var found bool
	err := db.DB.View(func(txn *badger.Txn) error {
		item, e2 := txn.Get(key)
		if e2 == badger.ErrKeyNotFound {
			return nil
		}
		handle(e2)

		var e3 error
		dst, e3 = item.ValueCopy(dst)
		handle(e3)

		found = true

		return nil
	})
	handle(err)
	return dst, found
}
  Up to 1995000 Time[30ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
badger 2020/03/10 13:47:52 DEBUG: Flushing memtable, mt.size=180555689 size of flushChan: 0
badger 2020/03/10 13:47:52 DEBUG: Storing value log head: {Fid:1 Len:32 Offset:67896750}
  Up to 1996000 Time[120ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 1997000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 1998000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 1999000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
badger 2020/03/10 13:47:52 DEBUG: Flushing memtable, mt.size=119939459 size of flushChan: 0
badger 2020/03/10 13:47:52 DEBUG: Storing value log head: {Fid:1 Len:32 Offset:67963956}
  Up to 2000000 Time[70ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2001000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2002000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2003000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2004000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2005000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2006000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2007000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2008000 Time[40ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2009000 Time[160ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 3534 miss: 1709 keys-added: 215 keys-updated: 1 keys-evicted: 202 cost-added: 14997382987 cost-evicted: 14011379302 sets-dropped: 0 sets-rejected: 57 gets-dropped: 2624 gets-kept: 2560 gets-total: 5243 hit-ratio: 0.67
    Metrics Id -> Uri: hit: 0 miss: 0 keys-added: 0 keys-updated: 0 keys-evicted: 0 cost-added: 0 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 0 hit-ratio: 0.00
  Up to 2010000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 24388 miss: 11695 keys-added: 489 keys-updated: 2 keys-evicted: 472 cost-added: 29726509290 cost-evicted: 28722313375 sets-dropped: 0 sets-rejected: 724 gets-dropped: 8000 gets-kept: 28032 gets-total: 36083 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2011000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 45217 miss: 21676 keys-added: 1607 keys-updated: 25 keys-evicted: 1594 cost-added: 75517705659 cost-evicted: 74508572132 sets-dropped: 0 sets-rejected: 7911 gets-dropped: 8000 gets-kept: 58816 gets-total: 66893 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2012000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 66060 miss: 31657 keys-added: 4133 keys-updated: 71 keys-evicted: 4117 cost-added: 171955707386 cost-evicted: 170936279280 sets-dropped: 0 sets-rejected: 27454 gets-dropped: 8000 gets-kept: 89600 gets-total: 97717 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2013000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 86884 miss: 41628 keys-added: 4578 keys-updated: 71 keys-evicted: 4564 cost-added: 186698283633 cost-evicted: 185689653950 sets-dropped: 0 sets-rejected: 36980 gets-dropped: 8000 gets-kept: 120256 gets-total: 128512 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2014000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 107700 miss: 51601 keys-added: 5087 keys-updated: 71 keys-evicted: 5073 cost-added: 202789763492 cost-evicted: 201801163401 sets-dropped: 0 sets-rejected: 46444 gets-dropped: 8000 gets-kept: 151104 gets-total: 159301 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2015000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 128508 miss: 61557 keys-added: 5489 keys-updated: 71 keys-evicted: 5471 cost-added: 214000832067 cost-evicted: 212981577542 sets-dropped: 0 sets-rejected: 55998 gets-dropped: 8000 gets-kept: 181824 gets-total: 190065 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2016000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 149271 miss: 71489 keys-added: 6124 keys-updated: 71 keys-evicted: 6101 cost-added: 232825118426 cost-evicted: 231814048409 sets-dropped: 0 sets-rejected: 65295 gets-dropped: 8000 gets-kept: 212608 gets-total: 220760 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2017000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 170062 miss: 81453 keys-added: 6506 keys-updated: 71 keys-evicted: 6494 cost-added: 243422806841 cost-evicted: 242410842065 sets-dropped: 0 sets-rejected: 74877 gets-dropped: 8000 gets-kept: 243328 gets-total: 251515 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2018000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 190831 miss: 91409 keys-added: 7025 keys-updated: 71 keys-evicted: 7014 cost-added: 258538705104 cost-evicted: 257554653809 sets-dropped: 0 sets-rejected: 84314 gets-dropped: 8000 gets-kept: 274112 gets-total: 282240 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2019000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 211574 miss: 101345 keys-added: 7228 keys-updated: 71 keys-evicted: 7214 cost-added: 264261259105 cost-evicted: 263249641507 sets-dropped: 0 sets-rejected: 94047 gets-dropped: 8000 gets-kept: 304768 gets-total: 312919 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2020000 Time[50ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 232325 miss: 111296 keys-added: 7694 keys-updated: 76 keys-evicted: 7680 cost-added: 280093360327 cost-evicted: 279148465753 sets-dropped: 0 sets-rejected: 103527 gets-dropped: 8000 gets-kept: 335488 gets-total: 343621 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2021000 Time[620ms] UrisCreated[1000]
    Metrics Uri -> Id: hit: 253042 miss: 121263 keys-added: 8357 keys-updated: 91 keys-evicted: 8340 cost-added: 305322910298 cost-evicted: 304312798292 sets-dropped: 0 sets-rejected: 112816 gets-dropped: 8000 gets-kept: 366272 gets-total: 374305 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2022000 Time[1.72s] UrisCreated[1000]
    Metrics Uri -> Id: hit: 273690 miss: 131241 keys-added: 9396 keys-updated: 125 keys-evicted: 9381 cost-added: 350441699376 cost-evicted: 349444000765 sets-dropped: 0 sets-rejected: 121721 gets-dropped: 8000 gets-kept: 396800 gets-total: 404931 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2023000 Time[4.3s] UrisCreated[1000]
    Metrics Uri -> Id: hit: 294275 miss: 141329 keys-added: 10760 keys-updated: 201 keys-evicted: 10743 cost-added: 413984940546 cost-evicted: 412981108474 sets-dropped: 0 sets-rejected: 130369 gets-dropped: 8000 gets-kept: 427392 gets-total: 435604 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2024000 Time[4.84s] UrisCreated[1000]
    Metrics Uri -> Id: hit: 314760 miss: 151458 keys-added: 12401 keys-updated: 306 keys-evicted: 12384 cost-added: 496855829705 cost-evicted: 495849904940 sets-dropped: 0 sets-rejected: 138752 gets-dropped: 8000 gets-kept: 457856 gets-total: 466218 hit-ratio: 0.68
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2025000 Time[5.61s] UrisCreated[1000]
    Metrics Uri -> Id: hit: 335135 miss: 161656 keys-added: 14591 keys-updated: 452 keys-evicted: 14578 cost-added: 619874731722 cost-evicted: 618866903295 sets-dropped: 0 sets-rejected: 146611 gets-dropped: 8000 gets-kept: 488256 gets-total: 496791 hit-ratio: 0.67
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00
  Up to 2026000 Time[9.22s] UrisCreated[1000]
    Metrics Uri -> Id: hit: 355442 miss: 171937 keys-added: 17136 keys-updated: 636 keys-evicted: 17123 cost-added: 764663412397 cost-evicted: 763665670676 sets-dropped: 0 sets-rejected: 154166 gets-dropped: 8000 gets-kept: 518592 gets-total: 527379 hit-ratio: 0.67
    Metrics Id -> Uri: hit: 0 miss: 2 keys-added: 3 keys-updated: 0 keys-evicted: 0 cost-added: 437194123 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 0 gets-total: 2 hit-ratio: 0.00

I'm willing to share actual code, but not in this public ticket.

@jarifibrahim jarifibrahim added status/accepted We accept to investigate or work on it. and removed status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it labels Mar 11, 2020
@jarifibrahim jarifibrahim added kind/enhancement Something could be better. priority/P0 Critical issue that requires immediate attention. labels Mar 11, 2020
@jarifibrahim
Copy link
Contributor

Hey @dougdonohoe, I've raised #1256 to fix this.
Major performance improvement comes from #1256 and #1257 also improves the performance.

Can you please run your application against #1257 and let me know how it goes?

@dougdonohoe
Copy link
Author

Sure. Can you remind me how to get this version using go get?

@jarifibrahim
Copy link
Contributor

Here you go.

# Ensure you have go modules on
$  go mod init foo                                                                                                                                                     go: creating new go.mod: module foo
$ go get -v github.com/dgraph-io/badger/v2@1c4e4fcf20a9516126e2265ab8382c2dbb782b73                                                                                   
go: finding github.com 1c4e4fcf20a9516126e2265ab8382c2dbb782b73
go: finding github.com/dgraph-io 1c4e4fcf20a9516126e2265ab8382c2dbb782b73
go: finding github.com/dgraph-io/badger/v2 1c4e4fcf20a9516126e2265ab8382c2dbb782b73
go: finding github.com/dgraph-io/badger 1c4e4fcf20a9516126e2265ab8382c2dbb782b73
go: downloading github.com/dgraph-io/badger/v2 v2.0.1-rc1.0.20200312120824-1c4e4fcf20a9
go: extracting github.com/dgraph-io/badger/v2 v2.0.1-rc1.0.20200312120824-1c4e4fcf20a9
github.com/dgraph-io/badger/v2/options
github.com/dgraph-io/badger/v2/trie
github.com/dgraph-io/badger/v2/pb
github.com/dgraph-io/badger/v2/y
github.com/dgraph-io/badger/v2/skl
github.com/dgraph-io/badger/v2/table
github.com/dgraph-io/badger/v2
# Ensure you have the correct commit
$ go list -m all | grep badger                                                     
github.com/dgraph-io/badger/v2 v2.0.1-rc1.0.20200312120824-1c4e4fcf20a9

@dougdonohoe
Copy link
Author

I ran it, and performance is better (possibly slightly slower than 1.6, but hard to gauge). Unfortunately, at first flush i got this error:

badger 2020/03/12 15:28:40 ERROR: writeRequests: Create value log file. Path=/mnt/disks/sorting/data/badger/KafkaInfo/000002.vlog. Error=open /mnt/disks/sorting/data/badger/KafkaInfo/000002.vlog: file exists
PANIC unable to flush batch: Create value log file. Path=/mnt/disks/sorting/data/badger/KafkaInfo/000002.vlog. Error=open /mnt/disks/sorting/data/badger/KafkaInfo/000002.vlog: file exists
goroutine 239 [running]:

Never seen an error like this before.

@dougdonohoe
Copy link
Author

I'm re-running from fresh start to see if the above repeats.

@jarifibrahim
Copy link
Contributor

@dougdonohoe Can you check how many vlog files you have? Badger keeps track of the max vlog file id it knows about. This error means you have a vlog file with id equal to max vlog file id.

@dougdonohoe
Copy link
Author

dougdonohoe commented Mar 12, 2020

In Badger 2.0, the *.vlog files seem to be named differently. They are no longer sequentially increasing monotonically. Is this expected?

000000.vlog
000001.vlog
000002.vlog
000003.vlog
000018.vlog
000019.vlog
000020.vlog
000021.vlog
000022.vlog
000023.vlog
000024.vlog
000025.vlog
000026.vlog

Notice the jump from 3 to 18. In 1.6, I have files from 000000.vlog to 000057.vlog (the list above is from a process that is still running, so not sure where it will end up).

@jarifibrahim
Copy link
Contributor

@dougdonohoe The files could be missing because they got garbage collected.

The error you mentioned before is a strange one. The code here

badger/value.go

Line 893 in b13b927

vlog.maxFid = uint32(fid)
is supposed to find the max fid and then start creating file from that ID onwards. In your case, badger was trying to create file from id=2 which already exists. Can you show me db start up logs? That might have some more details.

@jarifibrahim
Copy link
Contributor

As for the performance, the code in #1257 is definitely faster than v1.6.0
with #1257 read bench tool (badger/cmd/read_bench.go) read 3.3 GB in 1 minute

grep elapsed read-new.txt | tail
Time elapsed: 50s, bytes read: 2.9 GB, speed: 58 MB/sec, entries read: 18703030, speed: 374060/sec
Time elapsed: 51s, bytes read: 2.9 GB, speed: 57 MB/sec, entries read: 19013253, speed: 372808/sec
Time elapsed: 52s, bytes read: 3.0 GB, speed: 57 MB/sec, entries read: 19316184, speed: 371465/sec
Time elapsed: 53s, bytes read: 3.0 GB, speed: 57 MB/sec, entries read: 19615634, speed: 370106/sec
Time elapsed: 54s, bytes read: 3.1 GB, speed: 57 MB/sec, entries read: 19863860, speed: 367849/sec
Time elapsed: 55s, bytes read: 3.1 GB, speed: 57 MB/sec, entries read: 20224828, speed: 367724/sec
Time elapsed: 56s, bytes read: 3.2 GB, speed: 57 MB/sec, entries read: 20595347, speed: 367774/sec
Time elapsed: 57s, bytes read: 3.2 GB, speed: 57 MB/sec, entries read: 20967933, speed: 367858/sec
Time elapsed: 58s, bytes read: 3.3 GB, speed: 57 MB/sec, entries read: 21335377, speed: 367851/sec
Time elapsed: 59s, bytes read: 3.3 GB, speed: 57 MB/sec, entries read: 21672915, speed: 367337/sec

For v1.6.0, it read 2.8 GB

grep elapsed read-1.6.txt | tail
Time elapsed: 50s, bytes read: 2.4 GB, speed: 48 MB/sec, entries read: 14321488, speed: 286429/sec
Time elapsed: 51s, bytes read: 2.4 GB, speed: 48 MB/sec, entries read: 14625924, speed: 286782/sec
Time elapsed: 52s, bytes read: 2.5 GB, speed: 48 MB/sec, entries read: 14912857, speed: 286785/sec
Time elapsed: 53s, bytes read: 2.5 GB, speed: 48 MB/sec, entries read: 15159867, speed: 286035/sec
Time elapsed: 54s, bytes read: 2.6 GB, speed: 48 MB/sec, entries read: 15476701, speed: 286605/sec
Time elapsed: 55s, bytes read: 2.6 GB, speed: 48 MB/sec, entries read: 15778034, speed: 286873/sec
Time elapsed: 56s, bytes read: 2.7 GB, speed: 48 MB/sec, entries read: 16077903, speed: 287105/sec
Time elapsed: 57s, bytes read: 2.7 GB, speed: 48 MB/sec, entries read: 16378622, speed: 287344/sec
Time elapsed: 58s, bytes read: 2.8 GB, speed: 48 MB/sec, entries read: 16681041, speed: 287604/sec
Time elapsed: 59s, bytes read: 2.8 GB, speed: 48 MB/sec, entries read: 16976639, speed: 287739/sec

@dougdonohoe
Copy link
Author

@jarifibrahim - sorry, I deleted the data and started again. It's working so far (aside from the odd numbering of vlog files noted above). What I'm doing in this phase is mostly writing. I'll have better numbers to compare to 1.6 when it finishes (takes 4-5 hours).

@dougdonohoe
Copy link
Author

It happened again, after several hours:

badger 2020/03/12 18:30:53 ERROR: writeRequests: Create value log file. Path=/mnt/disks/sorting/data/badger/KafkaInfo/000018.vlog. Error=open /mnt/disks/sorting/data/badger/KafkaInfo/000018.vlog: file exists
PANIC unable to flush batch: Create value log file. Path=/mnt/disks/sorting/data/badger/KafkaInfo/000018.vlog. Error=open /mnt/disks/sorting/data/badger/KafkaInfo/000018.vlog: file exists
goroutine 306 [running]:

Here are all the files:

-rw------- 1 donohoe donohoe 238M Mar 12 16:51 000000.vlog
-rw------- 1 donohoe donohoe 256M Mar 12 16:58 000001.vlog
-rw------- 1 donohoe donohoe 235M Mar 12 17:03 000002.vlog
-rw------- 1 donohoe donohoe 221M Mar 12 17:08 000003.vlog
-rw------- 1 donohoe donohoe 198M Mar 12 17:11 000004.vlog
-rw------- 1 donohoe donohoe 276M Mar 12 17:16 000005.vlog
-rw------- 1 donohoe donohoe 299M Mar 12 17:23 000006.vlog
-rw------- 1 donohoe donohoe  57M Mar 12 17:44 000007.sst
-rw------- 1 donohoe donohoe 262M Mar 12 17:29 000007.vlog
-rw------- 1 donohoe donohoe 279M Mar 12 17:44 000008.sst
-rw------- 1 donohoe donohoe 268M Mar 12 17:35 000008.vlog
-rw------- 1 donohoe donohoe 267M Mar 12 17:44 000009.vlog
-rw------- 1 donohoe donohoe 287M Mar 12 17:58 000010.vlog
-rw------- 1 donohoe donohoe 263M Mar 12 18:07 000011.vlog
-rw------- 1 donohoe donohoe 208M Mar 12 18:09 000012.vlog
-rw------- 1 donohoe donohoe 207M Mar 12 18:11 000013.vlog
-rw------- 1 donohoe donohoe 401M Mar 12 18:19 000014.vlog
-rw------- 1 donohoe donohoe 248M Mar 12 18:24 000015.vlog
-rw------- 1 donohoe donohoe 242M Mar 12 18:27 000016.vlog
-rw------- 1 donohoe donohoe 277M Mar 12 18:30 000017.vlog
-rw------- 1 donohoe donohoe 181M Mar 12 16:47 000018.vlog
-rw------- 1 donohoe donohoe 177M Mar 12 16:48 000019.vlog
-rw------- 1 donohoe donohoe 192M Mar 12 16:49 000020.vlog
-rw------- 1 donohoe donohoe 219M Mar 12 16:51 000021.vlog
-rw------- 1 donohoe donohoe 229M Mar 12 16:55 000022.vlog
-rw------- 1 donohoe donohoe 207M Mar 12 16:59 000023.vlog
-rw------- 1 donohoe donohoe 222M Mar 12 17:02 000024.vlog
-rw------- 1 donohoe donohoe 211M Mar 12 17:05 000025.vlog
-rw------- 1 donohoe donohoe 224M Mar 12 17:09 000026.vlog
-rw------- 1 donohoe donohoe 229M Mar 12 17:13 000027.vlog
-rw------- 1 donohoe donohoe 232M Mar 12 17:16 000028.vlog
-rw------- 1 donohoe donohoe 230M Mar 12 17:21 000029.vlog
-rw------- 1 donohoe donohoe 279M Mar 12 16:49 000030.sst
-rw------- 1 donohoe donohoe 230M Mar 12 17:25 000030.vlog
-rw------- 1 donohoe donohoe 226M Mar 12 17:28 000031.vlog
-rw------- 1 donohoe donohoe 259M Mar 12 17:32 000032.vlog
-rw------- 1 donohoe donohoe 228M Mar 12 17:36 000033.vlog
-rw------- 1 donohoe donohoe 235M Mar 12 17:41 000034.vlog
-rw------- 1 donohoe donohoe 249M Mar 12 17:46 000035.vlog
-rw------- 1 donohoe donohoe 269M Mar 12 17:52 000036.vlog
-rw------- 1 donohoe donohoe 274M Mar 12 17:57 000037.vlog
-rw------- 1 donohoe donohoe 279M Mar 12 17:25 000038.sst
-rw------- 1 donohoe donohoe 307M Mar 12 18:00 000038.vlog
-rw------- 1 donohoe donohoe 279M Mar 12 18:06 000039.vlog
-rw------- 1 donohoe donohoe 243M Mar 12 18:11 000040.vlog
-rw------- 1 donohoe donohoe 256M Mar 12 18:17 000041.vlog
-rw------- 1 donohoe donohoe 204M Mar 12 18:20 000042.vlog
-rw------- 1 donohoe donohoe 218M Mar 12 18:23 000043.vlog
-rw------- 1 donohoe donohoe 218M Mar 12 18:27 000044.vlog
-rw------- 1 donohoe donohoe 219M Mar 12 18:31 000045.vlog
-rw------- 1 donohoe donohoe 2.9M Mar 12 18:11 000046.sst
-rw------- 1 donohoe donohoe 219M Mar 12 18:34 000046.vlog
-rw------- 1 donohoe donohoe 279M Mar 12 18:11 000047.sst
-rw------- 1 donohoe donohoe 219M Mar 12 18:37 000047.vlog
-rw------- 1 donohoe donohoe 279M Mar 12 18:11 000048.sst
-rw------- 1 donohoe donohoe 213M Mar 12 18:41 000048.vlog

What would you like to know next?

@jarifibrahim
Copy link
Contributor

@dougdonohoe How are you inserting the data? Stream writer?

@dougdonohoe
Copy link
Author

dougdonohoe commented Mar 12, 2020

I'm using badger.WriteBatch. This smells like a race condition bug. Error occurs when calling Flush().

@dougdonohoe
Copy link
Author

In other news, I ran my test program to create 22,000,000 entries (with random lookups done for each one). In 1.6, it ran in 25m22s. In 2.0.2 with patch, it ran in 20m40s. So that is a positive sign.

OTOH, I was trying to compare the re-world performance of my Kafka extraction program, but it died due to the above filename bug. At the time of the crash, it had been running for 1h45m. At the approximately same time when last run using 1.6, it took 1h30m. So it appears to be 16% slower in 2.x. Note that this code is predominately write-focused.

@dougdonohoe
Copy link
Author

I don't know much about the code, but I noticed this:

1.6 value.go:1013

			newid := atomic.AddUint32(&vlog.maxFid, 1)

2.0 value.go:1405

			newid := vlog.maxFid + 1

Seems suspicious to have removed the atomic add.

@dougdonohoe
Copy link
Author

I spent a little more time trying to understand the Badger code base. It's a lot to come up to speed on. It is obvious the maxFid logic changed substantially from 1.6 to 2.0 so my point above may not be relevant.

However, one thing I don't understand in 2.0 is why the *.vlog file timestamps aren't monotonically increasing.

In 1.6, ls by timestamp shows timestamps increasing along with file id.

$ ls -ltrh *.vlog
-rw-rw-r-- 1 donohoe donohoe 250M Feb 17 15:08 000000.vlog
-rw-rw-r-- 1 donohoe donohoe 268M Feb 17 15:13 000001.vlog
-rw-rw-r-- 1 donohoe donohoe 247M Feb 17 15:17 000002.vlog
-rw-rw-r-- 1 donohoe donohoe 232M Feb 17 15:21 000003.vlog
-rw-rw-r-- 1 donohoe donohoe 209M Feb 17 15:23 000004.vlog
-rw-rw-r-- 1 donohoe donohoe 288M Feb 17 15:28 000005.vlog
-rw-rw-r-- 1 donohoe donohoe 310M Feb 17 15:33 000006.vlog
-rw-rw-r-- 1 donohoe donohoe 273M Feb 17 15:38 000007.vlog
-rw-rw-r-- 1 donohoe donohoe 280M Feb 17 15:43 000008.vlog
-rw-rw-r-- 1 donohoe donohoe 279M Feb 17 15:51 000009.vlog
-rw-rw-r-- 1 donohoe donohoe 299M Feb 17 16:04 000010.vlog
-rw-rw-r-- 1 donohoe donohoe 274M Feb 17 16:12 000011.vlog
-rw-rw-r-- 1 donohoe donohoe 220M Feb 17 16:14 000012.vlog
-rw-rw-r-- 1 donohoe donohoe 218M Feb 17 16:17 000013.vlog
-rw-rw-r-- 1 donohoe donohoe 412M Feb 17 16:24 000014.vlog
-rw-rw-r-- 1 donohoe donohoe 259M Feb 17 16:28 000015.vlog
-rw-rw-r-- 1 donohoe donohoe 253M Feb 17 16:30 000016.vlog
-rw-rw-r-- 1 donohoe donohoe 288M Feb 17 16:33 000017.vlog
-rw-rw-r-- 1 donohoe donohoe 246M Feb 17 16:36 000018.vlog
-rw-rw-r-- 1 donohoe donohoe 188M Feb 17 16:37 000019.vlog
-rw-rw-r-- 1 donohoe donohoe 187M Feb 17 16:38 000020.vlog
-rw-rw-r-- 1 donohoe donohoe 225M Feb 17 16:39 000021.vlog
-rw-rw-r-- 1 donohoe donohoe 245M Feb 17 16:41 000022.vlog
-rw-rw-r-- 1 donohoe donohoe 223M Feb 17 16:44 000023.vlog
-rw-rw-r-- 1 donohoe donohoe 220M Feb 17 16:47 000024.vlog
-rw-rw-r-- 1 donohoe donohoe 228M Feb 17 16:50 000025.vlog
-rw-rw-r-- 1 donohoe donohoe 228M Feb 17 16:53 000026.vlog
-rw-rw-r-- 1 donohoe donohoe 243M Feb 17 16:56 000027.vlog
-rw-rw-r-- 1 donohoe donohoe 243M Feb 17 16:59 000028.vlog
-rw-rw-r-- 1 donohoe donohoe 243M Feb 17 17:02 000029.vlog
-rw-rw-r-- 1 donohoe donohoe 244M Feb 17 17:05 000030.vlog
-rw-rw-r-- 1 donohoe donohoe 238M Feb 17 17:09 000031.vlog
-rw-rw-r-- 1 donohoe donohoe 244M Feb 17 17:12 000032.vlog
-rw-rw-r-- 1 donohoe donohoe 267M Feb 17 17:15 000033.vlog
-rw-rw-r-- 1 donohoe donohoe 240M Feb 17 17:18 000034.vlog
-rw-rw-r-- 1 donohoe donohoe 249M Feb 17 17:23 000035.vlog
-rw-rw-r-- 1 donohoe donohoe 275M Feb 17 17:27 000036.vlog
-rw-rw-r-- 1 donohoe donohoe 282M Feb 17 17:32 000037.vlog
-rw-rw-r-- 1 donohoe donohoe 303M Feb 17 17:35 000038.vlog
-rw-rw-r-- 1 donohoe donohoe 311M Feb 17 17:38 000039.vlog
-rw-rw-r-- 1 donohoe donohoe 276M Feb 17 17:43 000040.vlog
-rw-rw-r-- 1 donohoe donohoe 253M Feb 17 17:47 000041.vlog
-rw-rw-r-- 1 donohoe donohoe 226M Feb 17 17:50 000042.vlog
-rw-rw-r-- 1 donohoe donohoe 235M Feb 17 17:53 000043.vlog
-rw-rw-r-- 1 donohoe donohoe 229M Feb 17 17:56 000044.vlog
-rw-rw-r-- 1 donohoe donohoe 230M Feb 17 17:59 000045.vlog
-rw-rw-r-- 1 donohoe donohoe 231M Feb 17 18:03 000046.vlog
-rw-rw-r-- 1 donohoe donohoe 231M Feb 17 18:06 000047.vlog
-rw-rw-r-- 1 donohoe donohoe 230M Feb 17 18:09 000048.vlog
-rw-rw-r-- 1 donohoe donohoe 274M Feb 17 18:14 000049.vlog
-rw-rw-r-- 1 donohoe donohoe 231M Feb 17 18:17 000050.vlog
-rw-rw-r-- 1 donohoe donohoe 230M Feb 17 18:21 000051.vlog
-rw-rw-r-- 1 donohoe donohoe 231M Feb 17 18:24 000052.vlog
-rw-rw-r-- 1 donohoe donohoe 230M Feb 17 18:27 000053.vlog
-rw-rw-r-- 1 donohoe donohoe 273M Feb 17 18:32 000054.vlog
-rw-rw-r-- 1 donohoe donohoe 290M Feb 17 18:37 000055.vlog
-rw-rw-r-- 1 donohoe donohoe 292M Feb 17 18:44 000056.vlog
-rw-rw-r-- 1 donohoe donohoe 164M Feb 18 03:09 000057.vlog

However in 2.0, it looks like .vlog files are accessed again after they have been written. In my use case, I'm only setting one value for a particular key, so no need to remove/touch previous .vlog files. Yet:

$ ls -ltrh *.vlog
-rw------- 1 donohoe donohoe 181M Mar 12 16:47 000018.vlog
-rw------- 1 donohoe donohoe 177M Mar 12 16:48 000019.vlog
-rw------- 1 donohoe donohoe 192M Mar 12 16:49 000020.vlog
-rw------- 1 donohoe donohoe 238M Mar 12 16:51 000000.vlog
-rw------- 1 donohoe donohoe 219M Mar 12 16:51 000021.vlog
-rw------- 1 donohoe donohoe 229M Mar 12 16:55 000022.vlog
-rw------- 1 donohoe donohoe 256M Mar 12 16:58 000001.vlog
-rw------- 1 donohoe donohoe 207M Mar 12 16:59 000023.vlog
-rw------- 1 donohoe donohoe 222M Mar 12 17:02 000024.vlog
-rw------- 1 donohoe donohoe 235M Mar 12 17:03 000002.vlog
-rw------- 1 donohoe donohoe 211M Mar 12 17:05 000025.vlog
-rw------- 1 donohoe donohoe 221M Mar 12 17:08 000003.vlog
-rw------- 1 donohoe donohoe 224M Mar 12 17:09 000026.vlog
-rw------- 1 donohoe donohoe 198M Mar 12 17:11 000004.vlog
-rw------- 1 donohoe donohoe 229M Mar 12 17:13 000027.vlog
-rw------- 1 donohoe donohoe 276M Mar 12 17:16 000005.vlog
-rw------- 1 donohoe donohoe 232M Mar 12 17:16 000028.vlog
-rw------- 1 donohoe donohoe 230M Mar 12 17:21 000029.vlog
-rw------- 1 donohoe donohoe 299M Mar 12 17:23 000006.vlog
-rw------- 1 donohoe donohoe 230M Mar 12 17:25 000030.vlog
-rw------- 1 donohoe donohoe 226M Mar 12 17:28 000031.vlog
-rw------- 1 donohoe donohoe 262M Mar 12 17:29 000007.vlog
-rw------- 1 donohoe donohoe 259M Mar 12 17:32 000032.vlog
-rw------- 1 donohoe donohoe 268M Mar 12 17:35 000008.vlog
-rw------- 1 donohoe donohoe 228M Mar 12 17:36 000033.vlog
-rw------- 1 donohoe donohoe 235M Mar 12 17:41 000034.vlog
-rw------- 1 donohoe donohoe 267M Mar 12 17:44 000009.vlog
-rw------- 1 donohoe donohoe 249M Mar 12 17:46 000035.vlog
-rw------- 1 donohoe donohoe 269M Mar 12 17:52 000036.vlog
-rw------- 1 donohoe donohoe 274M Mar 12 17:57 000037.vlog
-rw------- 1 donohoe donohoe 287M Mar 12 17:58 000010.vlog
-rw------- 1 donohoe donohoe 307M Mar 12 18:00 000038.vlog
-rw------- 1 donohoe donohoe 279M Mar 12 18:06 000039.vlog
-rw------- 1 donohoe donohoe 263M Mar 12 18:07 000011.vlog
-rw------- 1 donohoe donohoe 208M Mar 12 18:09 000012.vlog
-rw------- 1 donohoe donohoe 243M Mar 12 18:11 000040.vlog
-rw------- 1 donohoe donohoe 207M Mar 12 18:11 000013.vlog
-rw------- 1 donohoe donohoe 256M Mar 12 18:17 000041.vlog
-rw------- 1 donohoe donohoe 401M Mar 12 18:19 000014.vlog
-rw------- 1 donohoe donohoe 204M Mar 12 18:20 000042.vlog
-rw------- 1 donohoe donohoe 218M Mar 12 18:23 000043.vlog
-rw------- 1 donohoe donohoe 248M Mar 12 18:24 000015.vlog
-rw------- 1 donohoe donohoe 242M Mar 12 18:27 000016.vlog
-rw------- 1 donohoe donohoe 218M Mar 12 18:27 000044.vlog
-rw------- 1 donohoe donohoe 277M Mar 12 18:30 000017.vlog
-rw------- 1 donohoe donohoe 219M Mar 12 18:31 000045.vlog
-rw------- 1 donohoe donohoe 219M Mar 12 18:34 000046.vlog
-rw------- 1 donohoe donohoe 219M Mar 12 18:37 000047.vlog
-rw------- 1 donohoe donohoe 226M Mar 12 18:41 000048.vlog
-rw------- 1 donohoe donohoe 256M Mar 12 18:46 000049.vlog
-rw------- 1 donohoe donohoe 218M Mar 12 18:49 000050.vlog
-rw------- 1 donohoe donohoe 219M Mar 12 18:53 000051.vlog
-rw------- 1 donohoe donohoe 219M Mar 12 18:56 000052.vlog
-rw------- 1 donohoe donohoe 237M Mar 12 19:00 000053.vlog
-rw------- 1 donohoe donohoe 273M Mar 12 19:05 000054.vlog
-rw------- 1 donohoe donohoe 280M Mar 12 19:11 000055.vlog
-rw------- 1 donohoe donohoe 275M Mar 12 19:18 000056.vlog
-rw------- 1 donohoe donohoe 110M Mar 12 19:21 000057.vlog

Also notice file 0000018.vlog is the least recently modified ... and it is the one where the bug happened. This seems like a clue.

I'd like to dig in more, but I have to do some day job stuff :-)

@dougdonohoe
Copy link
Author

Note that both occurrence of this bug happened well after startup. To summarize previous info:

  • Using WithSyncWrites(false)
  • Using badger.WriteBatch with a batch size of 1000

I've been looking through the code a bit more, and it must be failing in the toDisk() anonymous function. In write:

			newid := vlog.maxFid + 1
			y.AssertTruef(newid > 0, "newid has overflown uint32: %v", newid)
			newlf, err := vlog.createVlogFile(newid)

AFAIK, aside from open, the only place that vlog.maxFid is changed is when vlog.createVlogFile is called. The only place that is called is in write. So the only way you'd get the error I'm seeing if if a multiple toDisk() are running concurrently, right?

@dougdonohoe
Copy link
Author

FYI: I've moved on to other things and have put on hold our upgrade to 2.0. This intermittent issue where the vlog files already exist is a huge blocker for us. I've seen it happen twice, but it takes a non-deterministic amount of time for it to occur. In 1.6, I never had any issues running long running imports before (some over 24 hours long).

@jarifibrahim
Copy link
Contributor

Hey @dougdonohoe, do you some specific time/pattern between the missing vlog file issues? I tried doing the same thing using the badger benchmark write -m 1000 command which inserts 1000 million keys in badger and didn't see any missing file/file already exists issue.

If there's any script/test program that you can share which can help me reproduce the issue it would be great. I haven't been able to reproduce the failure and that's why we haven't fixed it yet.

As for the vlog.maxFid code here

badger/value.go

Line 1405 in 91c31eb

newid := vlog.maxFid + 1

It is a sequential call and that's why it should be okay to not have any locks there. Earlier we were using atomics but they weren't really needed.

So the only way you'd get the error I'm seeing if if a multiple toDisk() are running concurrently, right?

Writes to the vlog file should always be serial. We cannot write concurrently to the vlog file which is the write-ahead log file.

@minhaj-shakeel
Copy link

Github issues have been deprecated.
This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing

@dougdonohoe
Copy link
Author

Issue 1187 appears to be what changes this code. Making a note in case I have to revisit this.

@jarifibrahim
Copy link
Contributor

The file already exists issue mentioned in #1254 (comment) was fixed by #1434 (it's not yet released).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Performance related issues. kind/enhancement Something could be better. priority/P0 Critical issue that requires immediate attention. status/accepted We accept to investigate or work on it.
Development

No branches or pull requests

5 participants