Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corruption: Out-of-order key insertion into table #41242

Closed
neeral opened this issue Oct 1, 2019 · 9 comments
Closed

Corruption: Out-of-order key insertion into table #41242

neeral opened this issue Oct 1, 2019 · 9 comments

Comments

@neeral
Copy link
Contributor

neeral commented Oct 1, 2019

Describe the problem
Combination of fatal error messages and warnings.

  1. Fatal
E190930 07:33:54.478068 1 cli/error.go:230  cockroach server exited with error: failed to start store: checking for leaked raft entries for <no-attributes>=/var/lib/cockroachdb: Corruption: Out-of-order key insertion into table
Error: cockroach server exited with error: failed to start store: checking for leaked raft entries for <no-attributes>=/var/lib/cockroachdb: Corruption: Out-of-order key insertion into table
  1. Warnings
W191001 00:07:17.983745 783 server/server.go:1640  [n8,hb] writing last up timestamp: Corruption: Out-of-order key insertion into table

At first this seemed related to #37427 but that's specific to Windows.

To Reproduce
Observed on a long-running test cluster after upgrading cockroach on that cluster to a version that contained some new commits on 2.1.7.

The new commits included

Environment:

  • CockroachDB version 2.1.7
  • Server OS: Linux/Ubuntu
  • Filesystem: ext4
@petermattis
Copy link
Collaborator

cockroach debug pebble sstable layout (that tool is only present on master builds of cockroach) on the store dump provided by @neeral identified 144941336.sst` as an sstable containing out-of-order keys. Sstables should always contain in order keys, but this invariant was not being checked until cockroachdb/rocksdb#48. Violating this invariant can cause a wide variety of badness. The nature of the invariant makes it almost impossible for it to be violated due to the usage of RocksDB: violation requires a bug in RocksDB itself.

Logs provided by @neeral show that the node containing this corrupted sstable was upgraded at:

I190926 11:40:51.948579 34978 storage/engine/rocksdb.go:714  closing rocksdb instance at "/var/lib/cockroachdb"
...
I190926 11:41:11.504676 1 cli/start.go:1161  CockroachDB CCL v2.1.7-80-g0acaa1f (x86_64-unknown-linux-gnu, built 2019/09/20 16:08:16, go1.10.7)

So the server was shut down at 11:40:51.948579 and restarted at 11:41:11.504676. The change time on 144941336.sst shows it was last written to at 11:40:51.942139226 which is a few milliseconds before the server shut down. The timing is very suspicious, but I don't have a theory as to how that could cause the corruption that is seen. This is good evidence that the upgrade did not cause the corruption, but merely revealed its presence. Note that the corruption is very targeted, not random. This is very unlikely to be some sort of memory stomping bug.

The corruption itself isn't merely out-of-order keys in 144941336.sst, but a block of duplicated keys. The records between 7284-8655 in that sstable are duplicated at 8656-10028. The size of the duplicated data is ~15KB which is very close to the size of the other data blocks in 144941336.sst and can be explained by the compression ratio seen on this data in conjunction with the target block size of 32KB.

#37427 fixed a bug with the same out-of-order key symptoms, but the root cause was Windows specific: the cache key for sstable blocks was allowing a block for an old (now deleted) sstable to be retrieved as the block for a new sstable. The nature of this bug made it likely for the corruption to happen for block 0. We no longer have the input sstables for the compaction that generated 144941336.sst. It is possible there is another bug related to the block cache, though that is purely speculation at this point. There are also a whole variety of other possible ways out-of-order keys can be generated. We need to figure out a way to narrow down the suspects.

@neeral
Copy link
Contributor Author

neeral commented Oct 3, 2019

What are the next steps here @petermattis @ajkr ?

@petermattis
Copy link
Collaborator

@neeral We're chasing down a lead that perhaps #37427 could be a problem here. What version of Linux are you running. The behavior of i_generation seems to be different depending on the version of Linux. Sometimes it increments, sometimes it a random 32-bit number. The latter is problematic as collisions can occur much quicker than the incrementing style.

@petermattis
Copy link
Collaborator

Even knowing the version of Linux might not be sufficient. Can you run the program below on one of the machines in your cluster and provide the output:

package main

import (
	"fmt"
	"log"
	"os"
	"syscall"

	"golang.org/x/sys/unix"
)

func main() {
	if len(os.Args) != 2 {
		log.Fatalf("usage: %s <name>", os.Args[0])
	}
	name := os.Args[1]

	for i := 0; i < 10; i++ {
		f, err := os.Create(name)
		if err != nil {
			log.Fatal(err)
		}

		info, err := f.Stat()
		if err != nil {
			log.Fatal(err)
		}

		stat := info.Sys().(*syscall.Stat_t)

		const FS_IOC_GETVERSION = 0x80087601
		version, err := unix.IoctlGetInt(int(f.Fd()), FS_IOC_GETVERSION)
		if err != nil {
			log.Fatal(err)
		}
		fmt.Printf("%d %d\n", stat.Ino, version)

		if err := f.Close(); err != nil {
			log.Fatal(err)
		}
		if err := os.Remove(name); err != nil {
			log.Fatal(err)
		}
	}
}

@neeral
Copy link
Contributor Author

neeral commented Oct 3, 2019

Linux version: 4.15.0-58 with some patches

Output of the program

$ ./cockroach41242 foobar
Started cockroach41242
267154 3938375710
267154 2125668378
267154 3920946183
267154 3526488056
267154 2510883779
267154 205604375
267154 393398785
267154 2656558149
267154 3520081686
267154 2378905790

@neeral
Copy link
Contributor Author

neeral commented Oct 16, 2019

@petermattis any update here?

Is this related to #41394 ?

@petermattis
Copy link
Collaborator

@neeral Apologies for not closing the loop. Yes, #41394 is the fix for this bug.

@itsbilal Is there anything left to do here?

@itsbilal
Copy link
Contributor

Nope, this is it - that's the fix for the bug on 2.1, and will go out in the next 2.1 point release. For 2.0, the fix is #41541. Thanks and sorry for not closing the loop!

@neeral
Copy link
Contributor Author

neeral commented Oct 19, 2019

Thanks Peter and Bilal.

@neeral neeral closed this as completed Oct 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants