Fix block cache ID uniqueness on Windows #58

ajkr · 2019-09-23T19:33:37Z

Since we do not evict a file's blocks from block cache before that file
is deleted, we require a file's cache ID prefix is both unique and
non-reusable. However, the Windows functionality we were relying on only
guaranteed uniqueness. That meant a newly created file could be assigned
the same cache ID prefix as a deleted file. If the newly created file
had block offsets matching the deleted file, full cache keys could be
exactly the same, resulting in obsolete data blocks returned from cache
when trying to read from the new file.

We noticed this when running on FAT32 where compaction was writing out
of order keys due to reading obsolete blocks from its input files. The
functionality is documented as behaving the same on NTFS, although I
wasn't able to repro it there.

Test Plan: we had a reliable repro of out-of-order keys on FAT32 that
was fixed by this change

This change is

ajkr · 2019-09-23T19:34:02Z

Upstream PR is facebook#5844. No conflicts backporting :).

petermattis · 2019-09-23T20:28:25Z

port/win/io_win.cc

+  // Returning 0 is safe as it causes the table reader to generate a unique ID.
+  // This is suboptimal for performance as it prevents multiple table readers
+  // for the same file from sharing cached blocks, but at least it's safe.
+  return 0;


Isn't the unique-ID problem independent of MINGW. I was imagining we'd change GetUniqueIdFromFile to always return 0 on Windows, regardless of how the binary was built.

I don't know. non-MinGW on a modern platform should have _WIN32_WINNT != _WIN32_WINNT_VISTA which means it uses the code in the #else below. I read a bit about the FILE_ID_INFO::FileId used there and could not determine whether it is reusable or not. At least there wasn't documentation explicitly stating IDs can be reused like there was for BY_HANDLE_FILE_INFORMATION::nFileIndex{High,Low}

MinGW always uses this branch of the conditional compilation due to https://github.com/facebook/rocksdb/blob/2367656b6c19048d76037d24025ef0caab136866/CMakeLists.txt#L414-L416.

https://stackoverflow.com/questions/1866454/unique-file-identifier-in-windows

Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.

In the FAT file system, the file ID is generated from the first cluster of the containing directory and the byte offset within the directory of the entry for the file. Some defragmentation products change this byte offset. (Windows in-box defragmentation does not.) Thus, a FAT file ID can change over time. Renaming a file in the FAT file system can also change the file ID, but only if the new file name is longer than the old one.

This all suggests to me that the problem is not MinGW specific.

I don't know which file ID they're referring to. The one here is eight bytes, we know it has a problem, and it's only used when compiler is MinGW (or if somehow somebody compiles on Vista but I doubt that'll happen). The one below is sixteen bytes, we don't know if it's problematic, and it'll be used on modern non-MinGW builds.

I'll find somebody from Microsoft to help. It's hard to believe it's broken beyond the case of build for Vista || obsolete file system, but who knows.

It's still unclear so I'll return 0 in all cases Windows to get this landed to our fork (note this makes no difference compared to the original PR for cockroach's use case, and I do not think these changes should be upstreamed).

Since we do not evict a file's blocks from block cache before that file is deleted, we require a file's cache ID prefix is both unique and non-reusable. However, the Windows functionality we were relying on only guaranteed uniqueness. That meant a newly created file could be assigned the same cache ID prefix as a deleted file. If the newly created file had block offsets matching the deleted file, full cache keys could be exactly the same, resulting in obsolete data blocks returned from cache when trying to read from the new file. We noticed this when running on FAT32 where compaction was writing out of order keys due to reading obsolete blocks from its input files. The functionality is documented as behaving the same on NTFS, although I wasn't able to repro it there. Test Plan: we had a reliable repro of out-of-order keys on FAT32 that was fixed by this change

petermattis

LGTM

I'm following along the upstream conversation, but I think this is the right move for our fork.

Picks up cockroachdb/rocksdb#58. We found a corruption caused by multiple FAT32 files assigned the same block cache key prefix. We don't know the extent to which this problem affects other filesystems or other Windows file ID generation mechanisms. We decided to turn off the reliance on filesystem for generating cache keys on Windows. Instead we use randomization per table reader. This would cause a performance penalty for use cases that open multiple table readers per file, but I believe cockroach is not such a use case. Fixes cockroachdb#40918, fixes cockroachdb#40950. Release justification: Prevents corruption on some Windows filesystems Release note: None

41018: c-deps: bump rocksdb for unique cache IDs on Windows r=ajkr a=ajkr Picks up cockroachdb/rocksdb#58. We found a corruption caused by multiple FAT32 files assigned the same block cache key prefix. We don't know the extent to which this problem affects other filesystems or other Windows file ID generation mechanisms. We decided to turn off the reliance on filesystem for generating cache keys on Windows. Instead we use randomization per table reader. This would cause a performance penalty for use cases that open multiple table readers per file, but I believe cockroach is not such a use case. Fixes #40918, fixes #40950. Release justification: Prevents corruption on some Windows filesystems Release note: None 41020: util/log: fix GC of secondary loggers r=petermattis a=knz Fixes #40974. This is a subset of #40993 suitable for 19.2 and backport to 19.1. Release justification: bug fix Release note (bug fix): CockroachDB will now properly remove excess secondary log files (SQL audit logging, statement execution logging, and RocksDB events). Co-authored-by: Andrew Kryczka <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>

petermattis reviewed Sep 23, 2019

View reviewed changes

ajkr force-pushed the 6.2.1-fix-mingw-cache-id branch from 3742f32 to 0047833 Compare September 24, 2019 01:21

ajkr changed the title ~~Fix block cache ID uniqueness for MinGW builds~~ Fix block cache ID uniqueness on Windows Sep 24, 2019

petermattis approved these changes Sep 24, 2019

View reviewed changes

ajkr merged commit 217d7a1 into cockroachdb:crl-release-6.2.1 Sep 24, 2019

ajkr mentioned this pull request Sep 24, 2019

c-deps: bump rocksdb for unique cache IDs on Windows cockroachdb/cockroach#41018

Merged

ajkr mentioned this pull request Sep 27, 2019

Fix block cache ID uniqueness on Windows #60

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix block cache ID uniqueness on Windows #58

Fix block cache ID uniqueness on Windows #58

ajkr commented Sep 23, 2019 •

edited

Loading

ajkr commented Sep 23, 2019

petermattis Sep 23, 2019

ajkr Sep 23, 2019

ajkr Sep 23, 2019

petermattis Sep 23, 2019

ajkr Sep 23, 2019

ajkr Sep 23, 2019

ajkr Sep 24, 2019

petermattis left a comment

Fix block cache ID uniqueness on Windows #58

Fix block cache ID uniqueness on Windows #58

Conversation

ajkr commented Sep 23, 2019 • edited Loading

ajkr commented Sep 23, 2019

petermattis Sep 23, 2019

Choose a reason for hiding this comment

ajkr Sep 23, 2019

Choose a reason for hiding this comment

ajkr Sep 23, 2019

Choose a reason for hiding this comment

petermattis Sep 23, 2019

Choose a reason for hiding this comment

ajkr Sep 23, 2019

Choose a reason for hiding this comment

ajkr Sep 23, 2019

Choose a reason for hiding this comment

ajkr Sep 24, 2019

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

ajkr commented Sep 23, 2019 •

edited

Loading