Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix block cache ID uniqueness on Windows #58

Merged
merged 1 commit into from
Sep 24, 2019

Conversation

ajkr
Copy link

@ajkr ajkr commented Sep 23, 2019

Since we do not evict a file's blocks from block cache before that file
is deleted, we require a file's cache ID prefix is both unique and
non-reusable. However, the Windows functionality we were relying on only
guaranteed uniqueness. That meant a newly created file could be assigned
the same cache ID prefix as a deleted file. If the newly created file
had block offsets matching the deleted file, full cache keys could be
exactly the same, resulting in obsolete data blocks returned from cache
when trying to read from the new file.

We noticed this when running on FAT32 where compaction was writing out
of order keys due to reading obsolete blocks from its input files. The
functionality is documented as behaving the same on NTFS, although I
wasn't able to repro it there.

Test Plan: we had a reliable repro of out-of-order keys on FAT32 that
was fixed by this change


This change is Reviewable

@ajkr
Copy link
Author

ajkr commented Sep 23, 2019

Upstream PR is facebook#5844. No conflicts backporting :).

// Returning 0 is safe as it causes the table reader to generate a unique ID.
// This is suboptimal for performance as it prevents multiple table readers
// for the same file from sharing cached blocks, but at least it's safe.
return 0;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the unique-ID problem independent of MINGW. I was imagining we'd change GetUniqueIdFromFile to always return 0 on Windows, regardless of how the binary was built.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. non-MinGW on a modern platform should have _WIN32_WINNT != _WIN32_WINNT_VISTA which means it uses the code in the #else below. I read a bit about the FILE_ID_INFO::FileId used there and could not determine whether it is reusable or not. At least there wasn't documentation explicitly stating IDs can be reused like there was for BY_HANDLE_FILE_INFORMATION::nFileIndex{High,Low}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/questions/1866454/unique-file-identifier-in-windows

Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.

In the FAT file system, the file ID is generated from the first cluster of the containing directory and the byte offset within the directory of the entry for the file. Some defragmentation products change this byte offset. (Windows in-box defragmentation does not.) Thus, a FAT file ID can change over time. Renaming a file in the FAT file system can also change the file ID, but only if the new file name is longer than the old one.

This all suggests to me that the problem is not MinGW specific.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know which file ID they're referring to. The one here is eight bytes, we know it has a problem, and it's only used when compiler is MinGW (or if somehow somebody compiles on Vista but I doubt that'll happen). The one below is sixteen bytes, we don't know if it's problematic, and it'll be used on modern non-MinGW builds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll find somebody from Microsoft to help. It's hard to believe it's broken beyond the case of build for Vista || obsolete file system, but who knows.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still unclear so I'll return 0 in all cases Windows to get this landed to our fork (note this makes no difference compared to the original PR for cockroach's use case, and I do not think these changes should be upstreamed).

Since we do not evict a file's blocks from block cache before that file
is deleted, we require a file's cache ID prefix is both unique and
non-reusable. However, the Windows functionality we were relying on only
guaranteed uniqueness. That meant a newly created file could be assigned
the same cache ID prefix as a deleted file. If the newly created file
had block offsets matching the deleted file, full cache keys could be
exactly the same, resulting in obsolete data blocks returned from cache
when trying to read from the new file.

We noticed this when running on FAT32 where compaction was writing out
of order keys due to reading obsolete blocks from its input files. The
functionality is documented as behaving the same on NTFS, although I
wasn't able to repro it there.

Test Plan: we had a reliable repro of out-of-order keys on FAT32 that
was fixed by this change
@ajkr ajkr force-pushed the 6.2.1-fix-mingw-cache-id branch from 3742f32 to 0047833 Compare September 24, 2019 01:21
@ajkr ajkr changed the title Fix block cache ID uniqueness for MinGW builds Fix block cache ID uniqueness on Windows Sep 24, 2019
Copy link

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I'm following along the upstream conversation, but I think this is the right move for our fork.

@ajkr ajkr merged commit 217d7a1 into cockroachdb:crl-release-6.2.1 Sep 24, 2019
ajkr added a commit to ajkr/cockroach that referenced this pull request Sep 24, 2019
Picks up cockroachdb/rocksdb#58.

We found a corruption caused by multiple FAT32 files assigned the same
block cache key prefix. We don't know the extent to which this problem
affects other filesystems or other Windows file ID generation mechanisms.
We decided to turn off the reliance on filesystem for generating cache
keys on Windows. Instead we use randomization per table reader. This
would cause a performance penalty for use cases that open multiple table
readers per file, but I believe cockroach is not such a use case.

Fixes cockroachdb#40918, fixes cockroachdb#40950.

Release justification: Prevents corruption on some Windows filesystems

Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Sep 24, 2019
41018: c-deps: bump rocksdb for unique cache IDs on Windows r=ajkr a=ajkr

Picks up cockroachdb/rocksdb#58.

We found a corruption caused by multiple FAT32 files assigned the same
block cache key prefix. We don't know the extent to which this problem
affects other filesystems or other Windows file ID generation mechanisms.
We decided to turn off the reliance on filesystem for generating cache
keys on Windows. Instead we use randomization per table reader. This
would cause a performance penalty for use cases that open multiple table
readers per file, but I believe cockroach is not such a use case.

Fixes #40918, fixes #40950.

Release justification: Prevents corruption on some Windows filesystems

Release note: None

41020: util/log: fix GC of secondary loggers r=petermattis a=knz

Fixes #40974.

This is a subset of #40993 suitable for 19.2 and backport to 19.1.

Release justification: bug fix

Release note (bug fix): CockroachDB will now properly remove excess
secondary log files (SQL audit logging, statement execution logging,
and RocksDB events).

Co-authored-by: Andrew Kryczka <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants