-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix block cache ID uniqueness on Windows #58
Fix block cache ID uniqueness on Windows #58
Conversation
Upstream PR is facebook#5844. No conflicts backporting :). |
// Returning 0 is safe as it causes the table reader to generate a unique ID. | ||
// This is suboptimal for performance as it prevents multiple table readers | ||
// for the same file from sharing cached blocks, but at least it's safe. | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the unique-ID problem independent of MINGW. I was imagining we'd change GetUniqueIdFromFile
to always return 0 on Windows, regardless of how the binary was built.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. non-MinGW on a modern platform should have _WIN32_WINNT != _WIN32_WINNT_VISTA
which means it uses the code in the #else
below. I read a bit about the FILE_ID_INFO::FileId
used there and could not determine whether it is reusable or not. At least there wasn't documentation explicitly stating IDs can be reused like there was for BY_HANDLE_FILE_INFORMATION::nFileIndex{High,Low}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MinGW always uses this branch of the conditional compilation due to https://github.com/facebook/rocksdb/blob/2367656b6c19048d76037d24025ef0caab136866/CMakeLists.txt#L414-L416.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://stackoverflow.com/questions/1866454/unique-file-identifier-in-windows
Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.
In the FAT file system, the file ID is generated from the first cluster of the containing directory and the byte offset within the directory of the entry for the file. Some defragmentation products change this byte offset. (Windows in-box defragmentation does not.) Thus, a FAT file ID can change over time. Renaming a file in the FAT file system can also change the file ID, but only if the new file name is longer than the old one.
This all suggests to me that the problem is not MinGW specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know which file ID they're referring to. The one here is eight bytes, we know it has a problem, and it's only used when compiler is MinGW (or if somehow somebody compiles on Vista but I doubt that'll happen). The one below is sixteen bytes, we don't know if it's problematic, and it'll be used on modern non-MinGW builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll find somebody from Microsoft to help. It's hard to believe it's broken beyond the case of build for Vista || obsolete file system
, but who knows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still unclear so I'll return 0
in all cases Windows to get this landed to our fork (note this makes no difference compared to the original PR for cockroach's use case, and I do not think these changes should be upstreamed).
Since we do not evict a file's blocks from block cache before that file is deleted, we require a file's cache ID prefix is both unique and non-reusable. However, the Windows functionality we were relying on only guaranteed uniqueness. That meant a newly created file could be assigned the same cache ID prefix as a deleted file. If the newly created file had block offsets matching the deleted file, full cache keys could be exactly the same, resulting in obsolete data blocks returned from cache when trying to read from the new file. We noticed this when running on FAT32 where compaction was writing out of order keys due to reading obsolete blocks from its input files. The functionality is documented as behaving the same on NTFS, although I wasn't able to repro it there. Test Plan: we had a reliable repro of out-of-order keys on FAT32 that was fixed by this change
3742f32
to
0047833
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm following along the upstream conversation, but I think this is the right move for our fork.
Picks up cockroachdb/rocksdb#58. We found a corruption caused by multiple FAT32 files assigned the same block cache key prefix. We don't know the extent to which this problem affects other filesystems or other Windows file ID generation mechanisms. We decided to turn off the reliance on filesystem for generating cache keys on Windows. Instead we use randomization per table reader. This would cause a performance penalty for use cases that open multiple table readers per file, but I believe cockroach is not such a use case. Fixes cockroachdb#40918, fixes cockroachdb#40950. Release justification: Prevents corruption on some Windows filesystems Release note: None
41018: c-deps: bump rocksdb for unique cache IDs on Windows r=ajkr a=ajkr Picks up cockroachdb/rocksdb#58. We found a corruption caused by multiple FAT32 files assigned the same block cache key prefix. We don't know the extent to which this problem affects other filesystems or other Windows file ID generation mechanisms. We decided to turn off the reliance on filesystem for generating cache keys on Windows. Instead we use randomization per table reader. This would cause a performance penalty for use cases that open multiple table readers per file, but I believe cockroach is not such a use case. Fixes #40918, fixes #40950. Release justification: Prevents corruption on some Windows filesystems Release note: None 41020: util/log: fix GC of secondary loggers r=petermattis a=knz Fixes #40974. This is a subset of #40993 suitable for 19.2 and backport to 19.1. Release justification: bug fix Release note (bug fix): CockroachDB will now properly remove excess secondary log files (SQL audit logging, statement execution logging, and RocksDB events). Co-authored-by: Andrew Kryczka <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
Since we do not evict a file's blocks from block cache before that file
is deleted, we require a file's cache ID prefix is both unique and
non-reusable. However, the Windows functionality we were relying on only
guaranteed uniqueness. That meant a newly created file could be assigned
the same cache ID prefix as a deleted file. If the newly created file
had block offsets matching the deleted file, full cache keys could be
exactly the same, resulting in obsolete data blocks returned from cache
when trying to read from the new file.
We noticed this when running on FAT32 where compaction was writing out
of order keys due to reading obsolete blocks from its input files. The
functionality is documented as behaving the same on NTFS, although I
wasn't able to repro it there.
Test Plan: we had a reliable repro of out-of-order keys on FAT32 that
was fixed by this change
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)