-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partially implement cache hash API in zig #4635
Conversation
The C API should only be copied in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work so far. I gave a couple suggestions to consider at your leisure
lib/std/cache_hash.zig
Outdated
pub const CacheHash = struct { | ||
alloc: *Allocator, | ||
blake3: Blake3, | ||
manifest_dir: []const u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet this could be improved by using a std.fs.Dir
handle rather than a string path name
lib/std/cache_hash.zig
Outdated
const decoder = base64.Base64Decoder.init(base64_alphabet, base64_pad_char); | ||
const BIN_DIGEST_LEN = 32; | ||
|
||
pub const CacheHashFile = struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The full path to this struct is std.cache_hash.CacheHashFile
so you can remove redundancy by renaming this to simply File
. To avoid clashing with fs.File
I would suggest removing const File = fs.File;
and instead using fs.File
when you need that one.
Similar comment for functions which have cache_
in their name.
2e5adc1
to
e6ed5d9
Compare
There are a few pieces missing from the zig std library that are needed to do a full port of the c cache_hash API:
Should I just put all the relevant code in cache_hash, or should I make separate pull requests that add this functionality to the zig std library? |
For these things let's go the std lib route. |
lib/std/cache_hash.zig
Outdated
debug.assert(self.manifest_file == null); | ||
|
||
const val_type = @TypeOf(val); | ||
switch (@typeInfo(val_type)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I wonder if this can be simplified with something like:
self.blake3.update(mem.asBytes(&val));`
Plus some checks for zero-sized types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good idea, I'll try it out. There will also need to be a check for boolean values, because booleans are handled differently (or least were in the C version, but I see no reason to change that).
I think I'll also try to detect fs.File
types and have special logic for them.
88bb407
to
0cdaa72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this at a point where it could be merged as is and then improved? Let me know when you're ready for a review.
This PR is pretty close, just a few things to add. I think the biggest thing I need to add is logic for adding files after the hash has been generated ( |
240a9cb
to
69699ba
Compare
Okay, I've changed const src_file_idx = try cache_hash.addFile("src/foo.txt");
if (cache_hash.hit()) |hash| {
// cache hit
} else {
const src_contents = cache_hash.files.items[src_file_idx].?;
// compile stuff
} I'm not sure this is the best API to have. Any feedback welcome! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! And thanks for your patience on how long it has taken for me to review it.
I gave some areas to look into improving - let me know when you're ready for round 2.
fn is_problematic_timestamp(file_mtime_ns: i64) bool { | ||
const now_ms = time.milliTimestamp(); | ||
const file_mtime_ms = @divFloor(file_mtime_ns, time.millisecond); | ||
return now_ms == file_mtime_ms; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this has the same behavior as the c++ code:
static bool is_problematic_timestamp(const OsTimeStamp *fs_clock) {
OsTimeStamp wall_clock = os_timestamp_calendar();
// First make all the least significant zero bits in the fs_clock, also zero bits in the wall clock.
if (fs_clock->nsec == 0) {
wall_clock.nsec = 0;
if (fs_clock->sec == 0) {
wall_clock.sec = 0;
} else {
wall_clock.sec &= (-1ull) << ctzll(fs_clock->sec);
}
} else {
wall_clock.nsec &= (-1ull) << ctzll(fs_clock->nsec);
}
return wall_clock.nsec == fs_clock->nsec && wall_clock.sec == fs_clock->sec;
}
std.time.milliTimeStamp
is not quite what we want here. It looks like std.time.milliTimeStamp
is too high level, and std.time
does not actually offer what we need: a cross-platform high resolution calendar timestamp. This will require a new function added to std.time
. I believe that std.time.milliTimeStamp
could then be reworked to call this other function, and do a small amount of logic on top of it.
Basically, move type specific code into their own functions instead of making `add` a giant function responsible for everything.
Now that the memory leak mentioned in ziglang#4656 has been fixed.
It was causing a segfault on `mipsel` architecture, not sure why other architectures weren't affected.
Instead of releasing the manifest file when an error occurs, it is only released when when `CacheHash.release` is called. This maps better to what a zig user expects when they do `defer cache_hash.release()`.
It checks whether the cache will respond correctly to inputs that don't initially depend on filesystem state. In that case, we have to check for the existence of a manifest file, instead of relying on reading the list of entries to tell us if the cache is invalid.
If a user doesn't care that the manifest failed to be written, they can simply ignore it. The program will still work; that particular cache item will simply not be cached.
This makes it possible for the user to retrieve the contents of the file without running into data races.
It occured when the manifest file was manually edited to include an extra file. Now it will simply copy the file name in the manifest file
People using the API as intended would never trigger this assertion anyway, but if someone has a non standard use case, I see no reason to make the program panic.
69699ba
to
4a5151c
Compare
@leroycep I'm juggling a few things so I might not be paying close attention on this. Let me know when there is somewhere I can focus my attention to unblock you on this. |
Will close #4311
This pull request translates the C++
cache_hash
API into zig, understd.cache_hash
. Currently it only supports adding files before hitting the cache, doesn't implement the many differentcache_*
functions, doesn't check files' modification times, and has a memory leak somewhere.It's a start.
cache_*
functionscache_buf
(will probably merge intocache_val
)cache_file
cache_val(var)
Not sure if this is the way to go, or if I should copy the C API and make a function for every typeCacheHash.add(value)
CacheHash.addFile(path)
CacheHash.addFile(path)
but after the initial hash has been computedcache_hit
std.time
addFile*
functions