-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subdirectory support broken due to wrong tree hash verification #281
Comments
Right, that makes sense and that was an oversight when support was initially added. It looks like if we want to do this right, we'd need to implement tree hash computation from scratch, as I don't think the Git CLI provides a way to do so. |
I am not sure that this is a solution though: The problem is in Line 311 in 7ac5878
which is called from new_versions and which takes a dict of "version" -> "tree hash" with all Julia package versions that have been registered within the last lookback days. _filter_map_versions is then supposed to convert the tree hash to the corresponding commit hash.
But that's where I see the issue that we cannot easily work around: This conversion only works if there is a unique tree hash -> commit hash mapping. However, for a subdirectory package, the tree hash of the subdirectory does not change if the subdirectory contents are left untouched. That is, there are potentially a lot of git commits that have a common tree hash for (some of) their subdirectories. To summarize, unless I am missing something (which might very well be the case), there is just no way to uniquely identify a git commit through a tree hash of a subdirectory. Therefore, I think to retain the ability for TagBot to tag subdirectory packages (actually: to actually make it work in the first place), the whole version identification mechanism has to be rethought (and reimplemented). Since the Julia registry does not record commit hashes for its versions in the repository (only tree hashes), literally the only place where we have a unique connection between a Julia package version and its commit hash is currently the Thus, we would either need to convince the Julia registry to start recording commit hashes for new package versions together with the tree hash. Alternatively, we could limit the subdirectory tagging (or TagBot in its entirety) to work only with versions that have been registered through the JuliaRegistrator, which creates the PRs with the commit hash information in an expected format. |
To double check: is this check a consistency check (verify TagBot is running on the right commit, which it should be doing bc of some other mechanism), OR is this to figure out what commit to run on at all (which commit to tag)? If it is a consistency check, then for subdirs my thought would be: as long as the commit in question indeed contains the right subdir hash (even if there are other commits that also contain that subdir), then we are consistent and the check can pass. |
In general, a little bit of both. In Lines 311 to 334 in 7ac5878
TagBot first tries to get the commit hash from the registry PR using _commit_sha_from_registry_pr in line 316. It currently fails here because of the tree hash mismatch (as described above), but otherwise would be able to get the commit hash we need.
However, if the PR approach fails, TagBot falls back to using However, we could follow your idea,
and do the following: This would make the |
I've created #282 which implements the suggested solution. |
TL;DR
If I am not mistaken, the current implementation for supporting subdirectory tagging is broken due to an apple-with-oranges-style comparison of the commit tree hash and the package tree hash, which are not identical for subdirectory tags.
Long version
The setup
After #279 had been merged, we tried to use TagBot to tag a new release of LibTrixi.jl, which lives in the
LibTrixi.jl
subdirectory of https://github.com/trixi-framework/libtrixi. Correspondingly, we have configured TagBot withsubdir: LibTrixi.jl
.The error message
However, during its run, TagBot quit early with the following messages:
(see this log message).
The origin
The "Tree SHA ... does not match" error comes from the following snippet:
TagBot/tagbot/action/repo.py
Lines 258 to 263 in 7ac5878
where
m[1]
holds the first 32 characters of the commit hash that has been extracted from the PR body of the corresponding PR to https://github.com/JuliaRegistries/General. In our case, it got the commit hash6e615c91364595e8ad1feca67ebfbb39
from JuliaRegistries/General#91262 (comment).As an additional safety measure (or maybe it is required to catch ambiguities, but I cannot see where they might come from), in line 259 the tree hash of the registered commit (
commit.commit.tree.sha
) is compared against the tree hash stored in the Julia registry for the corresponding package version. In our case, the version isv0.1.2
, and TagBot extracted the package tree hash7911434c8b808550e3bd6c321dd50da91b128a9e
from this file.The bug (?)
And this is where the problem is located: The package tree hash
7911...
is obtained by computing the git tree hash of the package directory, which in our case is the subdirectoryLibTrixi.jl
. The commit tree hash, against it is compared here, however, is computed on the entire repository.To quickly verify that they are different, you can check the different strings from the libtrixi repo:
which will output something like
Obviously, this comparison of the commit tree hash and the package tree hash is bound to fail whenever the package lives in a subdirectory of the repository.
The solution
Currently, I see no obvious way out of this: Relying on the tree hash to determine the appropriate commit hash works only if the tree hash was computed for the entire repository, as otherwise there is no unique tree hash -> commit hash relation anymore. Since we are not interested in the content here (in which case that would be fine) but in the unique commit, this mechanism does fundamentally not work for subdirectory tags.
The only viable option I see atm is to remove the fallback mechanism for the commit identification and solely rely on the registry PR information. If this is not an option, one could consider doing this only for subdirectory tags and to leave the existing machinery in place for full-repo tags. Removing the (currently usuable) subdirectory support is imho not an option.
What are your thoughts on this @christopher-dG @IanButterworth (and maybe @hannahilea)?
cc @bgeihe
The text was updated successfully, but these errors were encountered: