-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Improve command_static_url repack archive logic #591
Conversation
f85c429
to
0cfc615
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thoroughly documented PR, this makes sense to me!
I also appreciate the effort to make this backwards compatible, can you help me understand which cases this might break?
It looks like the only cases where we might no longer repack are:
-
If orig has no suffixes and dest is something other than
.tar.zst
.. However havingdest
be something other than.tar.zst
is already not supported, so this case is fully backwards compatible. -
If orig doesn't end in
tar.*
,.zip
or.tgz
AND dest ends with the same last suffix that orig does. But again, becausedest
MUST end withtar.zstd
, I think the only case where we might break something is iforig
ends with.zstd
but not.tar.zstd
.
Am I understanding that right? If so, I agree that this doesn't need to be considered a breaking change (maybe it still is technically.. but the edge case is so tiny I don't think it should matter).
I do have a couple nits, would you mind pushing a quick fix for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm going to request changes for the nits just to make expectations clear :)
The other thought I had, is maybe we should just break backwards compat and solve this properly, e.g by forcing the user to be explicit about whether they want a repackage or not. I don't love how complicated this all is :) That said, this PR clearly fixes some buggy cases, so is a step in the right direction and I'm happy to take it in the meantime. |
0cfc615
to
f2736bb
Compare
UpdateI've fixed all of the nits.
Regarding case 1) I agree with your reasoning here. Regarding case 2) If However, with the new code, it would rename I don't want to introduce problematic behavior of a false-positive fetch task that should fail. I could add an extra case for this in the logic, or we could try to ad Ultimately, I agree with your comment that all of this should be explicit, rather than detected. Though, I think that's better deserving of a follow-up PR. I'd be happy to file an issue for it. |
af0a9f1
to
d95db3b
Compare
This is a little gross, but based on the discussion regarding case 2) above, I've added this logic: if orig.suffixes[-1:] == [".zst"] and dest.suffixes[-2:] == [".tar", ".zst"]:
# If we have a source ".zst" file and the destination is a ".tar.zst" file then a repack is necessary.
return True It's a bit misleading because the repack will fail, but that's better than renaming in this case and failing later. I don't like this solution, and I'm happy to try to add But at this point I'd like to open this back up for discussion. My preference would be to take the time to do this correctly, rather than rush on a decision here. |
Alternatively, we could probably simplify the logic by just saying that if the That would probably be the simplest thing to do in the near-term that ensures no breakage. Let me know how you would like to move forward and I'll make it happen! |
d95db3b
to
b8e2530
Compare
Okay, actually I like this logic more. I changed it to this. Please let me know your thoughts. |
This patch improves the logic of static URL fetch tasks to better be able to determine if a file simply needs to be renamed, or if it is an archive that needs to be repackaged.
b8e2530
to
4cf03e8
Compare
Okay, sorry for the stream of comments here! I've updated all the logic to what I believe is a sufficient algorithm, and I added several new test cases. To help further verify, I temporarily changed the function to this: def should_repack_archive(orig: pathlib.Path, dest: pathlib.Path, strip_components=0, add_prefix="") -> bool:
if orig != dest or strip_components or add_prefix:
return True
return False This is effectively the old logic. Then I ran the current set of test cases, and the only ones that failed are these cases:
The first two are cases that will never happen in practice, since the file will already be renamed if the suffixes match. However, I felt it important the logic within this function be able to stand on its own, and these are valid detections of cases that should be a rename rather than a repack. The second two are the edge cases for which I wrote this patch in the first place, which we know will be different behavior. To me that is a pretty good indicator that we haven't regressed anything with this PR. |
Whatever you think is easiest, probably special casing it?
Agreed, let's not worry about this now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I like this approach!
Longer term we can still make this all more explicit, but it's probably not a high priority.
Description
This patch improves the logic of static URL fetch tasks to better be able to determine if a file simply needs to be renamed, or if it is an archive that needs to be repackaged.
Background
I was attempting to add the following fetch task to Firefox CI:
The issue is that the downloaded artifact (
dl_path
) is9ee26e91-9b52-44ba-8d30-c0230dd587b2.bin
, however the desired artifact-name (path
) ismodel.esen.intgemm.alphas.bin
If you look at the log file for this fetch task in the current state of the code, you will see:
The Problem
The problem lies in these lines of code where we compare the suffixes for the file. If the suffixes don't match exactly, then we treat it as an archive that needs to be repacked.
I added some extra logging (prefixed by
!!!
), and you can see that my original file only has the suffix of.bin
, but the detination file has the suffixes of['.esen', '.intgemm', '.alphas', '.bin']
. This causes it to be unnecessarily treated as an archive, which fails the fetch.Now, this isn't the only case of a suffix mismatch that should be a rename. Consider this example.
In this case we have the file
python-3.8.10-amd64.exe
, which has the detected suffixes of['.8', '.10-amd64', '.exe']
due to the semantic version being included in the file name. This fetch task happens to be working in production because we are not renaming the artifact and the suffixes happen to match. However, if we were to try to doartifact-name: python.exe
, we would run into the same situation where the suffixes mismatch (['.8', '.10-amd64', '.exe']
vs.['.exe']
) and the.exe
file would be treated as an archive, which would result in the fetch task failing.The Solution
I have attempted to amend the code in a way that is both backward compatible with all of the current fetch tasks, and that is forward compatible with the naming conventions that I need to support with multiple suffixes. This code attempts to determine whether a file simply needs to be renamed, or whether it is an archive that needs to be repacked.
You can see a working group of fetch tasks here:
https://treeherder.mozilla.org/jobs?repo=try&revision=1cd59eed16adb9711a73e4d27cf253ffe3b3e100
I've also added some extra pytests to this repository.
The Outcome
My hope is that we can get these changes approved and published as a dot release, then re-vendored into the Firefox source tree so that I can land my new fetch tasks as soon as possible.