-
-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for overriding the default hash format (new --hash-format argument) #3709
Conversation
Adding to 4.1 release plans. Subject to change. |
This change adds support for a new --hash-format parameter that can be used to override the default hash format used by SCons. The default remains MD5, but this allows consumers to opt into SHA1, SHA256, or any other hash algorithm offered by their implementation of hashlib.
Fixes one sider issue and a code error that broke some tests
1. Remove unnecessary writes of f1.in and f2.in. The former is already in the dir fixture and the latter is unused. 2. Untabify SConstruct file.
Rebasing off of master put my CHANGES.txt addition in an old release. This commit moves it up to the correct section.
cf61dc0
to
a1bd89e
Compare
Yesterday's md5_chunksize PR conflicted with this one and the start of the branch was a bit old, so I have rebased this PR off of master. |
I stumbled across a trap in this transition, a hidden assumption that a signature is always 32 bytes. Just adding here so it's recorded somewhere. Look for:
A quick grep suggests there's only the one of these. |
This function is in convert_old_entry, which is called here:
I could change the check from "== 32" to ">= 32" but I don't believe that this code would operate on anything other than MD5 hashes because it would only be called on very old .sconsign files. |
Hmmm, I wonder if we can ditch that logic entirely. Many of these "old" things are 10+ years old. |
So if this is going to be pursued, it's time to flush out if this approach is okay with the maintainer, or if a hash-format-detecting approach is needed (possibly with a one-time conversion tool to "upgrade" old sconsigns)? |
I think I've suggested this before, if the hash isn't md5, then the sconsign file name should have the hash type in it. Unless we want to go down the rathole of trying to figure out which hash was used in an existing sconsign file, or create a new format which has the hash type used somewhere in the header of the sconsign file.. We can add logic to check for each filename when opening if there's already one, otherwise default if not speicfied? |
I'm having trouble understanding how such a check would happen. Can you help me understand what the benefit of this would be? Would the downside be that such a check have the side effect that would unnecessarily cause rebuilds when switching hash formats if you are using a timestamp decider? |
Also, two more questions:
|
Can I just respond to question 2 above by saying "it's horrid"? I had had thoughts of changing things to use multihash (https://pymultihash.readthedocs.io/en/latest/), but it might be this is a case of YAGNI, and just having a |
Ok I think I figured out how to do this. I'll push an update in the next hour if I can validate that I haven't broken SConsign-related tests. |
This was requested in the code review. The sconsign database file name is still .sconsign.dblite if the hash format is not overridden, but if it is, the name will be something like .sconsign_sha256.dblite.
PR is updated to change the sconsign db file name if the hash format is overridden. Functional test now validates that the db name is correct if the hash format is overridden either by --hash-format or by SCons.Util.set_hash_format(). |
Looks good to me.
|
This is the same question I have with other work in progress, if multiple signature DBs are detected but no directive to help pick one is detected, how should SCons behave? Default to using .dblite always, or pick a newer one if it exists, since it's presumably "intentional" to have created it? I can see arguments both ways. |
I think it should walk the possibilities and take the first in order. |
Only thing left is to pick a hash format based on the sconsign database name.
1. Fix failure finding UserError. 2. Fix bad string formatting. 3. Add test case covering passing an invalid hash format. 4. Remove blake2b, as I haven't tested it. We can add it some day if people want it.
I did some diff updates and these two comments are the big ones left. For #1, I did add a deprecation warning and updated the documentation but I am not sure if there are other changes I need to make. For #2, I am concerned that SConsign.Get_Database() is called too late. As far as I can tell, the SConsign database is lazy-loaded so for example, in a test, the first call to self.taskmaster.next_task() is what causes the database to be loaded (Node.get_stored_info() calls Dir.sconsign() which then loads the database if it isn't already loaded). As a result, I would prefer to not do this one. |
Seems like there's a flake8 warning from CI: SCons/Util.py:1526 do not use bare 'except' (E722) I'd say we can discuss whether an actual deprecation warning needs to be added to the code as a separate item, but let's see what Bill thinks... |
Now that we have a much more limited selection of supported algorithms, I don't need the code to actually call the hash function in set_hash_format. This also means that I don't need a try/except and can instead use getattr().
IMHO this is great work. One question to resolve, and feel free to say push this to later, is about using the list of hashes from @grossag @mwichmann - thoughts? |
This change adds support for a new --hash-format parameter that can be used to override the default hash format used by SCons. The default remains MD5, but this allows consumers to opt into SHA1, SHA256, or any other hash algorithm offered by their implementation of hashlib.
Contributor Checklist:
CHANGES.txt
(and read theREADME.rst
)