Skip to content

Commit

Permalink
Verification Code and Checksum computation updates (#43)
Browse files Browse the repository at this point in the history
* On further review on the SPDX specification, updated the algorithm for computing a package verification code
* Replaced the function spdxchecksum() with ComputePackageVerificationCode() and ComputeFileChecksum()
* Resolved #40: Handling of symbolic links when computing the package verification code
* Resolved #29: Support checksum calculation on a single file
* Resolved #28: Use the Logging standard library to record all the files processed and their checksums
  • Loading branch information
SamuraiAku authored Jan 30, 2024
1 parent d9b3b69 commit e9e9062
Show file tree
Hide file tree
Showing 13 changed files with 109 additions and 36 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# CHANGELOG

## New Version
* On further review on the SPDX specification, updated the algorithm for computing a package verification code
* Replaced the function spdxchecksum() with ComputePackageVerificationCode() and ComputeFileChecksum()
* Resolved [#40](https://github.com/SamuraiAku/SPDX.jl/issues/40): Handling of symbolic links when computing the package verification code
* Resolved [#29](https://github.com/SamuraiAku/SPDX.jl/issues/29): Support checksum calculation on a single file
* Resolved [#28](https://github.com/SamuraiAku/SPDX.jl/issues/28): Use the Logging standard library to record all the files processed and their checksums

## v0.3.2
* Add lots of tests to improve Code Coverage

Expand Down
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
TimeZones = "f269a46b-ccf7-5d73-abea-4c690281aa53"
UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"

[compat]
DataStructures = "0.18"
Expand Down
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,28 @@ updatenamespace!(myDoc) # Updates only the UUID portion of the namespace

setcreationtime!(myDoc) # Sets document creation time to the local time, taking the local timezone into account

# Compute a verification code or checksum of a directory [Clauses 7.9, 7.10]
# Compute a verification code of a package directory [Clause 7.9]
# Returns object of type SpdxPkgVerificationCodeV2
# NOTES:
# Files that are excluded by name are included in the ExcludedFiles property
# Symbolic links are automatically excluded from the computation and included in the ExcludedFiles property, unless the links are inside an excluded directory or pattern
# Directories and excluded patterns (not shown in example below) are NOT included in the ExcludedFiles property. The reasoning being that these are temporary/version control locations that are not part of the released package.
#
# Example Call: Compute a verification code that ignores a specific file and a .git directory at the root level. A common usage pattern.
verif_code= ComputePackageVerificationCode("/path/to/pkgdir", ["IgnoreThisFile.spdx.json"], [".git"]) #
# Example Return:
# e0b4c73534bc495ebf43effa633b424d52899183 (excludes: IgnoreThisFile.spdx.json link_to_file)
# Logging:
# If LoggingLevel is set to -100 or lower, then a full file listing will be logged along with the hash of each file for user review. See the documention of Julia standard logging facilities for details.


# Compute the checksum of a package tarball [Clause 7.10]
# Returns object of type SpdxChecksumV2
# Supported checksum algorithms are:
# ["SHA1", "SHA224", "SHA256", "SHA384", "SHA512", "SHA3-256", "SHA3-384", "SHA3-512"]
spdxchecksum("SHA1", "/path/to/dir", ["IgnoreThisFile.spdx.json"], [".git"]) # Compute a checksum that ignores a specific file and a .git directory at the root level. A common usage pattern.

file_cksum= ComputeFileChecksum(("SHA256", "/path/to/package.tar.gz")
# Example Return:
# SHA256: 4b1dfe7b8886825527a362ee37244a665a32f68d9e7ca53c521dfec9ae8cd41a
```
## SPDX Document Structure
Expand Down
1 change: 1 addition & 0 deletions src/SPDX.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ using UUIDs
using TimeZones
using SHA
using Base.Filesystem
using Logging

#######################
Base.Bool(x::AbstractString)= parse(Bool, lowercase(x))
Expand Down
92 changes: 61 additions & 31 deletions src/checksums.jl
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: MIT

export spdxchecksum
export ComputePackageVerificationCode, ComputeFileChecksum

function spdxchecksum(algorithm::AbstractString, rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}= String[], excluded_dirlist::Vector{<:AbstractString}= String[], excluded_patterns::Vector{Regex}=Regex[])
function determine_checksum_algorithm(algorithm::AbstractString)
# Check to see if algorithm is in the list of support algorithms, unsupported algorithms, or not recognized
# TODO: substitute "_" for "-" and other things to account for user typos
supported_algorithms= Set(["SHA1", "SHA224", "SHA256", "SHA384", "SHA512", "SHA3-256", "SHA3-384", "SHA3-512"])
Expand All @@ -12,42 +12,43 @@ function spdxchecksum(algorithm::AbstractString, rootdir::AbstractString, exclud
issubset(Set([algorithm]), unsupported_algorithms) && error("checksum(): The hash algorithm $(algorithm) is not supported by SPDX.jl")
issubset(Set([algorithm]), supported_algorithms) || error("checksum(): algorithm $(algorithm) is not recognized")

HashFunction, HashContext= (algorithm == "SHA1") ? (sha1, SHA1_CTX) :
(algorithm == "SHA224") ? (sha224, SHA224_CTX) :
(algorithm == "SHA256") ? (sha256, SHA256_CTX) :
(algorithm == "SHA384") ? (sha384, SHA384_CTX) :
(algorithm == "SHA512") ? (sha512, SHA256_CTX) :
(algorithm == "SHA3-256") ? (sha3_256, SHA3_256_CTX) :
(algorithm == "SHA3-384") ? (sha3_384, SHA3_384_CTX) :
(sha3_512, SHA3_512_CTX)
HashFunction= (algorithm == "SHA1") ? sha1 :
(algorithm == "SHA224") ? sha224 :
(algorithm == "SHA256") ? sha256 :
(algorithm == "SHA384") ? sha384 :
(algorithm == "SHA512") ? sha512 :
(algorithm == "SHA3-256") ? sha3_256 :
(algorithm == "SHA3-384") ? sha3_384 :
sha3_512

package_hash::Vector{UInt8}= spdxchecksum_sha(HashFunction, HashContext, rootdir, excluded_flist, excluded_dirlist, excluded_patterns)

return package_hash
return HashFunction
end

function spdxchecksum_sha(HashFunction::Function, HashContext::DataType, rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex})
flist_hash::Vector{Vector{UInt8}}= [file_hash(file, HashFunction) for file in getpackagefiles(rootdir, excluded_flist, excluded_dirlist, excluded_patterns)]
flist_hash= sort(flist_hash)

ctx= HashContext()
for hash in flist_hash
SHA.update!(ctx, hash)
end

return SHA.digest!(ctx)
###############################
function spdxverifcode(rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex})
ignored_files= String[]
flist_hash::Vector{String}= [file_hash(file, sha1) for file in getpackagefiles(rootdir, excluded_flist, excluded_dirlist, excluded_patterns, ignored_files)]
flist_hash= sort(flist_hash)
combined_hashes= join(flist_hash)
return (sha1(combined_hashes), ignored_files)
end


###############################
file_hash(fpath::AbstractString, HashFunction::Function)= open(fpath) do f
return HashFunction(f)
hash= HashFunction(f)
@logmsg Logging.LogLevel(-100) "$(string(HashFunction))($fpath)= $(bytes2hex(hash))"
return bytes2hex(hash)
end


###############################
function getpackagefiles(rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex})
return Channel{String}(chnl -> _getpackagefiles(chnl, rootdir, excluded_flist, excluded_dirlist, excluded_patterns))
function getpackagefiles(rootdir, excluded_flist, excluded_dirlist, excluded_patterns, ignored_files)
return Channel{String}(chnl -> _getpackagefiles(chnl, rootdir, excluded_flist, excluded_dirlist, excluded_patterns, ignored_files))
end

function _getpackagefiles(chnl, root::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex})
function _getpackagefiles(chnl, root::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex}, ignored_files::Vector{String})
# On first call of this function put an absolute path on root and exclusion lists
isabspath(root) || (root= abspath(root))
all(isabspath.(excluded_flist)) || (excluded_flist= normpath.(joinpath.(root, excluded_flist)))
Expand All @@ -61,17 +62,46 @@ function _getpackagefiles(chnl, root::AbstractString, excluded_flist::Vector{<:A
for path in content
if isdir(path)
if any(excluded_dirlist .== path)
continue # Skip over exluded directories
@logmsg Logging.LogLevel(-100) "Skipping Directory $path"
elseif islink(path)
push!(ignored_files, path)
@logmsg Logging.LogLevel(-100) "Excluding symbolic link $path"
else
_getpackagefiles(chnl, path, excluded_flist, excluded_dirlist, excluded_patterns) # Descend into the directory and get the files there
_getpackagefiles(chnl, path, excluded_flist, excluded_dirlist, excluded_patterns, ignored_files) # Descend into the directory and get the files there
end
elseif any(excluded_flist .== path)
continue # Skip over excluded files
@logmsg Logging.LogLevel(-100) "Excluding File $path"
push!(ignored_files, path)
elseif any(occursin.(excluded_patterns, path))
continue # Skip files that match one of the excluded patterns
@logmsg Logging.LogLevel(-100) "Ignoring $path which matches an excluded pattern" pattern_regexes= excluded_patterns
elseif islink(path)
@logmsg Logging.LogLevel(-100) "Excluding symbolic link $path"
push!(ignored_files, path) # Any link that passes the previous checks is a part of the deployed code and it's exclusion from the computation needs to be noted
else
push!(chnl, path) # Put the file path in the channel
push!(chnl, path) # Put the file path in the channel. Then block until it is taken
end
end
return nothing
end


###############################
function ComputePackageVerificationCode(rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}= String[], excluded_dirlist::Vector{<:AbstractString}= String[], excluded_patterns::Vector{Regex}=Regex[])
@logmsg Logging.LogLevel(-50) "Computing Verification Code at: $rootdir" excluded_flist= excluded_flist excluded_dirlist= excluded_dirlist excluded_patterns= excluded_patterns
package_hash, ignored_files= spdxverifcode(rootdir, excluded_flist, excluded_dirlist, excluded_patterns)
ignored_files= relpath.(ignored_files, rootdir)
verif_code= SpdxPkgVerificationCodeV2(bytes2hex(package_hash), ignored_files)
@logmsg Logging.LogLevel(-50) "Verification Code= $(string(verif_code))"
return verif_code
end


###############################
function ComputeFileChecksum(algorithm::AbstractString, filepath::AbstractString)
@logmsg Logging.LogLevel(-50) "Computing File Checksum on $filepath"
HashFunction= determine_checksum_algorithm(algorithm)
fhash= file_hash(filepath, HashFunction)
checksum_obj= SpdxChecksumV2(algorithm, fhash)
@logmsg Logging.LogLevel(-50) string(checksum_obj)
return checksum_obj
end
1 change: 1 addition & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ using Test
using JSON
using Dates
using TimeZones
using SHA

@testset "Bool check" begin
@test Bool(" True ")
Expand Down
14 changes: 12 additions & 2 deletions test/test_checksums.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
@testset "checksums" begin
checksum= spdxchecksum("SHA1", pkgdir(SPDX), String["SPDX.spdx.json"], String[".git"])
@test checksum isa Vector{UInt8} # No good way to indepently verify that the calculation is correct.
verifcode= ComputePackageVerificationCode(pkgdir(SPDX), String["SPDX.spdx.json"], String[".git"])
@test verifcode isa SpdxPkgVerificationCodeV2 # No good way to indepently verify that the calculation is correct.
@test issubset(["SPDX.spdx.json"], verifcode.ExcludedFiles)

checksum= ComputeFileChecksum("SHA256", joinpath(pkgdir(SPDX), "Project.toml"))
@test checksum isa SpdxChecksumV2
@test checksum.Hash == open(joinpath(pkgdir(SPDX), "Project.toml")) do f
return bytes2hex(sha256(f))
end

linktest_code= ComputePackageVerificationCode(joinpath(pkgdir(SPDX), "test", "test_package"))
@test issetequal(linktest_code.ExcludedFiles, ["dir_link", "file_link", "src/bad_link"])
end
1 change: 1 addition & 0 deletions test/test_package/dir_link
1 change: 1 addition & 0 deletions test/test_package/file1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a file.
1 change: 1 addition & 0 deletions test/test_package/file2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is also a file
1 change: 1 addition & 0 deletions test/test_package/file_link
1 change: 1 addition & 0 deletions test/test_package/src/bad_link
1 change: 1 addition & 0 deletions test/test_package/src/file3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a file as well.

0 comments on commit e9e9062

Please sign in to comment.