Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verification Code and Checksum computation updates #43

Merged
merged 15 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# CHANGELOG

## New Version
* On further review on the SPDX specification, updated the algorithm for computing a package verification code
* Replaced the function spdxchecksum() with ComputePackageVerificationCode() and ComputeFileChecksum()
* Resolved [#40](https://github.com/SamuraiAku/SPDX.jl/issues/40): Handling of symbolic links when computing the package verification code
* Resolved [#29](https://github.com/SamuraiAku/SPDX.jl/issues/29): Support checksum calculation on a single file
* Resolved [#28](https://github.com/SamuraiAku/SPDX.jl/issues/28): Use the Logging standard library to record all the files processed and their checksums

## v0.3.2
* Add lots of tests to improve Code Coverage

Expand Down
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
SHA = "ea8e919c-243c-51af-8825-aaa63cd721ce"
TimeZones = "f269a46b-ccf7-5d73-abea-4c690281aa53"
UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"

[compat]
DataStructures = "0.18"
Expand Down
23 changes: 20 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,28 @@ updatenamespace!(myDoc) # Updates only the UUID portion of the namespace

setcreationtime!(myDoc) # Sets document creation time to the local time, taking the local timezone into account

# Compute a verification code or checksum of a directory [Clauses 7.9, 7.10]
# Compute a verification code of a package directory [Clause 7.9]
# Returns object of type SpdxPkgVerificationCodeV2
# NOTES:
# Files that are excluded by name are included in the ExcludedFiles property
# Symbolic links are automatically excluded from the computation and included in the ExcludedFiles property, unless the links are inside an excluded directory or pattern
# Directories and excluded patterns (not shown in example below) are NOT included in the ExcludedFiles property. The reasoning being that these are temporary/version control locations that are not part of the released package.
#
# Example Call: Compute a verification code that ignores a specific file and a .git directory at the root level. A common usage pattern.
verif_code= ComputePackageVerificationCode("/path/to/pkgdir", ["IgnoreThisFile.spdx.json"], [".git"]) #
# Example Return:
# e0b4c73534bc495ebf43effa633b424d52899183 (excludes: IgnoreThisFile.spdx.json link_to_file)
# Logging:
# If LoggingLevel is set to -100 or lower, then a full file listing will be logged along with the hash of each file for user review. See the documention of Julia standard logging facilities for details.


# Compute the checksum of a package tarball [Clause 7.10]
# Returns object of type SpdxChecksumV2
# Supported checksum algorithms are:
# ["SHA1", "SHA224", "SHA256", "SHA384", "SHA512", "SHA3-256", "SHA3-384", "SHA3-512"]
spdxchecksum("SHA1", "/path/to/dir", ["IgnoreThisFile.spdx.json"], [".git"]) # Compute a checksum that ignores a specific file and a .git directory at the root level. A common usage pattern.

file_cksum= ComputeFileChecksum(("SHA256", "/path/to/package.tar.gz")
# Example Return:
# SHA256: 4b1dfe7b8886825527a362ee37244a665a32f68d9e7ca53c521dfec9ae8cd41a
```

## SPDX Document Structure
Expand Down
1 change: 1 addition & 0 deletions src/SPDX.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ using UUIDs
using TimeZones
using SHA
using Base.Filesystem
using Logging

#######################
Base.Bool(x::AbstractString)= parse(Bool, lowercase(x))
Expand Down
92 changes: 61 additions & 31 deletions src/checksums.jl
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: MIT

export spdxchecksum
export ComputePackageVerificationCode, ComputeFileChecksum

function spdxchecksum(algorithm::AbstractString, rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}= String[], excluded_dirlist::Vector{<:AbstractString}= String[], excluded_patterns::Vector{Regex}=Regex[])
function determine_checksum_algorithm(algorithm::AbstractString)
# Check to see if algorithm is in the list of support algorithms, unsupported algorithms, or not recognized
# TODO: substitute "_" for "-" and other things to account for user typos
supported_algorithms= Set(["SHA1", "SHA224", "SHA256", "SHA384", "SHA512", "SHA3-256", "SHA3-384", "SHA3-512"])
Expand All @@ -12,42 +12,43 @@ function spdxchecksum(algorithm::AbstractString, rootdir::AbstractString, exclud
issubset(Set([algorithm]), unsupported_algorithms) && error("checksum(): The hash algorithm $(algorithm) is not supported by SPDX.jl")
issubset(Set([algorithm]), supported_algorithms) || error("checksum(): algorithm $(algorithm) is not recognized")

HashFunction, HashContext= (algorithm == "SHA1") ? (sha1, SHA1_CTX) :
(algorithm == "SHA224") ? (sha224, SHA224_CTX) :
(algorithm == "SHA256") ? (sha256, SHA256_CTX) :
(algorithm == "SHA384") ? (sha384, SHA384_CTX) :
(algorithm == "SHA512") ? (sha512, SHA256_CTX) :
(algorithm == "SHA3-256") ? (sha3_256, SHA3_256_CTX) :
(algorithm == "SHA3-384") ? (sha3_384, SHA3_384_CTX) :
(sha3_512, SHA3_512_CTX)
HashFunction= (algorithm == "SHA1") ? sha1 :
(algorithm == "SHA224") ? sha224 :
(algorithm == "SHA256") ? sha256 :
(algorithm == "SHA384") ? sha384 :
(algorithm == "SHA512") ? sha512 :
(algorithm == "SHA3-256") ? sha3_256 :
(algorithm == "SHA3-384") ? sha3_384 :
sha3_512

package_hash::Vector{UInt8}= spdxchecksum_sha(HashFunction, HashContext, rootdir, excluded_flist, excluded_dirlist, excluded_patterns)

return package_hash
return HashFunction
end

function spdxchecksum_sha(HashFunction::Function, HashContext::DataType, rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex})
flist_hash::Vector{Vector{UInt8}}= [file_hash(file, HashFunction) for file in getpackagefiles(rootdir, excluded_flist, excluded_dirlist, excluded_patterns)]
flist_hash= sort(flist_hash)

ctx= HashContext()
for hash in flist_hash
SHA.update!(ctx, hash)
end

return SHA.digest!(ctx)
###############################
function spdxverifcode(rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex})
ignored_files= String[]
flist_hash::Vector{String}= [file_hash(file, sha1) for file in getpackagefiles(rootdir, excluded_flist, excluded_dirlist, excluded_patterns, ignored_files)]
flist_hash= sort(flist_hash)
combined_hashes= join(flist_hash)
return (sha1(combined_hashes), ignored_files)
end


###############################
file_hash(fpath::AbstractString, HashFunction::Function)= open(fpath) do f
return HashFunction(f)
hash= HashFunction(f)
@logmsg Logging.LogLevel(-100) "$(string(HashFunction))($fpath)= $(bytes2hex(hash))"
return bytes2hex(hash)
end


###############################
function getpackagefiles(rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex})
return Channel{String}(chnl -> _getpackagefiles(chnl, rootdir, excluded_flist, excluded_dirlist, excluded_patterns))
function getpackagefiles(rootdir, excluded_flist, excluded_dirlist, excluded_patterns, ignored_files)
return Channel{String}(chnl -> _getpackagefiles(chnl, rootdir, excluded_flist, excluded_dirlist, excluded_patterns, ignored_files))
end

function _getpackagefiles(chnl, root::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex})
function _getpackagefiles(chnl, root::AbstractString, excluded_flist::Vector{<:AbstractString}, excluded_dirlist::Vector{<:AbstractString}, excluded_patterns::Vector{Regex}, ignored_files::Vector{String})
# On first call of this function put an absolute path on root and exclusion lists
isabspath(root) || (root= abspath(root))
all(isabspath.(excluded_flist)) || (excluded_flist= normpath.(joinpath.(root, excluded_flist)))
Expand All @@ -61,17 +62,46 @@ function _getpackagefiles(chnl, root::AbstractString, excluded_flist::Vector{<:A
for path in content
if isdir(path)
if any(excluded_dirlist .== path)
continue # Skip over exluded directories
@logmsg Logging.LogLevel(-100) "Skipping Directory $path"
elseif islink(path)
push!(ignored_files, path)
@logmsg Logging.LogLevel(-100) "Excluding symbolic link $path"
else
_getpackagefiles(chnl, path, excluded_flist, excluded_dirlist, excluded_patterns) # Descend into the directory and get the files there
_getpackagefiles(chnl, path, excluded_flist, excluded_dirlist, excluded_patterns, ignored_files) # Descend into the directory and get the files there
end
elseif any(excluded_flist .== path)
continue # Skip over excluded files
@logmsg Logging.LogLevel(-100) "Excluding File $path"
push!(ignored_files, path)
elseif any(occursin.(excluded_patterns, path))
continue # Skip files that match one of the excluded patterns
@logmsg Logging.LogLevel(-100) "Ignoring $path which matches an excluded pattern" pattern_regexes= excluded_patterns
elseif islink(path)
@logmsg Logging.LogLevel(-100) "Excluding symbolic link $path"
push!(ignored_files, path) # Any link that passes the previous checks is a part of the deployed code and it's exclusion from the computation needs to be noted
else
push!(chnl, path) # Put the file path in the channel
push!(chnl, path) # Put the file path in the channel. Then block until it is taken
end
end
return nothing
end


###############################
function ComputePackageVerificationCode(rootdir::AbstractString, excluded_flist::Vector{<:AbstractString}= String[], excluded_dirlist::Vector{<:AbstractString}= String[], excluded_patterns::Vector{Regex}=Regex[])
@logmsg Logging.LogLevel(-50) "Computing Verification Code at: $rootdir" excluded_flist= excluded_flist excluded_dirlist= excluded_dirlist excluded_patterns= excluded_patterns
package_hash, ignored_files= spdxverifcode(rootdir, excluded_flist, excluded_dirlist, excluded_patterns)
ignored_files= relpath.(ignored_files, rootdir)
verif_code= SpdxPkgVerificationCodeV2(bytes2hex(package_hash), ignored_files)
@logmsg Logging.LogLevel(-50) "Verification Code= $(string(verif_code))"
return verif_code
end


###############################
function ComputeFileChecksum(algorithm::AbstractString, filepath::AbstractString)
@logmsg Logging.LogLevel(-50) "Computing File Checksum on $filepath"
HashFunction= determine_checksum_algorithm(algorithm)
fhash= file_hash(filepath, HashFunction)
checksum_obj= SpdxChecksumV2(algorithm, fhash)
@logmsg Logging.LogLevel(-50) string(checksum_obj)
return checksum_obj
end
1 change: 1 addition & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ using Test
using JSON
using Dates
using TimeZones
using SHA

@testset "Bool check" begin
@test Bool(" True ")
Expand Down
14 changes: 12 additions & 2 deletions test/test_checksums.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@
@testset "checksums" begin
checksum= spdxchecksum("SHA1", pkgdir(SPDX), String["SPDX.spdx.json"], String[".git"])
@test checksum isa Vector{UInt8} # No good way to indepently verify that the calculation is correct.
verifcode= ComputePackageVerificationCode(pkgdir(SPDX), String["SPDX.spdx.json"], String[".git"])
@test verifcode isa SpdxPkgVerificationCodeV2 # No good way to indepently verify that the calculation is correct.
@test issubset(["SPDX.spdx.json"], verifcode.ExcludedFiles)

checksum= ComputeFileChecksum("SHA256", joinpath(pkgdir(SPDX), "Project.toml"))
@test checksum isa SpdxChecksumV2
@test checksum.Hash == open(joinpath(pkgdir(SPDX), "Project.toml")) do f
return bytes2hex(sha256(f))
end

linktest_code= ComputePackageVerificationCode(joinpath(pkgdir(SPDX), "test", "test_package"))
@test issetequal(linktest_code.ExcludedFiles, ["dir_link", "file_link", "src/bad_link"])
end
1 change: 1 addition & 0 deletions test/test_package/dir_link
1 change: 1 addition & 0 deletions test/test_package/file1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a file.
1 change: 1 addition & 0 deletions test/test_package/file2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is also a file
1 change: 1 addition & 0 deletions test/test_package/file_link
1 change: 1 addition & 0 deletions test/test_package/src/bad_link
1 change: 1 addition & 0 deletions test/test_package/src/file3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a file as well.
Loading