Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolved #34 - Option to make the package server the download location instead of the GitHub repo #35

Merged
merged 11 commits into from
Jul 19, 2024
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# CHANGELOG

## v0.1.12
* Resolved [#34](https://github.com/SamuraiAku/PkgToSoftwareBOM.jl/issues/34), Option to make the package server the download location instead of the GitHub repo

## v0.1.11
* Resolved [#18](https://github.com/SamuraiAku/PkgToSoftwareBOM.jl/issues/18), Put a package's git tree hash in the Download Location
* Pulled out some trailing whitespace ininformation fields
Expand Down
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ RegistryInstances = "2792f1a3-b283-48e8-9a74-f99dce5104f3"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
LicenseCheck = "726dbf0d-6eb6-41af-b36c-cd770e0f00cc"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
Downloads= "f43a241f-c20a-4ad4-852c-f6b1247861c6"

[compat]
SPDX = "0.4"
Expand Down
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,31 @@ writespdx(sbom, "path/to/package/source/MyPackageName.spdx.json")

One case that PkgToSoftwareBOM does not support properly today is when a previous version of the developer's package does not exist in the registry. In that case, the SBOM will list the path to the local copy of the package code, instead of the URL of the repository. This may be fixed in a later version.

## Optional Modes
PkgToSoftwareBOM has keywords that can be invoked with `spdxCreationData()`. These keywords modify the contents of the SBOM in ways that are useful in particular situations

### Use a package server as the DownloadLocation
The package developer's GitHub (or other) repository is the canonical source for the package code. By default, this repository is used to populate the field DownloadLocation in each package description.

But in everyday use, very few people actually download from there. Instead Pkg defaults to using the package server maintained by JuliaLang (https://pkg.julialang.org) or another package server specified by `ENV["JULIA_PKG_SERVER"]`. A package server maintains compressed tarballs of released source code for packages tracked by a registry. So you can argue that the SBOM should reflect that in the name of accuracy.

Also not every analyst would find it useful to be directed to the repo and then be expected to figure out how to use git to extract the correct version. A straight download location could be easier for them.

The user can change the DownloadLocation to the package server through the use of the keyword use_packageserver when creating a spdxCreationData object (see example below)

```julia
spdxCreationData(use_packageserver= true)
```

When this keyword is used, PkgToSoftwareBOM will determine if each package has a valid package server URL and use it if available. If the JuliaLang package server is used, then the package Supplier field will be updated to reflect that.
```
Organization: JuliaLang ()
```

If a valid package server URL cannot be determined, then the repository link will be used.

In all cases, the repository URL is documented in the HomePage field of the package description.

## How does PkgToSoftwareBOM support mulitple registries?

The majority of users and developers only ever use the General registry and that is what PkgToSoftwareBOM defaults to to find package information.
Expand Down
3 changes: 3 additions & 0 deletions src/PkgToSoftwareBOM.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ using Artifacts
using RegistryInstances
using Base.BinaryPlatforms
using Logging
using Downloads

export spdxCreationData, spdxPackageInstructions

Expand All @@ -25,6 +26,7 @@ Base.@kwdef struct PackageRegistryInfo
packageURL::String
packageSubdir::String
packageTreeHash::Union{String, Nothing}
packageserverURL::Union{String, Nothing}

# It would be nice to add these fields, but first have to figure out how to resolve version ranges
#packageCompatibility::Dict{String, Any}
Expand Down Expand Up @@ -64,6 +66,7 @@ Base.@kwdef struct spdxCreationData
rootpackages::Dict{String, Base.UUID}= Pkg.project().dependencies
packageInstructions::Dict{UUID, spdxPackageInstructions}= Dict{UUID, spdxPackageInstructions}()
licenseScan::Bool= true
use_packageserver::Bool= false
end

include("Registry.jl")
Expand Down
96 changes: 89 additions & 7 deletions src/Registry.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,21 @@

###############################
# Think of a name that would be good fit for the Pkg API
function registry_packagequery(packages::Dict{UUID, Pkg.API.PackageInfo}, registries::Vector{<:AbstractString})
function registry_packagequery(packages::Dict{UUID, Pkg.API.PackageInfo}, registries::Vector{<:AbstractString}, use_packageserver::Bool)
if use_packageserver
server_registry_info= pkg_server_registry_info()
else
server_registry_info= nothing
end

if length(registries) == 1
return _registry_packagequery(packages, registries[1])
return _registry_packagequery(packages, registries[1], server_registry_info)
end

registry_pkg= Dict{UUID, Union{Nothing, Missing, PackageRegistryInfo}}()
querylist= packages
for reg in registries
reglist= _registry_packagequery(querylist, reg)
reglist= _registry_packagequery(querylist, reg, server_registry_info)
registry_pkg= merge(registry_pkg, reglist)
emptykeys= keys(filter(p-> isnothing(p.second) || ismissing(p.second), registry_pkg))
querylist= Dict{UUID, Pkg.API.PackageInfo}(k => packages[k] for k in emptykeys)
Expand All @@ -19,7 +25,7 @@ function registry_packagequery(packages::Dict{UUID, Pkg.API.PackageInfo}, regist
end

###############################
function _registry_packagequery(packages::Dict{UUID, Pkg.API.PackageInfo}, registry::AbstractString)
function _registry_packagequery(packages::Dict{UUID, Pkg.API.PackageInfo}, registry::AbstractString, server_registry_info)
#Get the requested registry
active_regs= reachable_registries()
selected_registry= nothing
Expand All @@ -35,13 +41,24 @@ function _registry_packagequery(packages::Dict{UUID, Pkg.API.PackageInfo}, regis
end
println("""Using registry "$(selected_registry.name)" @ $(selected_registry.path)""")

registry_pkg= Dict{Base.UUID, Union{Nothing, Missing, PackageRegistryInfo}}(k => populate_registryinfo(k, packages[k], selected_registry) for k in keys(packages))
if isnothing(server_registry_info)
packageserver= nothing
else
server, registry_info = server_registry_info
if selected_registry.uuid in keys(registry_info)
packageserver= "$server/package"
else
packageserver= nothing
end
end

registry_pkg= Dict{Base.UUID, Union{Nothing, Missing, PackageRegistryInfo}}(k => populate_registryinfo(k, packages[k], selected_registry, packageserver) for k in keys(packages))

return registry_pkg
end

###############################
function populate_registryinfo(uuid::UUID, package::Pkg.API.PackageInfo, registry::RegistryInstance)
function populate_registryinfo(uuid::UUID, package::Pkg.API.PackageInfo, registry::RegistryInstance, packageserver::Union{String, Nothing})
package.is_tracking_repo && return nothing
is_stdlib(uuid) && return nothing

Expand Down Expand Up @@ -70,6 +87,8 @@ function populate_registryinfo(uuid::UUID, package::Pkg.API.PackageInfo, registr
tree_hash= haskey(registryPkgData.version_info, package.version) ? treehash(registryPkgData, package.version) : nothing
package.is_tracking_registry && string(tree_hash) !== package.tree_hash && error("Tree hash of $(package.name) v$(string(package.version)) does not match registry: $(string(package.tree_hash)) (Package) vs. $(treehash(registryPkgData, package.version)) (Registry)")

packageserverURL= isnothing(packageserver) ? nothing : packageserver * "/$(uuid)/$(package.tree_hash)"

pkgRegInfo= PackageRegistryInfo(;
registryName= registry.name,
registryURL= registry.repo,
Expand All @@ -80,8 +99,71 @@ function populate_registryinfo(uuid::UUID, package::Pkg.API.PackageInfo, registr
packageVersion= package.version,
packageURL= registryPkgData.repo,
packageSubdir= packageSubdir,
packageTreeHash= string(tree_hash)
packageTreeHash= string(tree_hash),
packageserverURL= packageserverURL
)

return pkgRegInfo
end

################################
## The code below has been copied from Julia Package Manager v1.10.4 and modified as needed
## https://github.com/JuliaLang/Pkg.jl/tree/v1.10.4
#
# Copyright (c) 2017-2021: Stefan Karpinski, Kristoffer Carlsson, Fredrik Ekre, David Varela, Ian Butterworth, and contributors:
# https://github.com/JuliaLang/Pkg.jl/graphs/contributors
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

function pkg_server()
server = get(ENV, "JULIA_PKG_SERVER", "https://pkg.julialang.org")
isempty(server) && return nothing
startswith(server, r"\w+://") || (server = "https://$server")
return rstrip(server, '/')
end

################################
function pkg_server_registry_info()
registry_info = Dict{UUID, Base.SHA1}()
server = pkg_server()
server === nothing && return nothing
tmp_path = tempname()
download_ok = false
try
f = retry(delays = fill(1.0, 3)) do
Downloads.download("$server/registries", tmp_path)
end
f()
download_ok = true
catch err
@warn "Could not download $server/registries, unable to fill in package server URLs" exception=err
end
download_ok || return nothing
open(tmp_path) do io
for line in eachline(io)
if (m = match(r"^/registry/([^/]+)/([^/]+)$", line)) !== nothing
uuid = UUID(m.captures[1]::SubString{String})
hash = Base.SHA1(m.captures[2]::SubString{String})
registry_info[uuid] = hash
end
end
end
Base.rm(tmp_path, force=true)
return server, registry_info
end
18 changes: 15 additions & 3 deletions src/packageInfo.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,22 @@ function resolve_pkgsource!(package::SpdxPackageV2, packagedata::Pkg.API.Package

if packagedata.is_tracking_registry
# Simplest and most common case is if you are tracking a registered package
package.DownloadLocation= SpdxDownloadLocationV2("git+$(registrydata.packageURL)@$(packagedata.tree_hash)$(isempty(registrydata.packageSubdir) ? "" : "#"*registrydata.packageSubdir)")
repo_download= SpdxDownloadLocationV2("git+$(registrydata.packageURL)@$(packagedata.tree_hash)$(isempty(registrydata.packageSubdir) ? "" : "#"*registrydata.packageSubdir)")

if isnothing(registrydata.packageserverURL)
package.DownloadLocation= repo_download
package.SourceInfo= "Download Location is supplied by the $(registrydata.registryName) registry:\n$(registrydata.registryURL)"
package.SourceInfo= package.SourceInfo * "\nThe hash supplied in Download Location is not the typical git commit hash. Instead it is a git tree hash. The easiest way to retrieve this version from the cloned repository is to use the command:\ngit archive --output=path/to/archive.tar <tree hash>"
else
package.DownloadLocation= SpdxDownloadLocationV2(registrydata.packageserverURL)
if startswith(registrydata.packageserverURL, "https://pkg.julialang.org/")
package.Supplier= SpdxCreatorV2("Organization", "JuliaLang", "")
else
package.Supplier= SpdxCreatorV2("NOASSERTION")
end
package.SourceInfo= "Download is a compressed tarball, supplied from a package server, rather than the package source respository."
end
package.HomePage= registrydata.packageURL
package.SourceInfo= "Source Code Location is supplied by the $(registrydata.registryName) registry:\n$(registrydata.registryURL)"
package.SourceInfo= package.SourceInfo * "\nThe hash supplied in Download Location is not the typical git commit hash. Instead it is a git tree hash. The easiest way to retrieve this version from the cloned repository is to use the command:\ngit archive --output=path/to/archive.tar <tree hash>"
elseif packagedata.is_tracking_repo
# Next simplest case is if you are directly tracking a repository
# TODO: Extract the subdirectory information if it exists. Can't find it in packagedata.
Expand Down
2 changes: 1 addition & 1 deletion src/spdxBuild.jl
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ sbom= generateSPDX(spdxCreationData(), ["PrivateRegistry", "General"]);
"""
function generateSPDX(docData::spdxCreationData= spdxCreationData(), sbomRegistries::Vector{<:AbstractString}= ["General"], envpkgs::Dict{Base.UUID, Pkg.API.PackageInfo}= Pkg.dependencies())
# Query the registries for package information
registry_packages= registry_packagequery(envpkgs, sbomRegistries)
registry_packages= registry_packagequery(envpkgs, sbomRegistries, docData.use_packageserver)

packagebuilddata= spdxPackageData(targetplatform= docData.TargetPlatform, packages= envpkgs, registrydata= registry_packages, packageInstructions= docData.packageInstructions, licenseScan= docData.licenseScan)

Expand Down
19 changes: 19 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,20 @@ using Base.BinaryPlatforms
# Verify that the package in the SBOM did not choose that version for the SBOM
DataStructuresPkg= filter(p-> occursin("SPDXRef-DataStructures", p.SPDXID), sbom.Packages)
@test VersionNumber(DataStructuresPkg[1].Version) >= v"0.18"

# Use the package server for downloads
sbom_with_packageserver = generateSPDX(spdxCreationData(use_packageserver= true))
package_tree_hash= sbom.Packages[end-1].DownloadLocation.VCS_Tag
package_server_source= string(sbom_with_packageserver.Packages[end-1].DownloadLocation)
packageserver= PkgToSoftwareBOM.pkg_server() # Internal function
@test startswith(package_server_source, packageserver)
@test endswith(package_server_source, package_tree_hash)

# Try with a package server that doesn't exist
ENV["JULIA_PKG_SERVER"]= "https://pkg.nowhere.org"
sbom_with_packageserver = generateSPDX(spdxCreationData(use_packageserver= true))
@test sbom.Packages[end-1].DownloadLocation == sbom_with_packageserver.Packages[end-1].DownloadLocation
delete!(ENV, "JULIA_PKG_SERVER")
end

@testset "README.md examples: Developer" begin
Expand Down Expand Up @@ -173,6 +187,11 @@ using Base.BinaryPlatforms
@test sbom.Packages[idx].DownloadLocation == p.second.DownloadLocation
@test sbom.Packages[idx].HomePage == p.second.HomePage
end

## Regenerate the SBOM trying to use the package server. Since none of these packages are in the pacage server
# the download locations should be unchanged
sbom2= generateSPDX(spdxCreationData(rootpackages= filter(p-> (p.first in ["Dummy4"]), Pkg.project().dependencies), use_packageserver= true), ["DummyRegistry", "General"]);
@test issetequal(sbom.Packages, sbom2.Packages)
end

# Remove registry
Expand Down
Loading