Skip to content

Commit

Permalink
Merge pull request #2825 from nexB/2098-top-level-packages
Browse files Browse the repository at this point in the history
Add Package Instances #2691

This PR adds the PackageInstance class and functions to group package
manifests and package data as top level package instances.

Existing package data are ported to this new approach.

Reference: #2098 
Reference: #2691 
Reference: #2692
Reference: #2693
Reference: #2843 
Reference: #2652 
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
  • Loading branch information
pombredanne authored Mar 5, 2022
2 parents 1dc4b61 + 376abc6 commit e080f83
Show file tree
Hide file tree
Showing 1,155 changed files with 47,981 additions and 34,234 deletions.
53 changes: 39 additions & 14 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Important API changes:
instead under the ``venv`` subdirectory.

- Main package API function `get_package_infos` is now deprecated, and is
replaced by `get_package_manifests`.
replaced by `get_package_data`.

- The data structure of the JSON output has changed for copyrights, authors
and holders: we now use proper name for attributes and not a generic "value".
Expand All @@ -27,10 +27,18 @@ Important API changes:
as an option.

- The data structure of the JSON output has changed for packages: we now
return "package_manifests" package information at the manifest file-level
rather than "packages". There is a a new top-level "packages" attribute
that contains each package instance that can be aggregating data from
multiple manifests for a single package instance.
return "package_data" package information at the manifest file-level
rather than "packages". This has all the data attributes of a "package_data"
field plus others: "package_uuid", "package_data_files" and "files".

- There is a a new top-level "packages" attribute that contains package
instances that can be aggregating data from multiple manifests.

- There is a a new top-level "dependencies" attribute that contains each dependency
instance, these can be standalone or releated to a package.

- There is a new resource-level attribute "for_packages" which refers to packages
through package_uuids (pURL + uuid string).

- The data structure for HTML output has been changed to include emails and
urls under the "infos" object. Now HTML template will output holders,
Expand Down Expand Up @@ -136,17 +144,31 @@ Package detection:
- Yocto/BitBake .bb recipes.

- Major changes in packages detection and reporting, codebase-level attribute `packages`
with one or more "package_manifests" and files for the packages are reported.
with one or more `package_data` and files for the packages are reported.
The specific changes made are:

- The resource level attribute `packages` has been renamed to `package_manifests`,
as these are really package manifests that are being detected.
- The resource level attribute `packages` has been renamed to `package_data`,
as these are really package data that are being detected, and can be manifests,
lockfiles or other package data. This has all the data attributes of a `package_data`
field plus others: `package_uuid`, `package_data_files` and `files`.


- A new top-level attribute `packages` has been added which contains package
instances created from package_manifests detected in the codebase.
instances created from `package_data` detected in the codebase.

- A new codebase level attribute `dependencies` has been added which contains dependency
instances created from lockfiles detected in the codebase.

- A new codebase level attribute `packages` has been added which contains package
instances created from package_manifests detected in the codebase.
- The package attribute `root_path` has been deleted from `package_data` in favour
of the new format where there is no root conceptually, just a list of files for each
package.

- There is a new resource-level attribute `for_packages` which refers to packages
through package_uuids (pURL + uuid string).

- The package_data attribute `dependencies` (which is a list of DependentPackages),
now has a new attribute `resolved_package` having a package data mapping.
Also the `requirement` attribute here is renamed to `extracted_requirement`.


Outputs:
Expand All @@ -159,16 +181,19 @@ Outputs:
Output version
--------------

Scancode Data Output Version is now 2.0.0.
Scancode Data Output Version is now 3.0.0.

Changes:

- rename resource level attribute `packages` to `package_manifests`.
- rename resource level attribute `packages` to `package_data`.
- add top-level attribute `packages`.

- add top-level attribute `dependencies`.
- add resource-level attribute `for_packages`.
- remove `package-data` attribute `root_path`.

Documentation Update
~~~~~~~~~~~~~~~~~~~~~~~~

- Various documentations have been updated to reflects API changes and
correct minor documentation issues.

Expand Down
6 changes: 3 additions & 3 deletions src/formattedcode/output_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ def collect_keys(mapping, key_group):
collect_keys(url_info, 'url')
yield url_info

for package in scanned_file.get('package_manifests', []):
for package in scanned_file.get('package_data', []):
flat = flatten_package(package, path)
collect_keys(flat, 'package')
yield flat
Expand Down Expand Up @@ -229,7 +229,7 @@ def get_package_columns(_columns=set()):
if _columns:
return _columns

from packagedcode.models import Package
from packagedcode.models import PackageData

# exclude some columns for now that contain list of items
excluded_columns = {
Expand All @@ -252,7 +252,7 @@ def get_package_columns(_columns=set()):
'notice_url',
]

fields = Package.fields() + extra_columns
fields = PackageData.fields() + extra_columns
_columns = set(f for f in fields if f not in excluded_columns)
return _columns

Expand Down
2 changes: 1 addition & 1 deletion src/formattedcode/output_cyclonedx.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ def from_package(cls, package):
properties.append(
CycloneDxProperty(
name='WARNING',
value=f'WARNING: component skipped in CycloneDX output: {self!r}'
value=f'WARNING: component skipped in CycloneDX output: {package!r}'
)
)

Expand Down
4 changes: 2 additions & 2 deletions src/formattedcode/output_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def generate_output(results, version, template):

LICENSES = 'licenses'
COPYRIGHTS = 'copyrights'
PACKAGES = 'package_manifests'
PACKAGES = 'package_data'

# Create a flattened data dict keyed by path
for scanned_file in results:
Expand Down Expand Up @@ -207,7 +207,7 @@ def generate_output(results, version, template):
files = {
'license_copyright': converted,
'infos': converted_infos,
'package_manifests': converted_packages
'package_data': converted_packages
}

return template.generate(files=files, licenses=licenses, version=version)
Expand Down
4 changes: 2 additions & 2 deletions src/formattedcode/templates/html/template.html
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@
</table>
{% endif %}

{% if files.package_manifests %}
{% if files.package_data %}
<table>
<caption>Package Information</caption>
<thead>
Expand All @@ -245,7 +245,7 @@
</tr>
</thead>
<tbody>
{% for path, data in files.package_manifests.items() %}
{% for path, data in files.package_data.items() %}
{% for row in data %}
<tr>
<td>{{ path }}</td>
Expand Down
65 changes: 39 additions & 26 deletions src/packagedcode/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from packagedcode import debian
from packagedcode import conda
from packagedcode import cocoapods
from packagedcode import cran
from packagedcode import freebsd
from packagedcode import golang
from packagedcode import haxe
Expand All @@ -40,15 +41,15 @@

# Note: the order matters: from the most to the least specific
# Package classes MUST be added to this list to be active
PACKAGE_MANIFEST_TYPES = [
PACKAGE_DATA_CLASSES = [
rpm.RpmManifest,
debian.DebianPackage,

models.JavaJar,
jar_manifest.JavaManifest,
models.JavaEar,
models.JavaWar,
maven.MavenPomPackage,
maven.PomXml,
jar_manifest.IvyJar,
models.JBossSar,
models.Axis2Mar,
Expand Down Expand Up @@ -102,38 +103,50 @@
build.BuckPackage,
build.AutotoolsPackage,
conda.Condayml,
win_pe.WindowsExecutableManifest,
win_pe.WindowsExecutable,
readme.ReadmeManifest,
build.MetadataBzl,
msi.MsiInstallerPackage,
windows.MicrosoftUpdateManifest,
pubspec.PubspecYaml,
pubspec.PubspecLock,
build_gradle.BuildGradle,
cran.DescriptionFile,
build_gradle.BuildGradle
]

PACKAGE_MANIFESTS_BY_TYPE = {
(
cls.package_manifest_type
if isinstance(cls, models.PackageManifest)
else cls.default_type
): cls
for cls in PACKAGE_MANIFEST_TYPES

PACKAGE_INSTANCE_CLASSES = [
rpm.RpmPackage,
maven.MavenPackage,
npm.NpmPackage,
phpcomposer.PhpPackage,
haxe.HaxePackage,
cargo.RustPackage,
cocoapods.CocoapodsPackage,
opam.OpamPackage,
bower.BowerPackage,
freebsd.FreebsdPackage,
rubygems.RubyPackage,
pypi.PythonPackage,
golang.GoPackage,
nuget.NugetPackage,
chef.ChefPackage,
win_pe.WindowsPackage,
pubspec.PubspecPackage,
cran.CranPackage
]


PACKAGE_DATA_BY_TYPE = {
cls.default_type: cls
for cls in PACKAGE_DATA_CLASSES
}


PACKAGE_INSTANCES_BY_TYPE = {
cls.default_type: cls
for cls in PACKAGE_INSTANCE_CLASSES
}
# We cannot have two package classes with the same type
if len(PACKAGE_MANIFESTS_BY_TYPE) != len(PACKAGE_MANIFEST_TYPES):
seen_types = {}
for pmt in PACKAGE_MANIFEST_TYPES:
manifest = pmt()
assert manifest.package_manifest_type
seen = seen_types.get(manifest.package_manifest_type)
if seen:
msg = ('Invalid duplicated packagedcode.Package types: '
'"{}:{}" and "{}:{}" have the same type.'
.format(manifest.package_manifest_type, manifest.__name__, seen.package_manifest_type, seen.__name__,))
raise Exception(msg)
else:
seen_types[manifest.package_manifest_type] = manifest


def get_package_class(scan_data, default=models.Package):
Expand All @@ -159,7 +172,7 @@ def get_package_class(scan_data, default=models.Package):
if not ptype:
# basic type for default package types
return default
ptype_class = PACKAGE_MANIFESTS_BY_TYPE.get(ptype)
ptype_class = PACKAGE_DATA_BY_TYPE.get(ptype)
return ptype_class or default


Expand Down
6 changes: 3 additions & 3 deletions src/packagedcode/about.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
# TODO: Override get_package_resource so it returns the Resource that the ABOUT file is describing

@attr.s()
class AboutPackage(models.Package):
class AboutPackageData(models.PackageData):

default_type = 'about'

Expand All @@ -44,13 +44,13 @@ def get_package_root(self, manifest_resource, codebase):


@attr.s()
class Aboutfile(AboutPackage, models.PackageManifest):
class Aboutfile(AboutPackageData, models.PackageDataFile):

file_patterns = ('*.ABOUT',)
extensions = ('.ABOUT',)

@classmethod
def is_manifest(cls, location):
def is_package_data_file(cls, location):
"""
Return True if the file at ``location`` is likely a manifest of this type.
"""
Expand Down
6 changes: 3 additions & 3 deletions src/packagedcode/alpine.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@


@attr.s()
class AlpinePackage(models.Package, models.PackageManifest):
class AlpinePackage(models.PackageData, models.PackageDataFile):
extensions = ('.apk', 'APKBUILD')
default_type = 'alpine'

Expand All @@ -40,7 +40,7 @@ def compute_normalized_license(self):
return detected

def to_dict(self, _detailed=False, **kwargs):
data = models.Package.to_dict(self, **kwargs)
data = super().to_dict(**kwargs)
if _detailed:
#################################################
data['installed_files'] = [istf.to_dict() for istf in (self.installed_files or [])]
Expand Down Expand Up @@ -891,7 +891,7 @@ def D_dependencies_handler(value, dependencies=None, **kwargs):
dependency = models.DependentPackage(
purl=purl,
scope=scope,
requirement=requirement,
extracted_requirement=requirement,
is_resolved=is_resolved,
)
if dependency not in dependencies:
Expand Down
24 changes: 19 additions & 5 deletions src/packagedcode/bower.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@


@attr.s()
class BowerPackage(models.Package):
class BowerPackageData(models.PackageData):

default_type = 'bower'

Expand All @@ -43,13 +43,13 @@ def compute_normalized_license(self):


@attr.s()
class BowerJson(BowerPackage, models.PackageManifest):
class BowerJson(BowerPackageData, models.PackageDataFile):

file_patterns = ('bower.json', '.bower.json')
extensions = ('.json',)

@classmethod
def is_manifest(cls, location):
def is_package_data_file(cls, location):
"""
Return True if the file at ``location`` is likely a manifest of this type.
"""
Expand Down Expand Up @@ -114,7 +114,7 @@ def recognize(cls, location):
models.DependentPackage(
purl=PackageURL(type='bower', name=dep_name).to_string(),
scope='dependencies',
requirement=requirement,
extracted_requirement=requirement,
is_runtime=True,
is_optional=False,
)
Expand All @@ -126,7 +126,7 @@ def recognize(cls, location):
models.DependentPackage(
purl=PackageURL(type='bower', name=dep_name).to_string(),
scope='devDependencies',
requirement=requirement,
extracted_requirement=requirement,
is_runtime=False,
is_optional=True,
)
Expand All @@ -145,6 +145,20 @@ def recognize(cls, location):
)


@attr.s()
class BowerPackage(BowerPackageData, models.Package):
"""
A Bower Package that is created out of one/multiple bower package
manifests and package-like data, with it's files.
"""

@property
def manifests(self):
return [
BowerJson
]


def compute_normalized_license(declared_license):
"""
Return a normalized license expression string detected from a list of
Expand Down
Loading

0 comments on commit e080f83

Please sign in to comment.