Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metadata] Add initial providers and rules #1

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Yannic
Copy link
Collaborator

@Yannic Yannic commented Nov 29, 2024

This change introduces the new module @package_metadata in //metadata, which is split into two distinct concept groups:

  • the "core" concepts, defined in @package_metadata//:defs.bzl, and
  • secondary concepts like license metadata defined in subpackages of the Bazel module.

Core

The core of @package_metadata are two providers:

  • PackageMetadataInfo, which defines metadata about the package itself and uses a PURL to uniquely identify a package, and
  • PackageAttributeInfo, which defines attributes of a package (either "well-known" attributes like license info or custom attributes defined by the user) and is embedded in PackageMetadataInfo.

This design allows

  • us to group the different kinds of package metadata and possibly split ownership with domain experts (e.g., for licenses)
  • users to specify company-specific attributes without forking @package_metadata
  • users to override attributes of a package via a well-defined mechanism, e.g., if they disagree with the license a package should have (coming in a follow-up).

Differences to rules_license

The new rule set uses writes metadata and attributes to a file. This should reduce analysis memory if there are many packages and attributes in the build graph. It will also allow us to define a rule for importing package metadata and attributes from a file.

The metadata- and attribute files are simple JSON files.

metadata.json

{
  "label": "...",
  "purl": "...",
  "attributes": {
    "build.bazel.supply-chain.license": "bazel-out/...",
    "com.engflow.metadata.foo": "bazel-out/...",
  }
}

attributes.json

{
  "label": "...",
  "kind": "...",
  "attributes": {}
}

Example

load("@package_metadata//:defs.bzl", "package_metadata")
load("@package_metadata//licenses:defs.bzl", "license")
load("//bazel/rules/foo:defs.bzl", "foo_attribute")

package(default_package_metadata = [":metadata"])

package_metadata(
  name = "metadata",
  purl = "pkg:github/foo/bar@abcd",
  attributes = [
    ":license",
    ":foo",
  ],
  visibility = ["//visibility:public"],
)

license(
  name = "license",
  # ...
)

foo_attribute(
  name = "foo",
  # ...
)

Licenses

This will contain rules like license_kind and license similar to how they exist in rules_license today. This needs to be part of the @package_metadata Bazel module because @package_metadata will depend on it to declare its own license. We may want to create a "facade" Bazel module under //licenses to re-export what's defined here.

This change introduces the new module `@package_metadata` in `//metadata`, which is split into two distinct concept groups:

  - the "core" concepts, defined in `@package_metadata//:defs.bzl", and
  - secondary concepts like license metadata defined in subpackages of the Bazel module.

## Core

The core of `@package_metadata` are two providers:

  - `PackageMetadataInfo`, which defines metadata about the package itself and uses a [PURL](https://github.com/package-url/purl-spec) to uniquely identify a package, and
  - `PackageAttributeInfo`, which defines attributes of a package (either "well-known" attributes like license info or custom attributes defined by the user) and is embedded in `PackageMetadataInfo`.

This design allows

  - us to group the different kinds of package metadata and possibly split ownership with domain experts (e.g., for licenses)
  - users to specify company-specific attributes without forking `@package_metadata`
  - users to override attributes of a package via a well-defined mechanism, e.g., if they disagree with the license a package should have (coming in a follow-up).

### Differences to `rules_license`

The new rule set uses writes metadata and attributes to a file. This should reduce analysis memory if there are many packages and attributes in the build graph. It will also allow us to define a rule for importing package metadata and attributes from a file.

The metadata- and attribute files are simple JSON files.

#### `metadata.json`

```json
{
  "label": "...",
  "purl": "...",
  "attributes": {
    "build.bazel.supply-chain.license": "bazel-out/...",
    "com.engflow.metadata.foo": "bazel-out/...",
  }
}
```

#### `attributes.json`

```json
{
  "label": "...",
  "kind": "...",
  "attributes": {}
}
```

### Example

```starlark
load("@package_metadata//:defs.bzl", "package_metadata")
load("@package_metadata//licenses:defs.bzl", "license")
load("//bazel/rules/foo:defs.bzl", "foo_attribute")

package(default_package_metadata = [":metadata"])

package_metadata(
  name = "metadata",
  purl = "pkg:github/foo/bar@abcd",
  attributes = [
    ":license",
    ":foo",
  ],
  visibility = ["//visibility:public"],
)

license(
  name = "license",
  # ...
)

foo_attribute(
  name = "foo",
  # ...
)
```

## Licenses

This will contain rules like `license_kind` and `license` similar to how they exist in `rules_license` today. This needs to be part of the `@package_metadata` Bazel module because `@package_metadata` will depend on it to declare its own license. We may want to create a "facade" Bazel module under `//licenses` to re-export what's defined here.
@Yannic
Copy link
Collaborator Author

Yannic commented Nov 29, 2024

For package_metadata_import, I'm thinking of something like this:

package_metadata_import(
  name = "metadata",
  type = "my-company-metadata",
  file = "METADATA.asciipb",
)

Companies can then register a toolchain for generating metadata.json from METADATA.asciipb. There won't be a default toolchain implementation, so we don't get any dependencies from it.

@Yannic
Copy link
Collaborator Author

Yannic commented Nov 29, 2024

Similarly, we can define an optional toolchain to inject attribute overrides into the graph. e.g.,

package_metadata_overrides(
  name = "toolchain_impl",
  overrides = [
    ":foo_override",
  ],
)

package_metadata_override(
  name = "foo_override",
  overrides = [
    # Extra attributes to add to `metadata.json`. Will either override an existing attribute of the same kind, or insert if it doesn't exist.
    "//legal/licenses/overrides:foo",  # This is a regular `license` target
  ],
  removals = [
    # Removes all attributes of these kinds.
    "com.example.foo",
  ],
  filters = [
    # Only include attributes of these kinds.
    "com.example.bar",
    "build.bazel.supply-chain.license",
  ],
)

metadata/.gitignore Show resolved Hide resolved
metadata/MODULE.bazel Outdated Show resolved Hide resolved
metadata/private/providers/package_attribute_info.bzl Outdated Show resolved Hide resolved
metadata/private/providers/package_metadata_info.bzl Outdated Show resolved Hide resolved
metadata/private/rules/package_metadata.bzl Outdated Show resolved Hide resolved
metadata/private/rules/package_metadata.bzl Outdated Show resolved Hide resolved
@alexeagle
Copy link

Hey there, is anyone still working on this PR?

@Yannic
Copy link
Collaborator Author

Yannic commented Jan 14, 2025

Hey there, is anyone still working on this PR?

Yes. Things just got busy during the holiday season. Will pick this up soon

@Yannic
Copy link
Collaborator Author

Yannic commented Jan 20, 2025

PTAL

metadata/licenses/private/rules/license.bzl Outdated Show resolved Hide resolved
metadata/licenses/private/rules/license.bzl Outdated Show resolved Hide resolved
metadata/private/rules/package_metadata.bzl Outdated Show resolved Hide resolved
# Providers.

# Rules
license = _license
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the new convention is to avoid a defs.bzl file and instead have each rule be in its own bzl file. eg. //metadata/liceneses/rules:license.bzl would contain the license rule.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link

@alex-torok alex-torok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using rules_license to track third party license deps and have written a bit of additional tooling on top (specifically, missing license handling - bazelbuild/rules_license#165). I'll have a decent bit of rework to move the logic from analysis/rule-implementation into execution/code-implementation, but this design wouldn't block that from working. I like what you have here!

This design allows users to override attributes of a package via a well-defined mechanism, e.g., if they disagree with the license a package should have (coming in a follow-up).

I'm curious about how this will work. Can you give a bit of an explanation or preview?

),
PackageAttributeInfo(
metadata = metadata,
kind = "build.bazel.supply-chain.license",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs for kind suggest using "reverse DNS format (e.g., com.example.foo)". Should this format match the path of the repo containing the attribute? i.e since this is in the bazel-contrib/supply-chain repo, be something like com.github.bazel-contrib.supply-chain.license?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of putting well-known (builtin) attributes under bazel.build reverse DNS.

I'm still a bit in the fence about the format here: reverse-DNS is simple, but makes it hard to discover the schema of the attributes. We may want to make this a URL to the schema definition (e.g., JSON schema) instead.

metadata/licenses/private/rules/license.bzl Outdated Show resolved Hide resolved
metadata/private/providers/package_attribute_info.bzl Outdated Show resolved Hide resolved
metadata/private/providers/package_attribute_info.bzl Outdated Show resolved Hide resolved
metadata/private/rules/package_metadata.bzl Outdated Show resolved Hide resolved
metadata/licenses/private/rules/license.bzl Outdated Show resolved Hide resolved
@Yannic
Copy link
Collaborator Author

Yannic commented Jan 27, 2025

I'm curious about how this will work. Can you give a bit of an explanation or preview?

See #1 (comment) for an example.

I'm still a bit unclear whether to use a label or a (partial?) purl to identify the package the injected/overridden attributes should apply to. Label would be nicer IMO, but it'd require NODEP_LABEL, which isn't currently exposed to Starlark. Needs a bit more investigation unfortunately.

@Yannic
Copy link
Collaborator Author

Yannic commented Jan 29, 2025

This is ready for another round of review. PTAL

Copy link
Collaborator

@TheGrizzlyDev TheGrizzlyDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants