-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
schema: Use Validators map and prepare to extend beyond JSON Schema #403
Conversation
3b56626
to
bd8ec26
Compare
I expect the Travis error is due to the Dyn DDoS issues. We should kick Travis after those have been resolved. |
|
||
// DigestByte computes the digest of a blob using the requested | ||
// algorithm. | ||
func DigestByte(data []byte, algorithm string) (digest string, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only there were an existing, well-tested package that provides a mature implementation of this functionality...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have a more specific suggestion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking Is this still relevant? |
With image-tools split off into its own repository, the plan seems to be to keep all intra-blob JSON validation in this repository and to move all other validation (e.g. for layers or for walking Merkle trees) in image-tools [1]. All the non-validation logic currently in image/ is moving into image-tools as well [2]. Some requirements (e.g. multi-parameter checks like allowed OS/arch pairs [3]) are difficult to handle in JSON Schema but easy to handle in Go. And callers won't care if we're using JSON Schema or not; they just want to know if their blob is valid. This commit restructures intra-blob validation to ease the path going forward (although it doesn't actually change the current validation significantly). The old method: func (v Validator) Validate(src io.Reader) error is now a new Validator type: type Validator(blob io.Reader, descriptor *v1.Descriptor, strict bool) (err error) and instead of instantiating an old Validator instance: schema.MediaTypeImageConfig.Validate(reader) there's a Validators registry mapping from the media type strings to the appropriate Validator instance (which may or may not use JSON Schema under the hood). And there's a Validate function (with the same Validator interface) that looks up the appropriate entry in Validators for you so you have: schema.Validate(reader, descriptor, true) By using a Validators map, we make it easy for library consumers to register (or override) intra-blob validators for a particular type. Locations that call Validate(...) will automatically pick up the new validators without needing local changes. All of the old validation was based on JSON Schema, so currently all Validators values are ValidateJSONSchema. As the schema package grows non-JSON-Schema validation, entries will start to look like: var Validators = map[string]Validator{ v1.MediaTypeImageConfig: ValidateConfig, ... } although ValidateConfig will probably use ValidateJSONSchema internally. By passing through a descriptor, we get a chance to validate the digest and size (which we were not doing before). Digest and size validation for a byte array are also exposed directly (as ValidateByteDigest and ValidateByteSize) for use in validators that are not based on ValidateJSONSchema. Access to the digest also gives us a way to print specific error messages on failures. There is also a new 'strict' parameter to distinguish between compliant images (which should always pass when strict is false) and images that only use features which the spec requires implementations to support (which should only pass if strict is true). The current JSON Schemas are not strict, but the config/layer media type checks in ValidateManifest exercise this distinction. Also use go-digest for local hashing now that we're vendoring it. [1]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-12-21.01.log.html#l-71 [2]: opencontainers#337 [3]: https://tools.ietf.org/html/draft-fge-json-schema-validation-00#section-5.5.5 [4]: opencontainers#341 Signed-off-by: W. Trevor King <[email protected]>
Rebased onto master and fixed two lint errors with 3218e5b → f2b9500. |
Closing as this is really the wrong direction. One should be able to do this:
|
On Wed, Jan 25, 2017 at 03:40:26PM -0800, Stephen Day wrote:
Closing as this is really the wrong direction. One should be able to
do this:
```
if err := MediaTypeManifest.Validate(r); err != nil {
// handle errors
}
```
I think [1]:
schema.Validate(reader, &descriptor, true)
is far more convenient. And I think:
Validators[v1.MediaTypeManifest](reader, &descriptor, true)
is effectively the same as the your MediaTypeManifest.Validate(r),
except that:
* You need a descriptor. But that lets you write more useful error
messages, you'll almost always have a descriptor anyway, and they're
easy to build if you don't [2].
* There's a new ‘strict’ parameter, but that's fairly orthogonal to
whether you keep your validators in a map or not.
* It's a lot easier to extend the map approach (e.g. a third-party
consumer can plug in validators for their external media type).
[1]: https://github.com/opencontainers/image-spec/pull/403/files#diff-362cb712e1b17803354fd91472c8781bR123
[2]: https://github.com/opencontainers/image-spec/pull/403/files#diff-362cb712e1b17803354fd91472c8781bR118
|
@wking Why would validation take a descriptor? You already know the type. Just validate it. And since when did we have a If we want different levels of validation, create a type-switched validation train that can be pulled over the content:
Effectively, this allows you traverse a tree of validators, dispatching various types for each validator, while also allowing extension. This PR allows none of this and is pretty static. |
On Wed, Jan 25, 2017 at 04:56:45PM -0800, Stephen Day wrote:
@wking Why would validation take a descriptor?
From my topic post [1]:
By passing through a descriptor, we get a chance to validate the
digest and size (which we were not doing before). Digest and size
validation for a byte array are also exposed directly (as
ValidateByteDigest and ValidateByteSize) for use in validators that
are not based on ValidateJSONSchema. Access to the digest also gives
us a way to print specific error messages on failures.
That paragraph goes on to mention DigestByte, although that sentence
is stale since the rebase around #486 [2]. The analogous paragraph in
the commit message drops the stale line [3]. However, this is fairly
orthogonal to whether we use a map or not. I'm happy to punt it to
follow-up work if it's distracting here.
You already know the type. Just validate it.
What happens when the type isn't covered in the current Validator
constants? For example, #519,
application/vnd.oci.image.layer.v1.tar+gzip, whatever media type the
AppStream XML metadata [4] is, etc. With the map approach, a consumer
who is also interested in validating a new media type can just insert
a validator for their media type into the map, and existing validation
code needs no patching. With the current non-map approach, on the
other hand, image-tools maintainers will need to patch this switch [5]
after #519 lands.
And since when did we have a `strict` bool? That makes no
sense. Hasn't that kind of thing been considered bad practice for at
least a decade now?
I've been trying to get some maintainer feedback on this for a while
now (e.g. in opencontainers/image-tools#66). However, it seems
orthogonal to whether we use a map or not (I'd included it here to
future-proof the Validator type). I'm happy to punt it to follow-up
work if it's distracting here.
[1]: #403 (comment)
[2]: #403 (comment)
[3]: f2b9500
[4]: https://www.freedesktop.org/wiki/Distributions/AppStream/
[5]: https://github.com/opencontainers/image-tools/blob/121b8c90841b82a1c611d23f3c26bb4bda20ffa9/cmd/oci-image-validate/main.go#L180-L187
|
@wking The descriptor should already have been verified against the content. Collapsing these two layers in a HUGE design mistake and I have no more patience in arguing this point for point. There may be a few adjustments, but the suggested approach meets all of the requirements. I apologize for putting it this way, but either take direction from the maintainers or have your PRs closed. |
On Thu, Jan 26, 2017 at 01:17:06PM -0800, Stephen Day wrote:
There may be a few adjustments, but the suggested approach meets all
of the requirements.
Maybe I didn't explain the benefits of the map clearly enough.
Expanding on “With the map approach…” from [1], consider the following
scenarios:
Without a map:
1. #519 lands a TypeImageLayout validator.
2. image-tools bumps their image-spec dependency to pull in the #519
code.
3. image-tools updates this switch [2] to invoke
schema.TypeImageLayout.Validate(f) when the caller asks them to
validate oci-layout JSON (which has no media type?).
With a map:
1. #519 adds a Validators[v1.MediaTypeImageLayout] validator.
2. image-tools bumps their image-spec dependency to pull in the #519
code.
3. Relax, because image-tools does not have to change their
schema.Validate(f, …) call.
Comparing the two, the with-map version seems clearly superior,
because step (3) becomes a no-op.
Both of those would be the same process (modulo dependency bumping)
with any new type, and do not depend on who's adding the new
validator. For example, it could be image-tools which wants to add
the validator in step 1 (in which case there's no need for step 2), or
it could be a third-party (in which case step 2 would be “bump the
third-party dependency”).
[1]: #403 (comment)
[2]: https://github.com/opencontainers/image-tools/blob/121b8c90841b82a1c611d23f3c26bb4bda20ffa9/cmd/oci-image-validate/main.go#L180-L187
|
With image-tools split off into its own repository, the plan seems to be to keep all intra-blob JSON validation in this repository and to move all other validation (e.g. for layers or for walking Merkle trees) in image-tools. All the non-validation logic currently in
image/
is moving into image-tools as well.Some requirements (e.g. multi-parameter checks like allowed OS/arch pairs) are difficult to handle in JSON Schema but easy to handle in Go. And callers won't care if we're using JSON Schema or not; they just want to know if their blob is valid.
This commit restructures intra-blob validation to ease the path going forward (although it doesn't actually change the current validation significantly). The old method:
is now a new Validator type:
and instead of instantiating an old
Validator
instance:there's a
Validators
registry mapping from the media type strings to the appropriate Validator instance (which may or may not use JSON Schema under the hood). And there's a Validate function (with the same Validator interface) that looks up the appropriate entry in Validators for you so you have:By using a
Validators
map, we make it easy for library consumers to register (or override) intra-blob validators for a particular type. Locations that callValidate(…)
will automatically pick up the new validators without needing local changes.All of the old validation was based on JSON Schema, so currently all
Validators
values areValidateJSONSchema
. As theschema
package grows non-JSON-Schema validation, entries will start to look like:although
ValidateConfig
will probably useValidateJSONSchema
internally.By passing through a descriptor, we get a chance to validate the digest and size (which we were not doing before). Digest and size validation for a byte array are also exposed directly (as
ValidateByteDigest
andValidateByteSize
) for use in validators that are not based onValidateJSONSchema
. Access to the digest also gives us a way to print specific error messages on failures. In situations where you don't know the blob digest, the newDigestByte
will help you calculate it (for a byte array).There is also a new
strict
parameter to distinguish between compliant images (which should only pass when strict is false) and images that only use features which the spec requires implementations to support (which should pass regardless of strict). The current JSON Schemas are not strict, and I expect we'll soon gain Go code to handle the distinction (e.g. #341). So the presence ofstrict
in the Validator type is future-proofing our API and not exposing a currently-implemented feature.I've made the minimal sane changes to
cmd/
andimage/
, because we're dropping them from this repository (and continuing them in runtime-tools).