status | title | creation-date | last-updated | authors | ||
---|---|---|---|---|---|---|
implemented |
Object/Dictionary param and result types |
2021-07-14 |
2022-09-26 |
|
- Summary
- Motivation
- Requirements
- Proposal
- Design Details
- Test Plan
- Design Evaluation
- Drawbacks
- Alternatives
- Alternative #1: Introduce a
schema
section specifically for the type schema - Alternative #2: Create a wrapper for JSON Schema
- Alternative #3: Create our own syntax just for dictionaries
- Upgrade & Migration Strategy
- Potential future work
- Implementation Pull request(s)
- References
Recommendation: read TEP-0076 (array support in results and indexing syntax) before this TEP, as this TEP builds on that one.
This TEP proposes adding object (aka dictionary) types to Tekton Task and Pipeline results and params, as well as adopting a small subset of JSONPath syntax (precedent in the array expansion syntax we are using) for accessing values in these objects.
This proposal is dependent on TEP-0076 (array support in results and indexing syntax) in that this TEP should follow the precedents set there and also in that if we decide not to support array results, we probably won't support object results either.
This proposal also includes adding support for limited use of json object schema, to express the expected structure of the object.
Dictionary vs. object: The intended feature was supporting "dictionaries", but JSON Schema calls these "objects" so this proposal tries to use "objects" as the term for this type.
Tasks declare workspaces, params, and results, and these can be linked in a Pipeline, but external tools looking at these Pipelines cannot reliably tell when images are being built, or git repos are being used. Current ways around this are:
- PipelineResources (proposed to be deprecated in TEP-0074)
- Defining param names with special meaning, for example Tekton Chains type hinting)
This proposal takes the param name type hinting a step further by introducing object types for results: allowing Tasks
to define structured results. These structured results can be used to define interfaces, for example the values that are
expected to be produced by Tasks which do git clone
. This is possible with just string results, but without a grouping
mechanisms, all results and params are declared at the same level and it is not obvious which results are complying with
an interface and which are specific to the task.
- Tidy up Task interfaces by allowing Tasks to group related parameters (similar to the long parameter list or too many parameters "code smell")
- Enabled defining well known structured interfaces for Tasks (typing), for example, defining the values that a Task should produce when it builds an image, so that other tools can interface with them (e.g. Tekton Chains type hinting)
- Take a step in the direction of allowing Tasks to have even more control over their params and results (see pipelines#1393) (e.g. one day providing built in validation for params as part of their declaration)
- Adding complete JSONPath syntax support
- Adding support for nesting, e.g. object params where the values are themselves objects or arrays (no reason we can't add that later but trying to keep it simple for now)
- Adding complete JSON schema syntax support
- Supporting use of the entire object in a Task or Pipeline field, i.e. as the values for a field that requires a object, the way we do for arrays. Keeping this out of scope because we would need to explictly decide what fields we want to support this for.
-
Grouping related params and results such that users and tools can make inferences about the tasks. For example allowing Tekton Chains and other tools to be able to understand what types of artifacts a Pipeline is operating on.
Tekton could also define known interfaces, to make it easier to build these tools. For example:
- Images - Information such as the URL and digest
- Wheels and other package formats (see below)
- Git metadata - e.g. the state of a git repo after a clone, pull, checkout, etc.
- Git repo configuration - e.g. the info you need to connect to a git repo including url, proxy configuration, etc.
For example the upload-pypi Task defines 4 results which are meant to express the attributes of the built wheel:
results: - name: sdist_sha description: sha256 (and filename) of the sdist package - name: bdist_sha description: sha256 (and filename) of the bdist package - name: package_name description: name of the uploaded package - name: package_version description: version of the uploaded package
Some interesting things are already happening in this example:
sdist_sha
contains both the filename and the sha - an attempt to group related information.With object results, we could define the above with a bit more structure like this:
results: - name: sdist description: | The source distribution * sha: The sha256 of the contents * path: Path to the resulting tar.gz type: object properties: sha: type: string path: type: string - name: bdist description: | The built distribution * sha: The sha256 of the contents * path: Path to the resulting .whl file type: object properties: sha: type: string path: type: string - name: package description: | Details about the created package * name: The name of the package * version: The version of the package type: object properties: name: type: string version: type: string
Eventually when we have nested support, we could define a
wheel
interface which contains all of the above. -
Grouping related params to create simpler interfaces, and allowing similar Tasks to easily define pieces of their interface. For example the git-clone task has 15 parameters , and the git-rebase task has 10 parameters . Some observations:
- Each has potential groupings which stand out, for example:
- The git-rebase task could group the
PULL
andPUSH
remote params; each object would need the same params, which could become an interface for "git remotes" - Potential groupings for the git-clone task stand out as well, for example the proxy configuration
- On that note, since the git-rebase task is using git and accessing remote repos, it probably needs the same
proxy configuration, so what they probably both need is some kind of interface for what values to provide when
accessing a remote git repo Other examples
include okra-deploy which is using param
name prefixes to group related params (e.g.
ssh-
,okra-vm
,okra-token
).
- The git-rebase task could group the
- Each has potential groupings which stand out, for example:
- Must be possible to programmatically determine the structure of the object
- Must be possible for a result object to be empty
- Must be possible to use the object results of one Task in a Pipeline as the param of another Task in the pipeline which has a object param when the interface matches (more detail in matching interfaces and required vs optional keys)
- Must be possible to use one specific value in an object result of one Task in a Pipeline as the param of another Task
in the pipeline which has a string param
- If there is no value for the specified key, the Pipeline execution should fail (or we may be able to use TEP-0048 to specify a default value to use in that case instead)
- We would add support for object types for results and params, in addition to the existing string and array support
- Initially we would only support string values, eventually we can expand this to all values (string, array, object) (note that this is the case for arrays as well, we don't yet support arrays of arrays)
- Only string key types would be supported (which is the only key type supported by json)
- We would use json object schema syntax (as previously suggested - and see also why JSON Schema) to express the object structure
- To support object results, we would support writing json results to the results file, as described in TEP-0076 Array results and indexing (see "why json" in that proposal)
This feature would be considered alpha and would be (optional)gated by the alpha flag.
- Defaulting to string types for values
- Adding additional JSON Schema properties
- Why JSON Schema?
- Matching interfaces and extra keys
- Required vs optional keys
As an optimization, we'd support defaulting to string types for keys without the Task author needing to explicitly specify this. For example this more verbose specification:
- name: sdist
description: |
The source distribution
* sha: The sha256 of the contents
* path: Path to the resulting tar.gz
type: object
properties:
sha:
type: string
path:
type: string
Would be equivalent to:
- name: sdist
description: |
The source distribution
* sha: The sha256 of the contents
* path: Path to the resulting tar.gz
type: object
properties:
sha: { } # type:string is implied
path: { } # type:string is implied
This would be supported in recognition that:
a. Only string types would initially be supported b. Even when other types are supported, the most common usage of objects will likely use string values
This proposal suggests adding a new properties
section to param and result definition. If we later support more json
schema attribute such as additionalProperties
and required
, we'd also support them at the same level as the
properties
field here. (
See Alternative #1 adding a schema section
.)
(At that point we should also consider whether we want to adopt strict JSON schema syntax or if we want to support Open
API schema instead; see why JSON Schema.)
Assuming we move forward with using JSON to specify results (see "why json" in TEP-0076 Array results and indexing), we'll need a syntax that allows us to define schemas for JSON.
Since JSON schema was created for exactly that purpose, it seems like a reasonable choice (see original suggestion). OpenAPI schema objects are an interesting alternative which builds on JSON schema and are already used by Kubernetes to publish schema valiation for CRDs . The subset of JSON Schema that we are proposing to adopt in this TEP is compatible with OpenAPI schema (Open API supports JSON Schema Keywords), so at the point when we start proposing more JSON Schema support we should consider if we want to support the Open API schema variations instead.
The language CUE also provides a syntax for defining schemas. It allows using less text to express the same schema
Pros:
- Less text required; much more succinct to express complex validation
- CUE can be used to validate JSON so if we decided to use CUE for expressing schemas, Tasks could still output results in JSON (i.e. no additional complication for Task authors when writing steps and no need for additional tools to generate results in the right format)
Cons:
- CUE is a superset of json; in order to express CUE within our existing yaml and json types, we'd need a way to encode CUE within those types
- CUE is intended for more than just expressing schemas, it is intended also for code generation, and expressing configuration. (Instead of embedding it into Tekton types, maybe a Tekton integration that takes advantage of all CUE has to offer would be to use CUE to generate Tekton types?)
- In the open API schema comparison the open api
version can be read without needing to learn a specific syntax (e.g. the difference between
max?: uint & <100
the verbose Open API version which specifies max is an integer with minimum 0 and exclusive maximum 100). - Our flexibility design standard recommends we prefer that when we need to pull in some other language, we use existing languages which are widely used and supported. Instead of being an early adopter of CUE we could wait until it is more popular and then consider using it (we can also delay this decision until we want to add more schema support).
An object provided by one Task (as a result) will be considered to match the object expected by another Task (as a params) if the object in the result contains at least the required keys declared by the param in the consuming Task. The extra keys will be ignored, allowing us to take a duck typing approach and maximize interoperabiltiy between tasks.
For example imagine a Task which requires a param object that contains url
and httpsProxy
. Another task produces an
object with those two keys in addition to other keys. It should be possible to pass the object with the additional keys
directly to the Task which requires only url
and httpsProxy
without needing to modify the object in between.
If we expand our JSON Schema support in the future (and we have use cases to support it) we could allow Tasks to express
if they would like to override this and not allow additional keys to be provided via
JSON Schema's additionalProperties
field
(which defaults to allowing additional properties).
When the Task writes more keys than it declares in its results:
- The TaskRun will succeed
- The additional keys will not be included in the TaskRun status
- The additional keys will not be available for variable replacement in a Pipeline; only the declared keys will be available
When a Task uses the result of a previous Task which declares more keys in addition to what the Task needs:
- This will be allowed
- The TaskRun will be created with only the keys that the Task needs; the rest will be ignored
We have several options for how we define and handle required and optional keys:
- Initially do not support the
required
field and imply implicitly that all keys are required (i.e. make the assumption in the controller but do not reflect it in the types themselves).- Pro: By not supporting the
required
field at all (initially) there will be no ambiguity or confusion in the initial version of this behavior. This approach lets us postpone answering the question of how to effectively support required vs. optional fields. We've been going back and forth on this issue, and by punting the question for now, we can let people use the feature and gather feedback before deciding (e.g. maybe we won't need to support optional fields at all) - Pro: We are already suggesting infering some things such as
that
type: string
is being used for empty properties, so we're already setting a precedent for infering some parts of the schema definition - Con: Anyone reading the JSON Schema in the Task would think the attributes are optional when they are not
- Pro: By not supporting the
- Embrace the optional by default behavior of JSON Schema; instead of validating the presences of "required" keys via
jsonschema, we could do the validation based on the declared variable replacement, e.g. say a Task declares it needs
an object called
foo
with an attribute calledbar
. At runtime if the Task is provided with an objectfoo
that doesn't have an attribute calledbar
this will be okay unless the Task contains variable replacement that uses it ($(params.foo.bar)
).- Pro: Allows us to use JSON Schema as is and gives us the behavior we want.
- Pro: Optional by default behavior will be useful if we pursue the future option of supporting pre-defined schemas: optional by default will allow Tasks to refer to schemas for their params, and additive changes can be made to those schemas without requiring all Tasks using them to be updated.
- Con: weird because effectively we're implying
required
(assuming the fields are used, and if they aren't used, why are they there) even though the JSON Schema declaration says the fields are optional.
- Infer that all keys are required and add the required field via a mutating admission controller for all properties in
the dictionary (unless a
required
field is already added).- Con: it would be strange to mutate instances that will be reused; in fact mutations like this would interfere with efforts we might explore in the future to provides hashes of Tasks and Pipelines so folks can ensure the are running what they think they are running.
- Make our version of JSON Schema deviate from the official version: default to
required
and instead introduce syntax foroptional
(in fact early versions of JSON Schema usedoptional
instead ofrequired
) - Create our own JSON Schema based syntax instead
- When using variable replacement with optional fields (i.e. fields that are not explicitly listed in the
required
stanza), if they are not provided at runtime, replace them with a default zero value (e.g. "" for string types)- TEP-0048 is exploring a similar feature, i.e. a way of providing defaults when values are not available when doing variable replacment. Suggest we let TEP-0048 progress and if we decide to provide zero values for optional fields we follow a similar approach. If we support this we may also need a syntax to allow authors to check if a value has been set or is a default value.
This proposal suggests we use (1):
- Do not support the
required
field (at least initially) - Imply that all fields are
reuquired
(as if therequired
field was present and all object fields were listed.)
For example in the following param gitrepo
the url
and commitish
fields will both be required; if at runtime one
or both of them are not present in the object provided for the gitrepo
parameter, execution will fail (in the same
way as it would for a parameter that is entirely missing and has no default specified):
spec:
params:
- name: gitrepo
type: object
properties:
url: {}
commitish: {}
The behavior will be as if the above param specification included required: [url, commitish]
.
See also TEP-0076 Array results and indexing for notes and caveats specific to supporting json results.
- Since json only supports string types for keys (aka "names") we would only support string types for keys
- What if a Pipeline tries to use a object key that doesn't exist in a param or result?
- Some of this could be caught at creation time, i.e. pipeline params would declare the keys each object param will contain, and we could validate at creation time that those keys exist
- For invalid uses that can only be caught at runtime (e.g. after fetching the task spec), the PipelineRun would fail
See TEP-0076 Array results and indexing for risks and mitigations specific to expanding ArrayOrString to support more types, including size limitations caused by our implementation of results (pipelines#4012).
- Using the JSON Schema syntax to describe the structure of the object is a bit verbose but looking at our Alternatives this seems like the best option assuming we want to support more complex types in the future.
We would use JSON object schema syntax to declare the structure of the object.
Declaring defaults for parameters that are of type object would follow the pattern we have already established with array param defaults which is to declare the value in the expected format in yaml (which is treated as json by the k8s APIs).
When declaring default for object parameters, one can provide a value for all keys (example 1). It should be also allowed to only provide a value for a subset of keys in default
as long as the rest of keys are provided with a value at run level (example 2). In example 2, the resolved gitrepo
param will be {"url": "abc.com", "path": "./mydir/", "commit": "sha123"}
. Since the run level provides a value for the key url
that is also provided in default
, the value from the run level takes precedence. Therefore, the value used for th key url
at runtime will be "abc.com" instead of "default.com". However, there must be a value provided for all keys either from default or run level value because all keys declared in properties
will be required keys. As such, the example 3 will be an invalid case because the key path
is missed.
Example 1 (valid): all keys have a default value:
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
generateName: object-param-test-
spec:
taskSpec:
params:
- name: pull_remote
description: JSON Schema has no "description" fields, so we'd have to include documentation about the structure in this field
type: object
properties:
url: {
type: string
}
path: {
type: string
}
default:
url: https://github.com/somerepo
path: ./my/directory/
Example 2 (valid): all the keys not provided by the default are provided by the taskrun
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
generateName: object-param-test-
spec:
params:
- name: gitrepo
value:
url: "abc.com"
commit: "sha123"
taskSpec:
params:
- name: gitrepo
properties:
url: {type: string}
commit: {type: string}
path: {type: string}
default:
url: "default.com"
path: "./mydir/"
Example 3 (invalid): some keys (path
in this example) are not provided with a value in both default and taskrun
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
generateName: object-param-test-
spec:
params:
- name: gitrepo
value:
commit: "sha123"
taskSpec:
params:
- name: gitrepo
properties:
url: {type: string}
commit: {type: string}
path: {type: string}
default:
url: "default.com"
Results example:
results:
- name: sdist
description: JSON Schema has no "description" fields, so we'd have to include documentation about the structure in this field
type: object
properties:
sha: {
type: string
}
path: {
type: string
}
As described in
TEP-0076 Array results and indexing, we add support for writing json
content to /tekton/results/resultName
, supporting strings, objects and arrays of strings.
For example, say we want to write to emit a built image's url and digest in a dictionary called image
:
-
Write the following content to
$(results.image.path)
:{"url": "gcr.io/somerepo/someimage", "digest": "a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09"}
-
This would be written to the pod termination message as escaped json, for example (with a string example included as well):
message: '[{"key":"image","value":"{\"url\": \"gcr.io\/somerepo\/someimage\", \"digest\": \"a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09\"}","type":"TaskRunResult"},{"key":"someString","value":"aStringValue","type":"TaskRunResult"}]'
-
We would use the same ArrayOrString type (expanded to support dictionaries, and in TEP-0074, arrays) for task results, e.g. for the above example, the TaskRun would contain:
taskResults: - name: someString value: aStringValue - name: image value: url: gcr.io/somerepo/someimage digest: a61ed0bca213081b64be94c5e1b402ea58bc549f457c2682a86704dd55231e09
We add support for the JSONPath subscript operator for accessing "child" members of the object by name in variable replacement.
For example:
apiVersion: tekton.dev/v1beta1
kind: Pipeline
...
spec:
params:
- name: gitrepo
type: object
properties:
url: {}
commitish: {}
...
tasks:
- name: notify-slack-before
params:
- name: message
value: "about to clone $(params.gitrepo.url) at $(params.gitrepo.commitish)"
- name: clone-git
runAfter: [ 'notify-slack' ]
params:
- name: gitrepo
value: $(params.gitrepo[*])
results:
- name: cloned-gitrepo
type: object
properties:
url: {}
commitish: {}
taskSpec:
params:
- name: gitrepo
type: object
properties:
url: {}
commitish: {}
steps:
- name: do-the-clone
image: some-git-image
args:
- "-url=$(params.gitrepo.url)"
- name: notify-slack-after
params:
- name: message
value: "cloned $(tasks.cloned-git.results.cloned-gitrepo.url) at $(tasks.cloned-git.results.cloned-gitrepo.commitish)"
This proposal does not include adding support for any additional syntax (though it could be added in the future!).
When providing values for objects Task and Pipeline authors can provide an entire object as a value only when the
value is also an object (see matching interface and extra keys) by using
the same [*]
syntax used to provide entire arrays.
(Note that supporting this replacement for arbitrary Task and Pipeline fields is currently out of scope;
this feature would apply to binding results to params only.
In the above example:
params:
- name: gitrepo
value: $(params.gitrepo[*])
When providing values for strings, Task and Pipeline authors can access individual attributes of an object param; they cannot access the object as whole (we could add support for this later).
In the above example, within a Pipeline Task:
value: "about to clone $(params.gitrepo.url) at $(params.gitrepo.commitish)"
In the above example, within a Task spec:
- "-url=$(params.gitrepo.url)"
When populating a string field, It would be invalid (at least initially) to attempt to do variable replacement on the
entire gitrepo
object ($(params.gitrepo)
). If we choose to support this later we could potentially replace the value
with the json representation of the object directly.
Within the context of a Task, Task authors would continue to have access to the path
attribute of a result (identified
by name) and would not have any additional variable replacement access to the keys.
In the above example:
cat /place/with/actual/cloned/gitrepo.json > $(results.cloned-gitrepo.path)
$(results.cloned-gitrepo.path)
refers to the path at which to write the clone-gitrepo
object as json (it does not
refer to an attribute of cloned-gitrepo
called path
, and even if cloned-gitrepo
had a path
attribute, there
would be no collision because variable replacement for individual attributes of the result is not supported within the
Task.
(See emitting object reesults for more details.)
Within the context of a Pipeline Task, the Pipeline author can refer to individual attributes of the object results of other Tasks.
In the above example:
value: "cloned $(tasks.cloned-git.results.cloned-gitrepo.url) at $(tasks.cloned-git.results.cloned-gitrepo.commitish)"
As with params only access to individual keys will be allowed initially.
Tekton provides builtin variable replacements which may one day conflict with the keys in a dictionary. Existing
variable replacements will not conflict (the closest candidate is path
, as in $(results.foo.path)
but since this is
only used in the context of writing Tasks results, there will be no conflict - see
variable replacement with object params for more detail).
But still, features we add in the future may conflict, for example if we added an optional params feature, we might
provide a variable replacement like $params.foo.bound
. What if the foo
param was an object with a key called bound
?
For example:
results:
- name: foo
type: object
properties:
bound: {
type: string
}
TEP-0080
added support for using []
syntax instead of .
to support variables which are have names that include .
. Resolving
the ambiguity here would only require using $params.foo.bound
to refer to the built in variable replacement, and
$params.foo["bound"]
to unambiguously refer to the key bound
within foo
. Other options include:
- Do not allow keys to be defined which conflict with variable replacement
- Con: this means that a Task which is perfectly fine today might break tomorrow if we introduce new variable
replacement - in the above example,
foo.bound
is perfectly valid today and would be allowed but would suddenly stop working after thebound
feature is introduced
- Con: this means that a Task which is perfectly fine today might break tomorrow if we introduce new variable
replacement - in the above example,
- Let the defined object override the variable replacement; i.e. in this case the built in replacement will not be available
Problem: It has been supported to have dots.
in parameter names. But things get tricky when both a string/array parameter named foo.bar
and an object parameter named foo
with a key called bar
are declared in the same taskSpec.
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
generateName: object-param-test-
spec:
taskSpec:
params:
- name: foo
properties:
key1: {}
bar: {}
default:
key1: "val1"
bar: "val2"
- name: foo.bar
default: "tricky"
results:
- name: echo-output
steps:
- name: echo-params
image: bash
script: |
set -e
echo $(params.foo.bar) | tee $(results.echo-output.path)
Design: $(params.foo.bar)
would be treated as a reference to the key bar
of the object parameter foo
. Similar design applies to result names i.e. $(tasks.task1.results.foo.bar)
refers to the key bar
of the object result foo
from task1
.
Reason: Without this TEP-0075 implemented, $(params.foo.bar)
would be treated by the validation webhook as an invalid reference for the parameter named foo.bar
since parameter names containing dots can only be referenced using bracket notation i.e. $(params["foo.bar"])
because TEP-0080 added support for using []
syntax instead of dots .
to support variables which are have names that include dots .
(This doc also mentioned this.). Therefore, after this TEP-0075 is implemented, $(params.foo.bar)
will be treated naturally as a reference to a key of an object parameter without any conflicts.
Problem: If a parameter of object type is named foo.bar
and has two keys mykey
and my.dot.key
like the example below, it would become confusing to users how to reference the whole object parameter itself, and how to reference its individual keys.
apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
generateName: object-param-test-
spec:
taskSpec:
params:
- name: foo.bar
properties:
mykey: {}
my.dot.key: {}
- name: foo
properties:
bar: {}
anotherkey: {}
Design: For parameters of object type, using dots in its names or its key names is NOT allowed (at least in the initial implementation). Similarly, object result and key names shouldn't contain dots.
Reason: The motivation of TEP-0080 supporting dots in variable names is using domain-scoped names as a way for the domain owner to "own" the definition of what those parameters and results are for and how they will be used. For parameters of object type, the parameter name and its individual key names are already domain-scoped, which means there seems no need to still use dots in object parameter names or its key names.
Alternatives: Allow users to use dots in object parameter names and its key names. That means names containing dots must be referenced in bracket []
. Possible ways to reference individual keys
- using brackets only:
$(params["my.object"]["mykey"])
- mixing bracket and dot:
$(params["my.object"].mykey)
NOTE: If the key name also contain dots i.e.
my.key
, only #1 is allowed i.e.$(params["my.object"]["my.key"])
.
In addition to unit tests the implementation would include:
- At least one reconciler level integration test which includes an object param and an object result
- A tested example of a Task, used in a Pipeline, which:
- Declares an object param
- Emits an object result
- In the Pipeline, another Task consumes a specific value from the object
See also the TEP-0076 Array results and indexing design evaluation.
- Reusability:
- Pro: PipelineResources provided a way to make it clear what artifacts a Pipeline/Task is acting on, but
using them made a Task less resuable
- this proposal introduces an alternative that does not involve making a Task less reusable (especially once coupled with TEP-0044)
- Pro: PipelineResources provided a way to make it clear what artifacts a Pipeline/Task is acting on, but
using them made a Task less resuable
- Simplicity
- Pro: Support for emitting object results builds on TEP-0076
- Pro: This proposal reuses the existing array or string concept for params
- Pro: This proposal continues the precedent of using JSONPath syntax in variable replacement
- Flexibility
- Pro: Improves support for structured interfaces which tools can rely on
- Con: Although there is a precedent for including JSONPath syntax, this is a step toward including more hard coded expression syntax in the Pipelines API (without the ability to choose other language options)
- Con: We're also introducing and committing to JSON Schema! Seems worth it for what we get tho
- Conformance
- Supporting this syntax would be part of the conformance surface; the JSON Schema syntax is a bit verbose for simple cases (but paves hte way for the more complex cases we can support later, including letting Tasks express how to validate their own parameters)
See also TEP-0076 Array results and indexing for more drawbacks.
- In the current form, this sets a precedent for pulling in more schema support (why JSON Schema)
- This improves the ability to make assertions about Task interfaces but doesn't (explicitly) include workspaces at all. For example, say you had a “git” param for a pipeline containing a commitish and a url. This is likely going to be used to clone from git into a workspace, and there will be some guarantees that can be made about the data that ends up in the workspace (e.g. the presence of a .git directory). But there is no way of expressing the link between the workspace and the params.
Instead of embedding the details of the type in the param/result definition, we could introduce a new schema
section.
For example, instead of the proposed syntax where we allow a new value object
for type
and we add properties
:
params:
- name: someURL
type: string
- name: flags
type: array
- name: sdist
type: object
properties:
sha: { }
path: { }
We could add an explicit new schema
section:
params:
- name: someURL
schema:
type: string
- name: flags
schema:
type: array
- name: sdist
schema:
type: object
properties:
sha: { }
path: { }
Pros
- It's very clear where the schema is defined; specifically which part of the Task/Pipeline spec is considered JSON
Schema. In the current proposal the JSON Schema is mixed in with our own fields (specifically
name
).
Cons
-
For simple types this is more verbose - once we allow more complex objects the extra level of indentation won't be as noticeable, but when defining simple types it feels unnecessary.
-
If we go this route, we'll have to grapple with the question of what to do with the existing
type
field. For example, if we add the syntax suggested here to the existingtype
syntax, their uses would look like:params: - name: someURL type: string - name: flags type: array - name: sdist schema: type: object properties: sha: { type: string } path: { type: string }
If we deprecated the type syntax and only used JSON Schema, the above would become very verbose:
params: - name: someURL schema: type: string - name: flags schema: type: array - name: sdist schema: type: object properties: sha: { type: string } path: { type: string }
We could also go one step further and create a syntax which we translate into JSON Schema inside the controller, but is customized to meet our needs, specifically:
- Add "descriptions" for keys
- Use required by default (aka provide
optional
syntax)
For example, instead of:
params:
- name: someURL
description: ""
schema:
type: string
- name: flags
description: ""
schema:
type: array
- name: sdist
description: ""
schema:
type: object
properties:
sha: {
type: string
}
path: {
type: string
}
We could have:
params:
- name: someURL
description: ""
type: string
- name: flags
description: ""
type: array
- name: sdist
description: ""
type: dict # use dict intead of object
keys: # use keys instead of properties
sha: {
description: "" # add description section for each key
}
path: { }
optional: [ ] # add "optional" and have required by default
But the big differences are cosmetic (dict vs object, keys vs properties), and with the current proposal we could
reasonably add our own description
field into the existing property dictionaries (and description
is already
supported by Open API schema!.
We could make our own syntax, specific to dictionaries, however if we later add more JSON Schema support we'd need to revisit this. For example:
results:
- name: sdist
type: dict
keys: [ 'sha', 'path' ]
This is clean and clear when we're talking about a dictionary of just strings, but what if we want to allow nesting, e.g. array or dict values?
- Add support for nested types right away
- Add complete JSON Schema support
- Use something other than JSON Schema (or our own syntax proposed in Alternative #1) for expressing the object/dictionary structure It feels at this point like it's better to adopt a syntax that already handles these cases. (See why JSON Schema)
- Reduce the scope, e.g.:
- Don't add support for grabbing individual items in a dictionary
- We need this eventually for the future to be useful
- Don't let dictionaries declare their keys (e.g. avoid bringing in JSON Schema)
- Without a programmatic way to know what keys a dictionary needs/provides, we actually take a step backward in clarity, i.e. a list of strings has more information than a dictionary where you don't know the keys
- Don't add support for grabbing individual items in a dictionary
As mentioned in TEP-0076 Array results and indexing this update will be completely backwards compatible.
To make authoring time easier, we could support users declaring that a param or result is of type object, but not require them to specify anything further, including the keys they expect the object to support. For example inside of a Task you could do something like this:
spec:
params:
- name: grab-bag
type: object
steps:
- name: grab-specific-key-from-bag
image: ubuntu
script: |
echo $(params.grab-bag.magic-key)"
...
results:
- name: buncha-stuff
type: object
The use of $(params.grab-bag.magic-key)
would imply that the grab-bag
param is expected to have a key called
magic-key
but this would not need to be specified in the schema.
The result buncha-stuff
could include any keys (no keys would also be valid); no schema would be enforced at runtime.
Once Task and Pipeline authors are able to define object schemas for Tasks and Params, it would be very useful to:
- Allow definition reuse instead of having to copy and paste them around
- If we ship a known set of these with the Tekton Pipelines controller, they could define interfaces that tools such as Tekton Chains could rely on and build around.
We could add a way to for users to define these, for example a new CRD such as:
apiversion: tekton.dev/v1
kind: ParamSchema
metadata:
name: GitRepo
spec:
type: object # see question below
properties:
url: {
type: string # for now, all values are strings, so we could imply it
}
path: { } # example of implying type: string
(Or support something like the above via a ConfigMap.)
Which could be used in Tasks and Pipelines:
params:
- name: pull_remote
schemaRef: GitRepo
We could also pursue supporting JSON Schema refs.
(Thanks to @wlynch for the suggestion and above example!)
- tektoncd/pipeline#4786
- tektoncd/pipeline#4861
- tektoncd/pipeline#4867
- tektoncd/pipeline#4878
- tektoncd/pipeline#4883
- tektoncd/pipeline#4902
- tektoncd/pipeline#4904
- tektoncd/pipeline#5007
- tektoncd/pipeline#5083
- tektoncd/pipeline#5088
- tektoncd/pipeline#5090
- tektoncd/pipeline#5142
- tektoncd/pipeline#5144
- tektoncd/pipeline#5197
- tektoncd/pipeline#5222
- tektoncd/pipeline#5427