-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc 0158 - artifact metadata #158
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# RFC 158 - Artifact metadata | ||
* Comments: [#158](https://github.com/taskcluster/taskcluster-rfcs/pull/158) | ||
* Proposed by: @escapewindow | ||
|
||
# Summary | ||
|
||
The goal is to provide Artifact Integrity guarantees from the point the worker uploads an artifact, to the point that someone downloads that artifact to use. We can do this by: | ||
|
||
1. adding SHA metadata to artifacts in the Queue, | ||
2. ensuring that that metadata can't be modified once it's written, and | ||
3. providing a download tool that queries the Queue for an artifact's location and SHA, downloads the artifact, and verifies the downloaded SHA matches the SHA provided by the Queue. | ||
|
||
## Motivation | ||
|
||
This would improve robustness and security. By storing a SHA to verify on download, we can avoid corrupt downloads. By verifying the read-only SHA before we use the artifact, we can detect malicious tampering of artifacts at rest before we use it for a critical operation. | ||
|
||
(This would obsolete the [artifacts section of the Chain of Trust artifact](https://scriptworker.readthedocs.io/en/latest/chain_of_trust.html#chain-of-trust-artifacts), and improve artifact integrity platform-wide. See the [Chain of Trust implications](#chain-of-trust-implications) section below.) | ||
|
||
# Details | ||
|
||
## Adding artifact metadata to the Queue | ||
|
||
First, we add a `metadata` dictionary to the `S3ArtifactRequest` type. This is a dictionary to allow for flexibility of usage. The initial known keys should include | ||
|
||
``` | ||
ContentFilesize int64 `json:"contentFilesize"` | ||
ContentSha256 string `json:"contentSha256"` | ||
``` | ||
|
||
The sha256 field is required for Artifact Integrity. The filesize field is optional. | ||
|
||
A future entry may be `ContentSha256WorkerSignature`, if we solve worker identity, though we may abandon this in favor of worker [re]registration. | ||
|
||
We can optionally add a `metadata` dictionary to the `ErrorArtifactRequest` and `RedirectArtifactRequest` types. | ||
|
||
A new `Queue.getArtifactInfo` endpoint will return the artifact URL and metadata. | ||
|
||
## Ensuring that metadata can't be modified once it's written | ||
|
||
|
||
* Queue would have no API to write to this data | ||
* The backend storage (postgres) would be configured to not allow updates that would modify the data (so only select, insert, delete, not update) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess this was always technically possible with azure table storage but the fact that we're going to support it now with postgres makes me very happy |
||
|
||
## Providing a download tool | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should there not be an upload tool too, that calculates the hashes, and uploads them? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I was thinking this would be part of the worker. We could certainly provide a shared library or tool for docker-worker, generic-worker, and scriptworker to all use the same code, but I was thinking we would implement in node, go, and python. I'm not sure we need upload capability outside the workers, since we're trying to verify artifacts uploaded by workers, and not grant easy arbitrary uploads to everyone.
The workers do, aiui.
The worker is trusted, or we can't trust the artifact itself, let alone its sha. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I really mean is, is there a validation step after the artifact upload, that the SHA(s) provided in metadata match those of the uploaded file, or does this only happen when the artifact is downloaded? The advantage I see of validating the SHA(s) post-upload, if possible, is that taskcluster can refuse to store artifacts that claim to have one SHA but actually have a different one. I'm not sure is S3 provides any mechanisms to support SHA validation without the entire artifact needing to be downloaded, but if it does (such as an HTTP header containing the SHA256/SHA512 in the HTTP PUT that uploads the artifact), it would be great to use it. If we would need to download the artifact, perhaps we could limit the check to random spot checks, or to artifacts that are explicitly required to be validated. Currently we make a taskcluster API call (
so that the endpoint could be called multiple times during artifact upload to report progress on the upload state. If the connection to S3 dropped during upload, and the upload had to start again, this could all be made visible from the task inspector etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm. I could see that being useful for a) saving storage costs for invalid artifacts, b) more end-to-end reliability, c) marking the correct task as failed, and so on. (For (c), without post-upload-verification, the downstream tasks will fail when they try to verify their download; in certain graphs, this could be hundreds of downstream tasks failing. With post-upload-verification, the task that failed to upload properly will fail, which is more accurate.) Those are all wins. However, aiui, S3 doesn't provide a way to verify checksums unless we downgrade to md5, so we would need to re-download and re-calculate the checksums. That may be costly enough in machine time that we need to determine whether this is worth it. This RFC seems to be pointing at the question of why blob artifacts were deprecated, and whether it makes more sense to revive them; I'm leaning towards blocking this RFC on that decision. |
||
|
||
This tool queries the Queue for the artifact metadata, downloads the artifact, and verifies any shas. We should allow for optional and required metadata fields, and for failing out if any required information is missing, or if the sha doesn't match. We should be sure to measure the shas and filesizes on the right artifact state (e.g. combining a multipart artifact, not compressed unless the original artifact was compressed). | ||
|
||
This tool should be usable as a commandline tool, or as a library that the workers can use. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this also be included in Taskcluster Users workflows? i.e. QA downloading an artifact to test? If so, how will that be "enforced" or "promoted"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we would start with encouraging and recommending the tool, and making it easy to get and use. Once that starts happening, we could update user docs to mention the tool, and remove references to other ways to download things. But our main use case here is protecting the release pipeline: making sure no malicious artifact, worker, or task can result in us shipping something arbitrary. If we end up not enforcing manual use by humans, but those manual tasks never result in our shipping something we didn't expect to ship, we may be ok with that level of risk. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, sounds good. |
||
|
||
Once we implement worker signatures in artifact metadata, the download tool will verify those signatures as well. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What gets signed by the worker? The artifact content + metadata, or just the metadata, or something else? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would sign the metadata. In this case, just the sha256 hash. |
||
|
||
## Object Service | ||
|
||
The future object service should be compatible with this proposal. | ||
|
||
## Chain of Trust implications | ||
|
||
As mentioned above, this RFC will deprecate the `artifacts` section of the Chain of Trust artifact, also known as the CoT artifact. The Chain of Trust has three primary guarantees: | ||
|
||
1. the artifacts have not been modified at rest, | ||
2. the workers which generated the artifacts are under our control, and | ||
3. the tasks that the workers ran were generated from a trusted tree. | ||
|
||
The CoT artifact has an `artifacts` section, containing artifact paths and checksums, to address guarantee #1. Once there is a platform supported way to store and verify checksums for artifacts, we will no longer need to do so in the CoT artifact. | ||
|
||
# Implementation | ||
|
||
* TBD | ||
* Implemented in Taskcluster version ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "if we solve worker identity" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is referring to #157 , which is looking less likely to happen.