Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc 0158 - artifact metadata #158

Merged
merged 1 commit into from
Oct 16, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ See [mechanics](mechanics.md) for more detail.
| RFC#153 | [remove the email validation for metadata.owner](rfcs/0153-remove-email-validation-for-metadata-owner.md) |
| RFC#154 | [Migrate Taskcluster to postgres](rfcs/0154-Migrate-taskcluster-to-postgres.md) |
| RFC#155 | [Create an object service](rfcs/Create-object-service.md) |
| RFC#158 | [Artifact metadata](rfcs/0158-artifact-metadata.md) |
| RFC#163 | [ProjectId](rfcs/0163-project-id.md) |
| RFC#165 | [Anonymous scopes](rfcs/0165-Anonymous-scopes.md) |
| RFC#166 | [Sign Public S3 URLs](rfcs/0166-Sign-public-S3-urls.md) |
69 changes: 69 additions & 0 deletions rfcs/0158-artifact-metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# RFC 158 - Artifact metadata
* Comments: [#158](https://github.com/taskcluster/taskcluster-rfcs/pull/158)
* Proposed by: @escapewindow

# Summary

The goal is to provide Artifact Integrity guarantees from the point the worker uploads an artifact, to the point that someone downloads that artifact to use. We can do this by:

1. adding SHA metadata to artifacts in the Queue,
2. ensuring that that metadata can't be modified once it's written, and
3. providing a download tool that queries the Queue for an artifact's location and SHA, downloads the artifact, and verifies the downloaded SHA matches the SHA provided by the Queue.

## Motivation

This would improve robustness and security. By storing a SHA to verify on download, we can avoid corrupt downloads. By verifying the read-only SHA before we use the artifact, we can detect malicious tampering of artifacts at rest before we use it for a critical operation.

(This would obsolete the [artifacts section of the Chain of Trust artifact](https://scriptworker.readthedocs.io/en/latest/chain_of_trust.html#chain-of-trust-artifacts), and improve artifact integrity platform-wide. See the [Chain of Trust implications](#chain-of-trust-implications) section below.)

# Details

## Adding artifact metadata to the Queue

First, we add a `metadata` dictionary to the `S3ArtifactRequest` type. This is a dictionary to allow for flexibility of usage. The initial known keys should include

```
ContentFilesize int64 `json:"contentFilesize"`
ContentSha256 string `json:"contentSha256"`
```

The sha256 field is required for Artifact Integrity. The filesize field is optional.

A future entry may be `ContentSha256WorkerSignature`, if we solve worker identity, though we may abandon this in favor of worker [re]registration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "if we solve worker identity" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is referring to #157 , which is looking less likely to happen.


We can optionally add a `metadata` dictionary to the `ErrorArtifactRequest` and `RedirectArtifactRequest` types.

A new `Queue.getArtifactInfo` endpoint will return the artifact URL and metadata.

## Ensuring that metadata can't be modified once it's written


* Queue would have no API to write to this data
* The backend storage (postgres) would be configured to not allow updates that would modify the data (so only select, insert, delete, not update)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was always technically possible with azure table storage but the fact that we're going to support it now with postgres makes me very happy


## Providing a download tool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there not be an upload tool too, that calculates the hashes, and uploads them?
Would the tool support parallel multipart uploads?
What validates that the sha(s) are correct on upload?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there not be an upload tool too, that calculates the hashes, and uploads them?

I was thinking this would be part of the worker. We could certainly provide a shared library or tool for docker-worker, generic-worker, and scriptworker to all use the same code, but I was thinking we would implement in node, go, and python. I'm not sure we need upload capability outside the workers, since we're trying to verify artifacts uploaded by workers, and not grant easy arbitrary uploads to everyone.

Would the tool support parallel multipart uploads?

The workers do, aiui.

What validates that the sha(s) are correct on upload?

The worker is trusted, or we can't trust the artifact itself, let alone its sha.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I really mean is, is there a validation step after the artifact upload, that the SHA(s) provided in metadata match those of the uploaded file, or does this only happen when the artifact is downloaded?

The advantage I see of validating the SHA(s) post-upload, if possible, is that taskcluster can refuse to store artifacts that claim to have one SHA but actually have a different one. I'm not sure is S3 provides any mechanisms to support SHA validation without the entire artifact needing to be downloaded, but if it does (such as an HTTP header containing the SHA256/SHA512 in the HTTP PUT that uploads the artifact), it would be great to use it. If we would need to download the artifact, perhaps we could limit the check to random spot checks, or to artifacts that are explicitly required to be validated.

Currently we make a taskcluster API call (queue.createArtifact) to request to upload an artifact, then negotiate the artifact upload with S3, but do not perform a second taskcluster API call to confirm successful completion of the upload. I could see a solution whereby after the upload is performed, the queue considers the artifact upload to be in progress, but requires a second API call to confirm that the artifact upload is completed. Addressing this would allow for smarter behaviour in the task inspector and other tools that display artifacts. It would even be reasonable for the second taskcluster API call (e.g. queue.reportArtifactUploadProgress) to have a payload with something like:

{
  "uploadedBytes": 1349124,
  "totalBytes": 344134232,
}

so that the endpoint could be called multiple times during artifact upload to report progress on the upload state. If the connection to S3 dropped during upload, and the upload had to start again, this could all be made visible from the task inspector etc.
This last suggestion is a little bit of scope creep, but it feels if we are redesigning the artifact upload and download mechanics, it would be a good time to review whether a second API call would be useful here for reporting progress.
And the first point above is really just seeing if we can find a means to ensure that taskcluster never claims to have artifacts with one SHA but actually store an artifact with a different SHA (for example due to data getting corrupted during transit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I could see that being useful for a) saving storage costs for invalid artifacts, b) more end-to-end reliability, c) marking the correct task as failed, and so on. (For (c), without post-upload-verification, the downstream tasks will fail when they try to verify their download; in certain graphs, this could be hundreds of downstream tasks failing. With post-upload-verification, the task that failed to upload properly will fail, which is more accurate.)

Those are all wins. However, aiui, S3 doesn't provide a way to verify checksums unless we downgrade to md5, so we would need to re-download and re-calculate the checksums. That may be costly enough in machine time that we need to determine whether this is worth it.

This RFC seems to be pointing at the question of why blob artifacts were deprecated, and whether it makes more sense to revive them; I'm leaning towards blocking this RFC on that decision.


This tool queries the Queue for the artifact metadata, downloads the artifact, and verifies any shas. We should allow for optional and required metadata fields, and for failing out if any required information is missing, or if the sha doesn't match. We should be sure to measure the shas and filesizes on the right artifact state (e.g. combining a multipart artifact, not compressed unless the original artifact was compressed).

This tool should be usable as a commandline tool, or as a library that the workers can use.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also be included in Taskcluster Users workflows? i.e. QA downloading an artifact to test?

If so, how will that be "enforced" or "promoted"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would start with encouraging and recommending the tool, and making it easy to get and use. Once that starts happening, we could update user docs to mention the tool, and remove references to other ways to download things. But our main use case here is protecting the release pipeline: making sure no malicious artifact, worker, or task can result in us shipping something arbitrary. If we end up not enforcing manual use by humans, but those manual tasks never result in our shipping something we didn't expect to ship, we may be ok with that level of risk.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good.


Once we implement worker signatures in artifact metadata, the download tool will verify those signatures as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What gets signed by the worker? The artifact content + metadata, or just the metadata, or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would sign the metadata. In this case, just the sha256 hash.


## Object Service

The future object service should be compatible with this proposal.

## Chain of Trust implications

As mentioned above, this RFC will deprecate the `artifacts` section of the Chain of Trust artifact, also known as the CoT artifact. The Chain of Trust has three primary guarantees:

1. the artifacts have not been modified at rest,
2. the workers which generated the artifacts are under our control, and
3. the tasks that the workers ran were generated from a trusted tree.

The CoT artifact has an `artifacts` section, containing artifact paths and checksums, to address guarantee #1. Once there is a platform supported way to store and verify checksums for artifacts, we will no longer need to do so in the CoT artifact.

# Implementation

* TBD
* Implemented in Taskcluster version ...
1 change: 1 addition & 0 deletions rfcs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
| RFC#153 | [remove the email validation for metadata.owner](0153-remove-email-validation-for-metadata-owner.md) |
| RFC#154 | [Migrate Taskcluster to postgres](0154-Migrate-taskcluster-to-postgres.md) |
| RFC#155 | [Create an object service](Create-object-service.md) |
| RFC#158 | [Artifact metadata](0158-artifact-metadata.md) |
| RFC#163 | [ProjectId](0163-project-id.md) |
| RFC#165 | [Anonymous scopes](0165-Anonymous-scopes.md) |
| RFC#166 | [Sign Public S3 URLs](0166-Sign-public-S3-urls.md) |