Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

artifact metadata #156

Closed
escapewindow opened this issue Feb 5, 2020 · 6 comments · Fixed by #158
Closed

artifact metadata #156

escapewindow opened this issue Feb 5, 2020 · 6 comments · Fixed by #158

Comments

@escapewindow
Copy link
Contributor

escapewindow commented Feb 5, 2020

Artifact Metadata

The goal is to provide Artifact Integrity guarantees from the point the worker uploads an artifact, to the point that someone downloads that artifact to use. We can do this by:

  1. adding SHA metadata to artifacts in the Queue,
  2. ensuring that that metadata can't be modified once it's written, and
  3. providing a download tool that queries the Queue for an artifact's location and SHA, downloads the artifact, and verifies the downloaded SHA matches the SHA provided by the Queue.

Adding artifact metadata to the Queue

First, we add a metadata dictionary to the S3ArtifactRequest type. This is a dictionary to allow for flexibility of usage. The initial known keys would include

ContentLength int64 `json:"contentLength"`
ContentSha256 string `json:"contentSha256"`
ContentSha512 string `json:"contentSha512"`

The sha256 field is required for Artifact Integrity. Releng has use cases for all 3 fields, so I'm proposing all 3.

A future entry may be ContentSha256WorkerSignature, once we solve worker identity.

(Optionally we could also add a metadata dictionary to the ErrorArtifactRequest (error summary?) and RedirectArtifactRequest (live log socket info?) types, but it's not clear if we want or need those at this time.)

We could add a Queue.getArtifactInfo endpoint that returns the URL and metadata.

Ensuring that metadata can't be modified once it's written

I'm under the impression this will Just Work, given the nature of the Queue.

Providing a download tool

This is probably a thin wrapper around the taskcluster client library, that gets the metadata of the artifact, downloads it, and verifies any shas. We should allow for optional and required metadata fields, and for failing out if any required information is missing, or if the sha doesn't match. We should be sure to measure the shas and filesizes on the right artifact state (e.g. combining a multipart artifact, not compressed unless the original artifact was compressed).

This tool should be usable as a commandline tool, or as a library that the workers can use.

Once we implement worker signatures in artifact metadata, the download tool will verify those signatures as well.

Object Service

The future object service should be compatible with this proposal.


I can create an rfc once we come to an initial consensus here.

@djmitche
Copy link
Contributor

djmitche commented Feb 5, 2020

@taskcluster/services-reviewers please share your feedback!

@escapewindow
Copy link
Contributor Author

ContentLength int64 json:"contentLength"

I realized this may be ambiguous. There's the the gzipped content length, the multipart upload content lengths, and and the filesize on disk. We care about the filesize on disk, so perhaps filesize.

@jvehent
Copy link

jvehent commented Feb 19, 2020

I'm under the impression this will Just Work, given the nature of the Queue.

Can someone confirm this assumption?

@djmitche
Copy link
Contributor

Can someone confirm this assumption?

It's something we should flesh out a little here. Two parts:

  • Queue would have no API to write to this data
  • The backend storage (postgres) would be configured to not allow updates that would modify the data (so only select, insert, delete, not update)

@jvehent
Copy link

jvehent commented Feb 19, 2020

The backend storage (postgres) would be configured to not allow updates that would modify the data (so only select, insert, delete, not update)

Right, I remember the discussion point now. This isn't an immutable data structure so much as an append-only database enforced via Grants. 👍

@escapewindow
Copy link
Contributor Author

I'm going to start working on restructuring this issue into an RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants