RFC for signing/verifying remotely referenced taskcluster.yml files #187

bhearsum · 2023-10-16T15:30:00Z

This is an addendum to #182. I'll note that the contents of the RFC only cover verification, because that's the only part that Taskcluster the platform cares about.

In the Firefox CI cluster, I expect that we'll be signing these through Autograph (most likely via https://github.com/mozilla-releng/adhoc-signing at first), and copying the signatures into wherever we publish the .taskcluster.yml files.

rfcs/0187-sign-taskcluster-yml-remote-references.md

petemoore

The RFC doesn't cover how the service obtains the key(s) to validates the signatures.

For a multitenant environment, I think it would be better for the repo to stipulate if it requires a signature, and which signing keys it accepts, rather than have a single global key that can be used for signing across the entire deployment, or a single set of keys that apply to all projects. This feels like it should be repo config, empowering the project users who the CI is for.

For example, the .taskcluster.yml in the repo that wants to include the shared .taskcluster.yml file could look like this:

---
version: 1
config-from:
  source: github.com/taskcluster/taskgraph/data/taskcluster-yml-github.yml@main
  signature:
    required: true
    source: taskgraph.sig
    accepted-keys:
      - ed25519: <base64 encoded key> 
      - ....
      - ....
context:
  project-name: mozillavpn
  scopes:
    - secrets:get:project/mozillavpn/*

I think this design is much more flexible, more transparent, and puts the control in the hands of the projects that use it. My concern with the platform deployment approach is it assumes a taskcluster deployment is controlled by a central team, blocks project teams when those staff are not available, and does not support multi-tenant type environments. It is also more opaque, difficult to troubleshoot why the wrong signing key might be in use, more difficult to change the signing key(s) if they need updating (because hidden behind platform config and only visible to operational staff).

I think having it in the .taskcluster.yml makes each .taskcluster a little bit bigger, but the config in there is unlikely to change frequently, and if a key is rotated, it makes it much more visible, provides an auditing history, keeps a git history of the changes that occurred, and who made them, and allows you to roll out changes gradually if required, but with a script you can update all repos in one go if required. This supports the environment changing at mozilla too, if it stops being a single team that control all the CI pipelines of the whole company, and some teams need to move quickly but would like to adopt the same security approach. It is more flexible regarding changes to the organisation.

petemoore · 2023-10-27T11:39:10Z

rfcs/0187-sign-taskcluster-yml-remote-references.md

+
+To accommodate integrity checks, Taskcluster-GitHub will require that any remotely referenced `.taskcluster.yml` files have an associated detached GPG signature which can be verified by a public GPG key that it has been configured with.
+
+Integrity checks will be on by default, but can be disabled by setting `allow-unsigned-remote-references` to `True`.


This is pretty releng-specific, let's invert this and have feature disabled by default.

My concern with this being off by default is that a misconfiguration will result in integrity checks being lost, and no easy way to notice it (things will just silently continue to work). Maybe I'm overconcerned about this though? I've asked the SecOps folks for their opinion as well.

lotas · 2023-10-27T13:49:58Z

We were discussing this with Pete.. :)

The biggest question so far was what problem are we trying to solve? Protect what and from whom?

Some extra ideas that popped up: use scopes

We can put github:allow-includes scope and make github repo roles include it. So if some repo needs this - you can just add scope. This way you can stay flexible and don't lock into deployment.
Going further you can also add more control by adding scopes that would include allowed urls: github:allow-includes:github.com/releng/baseline, etc..
just a thought

bhearsum · 2023-11-13T20:01:52Z

The RFC doesn't cover how the service obtains the key(s) to validates the signatures.

The current draft has them specified in Taskcluster-Github's config.yml. I can see that it is perhaps not specific enough though, maybe that's what your referring to?

For a multitenant environment, I think it would be better for the repo to stipulate if it requires a signature, and which signing keys it accepts, rather than have a single global key that can be used for signing across the entire deployment, or a single set of keys that apply to all projects. This feels like it should be repo config, empowering the project users who the CI is for.

We're getting close to a point where what I believe the needs for Firefox CI are are close to incompatible with is wanted by Taskcluster in general. Specifically, I think we want all of the following for Firefox CI:

Repositories should not be able to opt out of signature checks if they are using a remotely referenced .taskcluster.yml
Some (possibly all) repositories should not be able to specify their own keys (I'm thinking of level 3 repositories here, where we are very strict about things that go into CI.)
Some (possibly all) repositories should only be allowed to pull remotely referenced .taskcluster.yml files from location specified by the Taskcluster-GitHub deployment. (This was a SecOps ask in the RRA.)

Many (all?) of these are quite at odds with what Taskcluster in general seems to want. I'm struggling to come up with a viable path forward here. I'm tempted to say that SecOps and the Taskcluster team needs to work together to come up with it - I feel that I'm largely acting as an intermediary here.

I think this design is much more flexible, more transparent, and puts the control in the hands of the projects that use it. My concern with the platform deployment approach is it assumes a taskcluster deployment is controlled by a central team, blocks project teams when those staff are not available, and does not support multi-tenant type environments. It is also more opaque, difficult to troubleshoot why the wrong signing key might be in use, more difficult to change the signing key(s) if they need updating (because hidden behind platform config and only visible to operational staff).

I think having it in the .taskcluster.yml makes each .taskcluster a little bit bigger, but the config in there is unlikely to change frequently, and if a key is rotated, it makes it much more visible, provides an auditing history, keeps a git history of the changes that occurred, and who made them, and allows you to roll out changes gradually if required, but with a script you can update all repos in one go if required. This supports the environment changing at mozilla too, if it stops being a single team that control all the CI pipelines of the whole company, and some teams need to move quickly but would like to adopt the same security approach. It is more flexible regarding changes to the organisation.

I understand what you're saying about flexibility, but we're not talking about something here that has no workarounds. If you want to include a .taskcluster.yml from a non-approved source, you would have two options:

Talk to RelEng and either that source added, or move the .taskcluster.yml to an already approved source.
Live without the remote reference, and copy in the contents.

There is no hard block stopping work here in any case - you can always do whatever you want in the .taskcluster.yml in a repo you control.

bhearsum · 2023-11-13T20:05:42Z

We were discussing this with Pete.. :)

The biggest question so far was what problem are we trying to solve? Protect what and from whom?

The goal is to ensure that the remote .taskcluster.yml that is processed was authored and published by a known good source. (To guard against man in the middle attacks, compromised GitHub accounts, etc.)

Some extra ideas that popped up: use scopes

We can put github:allow-includes scope and make github repo roles include it. So if some repo needs this - you can just add scope. This way you can stay flexible and don't lock into deployment. Going further you can also add more control by adding scopes that would include allowed urls: github:allow-includes:github.com/releng/baseline, etc.. just a thought

I'm not sure I fully understand this suggestion...are you saying that these scopes would control which repositories remotely referenced .taskcluster.yml files could come from? If so, that seems like a reasonable alternative to mapping project repositories to these repos in the Taskcluster-GitHub configuration. (It doesn't solve the integrity checking part of this - but it does address another thing that SecOps wanted.)

ahal · 2023-11-13T21:59:15Z

I think the disconnect here is stemming from the fact that the Taskcluster team are approaching this with the lens of developers as the target users and a "hacker ethos" (empower them as much as possible).

I think normally that's the right approach, but in this case our aim is to lock things down, the opposite of empowering them. Think of it from the lens of selling Taskcluster to an enterprise user and the request makes a lot more sense. Enterprise users (and fxci) need controls to prevent footguns and security oopsies. I think Taskcluster is best suited for large enterprises, so IMO it makes a ton of sense to build these controls directly into the platform.

That's not to say we need to enforce these controls on anyone. Every instance can be free to use or not use them as they see fit.

With that in mind, @petemoore is there any compelling reason not to specify the keys as a deployment configuration?

ahal · 2023-11-13T22:08:31Z

Also there's no reason they couldn't be configurable in both the deployment and the .taskcluster.yml if you wanted.. but I don't think fxci would use the .taskcluster.yml version, so would likely be a case of YAGNI.

bhearsum requested a review from a team as a code owner October 16, 2023 15:30

bhearsum requested review from lotas, petemoore and matt-boris and removed request for a team October 16, 2023 15:30

bhearsum force-pushed the 182-fix branch 2 times, most recently from c2fa32c to 077b489 Compare October 16, 2023 15:33

lotas reviewed Oct 18, 2023

View reviewed changes

rfcs/0187-sign-taskcluster-yml-remote-references.md Outdated Show resolved Hide resolved

lotas reviewed Oct 18, 2023

View reviewed changes

rfcs/0187-sign-taskcluster-yml-remote-references.md Outdated Show resolved Hide resolved

lotas reviewed Oct 18, 2023

View reviewed changes

rfcs/0187-sign-taskcluster-yml-remote-references.md Outdated Show resolved Hide resolved

bhearsum force-pushed the 182-fix branch from 077b489 to 784bea9 Compare October 19, 2023 18:29

RFC for signing/verifying remotely referenced taskcluster.yml files

50a2eb8

bhearsum force-pushed the 182-fix branch from 784bea9 to 50a2eb8 Compare October 24, 2023 13:37

bhearsum mentioned this pull request Oct 26, 2023

Allow remote references to .taskcluster.yml files processed by Taskcluster-GitHub taskcluster/taskcluster#6138

Open

bhearsum requested a review from lotas October 26, 2023 20:37

petemoore requested changes Oct 27, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC for signing/verifying remotely referenced taskcluster.yml files #187

RFC for signing/verifying remotely referenced taskcluster.yml files #187

bhearsum commented Oct 16, 2023

petemoore left a comment

petemoore Oct 27, 2023

bhearsum Oct 27, 2023

lotas commented Oct 27, 2023

bhearsum commented Nov 13, 2023

bhearsum commented Nov 13, 2023

ahal commented Nov 13, 2023

ahal commented Nov 13, 2023 •

edited

Loading


		To accommodate integrity checks, Taskcluster-GitHub will require that any remotely referenced `.taskcluster.yml` files have an associated detached GPG signature which can be verified by a public GPG key that it has been configured with.

		Integrity checks will be on by default, but can be disabled by setting `allow-unsigned-remote-references` to `True`.

RFC for signing/verifying remotely referenced taskcluster.yml files #187

Are you sure you want to change the base?

RFC for signing/verifying remotely referenced taskcluster.yml files #187

Conversation

bhearsum commented Oct 16, 2023

petemoore left a comment

Choose a reason for hiding this comment

petemoore Oct 27, 2023

Choose a reason for hiding this comment

bhearsum Oct 27, 2023

Choose a reason for hiding this comment

lotas commented Oct 27, 2023

bhearsum commented Nov 13, 2023

bhearsum commented Nov 13, 2023

ahal commented Nov 13, 2023

ahal commented Nov 13, 2023 • edited Loading

ahal commented Nov 13, 2023 •

edited

Loading