Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supply chain security with embedded protoc binaries #575

Closed
benesch opened this issue Dec 22, 2021 · 9 comments · Fixed by #610
Closed

Supply chain security with embedded protoc binaries #575

benesch opened this issue Dec 22, 2021 · 9 comments · Fixed by #610

Comments

@benesch
Copy link

benesch commented Dec 22, 2021

We're thinking about using prost over at @MaterializeInc, but the supply chain security of the embedded protoc binaries has me very nervous.

Specifically: what sort of validation are you doing before you merge PRs to check that the binaries supplied by third parties are identical to the binaries published by google/protobuf? I did some spot checks and they look identical, but I didn't see anything in your CI pipeline that would assert this.

We'd honestly probably want to opt out of the bundled protoc deps entirely. I know that's already possible if you specify PROTOC, but I'd feel better if there were a way to not even bring the binaries into our dependency tree. Otherwise it seems too easy for accidents to happen; e.g., if that env var somehow gets unset.

How would y'all feel about adopting an approach similar to rust-protobuf, where the bundled binaries are kept in a separate crate (e.g., prost-build-vendored)? Using that crate could be enabled by default, but we could keep it off @MaterializeInc and lint for its absence using cargo deny, and I'd sleep a good bit easier at night.

I've also been toying with the idea of building something like protobuf-src for Rust (a la openssl-src) that builds libprotobuf from source and makes its protoc available to downstream libraries. That could be a nice alternative for prost-build to expose.

benesch added a commit to benesch/prost that referenced this issue Dec 22, 2021
This commit makes it possible to opt out of downloading the bundled
pre-compiled protoc binaries. This is useful for downstream users with
strict supply chain security requirements (see tokio-rs#575 for details).

The approach taken is to move the pre-compiled binaries to a new crate
called `protoc-bin`. prost-build depends on this crate only if the
`protoc-vendored-bin` feature is enabled (which it is by default).
While it was always possible to avoid the vendored binaries by
specifying the `PROTOC` environment variable, the feature is a bit more
foolproof; security-concious users can audit their Cargo.lock to ensure
that the protc-bin crate is never even downloaded.

This commit also adds a new feature called `protoc-vendored-src`, which
uses the new `protobuf-src` crate to compile a copy of protoc from
source on the fly. This feature makes `cargo build` seamless on all
platforms that Protobuf supports, not just those that protoc-bin
distributed pre-compiled binaries for, but has the downside of requiring
a C++ toolchain and increasing compile times.

If neither of the new vendored features are enabled, `prost-build` fails
unless `PROTOC` and `PROTOC_INCLUDE` are explicitly specified.
benesch added a commit to MaterializeInc/prost that referenced this issue Jan 16, 2022
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decsion:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decsion:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decsion:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decsion:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decsion:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decsion:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decsion:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decision:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decision:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 18, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decision:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/prost that referenced this issue Jan 19, 2022
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 19, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decision:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 19, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decision:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
benesch added a commit to MaterializeInc/materialize that referenced this issue Jan 19, 2022
It's time to switch from rust-protobuf to Prost. There are a few factors
in this decision:

  * The maintainer of rust-protobuf has announced that the project is
    essentially unmaintained for the time being [0].

  * Prost is the more popular Protobuf library across the ecosystem. In
    particular, the Tonic GRPC library uses Prost, and the Tokio tracing
    console uses Prost. These are both extremely important for
    Materialize Platform, which we expect to be built on top of Tonic
    and making heavy use of distributed tracing.

There have been been two major blockers to adopting Prost:

  1. Prost lacked dynamic message support. That was miraculously fixed
     last month with the release of a fantastic prost-reflect crate [1].

  2. Prost embeds `protoc` in its Git repository (!). See
     tokio-rs/prost#575. That is a disaster for supply chain security.
     This commit patches in a Materialize-specific fork of Prost that
     compiles `protoc` from source rather than using upstream's embedded
     binary blobs.

This commit excludes one crucial piece: converting the dynamic Protobuf
decoding in the `interchange` and `testdrive` crates from rust-protobuf
to prost-reflect. I'll do that migration in a future commit to avoid
inflating the size of this PR unnecessarily.

[0]: stepancheg/rust-protobuf#583 (comment)
[1]: https://www.reddit.com/r/rust/comments/rsg6gx/announcing_prostreflect_a_crate_for_protobuf/
@LucioFranco
Copy link
Member

Hey @benesch sorry for the delay on this I must have missed it somehow. Are you still interested in getting this working? I think the separate crate idea is good.

cc @jonhoo I think you've been interested in some of this stuff too.

@jonhoo
Copy link

jonhoo commented Feb 28, 2022

I think doing something along what's proposed here is a great idea!

@benesch
Copy link
Author

benesch commented Feb 28, 2022

No problem, @LucioFranco! Thanks for taking a look now. I actually whipped up a proposal in #576, but we need you or another maintainer to weigh in on how to expose the toggle of which protoc to use. The lack of mutually exclusive features in Cargo means there’s no great solution.

I’m also maintaining a fork of prost that forces compiling protoc from source: https://github.com/materializeinc/prost. You could upstream that approach in prost, too! But I figure you probably don’t want to force a C++ dependency on all users of prost.

@benesch
Copy link
Author

benesch commented Feb 28, 2022

By the way, if it’s easiest to sync on Discord or something, happy to find the time!

@LucioFranco
Copy link
Member

Yeah, let me try to catch up on everything so may be a bit slow then we can sync. I'd love for yall to not have to maintain a fork so I'd like to get this solved :)

@LucioFranco
Copy link
Member

Ok I spent sometime today talking to @jonhoo about this and how we also want to not ship compiled binaries internally here.

I think the proposed solution in #576 is a step in the right direction but I think we will want to avoid having two mutually exclusive feature flags. I think the direction we should head should actually be to not ship any pre-compiled binaries and only allow users to either provide a protoc (similar to how we support it now) or to compile it thus requiring users to have a C++ toolchain on hand.

So I think we should follow this sort of flow for obtaining a protoc:

  • If PROTOC_NO_VENDOR
    • excute from PATH or fail
  • else if prost-build/vendored feature is enabled or protoc is not found in PATH
    • compile protoc from source and execute from there
  • else
    • excute from PATH or fail

I believe this solves most of our issues, while this may increase build times for users that don't have a protoc installed on their systems, I do think this will overall improve the security posture of prost for all our customers.

@benesch
Copy link
Author

benesch commented Mar 24, 2022

I think the proposed solution in #576 is a step in the right direction but I think we will want to avoid having two mutually exclusive feature flags. I think the direction we should head should actually be to not ship any pre-compiled binaries and only allow users to either provide a protoc (similar to how we support it now) or to compile it thus requiring users to have a C++ toolchain on hand.

This sounds even better than my PR!

@LucioFranco
Copy link
Member

@benesch I am working on a PR for this, ill post it in here and tag you on it once its ready. Thanks again for getting this rolling!

@benesch
Copy link
Author

benesch commented Mar 24, 2022

Sounds wonderful! You are taking an even stronger stance on this than I had hoped, so I am delighted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants