-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface crate security information in the UI #6397
Comments
I do see this callout to code hosts other than GitHub, which is good-- crates.io should aim to be less entwined with GitHub over time, rather than more. So given that, even though users must (currently) log in with GitHub, there is no requirement for crates to publish their repository on GitHub, or use Git at all, or specify any code repository at all. How are you envisioning for them to be included in all the checks planned? |
How do we expect the provenance checks to work over time? It feels bad if any of these are true:
In general I think focusing in on the over-time permanence of these checks would be great as we refine the proposal.
One thought that immediately came to mind here is that it'd be nice if we could give an easy ability for "confirming" publishes. Particularly if there's checks that are sensitive to the state at publish time, ideally an author can pre-publish in draft mode and then only release it to the wild when they're fully ready (perhaps even e.g. after docs.rs has finished docs building, in the future?). Currently there's not any kind of transient state - a release either exists or doesn't - and I think I've heard that this is a pain point for e.g. README rendering, so I imagine the new security stuff only adds to that. I'd also love to better understand our sense of how users are expected to interact with this. Most new releases get consumed automatically (e.g., cargo update, dependabot), maybe checked with something like cargo audit, and then folks are using them. Is the expectation that there will be an API for those to hit and query this information from? That might end up being a download-like API where outages are painful (i.e. break builds, potentially), so we should think about the load and design there early. |
(I think @walterhpearce is going to weigh in on the provenance side of things, since that's more his area of expertise than mine, although I do have some thoughts I might put down after that.)
Definitely. We're not at the point of doing a full design for this yet, but there have definitely been some early discussions around decoupling the end state of When we do get to the point of starting to think that through, I'll make sure we factor in the possibility of having other things like readme rendering and docs builds into the process.
Yes, I plan to add an API as part of this work. In the longer term, I think this also ties into other reliability work that's going on right now — improved observability and monitoring, zero downtime deployments for crates.io, and eventually disaster recovery. I'll give some thought to whether it's worth us persisting this information in some other, more durable form, like (but probably not actually) the crate index. (@Turbo87, @carols10cents, @jtgeibel, @mdtro: any thoughts?) |
I'd like to know more about the role of static checks in this hypothetical system; some examples would be nice. Relatively trivial checks like inspections of In particular, I'm wondering what security policies fall in the category of being acceptable to publish, but dangerous enough that they warrant a prominent warning. Perhaps displaying the checks not as pass/fail but more as informational items would make more sense. That also would help solve the real problem I have in mind, which is that this might just serve to push the burden of industry security practices onto open-source maintainers (something which is sure to generate controversy). I like the git provenance idea, so long as the implementation details can be sorted out in a way that doesn't prioritize GitHub as a hosting service. One way to get around the aforementioned issues might be to include the commit hash (or similar) in the publish API request. |
I'm a little unclear to me how this would go from getting a |
In addition to the other points that have been raised, two things stood out to me:
Also, idle thought, but might it be useful to highlight if |
A little known feature of cargo is that the VCS information is actually already included within a crate when available. If it is a git folder being published, cargo embeds the active hash and branch of the folder during publish and stores it inside the crate file as
With a valid source repository, branch and commit hash, we are then able to clone that specific repo/hash/branch and verify the source matches the published crate. This allows us to check provenance as:
This is done by cloning the remote repository at the branch and commit and comparing the published crate against the source code; with a few exceptions where modification occurs on publish (Cargo.toml mainly), we then confirm they match with a side-by-side directory diff. This has the added benefit while we are doing bulk analysis across the ecosystem to find anomalies and malicious activity.
This is a case we can address if it does get implemented; I'd see this as a case where we can, within our sandbox, execute the publish and compare with those results. We will need to maintain this to track any further modifications cargo makes to the crates on publish.
To support other VCS systems (CVS, SVN, etc) will need to be feature additions to cargo but the actual integration should be fairly trivial - the vast majority of published crates are via git and we will address edge cases as they arise; and we may hit single-crate-cases that will become a choice for the maintainer to publicize their code in a more accessible manner or not receive the green checkmark for that specific check.
I am aware there are some major quirks between the major providers (hi Gitlab), and we want to support all the major providers as well as non-vendored git repositories (gitweb et al). We want to make sure most ways of hosting source code are provided and avoid any vendor lock-in.
You'll see that the Strategically, I see it as explicitly having the vast majority of these ecosystem health checks to be non-blocking; that is to say, a crate will not pass the checks but will still be published and available. The purpose of this Security Tab is to provide us a way to surface this information. I hope as we begin implementing and rolling out these checks maintainers will be motivated to migrate towards more secure publishing, conform to making source code public and other security best practices in the scans - who doesn't want a green-across-the-board release?
My plan on these scans is to initially cover that baseline of "relatively trivial" checks across the board; we are currently lacking even this much, and the onus is completely on end users. Many of these are trivial, repeatable and provide a baseline level of confidence across our ecosystem. The vast majority of these we consider "opt-in" in a fashion; crates will not be blocked or yanked - but you won't have passing checks within the security page, and I feel this is a happy medium of helping drive adoption while not being disruptive. We are using the provenance as the initial POC check for this. Other things I'm thinking about are:
I want to do the initial work for the project and community to take ownership of - I hope to craft a good baseline for us of the obvious and trivial checks we can perform, and have the infrastructure and framework built for the community to begin expanding on it and adding checks as we come to a consensus on other security best practices.
As I said above, I hope to leave the majority of these as completely optional regardless - it is on the maintainer of a crate to determine whether they want to bother conforming to any best practices. When it comes to more complex regulatory or compliance requirements, or static code analysis you are absolutely correct - we should leave that to the vendors for various users to meet their needs as they see fit. However, I think its in the spirit of Rust to help give the community a baseline guarantee of safety from crates.io; with this system, we can protect the community from attacks that have been recently plaguing other repositories, and have the framework in place to keep up. |
For crates with a gitlab/github repo, crates.io could surface information from OpenSSF's https://securityscorecards.dev/ initiative. This creates a score and report from the result of many language-agnostic security checks. |
I want to raise a concern around the proposed provenance check of the source code and it's interaction with cargos current handling of cyclic dev-dependencies. It's obviously highly desirable that the published source code matches the corresponding tag in the git repository. Unfortunately it's not always possible to tag a meaningful version of the source code in a more complex workspace setup without running into cyclic dependency issues. Consider the following setup: You have a proc macro crate and a main crate. The proc macro crate provides some macros that are designed to be used with the main crate and relay on the main crate being available. A common solution here is to reexport the proc macro crate through the main crate, which adds the dependency |
do I understand correctly that this issue is meant as a sort of high-level "this is roughly what we have in mind" discussion? I'm asking because this issue mentions a lot of dedicated things and I fear that if we discuss these all in one issue we might lose track quite easily. I'm wondering if we should split this issue up into multiple issues that can be discussed independently and with less side-tracking. Here are a couple of my random thoughts:
|
I love the idea of having checks and reporting them. Its something I talked to some folks about this idea over a year ago when it comes to things like However, should we generalize these checks rather than exclusively doing them as security, on a security focused page? An example of a general check is future-incompat to identify crates that won't work with modern Rust. I think within the Python community at one point, they discussed having basic quality checks on packages. I don't remember what happened to that. And with all that said, a high level concern i have is with how we present any of this. If we sound too definitive, people might put too much trust in all checks being green. On the other, I have concern with how the wider Rust community has sometimes taken metrics like |
I'll catch up on the rest of this shortly, but repeating something I said in the crates.io meeting just now:
Yes, and I probably didn't do a good enough job of explaining this in my already over-long first post. My aim in writing this is mostly to lay out "hey, we think we have several months of work that we want to do, plus a longer term commitment around maintaining this, and since lots of it is going to integrate with crates.io, here is how we want to start on it and some context on where we're going" — primarily so that we can get high level feedback early in the process. (For example, if the crates.io team1 decided not to accept any of this work, we'd obviously want to know that ASAP, before committing significant resources into this.) Once we've got feedback in here (and what we've got so far is great; thank you all!), then the next step is going to be for me to break this up into concrete issues and PRs that incorporate that feedback and then work on it from there. I expect there'll be a project board to track all of that as well. Footnotes
|
It's great to see the community working to improve the state of knowledge about crates in general. Internally at Microsoft we are working on a system to enable persistent recording of human opinions about crates (both public and private), to help collect social knowledge and to flag outlier dependencies needing more scrutiny. My main architectural concern here is the tension between the notion of not publishing a crate until it has been checked, and the notion of an arbitrarily increasing number of extensible checks. I suggest instead that it may be better to use a more "microservice" model, where crate publishing happens prior to and separately from all the various crate checks, and the crate checks are semantically all For this reason, we probably also don't want to think in terms of any of these checks going into the crate repository, because that would create an increasing load on the crate repo as we increase the number of checks. (It's unfortunate because the crate repo is nicely immutable, but one could imagine a separate git repo that includes only crate-repo version IDs indexed against check results, so you'd have full provenance/history for check results.) It also may be useful to look at this proposal in terms of all known supply chain attacks, and how they each could or couldn't be covered by a possible check. I'm part way through this useful overview of the space: https://arxiv.org/abs/2304.05200 Some other checks I would personally want, that are security-adjacent at least, would include:
Actually looking at these, these are arguably more like metrics than checks. Because in practice, various organizations will have their own level of scrutiny or diligence over metrics like these. Which to me implies that we should be conservative in what checks we implement in crates.io itself, because they have no room for individual interpretation -- it's only crates.io's interpretation that matters. One specific example is we should probably not have a check based on cargo crev/vet records. There are many potential sources of such records, and many potential trust relationships that may or may not exist between review sources and crate consumers, so implementing a check that everyone thought was useful might be harder than just having people use that ecosystem as designed without really involving crates.io at all. So my highest-level feedback is:
Thanks again for going down this path. Edit: One other category of checks is "does this crate depend on any filesystem / networking / etc. types?" Or more generally "does this crate do any I/O?" Yoshua Wuyts has been considering capability security for Rust; being able to assess the functionalities of crates (gaining confidence that a crate version that claims to be purely algorithmic actually is, for instance) would be a potentially great way to prevent pushing versions that inject e.g. bitcoin miners or data exfiltrators. Of course it's much harder in crates that have a good reason for doing I/O, but at that point other capability/sandboxing techniques could apply, along with checks or metrics for whether particular crates are compatible with those techniques. Double edit: Really the key semantic question is "when does cargo fetch decide that a new version can be pulled?" Right now that is a binary decision based on whether it has been published. This whole conversation clarifies that mere publishing is only the start; one could imagine wanting to update to a new version only once it has been published and all checks/metrics run (not just the core pre-publishing ones) and particular organization-specific checks have passed. One could imagine organizations setting up proxy registries that mirror crates.io but that apply their own checking before exposing new crate versions, potentially hiding particular versions indefinitely if they don't pass key org-specific checks. |
I would not be surprised if several of these subcomponents require an RFC. Each team gets to define exactly when an RFC is required, and I am not on the crates.io team. But a stable interface that third-party plug-ins are expected to interact with should probably go through public discussion of its design in some form. Similarly, any implied official endorsement of crates should probably also have a community discussion, if not of all the details that at least how decisions will be made about changing those details. As I recall, there was an RFC for changing the algorithm for crates search. |
Yeah, skimming this issue, it seems like most of the things would benefit going through the RFC process. |
Having a "Security/Audit/Checks" tab makes sense to me. I'd expect the contents of it to change over time as we learn more about what checks are most relevant. I'd like to see this proposal broken down into smaller pieces. The provenance checking alone could have its own RFC. NPM is currently starting something similar with package provenance based on proving that the package was published by a CI job with a verifiable OIDC token. The CI job that published the package is then linked to the package. Linking to a CI job rather than a git tag would help mitigate concerns such as the git tag changing after verification, and |
Since it wasn't explicitly mentioned, I wanted to add that surfacing the presence of known security vulnerabilities would be very valuable. This is something libs.rs already does, using the https://rustsec.org/ database maintained by an official Rust WG. Other potential data sources include GHSA (they import RustSec data) and OSV (they import both GHSA and RustSec), but IMO it should just use one data source that interoperates with others. |
I want to reiterate on @tofay 's point above. Please use the OpenSSF Scorecard - https://github.com/ossf/scorecard (and https://github.com/readme/guides/software-supply-chain-security), it has a growing list of security checks providing all the security information you need, including the OSV/GHSA vulnerability scans using OSV-Scanner supported in the too. For malicious package activity, please check out https://openssf.org/blog/2022/04/28/introducing-package-analysis-scanning-open-source-packages-for-malicious-behavior/ |
Also, SLSA compliance for crates will be great to surface in UI, check out https://openssf.org/press-release/2023/04/19/openssf-announces-slsa-version-1-0-release/ |
Also, check out how deps.dev surface this information about various crates, e.g. https://deps.dev/cargo/kubernetes. This is a free community service, so you can even reuse links or fetch data via deps.dev api - https://security.googleblog.com/2023/04/announcing-depsdev-api-critical.html |
+1 on leveraging existing tooling like Scorecard that already supports a lot of the use cases above, such as the presence of security policy. (We briefly chatted about this with Joel at the start of the year). Scorecard cron results are used by teams like CNCF and nodejs to monitor their repos. it's also used by pkg.go.dev website to show results for packages - see an example on the right there is a link "Open Source Insights" below "Details". The scorecard team is actively working with the "Open Source Insights" (aka deps.dev) to improve the UX. Here is the UX used for Scorecard badges today. Scorecard does not have ecosystem-specific checks yet, but it's something the team would be happy to add if needed. Let me know how we can help! |
GitHub organizations can have a single For example some (Java) Google projects are doing this: https://github.com/google/guava/security uses the Though I am not sure how widely used this feature is; I only stumbled upon it by accident. And also it can be confusing for users who explicitly look for a |
Background
As part of the Rust Foundation's security initiative, we'd like to surface information related to crate security more prominently within crates.io. Our initial focus is on supply chain security, so surfacing information relevant to provenance is key, but we would also like to rapidly start surfacing information relevant to the security of individual crate versions as well.
Items below that are related to open questions are linked and italicised.
tl;drexecutive summaryWe would like to add a new tab to the crate1 page that surfaces security information relevant to the crate in an easy to digest form. In the initial version, this will include two things: the result of any checks run on the crate (the most recent version, in the case of the unversioned crate version), and — if present — the security policy in the
SECURITY.md
2 file in the repo.The initial check that would be added is almost certainly a provenance check3, with other checks4 based on static analysis of the repository and code to be added in short order.
Pictures
These are static mockups for now, although I'm going to wire up a prototype PR Real Soon Now™5 to better explore this space.
(Obligatory disclaimer: I am not, nor do I pretend to be, a designer. I intend to iterate further on this; this is mostly just to show the shape of what I'm thinking, rather than the specifics of the design around typography, spacing, colours, etc.)
Crate nav bar
The styling is probably too dramatic, particularly in the failed check case, but you get the general idea. (And the "icons" are just Unicode right now, but would probably be replaced with a more appropriate SVG in due course.)
Security tab
I used ChatGPT to create a "plausible sounding security policy" for this screenshot. All errors are therefore… uh, someone else's. Promise.
(And, yes, the padding appears to be off just because of how Firefox chose to screenshot the relevant element.)
Security policies
We would discover the security policy using the same general heuristics as GitHub uses, with one addition to handle repositories with multiple crates:
SECURITY.md
file at the same level asCargo.toml
?docs
(relative to the repository root)?.github
(relative to the repository root)?This will obviously require Cargo assistance for anything beyond the first point, since theose files may not be in the
.crate
tarball.We may also want to add a new
Cargo.toml
field, similar to the existingreadme
field.Checks
The idea here is much like general CI checks in GitHub, GitLab, or your favourite code host. Each check will represent a single pass/fail/skipped check for a specific crate version, with some level of content shown in the UI to indicate the result in more detail and contextualise what the check is actually showing.
As mentioned above, the initial check this would roll out with would be a provenance check for the source code in the crate tarball.
It is anticipated that the initial work here would only involve checks run on a global basis. These checks would be facilitated (and, initially, developed) by the Rust Foundation in conjunction with the crates.io team. Over time, I expect this would open up to allow other collaborators within the Rust ecosystem to propose and implement other checks that make sense within the ecosystem.
It is also possible in the future that these checks may feed into a quarantine system where crates that fail key, high fidelity security checks require human review before being published. That is not in the scope of this proposal, however.
An open question here is whether crate authors should be able to define their own, crate-specific checks.
Operation
Without going into great detail just yet on the schema or API calls, here's how I anticipate this would work:
Open questions
Should we allow crate authors to include their own checks?
As discussed in the checks section, an additional possibility here would be to allow crate authors to define crate-specific checks they want to run when their crate is published, and then surface those results alongside the security checks that are run over all published crates.
What does this tab look like?
"Security" would be the obvious label, but if we allow non-security checks as well, then either a more generic name or two tabs might make more sense.
How do we discover the security policy?
I think the heuristic in the security policies section is probably uncontroversial.
Whether we should add a new
Cargo.toml
field is a different question.8 Because this is a relatively new "standard" from GitHub, there isn't really the diversity of possible locations that there is with READMEs — we can reasonably assume the file will be calledSECURITY.md
, and that it will be in Markdown format. However, I do wonder if there are organisations that would prefer to link out to a single security policy, rather than adding it to each repository.My gut feeling is that a new field may make sense, but only if it also allows for URLs to be specified, with some sort of handling in the crates.io UI to then provide a useful looking link (presumably augmented with Open Graph metadata where available). Whether this is better than simply having a
SECURITY.md
in the repository that points to the external policy is unclear to me.We need you!
Nothing in here is set in stone, or closed to discussion. I genuinely want your feedback!
Ideally, I'd like to have feedback in within the next couple of weeks (so by early-to-mid May) — I intend to do some prototyping work in parallel with the feedback session to continue to explore this space, but obviously no concrete decisions will be made until after that. I'm also more focused on the admin console work in #6353, since that's a higher priority for me right now, so there isn't a tonne of urgency on this just at the moment.
Nevertheless, I would love to hear from you!
Footnotes
For simplicity, I'm mostly talking in terms of crates in the UI, but I'm aware that this will also extend to crate version specific routes. Consider "crate" shorthand for both unless there's an explicit callout that something's different on a version route. ↩
This is a GitHub initiative, announced back in 2019, to make it easier for authors to provide security policies for their projects in a standard place. ↩
Or, put in plain English, does the content of the
.crate
file match what was tagged in the source repository? ↩This is a significant focus area for @walterhpearce in the near term, so I would expect the initial batch of follow up checks would likely come from the Foundation. ↩
I'm also working on Prototype admin console #6353 simultaneously, and that's higher priority, so take this more as an expression of intention than a rock solid commitment. ↩
Initially, I expect this will invoke a service or lambda in Rust Foundation managed infrastructure than can then fan out to whatever checks are available. ↩
There's complexity that we definitely don't need to reproduce here, but a cut down version of the
output
field in GitHub check results is probably pretty close to what I'm thinking. ↩My thanks to @epage for pointing this out. ↩
The text was updated successfully, but these errors were encountered: