-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[discussion] POSIX Environment Variables #822
Comments
Currently we are setting the |
+1 to removing it. Does https://github.com/slsa-framework/slsa-github-generator/generic@v1 have a defined spec yet? Just making that clear in the spec would be useful. |
We don't have a "spec", just examples https://github.com/slsa-framework/slsa-github-generator/blob/main/internal/builders/generic/README.md#provenance-format For go builder we have some explanation in https://github.com/slsa-framework/slsa-github-generator/tree/main/internal/builders/go#buildconfig-format, but it's probably not enough to explain what is recorded, what is not, and maybe we should take the view of a re-builder? Great idea to have a spec, thanks! All the builders will have the same output, so it should be easy. What would you like to see captured in the specs? |
I think it's dangerous to start arbitrarily removing environment variables from provenance just because they're redundant on a specific platform (or set of platforms). What about build steps that run on Windows where PWD isn't a thing, and might validly be used for some user-defined purpose by such a build step? |
Note that we have not released our builders with I think you also need more context: we only record env variables the user asked us to set, and we don't record those that exist "by default" ( Recording all env variables is something we could do. In general we would not want to capture secrets, but in our case we don't have any. So the bigger question probably is: what should ideally be recorded? Everything, or those explicitly set by the user for the build? |
I think anything that could potentially affect the output of the build in any way needs to be recorded, e.g. PATH, LANG, LC_* could all have subtle effects on the output artifact |
I tend to agree with @shaunmlowry. I think we should set Obviously we separately endeavor to not include secrets in the recorded environment variables but these variables are those set when the command is executed (any set by the build command itself aren't visible to provenance generation. You'd have to include secrets in your environment set in your goreleaser config or something like that for it to be potentially included here. However, we could sanitize obvious ones like |
@shaunmlowry brings up a good point. I'm ok with some redundancy, especially in cases where someone might be doing something strange to normally read-only env vars. @ianlewis I'm not super familiar with how the reusable workflows work, but could someone configure it with a list of environment variables to not include in the output? Give it to the user what should/shouldn't be included in provenance. |
I would like to challenge the assumption that "logging everything" is better, and that "logging anything that affect the build" is better. Below are thought experiments with use cases of provenance: Re-builderIn order to re-build a project, a re-builder only needs the necessary steps the user provided. Default variables set by the OS, bash, etc are not needed. In fact, they are a source of noise: which env variables shold the builder need to set? Which are necessary? A complete list of env variables complicates a re-builder's job. Why the need to set Reproducible buildsThis is a special case of the above, where a re-builder wants to verify a reproducible builds. A build pipeline whose output differs based on arbitrary variables set by the OS can never be reproducible. So in this scenario, the full list of env variables does not provide obvious benefits and seems like noise. There are various "types" of reproducibility, but I don't know any of them account for arbitrary variables set by the OS. I'm not sure we should care either. I know some toolchains (cmake) let users tweak their build steps based on the machine they are built on, but arguably this is something that needs to be tackled by keeping the "machine image" around, like proposed in #23, and using the exact same machine to re-build. Incidence responseThis is the only scenario where more seems better. However, I doubt that catching a compromised machine thru its env variables is reliable. In practice, it requires list of files and their hashes, a snapshot of running process, etc. So I would argue that the benefit of reporting all env variables in this scenario is marginal, and not needed. I think a better way to tackle this problem is to keep a copy of the machine (tarball, container image) for ever, like SLSA prescribes for source requirement. Maybe this is something that should be added as a level 4 requirement for builders? A complimentary solution is to keep a log of the build slsa-framework/slsa#401, which could also come as a level 4 build requirement. So in general, I feel like reporting all env variables has only marginal benefits, and comes with additional complications for re-builders. There may be more nuances to what I describes above: maybe it depends on ecosystem / programming language, etc. We could even think of having the best of both worlds by having 2 types of env variables we report: provided by user, all on the machine. Let's keep this conversation going. I think it's important and how to report building steps is not documented anywhere. |
I agree that, while I argued for it's inclusion,
This is possible. Though the inputs to a workflow aren't structured (only strings, integers, etc.) maybe it could be supported via a config file that is included in the repo. I'm not sure we're at the point of considering this option yet though. |
This is effectively addressed by #1321. Please see that PR for a description of the changes. |
Background:
For reproducibility, build steps are included in provenance
buildConfig
by some of our workflows. These build steps include information to reproduce the command, such as the working directory and environment variables.Our build steps currently look like this:
POSIX defines some environment variables which mirror values set elsewhere. These include the
PWD
,USER
, andHOME
environment variables. Some programs may use the environment variables as a way to determine the current working directory current user for example. For that reason, we want our workflows to set some of these variables so that more builders, custom scripts, etc. will run properly.Some values, such as the current working directory, are set in a separate field from the environment variables and thus the information in the environment variables would be redundant.
Question
Should these environment variables be included in the build steps in the provenance? If so, which ones?
The text was updated successfully, but these errors were encountered: