Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pathmap for cross-machine determinism in PDBs #109

Closed
wants to merge 7 commits into from
Closed

Use pathmap for cross-machine determinism in PDBs #109

wants to merge 7 commits into from

Conversation

jrbeverly
Copy link
Contributor

The source path for files in embedded into the '.pdb' file that is created as part of the build process. You can see this in hello.pdb when building the basic example . If we want to have deterministic builds, these paths cannot be absolute and linking into the bazel sandbox.

Here is what a (cleaned up) embedded path looks like:

root/.cache/bazel/_bazel_root/1a06c90be1eefea8b933cc8429eff806/sandbox/processwrapper-sandbox/28/execroot__main__/basic/hello.cs

This PR sets the /pathmap property on compilation, using variable substitution in the dotnet wrapper to provide context of the path location. This requires parsing all of the arguments and params file to make the necessary variable substitutions.

The source path is replaced with a constant prefix based on the default path separator (/ \\).

Resolves #13

@jrbeverly jrbeverly added the enhancement New feature or request label Dec 10, 2019
@jrbeverly jrbeverly self-assigned this Dec 10, 2019
*
* @return Full path of the current working directory. Empty if cannot be set.
*/
std::string PWD() {
Copy link
Member

@j3parker j3parker Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add -std=c++17 to the compile flags: https://en.cppreference.com/w/cpp/filesystem/current_path

# These are of the form: __BAZEL_{name}_

# pathmap
args.add("/pathmap:%s=%s" % ("__BAZEL_SANDBOX__", "__BAZEL_WORKSPACE__"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pass this arg at all (and why the string formatting?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this line with the idea in mind that we may need multiple /pathmap statements, which would be something like: args.add_all(pathmaps, map_each = _format_pathmap_arg). Although I have checked with our examples, I'll likely take a look at a couple more cases to see if we need anymore statements.

The processor variables (__BAZEL_<>_) are so the wrapper can substitute context variables (e.g. filesystem paths) into the command line parameters.

The strings are constant, and resolve to the following: /pathmap:__BAZEL_SANDBOX__=__BAZEL_WORKSPACE__.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I can't think of a scenario where we'd pass multiple /pathmap. It's something that dotnet.exe supports but I don't think we'd have a use-case (for us we just need determinism, and one pathmap should be sufficient for that).

* @param subst Substitutions to make when replacing keywords.
* @return true if the substitutions were made; false otherwise.
*/
bool ResolveParamsFile(std::string file, SubstitutionMap subst) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so if we don't pass /pathmap in from Bazel we could just append it to argv rather than reaching into the file :)

There is the spawnv limit but since Windows paths are limited anyway it seems like it'd be guaranteed to fit along with the rest of our argv.

Copy link
Contributor Author

@jrbeverly jrbeverly Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean using an environment variable as an action switch (BAZEL_ACTION=ASSEMBLY_BUILD), which would conditionally append /pathmap to the arguments?

Depending on the implementation of #10 & #72 , we may be making use of other commands of dotnet (publish / build /. etc), which wouldn't support the /pathmap argument. So to append that argument, the wrapper would need to understand the command that is being executed.

I thought using variable substitutions would allow the least amount of friction in this case, as it wouldn't require the wrapper to understand the commands we executed, just how to do ${MY_VAR} in a cross-platform way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an addon to the above, the ideas that sort of pushed me towards using variable substitution were as follows:

  • I don't want the dotnet wrapper to try and understand the commands I am executing
  • I want to scrub away context dotnet requires in the build process (e.g. paths)
  • I want all the configuration about options/parameters/commands to remain in bazel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean using an environment variable as an action switch (BAZEL_ACTION=ASSEMBLY_BUILD), which would conditionally append /pathmap to the arguments?

I'd prefer it if we just did it always for now, rather than conditionally.

Depending on the implementation of #10 & #72 , we may be making use of other commands of dotnet (publish / build /. etc), which wouldn't support the /pathmap argument. So to append that argument, the wrapper would need to understand the command that is being executed.

Ok, let's cross that bridge when it comes to it. We can always have a single boolean switch that we pass to the wrapper if we need to, rather than the substitution thing. But let's not do anything prematurely.

I thought using variable substitutions would allow the least amount of friction in this case, as it wouldn't require the wrapper to understand the commands we executed, just how to do ${MY_VAR} in a cross-platform way.

I think it's fine for the wrapper to understand this. It's all in the same repo and the wrapper is useless outside of this repo. Let's just keep it simple while we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are at that bridge with the current proposed implementation for compilation of resource files (see #98). The internals of that PR are using dotnet build <csproj> to compile resource files using the dotnet executable. This is the case I was describing above.

};

// variable substitution of inputs + params file
for (int i = 1; i < argc; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll lose a lot of loops by just appending /pathmap rather than searching for it. It looks like we'd also support multiple of these kind of arguments but the motivation for that isn't clear to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't expect to support more variables, just those that help scrub away baked in context like paths.

Copy link
Member

@j3parker j3parker Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all I'm suggesting is to replace this with "/pathmap:" + workspaceDir + "=" + dirSeparator (rather than passing a format string down through Starlark and then hunting for it in the args and param file... and obscuring what this stuff is for).

The only extra thing the exe knows about is how /pathmap is formatted which seems perfectly fine: the job of the wrapper is to execute dotnet.exe inside Bazel and that's just part of it.

But it's less code and it makes the wrapper less confusing (we're supporting a lot of extra unnecessary functionality here).

@jrbeverly jrbeverly marked this pull request as ready for review February 3, 2020 21:01
@jrbeverly jrbeverly closed this Feb 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use pathmap for cross-machine determinism
2 participants