Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CC toolchain to use in-tree builtin includes #23615

Closed
wants to merge 2 commits into from

Conversation

kellyma2
Copy link

If the toolchain is composed in our build tree and we use an internal (relative) path for the builtin_sysroot that we pass to create_cc_toolchain_config_info, and we pass a value like %sysroot%/usr/include in cxx_builtin_include_directories then we will fail with missing dependency declarations.

Instead of only using the absolute paths for builtin includes, we can use all of them and do an additional matching check after we relativized the paths to the individual include items.

@github-actions github-actions bot added team-Rules-CPP Issues for C++ rules awaiting-review PR is awaiting review from an assigned reviewer labels Sep 13, 2024
If the toolchain is composed in our build tree and we use an
internal (relative) path for the `builtin_sysroot` that we pass to
`create_cc_toolchain_config_info`, and we pass a value like
`%sysroot%/usr/include` in `cxx_builtin_include_directories` then we
will fail with missing dependency declarations.

Instead of only using the absolute paths for builtin includes, we can
use all of them and do an additional matching check after we
relativized the paths to the individual include items.
@thomasvl thomasvl removed the request for review from a team September 13, 2024 15:57
@kellyma2
Copy link
Author

kellyma2 commented Oct 4, 2024

@lberki or @oquenchil any opinions on this?

@Wyverald Wyverald requested review from comius, trybka and pzembrod and removed request for lberki and oquenchil October 10, 2024 20:08
@pzembrod
Copy link
Contributor

Do I understand correctly: The key effect of this change is that all CcToolchain built_in_include_directories entries would be treated the same, regardless of whether they are absolute or relative paths?

@kellyma2
Copy link
Author

Yes, I believe so. In my case, specifically, I'm using a toolchain that we're constructing as part of our build, so the built-in includes show up in bazel-out

@trybka
Copy link
Contributor

trybka commented Oct 29, 2024

Sorry for the naïve question, but do you have a concrete example or sample toolchain? It seems quite odd to me to have output directories like bazel-out be present in include directories -- having outputs like this appear as inputs but with paths constructed outside of bazel seems potentially problematic, but perhaps I'm missing something simple. (Is this a two stage build? How are you deriving these paths?)

I am trying to dig into the code in question to better understand why this requirement exists in the first place...

@trybka
Copy link
Contributor

trybka commented Oct 29, 2024

I'm also wondering if something other than %sysroot% as a prefix would work better (e.g. %workspace%)

More info on the include dir prefixes here: https://cs.opensource.google/bazel/bazel/+/master:src/main/starlark/builtins_bzl/common/cc/cc_toolchain_provider_helper.bzl;l=103-122;drc=f2318d2cbf38d249d620dc99a11570843306a6fa

@kellyma2
Copy link
Author

Sorry for the naïve question, but do you have a concrete example or sample toolchain? It seems quite odd to me to have output directories like bazel-out be present in include directories -- having outputs like this appear as inputs but with paths constructed outside of bazel seems potentially problematic, but perhaps I'm missing something simple. (Is this a two stage build? How are you deriving these paths?)

tl;dr we're driving an internal hermetic toolchain that's derived from the contents of RPMs. It'll be hard to distill down into some thing trivial, but let me attempt to at least describe the details.

Longer version:

  1. We have a module extension that consumes the RPMs and synthesizes an internal repository for them, generates filegroup(), cc_library(), cc_import(), etc rules using the RPM inputs

  2. We have a separate module extension that can be used to generate the toolchain:

  • we define a sysroot using the RPM-import repository generated rules as inputs in the form of:
ext.sysroot(
   name="foo_sysroot",
   srcs = [
      ... a bunch of filegroups that specified the files from the RPMs ...
   ],
)

which ultimately synthesizes a repository to capture the sysroot

  • we define a toolchain where two of the things specified are (1) the sysroot, (2) the list of builtin include paths relative to the sysroot in the following form:
ext.toolchain(
  name = "bar_toolchain",
  sysroot = "@foo_sysroot//:sysroot",
  builtin_includes = ["%sysroot%/usr/include"],
)

The builtin_includes value is just getting plumbed directly into cxx_builtin_include_directories for create_cc_toolchain_config (via some indirection) and the path to @foo_sysroot//:sysroot gets shovelled in builtin_sysroot (which is ultimately what populates %sysroot%)

The /usr/include part of the builtin_includes is derived from the fact that we know from our RPM inputs that we stuck the headers that we're interested in into usr/include because we literally just explode the RPM file(s) into a tree somewhere. The relative path part of things comes from @foo_sysroot//:sysroot.

I am trying to dig into the code in question to better understand why this requirement exists in the first place...
I'm not clear on this either. The commit log didn't render me any particular insights and I'm still gaining familiarity with the bazel codebase.

@kellyma2
Copy link
Author

I'm also wondering if something other than %sysroot% as a prefix would work better (e.g. %workspace%)

The issue appears to be that it doesn't like relative paths which %workspace% is also going to give us.

@comius
Copy link
Contributor

comius commented Oct 30, 2024

cc @armandomontanez @matts1

@armandomontanez
Copy link

Backing up a bit to the most likely culprit just to double check: have you passed -no-canonical-prefixes (gcc/clang) and -fno-canonical-system-headers (gcc only) as compile flags in your toolchain, also making sure the headers are properly added to all_files and compiler_files? Once you've done that, you usually only need to allowlist truly absolute paths (e.g. /usr/include).

@armandomontanez
Copy link

Separately, I need to stare at this change longer to determine whether or not it's correct but at first glance I think it is.
Before landing this, I'd like to stamp out a minimal reproducer that we can check against (should be pretty easy with the new rule based toolchains). It just happens that in the vast majority of cases the root cause is actually forgetting to use those flags. Some compilers do not support or respect those -no-canonical-prefixes flags, so you do need an escape hatch like this to work with in-tree paths.

@comius
Copy link
Contributor

comius commented Nov 20, 2024

@kellyma2 Have you checked the flags?

@comius comius added awaiting-user-response Awaiting a response from the author and removed awaiting-review PR is awaiting review from an assigned reviewer labels Nov 20, 2024
@kellyma2
Copy link
Author

Backing up a bit to the most likely culprit just to double check: have you passed -no-canonical-prefixes (gcc/clang) and -fno-canonical-system-headers (gcc only) as compile flags in your toolchain, also making sure the headers are properly added to all_files and compiler_files? Once you've done that, you usually only need to allowlist truly absolute paths (e.g. /usr/include).

The files are in all_files and compiler_files. A quick test with --copt fails - -no-canonical-prefixes causes gcc to not be able to find cc1plus and -fno-canonical-system-headers appears to do nothing helpful in this case. ie: it still complains about missing dependency declarations for all of the toolchain inputs. eg:

  'bazel-out/k8-fastbuild/bin/external/_main~cc_toolchain~default_toolchain/sysroot/usr/include/stdc-predef.h'
  'bazel-out/k8-fastbuild/bin/external/_main~cc_toolchain~default_toolchain/sysroot/usr/include/bits/wordsize.h'
  'bazel-out/k8-fastbuild/bin/external/_main~cc_toolchain~default_toolchain/sysroot/usr/include/features.h'

This is pretty much unavoidable, I think, because I'm extracting the toolchain components as part of the build.

@kellyma2
Copy link
Author

@kellyma2 Have you checked the flags?

Yes. Thanks for the followup.

@armandomontanez
Copy link

Related: #4605

@armandomontanez
Copy link

I've spent a bit of time unpacking this, and this looks like an interesting case where:

  • -no-canonical-prefixes is NOT in use.
  • -isystempath/to/toolchain/foo/include is manually passed to the toolchain.
  • path/to/toolchain/foo/include is listed in cxx_builtin_include_directories

So to unwind the stack:

  • When -no-canonical-prefixes is passed, all of this is a non-issue.
  • -isystempath/to/toolchain/foo/include causes the headers to be found via a relative path, which is picked up by parsing the .d make dependency file.
  • This patch makes cxx_builtin_include_directories match against those paths, except in a way that appears to specifically silence errors specifically emitted by unresolvablePathProblems.

The last bullet here is where I'm not confident this is what we want to do. unresolvablePathProblems is warning that the files that your toolchain is referencing aren't enumerated as files via the same paths they're included by, which suggests a sandbox leak. If all_files and compiler_files adds the files to the sandbox via the same relative path, in theory this error shouldn't ever pop up.

@kellyma2
Copy link
Author

After some back and forth with @armandomontanez we determined a few things:

(1) we had pieces of our toolchain (still) leaking in from the outside world due to ...
(2) ... not using the canonical-prefixes related arguments resulting in ...
(3) includes coming from somewhere outside of the sysroot

Once we solved that and the things that broke as a result it's no longer necessary to use cxx_builtin_include_directories to talk about those includes coming in which renders this change redundant.

@kellyma2 kellyma2 closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-response Awaiting a response from the author team-Rules-CPP Issues for C++ rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants