Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal] Adding first-party sources compilation, linking, packaging, and running to the CC backend #16424

Closed

Conversation

sureshjoshi
Copy link
Member

@sureshjoshi sureshjoshi commented Aug 5, 2022

This draft PR is not even remotely close to merge-ready. It's been created to solicit feedback on best practices for dependency inference and internal API usage from the Pants maintainers, so I don't go too far with this backend while being completely in a bubble.

The desired scope of this PR is strictly compilation (without linking) on first-party code which occurs via the check goal. This code also allows discovering a Clang or GCC compiler already installed on the host system (no compilers are downloaded). Turns out I just kept typing past what the expected scope was intended to be.

This PR attempts to cover the main C/C++ first-party sources compilation and "build" workflow.

Specifically out-of-scope:

  • cc_test targets
  • cc_library as a dependency to any cc target
  • Third-party code of any shape

This code is being developed/tested on MacOS (x86_64) only, but will be tested on Linux when it's ready for a PR. Currently it's being tested using ./pants_from_sources fmt lint check examples/cc:: on the files here: https://github.com/sureshjoshi/pants-plugins/tree/76-cpp-compilation/examples/cc

[ci skip-rust]
[ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…al includes

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
…n't particularly like how it looks

- Feels like I'm missing something obvious

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
… but still pretty bad

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]


@rule_helper
async def _subdirectory_mapping(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the concept of SourceRoot + "include", which will become compiler include directories, but it lets us infer locations like core/foo.h located in: core/include/core/foo.h - which is a super standard way to locate public headers.

Those headers located in the src directory, next to the .c files are usually private to that library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not particularly happy with the code here - feels like there should be a more elegant Pythonic solution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sureshjoshi
Copy link
Member Author

@tdyas We've discussed selecting the C or C++ compiler based on file extensions, but we would also need a workflow around compiling .c files using a C++ compiler (which I do in every non-embedded use case).

I see two options for that:

  1. Allow a field or something to specify compiler per (lib/binary/target)
  2. Split out CC into C and CPP targets

Thoughts? Am I missing an easier solution?


# Finally, check the alternative source roots
logger.debug(f"Checking alternatives for {include.path}")
subdir = cc_files_mapping.subdirectory_mapping.get(include.path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally in other areas of Pants, multiple sources for a match would be put together first and then checked for ambiguity. For my education, what does it mean for an include to be processed this way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the last case looking for dependencies, which wasn't handled alternatively.

This covers the cases where there is an include directory, and the headers are labeled something like core/foo.h - which aren't system headers, not relative headers, but the "other" headers (no idea what the technical term for this is - I would have called them namespaced headers, except namespace is its own thing).

for field_set in source_file_field_sets
)

# TODO: Should we pass targets? Or field sets? Or single source files?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or a union that maps extension(s) to a field set, which can then be used to request compilation.

src/python/pants/backend/cc/subsystems/toolchain.py Outdated Show resolved Hide resolved


@rule_helper
async def _executable_path(binary_names: Iterable[str], search_paths: Iterable[str]) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return the BinaryPath found instead of just the path str? E.g., would you want to make use of the fingerprint of the found tool to be able to invalidate artifacts built with the tool? (For example, by putting the fingerprint into any Process using that tool via a dummy environment variable. See the "tool ID" code in the Go backend.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, good call - that's probably a better approach. Changing GCC/Clang should invalidate anything built with the previous versions.

src/python/pants/backend/cc/target_types.py Show resolved Hide resolved
src/python/pants/backend/cc/util_rules/compile.py Outdated Show resolved Hide resolved
@tdyas
Copy link
Contributor

tdyas commented Aug 8, 2022

@tdyas We've discussed selecting the C or C++ compiler based on file extensions, but we would also need a workflow around compiling .c files using a C++ compiler (which I do in every non-embedded use case).

I see two options for that:

1. Allow a field or something to specify compiler per (lib/binary/target)

2. Split out CC into C and CPP targets

Thoughts? Am I missing an easier solution?

Just have the user specify C++ tool names for the C-related tool names option?

Re (2), this would also imply creation of Objective-C and other targets per every language that can be handled by the GCC/LLVM toolchains.

@sureshjoshi
Copy link
Member Author

sureshjoshi commented Aug 8, 2022

Just have the user specify C++ tool names for the C-related tool names option?
Re (2), this would also imply creation of Objective-C and other targets per every language that can be handled by the GCC/LLVM toolchains.

Wouldn't we have that anyways?

swift_sources, obj_c_sources, rust_sources?

My thought was splitting c_sources and cpp_sources instead of just cc_sources. So, the default filename extensions for cpp_sources is a superset of c_sources + additional ones.

EDIT: In hindsight - I'm not sure if splitting c and cpp would help, since they're just generators which create a single file that has to be handled in any case. ... So confused... This is wrinkling my brain...

@tdyas
Copy link
Contributor

tdyas commented Aug 10, 2022

Just have the user specify C++ tool names for the C-related tool names option?
Re (2), this would also imply creation of Objective-C and other targets per every language that can be handled by the GCC/LLVM toolchains.

Wouldn't we have that anyways?

swift_sources, obj_c_sources, rust_sources?

My thought was splitting c_sources and cpp_sources instead of just cc_sources. So, the default filename extensions for cpp_sources is a superset of c_sources + additional ones.

At least for languages handled by essentially the same toolchain, e.g., C/C++/Obj-C, it seems redundant to me for directories with different source files to end up with multiple targets if they were going to end up in the same library. At least for Rust, that's a whole different toolchain from gccclang so it makes sense to me there to mark it differently. (Also, as an aside, Rust is also different enough to model the targets not as "rust_sources" as per my Rust proposal here.) And I agree, no need to solve now, can leave it for further design discussions.

EDIT: In hindsight - I'm not sure if splitting c and cpp would help, since they're just generators which create a single file that has to be handled in any case. ... So confused... This is wrinkling my brain...

Same here. I'm not sure of what the "right" decision here is. Ideally we'd have no explicit targets but that is not what Pants supports currently. The choice is somewhat an arbitrary judgment call with no clear "right" answer.

@sureshjoshi
Copy link
Member Author

In thinking about the compiler vs extension problem, while thinking about it further and reviewing my repos - I think it's perfectly reasonable to default to the expected compilers, but provide an escape hatch via an optional target flag, or maybe even a subsystem option.

Maybe via a language field in the sources target - and then we could use the alternate compiler for that.

I thought we might run into some linkage problems down the road, but no, so long as headers are correctly externd, then C++ won't mangle names, and C/C++ compiled objects are safe to use in whatever permutations we want.

- Split out C and CPP extensions into variables
- Changed CompileCCSourceRequest to take a field_set
- Cleaned up some help docs
- Detecting C vs C++ and assigning appropriate toolchain executable
- Hardcoding the object name for the moment
- Hardcoding the system linker flags
- Not passing through user-generated linker flags
- Refers to only a C or C++ toolchain, not both
- Comes populated with defines, flags (from subsystem)
- Can be acquired by specifying the language, or inferred from target filename

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@sureshjoshi
Copy link
Member Author

If you add a compiler field, then you would inspect the transitive closure for the compiler field to see if C++ was ever used. If so, use C++ linker.

And a reminder that for the compiler field, I'd recommend something like an auto setting that is based on file extension. Then users can override it if they want.

Not trying to be dense here, but if this was set to "auto" for all sources, then how would we infer/determine if C++ was used from this field?

@Eric-Arellano
Copy link
Contributor

Eric-Arellano commented Aug 25, 2022

Not trying to be dense here, but if this was set to "auto" for all sources, then how would we infer/determine if C++ was used from this field?

"auto" means "base it off of file extension". But users can override it to the compiler they want. That is all abstracted away from the plugin code, which will do something like CompilerField.normalized_value(CCSourcesField) to get back the compiler value. Do that for the whole transitive closure. If you encounter C++ anywhere, then use that for the linker. Else C. Does that make sense?

@sureshjoshi
Copy link
Member Author

I guess I'm having a hard time envisioning where CompilerField.normalized_value(CCSourcesField) would get changed in the compilation pipeline.

@Eric-Arellano
Copy link
Contributor

I guess I'm having a hard time envisioning where CompilerField.normalized_value(CCSourcesField) would get changed in the compilation pipeline.

When compiling the file, you will consult the field to know which compiler to use. And then after compilation, you consult the transitive closure to determine which linker to use.

This is all meant to be a solution to your question:

We've discussed selecting the C or C++ compiler based on file extensions, but we would also need a workflow around compiling .c files using a C++ compiler (which I do in every non-embedded use case).

The easier thing would be to always use the same compiler for C vs. C++ and don't allow customizing it.

@sureshjoshi
Copy link
Member Author

sureshjoshi commented Aug 25, 2022

Talked to Eric in Slack to clarify - makes sense now, will reflect in codebase. Idea is that cc_source's compiler field is where the compilation language determination happens, either added in by the user, or filename-based ("").

Then, this field is used to determine what toolchain to pick up and actually perform compilation/linking with later on.

- Defaults to auto-determine language, can be fixed to c/c++
- Modified field sets for uniqueness (need to revisit with correct fields)
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@sureshjoshi sureshjoshi changed the title [internal] Adding first-party sources compilation to the CC backend [internal] Adding first-party sources compilation, linking, packaging, and running to the CC backend Sep 5, 2022
@sureshjoshi
Copy link
Member Author

sureshjoshi commented Sep 5, 2022

@tdyas @Eric-Arellano @stuhood I'm keeping this in draft while I go through and clear out TODOs, add more test coverage, and just apply some general cleanup and tweaks - which is when the PR will be ready for proper review.

However, with @tdyas's cgo PR in the mix, it seems like a good time to bikeshed on target and subsystem naming, determining how consistent we want to keep names with other tools, and within Pants itself.

References:

Note: See #16777 for a bikeshed discussion.

logger = logging.getLogger(__name__)


class CCOptions:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdyas As discussed in the other PR, these should probably be cxx instead of cpp (in order to not get confused with the pre-processor flags), and then maybe even add the pre-processor flags uniquely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #16777

help = """Options for downloaded CC toolchain support."""

c_executable = StrOption(
default="gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi-gcc",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the real defaults - just making life easier for testing.

help = """Options for system CC toolchain support."""

c_executable = StrListOption(
default=["clang", "gcc"],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These defaults assume the user's system has their gcc/clang versions aliased to these names - which I think is a fair assumption. They can always be specified as needed to something like clang-15

@@ -51,11 +73,64 @@ class CCGeneratorFieldSet(FieldSet):
# -----------------------------------------------------------------------------------------------


class CCCompileFlagsField(StringSequenceField):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdyas @Eric-Arellano @stuhood
Big bikeshed on alias names - I've seen most permutations of flags/opts/options/some other things in the field. I like clear, concise, unambiguous... But that might almost be at a detriment because they look different to other tools.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #16777



# TODO: Need this so building a binary doesn't build a library too
class CCContrivedField(BoolField):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed anymore, after I started expanding on library vs binary targets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I know this isn't any kind of solution, anyways, just had it in there so I could move past weird package redundancy while developing.

@tdyas
Copy link
Contributor

tdyas commented Sep 5, 2022

However, with @tdyas's cgo PR in the mix, it seems like a good time to bikeshed on target and subsystem naming, determining how consistent we want to keep names with other tools, and within Pants itself.

We should probably open a separate issue for the bikeshedding on naming. Easier to keep that conversation distinct from the two PRs.

@sureshjoshi
Copy link
Member Author

However, with @tdyas's cgo PR in the mix, it seems like a good time to bikeshed on target and subsystem naming, determining how consistent we want to keep names with other tools, and within Pants itself.

We should probably open a separate issue for the bikeshedding on naming. Easier to keep that conversation distinct from the two PRs.

orrr..... A discussion!!! #16777

@sureshjoshi
Copy link
Member Author

Note to reviewers: I'm going to close this one and open up a few smaller PRs, as this is getting unfairly large - especially with some modifications I haven't yet pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: CC CC backend-related issues category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants