-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generation of cxx::bridge from existing C++ code #235
Comments
You got it -- this is almost exactly the kind of thing I had in mind in that paragraph. The "almost" is because the idea of scanning Rust code to find which C++ functions need to be made callable is not something I had considered (but very interesting). In my organization we'd have been fine using bindgen-style whitelist (this kind of thing, but in Buck). I'll loop in @abrown here since we have been poking at the concept of generation of cxx::bridge invocations recently in #228. One aspect of generation of cxx::bridge relevant to #228 is that we're free to explore making such a generator more customizable and/or opinionated than a straight one-to-one translation of C++ signatures to cxx::bridge signatures. For example it sounds like @abrown would be interested in a way to expose C++ constructors of possibly internal-pointer-y C++ types by automatically inserting UniquePtr in the right places to make them sensibly usable from Rust. I shared in that issue my considerations on not going for that kind of sugar in cxx directly, when this sort of generation of cxx::bridge is a possibility:
I haven't addressed your whole post but wanted to get back to you quickly; I'll try to respond in more detail in the next couple days. |
Great - glad I am not completely off in cloud cuckoo land. I'll look at #228 in detail as well. When you get a chance to reply more thoroughly (no hurry) - can you comment on how you would practically expect such a higher-level translator to fit in with cxx in terms of crate and module arrangements? For example, were I to attempt something like an |
I'll share the technical constraints as far as I understand them. By let ret = cpp_call!(path::to::Function)(arg, arg);
// or maybe this; the difference isn't consequential
let ret = cpp_call!(path::to::Function(arg, arg)); where One important constraint is that every procedural macro invocation is executed by Rust in isolation, being given access only to the input tokens of that call site and producing the output tokens for that call site. That means the various cpp_call invocations throughout a crate wouldn't be able to "collaborate" to produce one central #[cxx::bridge] module. That's not to say we won't want some kind of cpp_call macro to mark C++ call sites; I'll come back to this. It just means it wouldn't be the full story, even as a procedural macro. There would still need to be something else crawling the crate to find cpp_call call sites in order to have visibility into all of them together. When it comes to "crawling the crate", today procedural macros are not powerful enough to do this. There is no such thing as a crate-level procedural macro (i.e. like As such, the options are (1) don't crawl the crate, (2) crawl the crate using something that is not a procedural macro. Option 1 could entail something like this: // at the crate root
#[extern_c]
mod ffi {
pub use path::to::Function;
pub use some::namespace::*;
...
}
// in a submodule
let ret = crate::ffi::Function(arg, arg); // or crate::ffi::path::to::Function? The way this would operate is: #[extern_c] is a procedural macro that expands to The rest is just like raw The downside is that like Option 2 would be more like this: // at the crate root
mod_ffi!();
// in a submodule
#[extern_c]
use path::to::Function;
let ret = Function(arg, arg);
// or without a `use`:
let ret = extern_c!(path::to::Function)(arg, arg); In terms of how it expands, it would be quite similar to option 1. From a user's perspective in Rust, they get to "import" any item directly from C++ by writing an ordinary In comparison to option 1, crawling the crate has the disadvantage that in the presence of macros and/or conditional compilation it isn't possible to have an accurate picture of what the exact set of input files is. Tools like But depending on build system, this is a non-issue. I know in Buck we require the library author to declare what files constitute the crate, as a glob expression or as a list of files or as Python logic. It would be possible for us to use that same file list without attempting to trace the module hierarchy established by the source code. |
Thanks, yes, that's what I was thinking. Thanks for taking the time to explain the current state of procedural macros; I appreciate it. Option 2 seems more powerful, as it doesn't require any sort of pre-declarations at all. Since both options require an extra external code generator, it feels preferable to me to start with option 2, until or unless it's proven to be impossible? The code generator would also need to know where to find C++ .h files, so there would also need to be an include_cpp!("base/feature_list.h")
let ret = extern_c!(base::feature_list::FeatureList::whatever) This is beautifully similar to the way similar code would be written in C++ 👍 In the Rust build, Then the At the codegen stage,
In our case, right now, we don't need to list the .rs files that are pure Rust code, but we already do need to list the files which contain FFI (so we can pass them to How would you practically expect to structure this? We are now imagining these build stages:
Is there an argument that A few more questions/thoughts:
|
LGTM
I am open to this. We provide something almost like this already, in the form of https://docs.rs/cxx-build/0.3.4/cxx_build/ (see https://github.com/dtolnay/cxx/tree/0.3.4#cargo-based-setup). It's a Rust entrypoint for our C++ code generator, as opposed to Alternatively, it might be reasonable for your step 1 code generator to also do the I would call out that having a Rust API for the
This sounds fine to me. To the extent that any feature work would be required in |
Interesting, I hadn't thought of that. Insofar as I'd thought about this at all, I was probably imagining forking It looks like bindgen supports roughly the set of features we need but it's been an awfully long time since I've fiddled with it. |
Another aspect to discuss. Supposing (as you propose in #228) this higher-level code generator adds support for i.e. we want to able to see C++ code like this: class Foo {
public:
Foo(std::string& label, uint32_t number);
...
}; and write Rust code like this: let x = ffi::make_unique_Foo(CxxString::new("MyLabel"), 42);
// ideally, we could call this ffi::Foo::make_unique, but that's a detail I believe the higher-level code generator would currently need to generate .cc as well as .rs code. We'd need some C++ generated like this: std::unique_ptr<Foo> make_unique_Foo(std::string a, uint32_t b) {
return std::make_unique<Foo>(a,b);
} and then we'd want this to be represented in the [cxx::bridge]
mod ffi {
extern "C" {
type Foo;
fn make_unique_Foo(label: CxxString, number: u32) -> cxx::UniquePtr<Foo>;
}
} There are three approaches. Which would you prefer?
|
Other thoughts happening here as I continue to think this through:
|
Re: collecting a whitelist of C++ types/functions used in Rust code, I took a hacky, yet reasonably successful approach: try to compile the Rust crate without any of the C++ bindings included and parse the |
OK, there's an early attempt at such a higher-level code generator here: https://github.com/google/autocxx. It currently depends on a (slight) fork of cxx, and a (gross, appalling, hacky, make-your-eyeballs-bleed) fork of bindgen. Comments most welcome! It's still at the stage where I'm throwing random commits at it, rather than having any kind of PR-based process, but if anyone wants to join in I can certainly grow up a bit. |
Update for anyone reading along here. autocxx now no longer requires a fork of either bindgen or cxx. It is still, in every other way, "toy" code. It has a large number of test cases for individual cases of interop, but I suspect everything breaks horribly when it is asked to deal with a real C++ codebase. I hope to find out in the next couple of weeks. |
I'll close out this issue and we can keep the rest of the discussion on this topic in the autocxx repo. Thanks all! |
The cxx help says:
It is looking to me like that's exactly the model we may need in order to get permission to use Rust within the large codebase in which I work.
Could we use this issue to speculate about how this might work? It's not something I'm about to start hacking together, but it's conceivable that we could dedicate significant effort in the medium term, if we can become convinced of a solid plan.
Here are my early thoughts.
Note that C++ remains the undisputed ruler in our codebase, so all of this is based around avoiding any C++ changes whatsoever. That might not be practical.
Bits of cxx to keep unchanged
unsafe
keyword is only used if you're doing something that actually is a bit iffy with C++ object lifetimescxxbridge
tool.Bits where we'll need to do something different
cxx::bridge
section. Instead, declarations are read from C++ headers. A simpleinclude_cpp!
macro instead. ("Simple" in the user's sense, very much not in the implementation sense, where we'd need to replicate much ofbindgen
)include_cpp!
macro may pull in hundreds of header files with thousands of functions. Thecxxbridge
C++-generation tool would have to work incredibly hard for every.rs
file if it generated wrappers for each one. So, instead, the call sites into C++ from Rust would probably need to be recognizable by that tool such that it could generate the minimum number of shims. (Procedural macro in statement position?cpp_call!
?)cxxbridge
effort if hundreds of .rs files use the sameinclude_cpp!
macro(s). This might be purely a build-system thing; I haven't thought it through.struct
which can be represented incxx::bridge
(e.g. because it contains onlyunique_ptr
s and ints) then said struct is fully defined such that it's accessible to both Rust and C++. If a struct is not representable incxx
then it is declared in the virtualcxx::bridge
as an opaquetype
(and thus can only be referenced fromunique_ptr
s etc.)#define
s intorustc
. I can't think of a better way to do this than to rely on theenv!
macro. Ugh.cxx
in localized places where it wraps up C++ function calls with an idiomatic Rust wrapper. This proposal involves instead allowing production Rust code to freely call C++ classes and functions. There may be some major leaps here to make this practical, which I might not have thought of (as I say, these are early thoughts...)Next steps our side
cxx
, we're in trouble. My hope is that this isn't the case, but work is required to figure this out.Meanwhile though I wanted to raise this to find out if this is the sort of thing you were thinking, when you wrote that comment in the docs!
The text was updated successfully, but these errors were encountered: