Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify and refactor the implement macro #3507

Merged
merged 2 commits into from
Feb 25, 2025

Conversation

sivadeilra
Copy link
Collaborator

@sivadeilra sivadeilra commented Feb 23, 2025

This improves COM object support.

The main focus is on improving the internal structure of the implement proc macro. I've broken the implement function into a bunch of small, focused functions. The implement proc macro first parses its input, creates an ImplementInputs struct which contains all of the information need to generate output. Then, it calls gen_all to generate the output, which calls gen_* to generate each specific output item. This makes the code much easier to understand, debug, and edit.

The codegen functions (the gen_* functions) now generate strongly-typed syn values, such as syn::Item. Each gen_* generates the tokens and then parses them into the strongly-typed value. This allows us to catch syntax errors far more quickly, and the parse_quote macro will panic at the site of any syntax error. This makes development on the increment macro a lot more productive.

Speaking of debugging, I've added some unit tests within the implement crate. They don't actually test anything, but that's not the point. The point of these unit tests is that you can very easily see the output of implement, without needing cargo-expand (which has its own problems). You can also step through the code in a debugger.

This PR contains some modest improvements to the code that is generated by implements:

  • Instead of generating a tuple of interface pointers, this code now generates fields for each interface pointer. Each interface pointer field is named after the interface and its index, such as interface2_ifoo. This makes it a lot easier to debug COM objects at runtime.
  • The IUnknownImpl trait is now an unsafe trait. There's no way to implement this trait without writing unsafe code.
  • The IUnknown::Release implementation now puts the "slow" path (refcount reaches zero) into a separate function. This keeps the fast path (refcount is not zero) as small and fast as possible. For objects with significant destructors, this helps.
  • Cosmetic: Renamed GetTrustLevel and such to use snake casing, where applicable.
  • Removed from_inner_ref on IUnknown. There is no need for this any more, since all COM interface method implementations are on Foo_Impl types now.
  • Changed the methods on WeakRefCount that can change the refcount to have unsafe signatures. Any change to a refcount could affect memory lifetime, so these operations need to be unsafe. This allows us to expose &WeakRefCount for ComObject<T>, which simplifies other things. The motivation here is to make it difficult to accidentally leave a memory safety hole (in safe code).
  • "Inlined" a little code that touches the reference count and simply directly used the refcount field.
  • Code that is generated for a specific interface now uses the quote_spanned or parse_quote_spanned macros, with a span that points to the interface name. This improves the error messages when the user forgets to provide an implementation of a COM interface, because the error message will point to the interface name.

@kennykerr
Copy link
Collaborator

At an initial reading this PR seems to be doing too much - it would be easier to reason about if this were broken up a bit more into more discrete changes that focused on individual things. Philosophical changes to WeakRefCount - which I'm not convinced I agree with - seem unrelated to the refactoring of the code generation of the implement macro. Then there are changes to ComObject and IUnknownImpl which looks good but again are not directly related to code generation in the implement macro. Also some renaming of methods which may be good for consistency but again unrelated and just leading to the impression of a large complex change. Given that there are around 900 LOC of new and somewhat intricate code involving proc macros and COM - both of which are complex topics, it would be beneficial to just make one change at a time if at all possible.

@kennykerr
Copy link
Collaborator

The main focus is on improving the internal structure of the implement proc macro. I've broken the implement function into a bunch of small, focused functions. The implement proc macro first parses its input, creates an ImplementInputs struct which contains all of the information need to generate output. Then, it calls gen_all to generate the output, which calls gen_* to generate each specific output item. This makes the code much easier to understand, debug, and edit.

The codegen functions (the gen_* functions) now generate strongly-typed syn values, such as syn::Item. Each gen_* generates the tokens and then parses them into the strongly-typed value. This allows us to catch syntax errors far more quickly, and the parse_quote macro will panic at the site of any syntax error. This makes development on the increment macro a lot more productive.

This sounds like a good meaty PR on its own. I like the sound of this.

Speaking of debugging, I've added some unit tests within the implement crate. They don't actually test anything, but that's not the point. The point of these unit tests is that you can very easily see the output of implement, without needing cargo-expand (which has its own problems). You can also step through the code in a debugger.

  • Code that is generated for a specific interface now uses the quote_spanned or parse_quote_spanned macros, with a span that points to the interface name. This improves the error messages when the user forgets to provide an implementation of a COM interface, because the error message will point to the interface name.

Macros seem like they have so much potential in Rust but lack so much in terms of debuggability and testability and contextual ignorance. It would be nice, for consistency to move this into the test_implement crate, but that probably makes it harder to keep these details private. It would also be nice to have some panic tests like this to ensure we maintain the usability.

  • Instead of generating a tuple of interface pointers, this code now generates fields for each interface pointer. Each interface pointer field is named after the interface and its index, such as interface2_ifoo. This makes it a lot easier to debug COM objects at runtime.

There might be an issue here if an implementation needs to implement both IVector<HSTRING> and IVector<IInspectable> for example. I encountered such edge cases with C++/WinRT.

  • The IUnknown::Release implementation now puts the "slow" path (refcount reaches zero) into a separate function. This keeps the fast path (refcount is not zero) as small and fast as possible. For objects with significant destructors, this helps.

Neat, we did this in C++/WinRT and it made big difference.

  • Cosmetic: Renamed GetTrustLevel and such to use snake casing, where applicable.

We may not be super consistent here but it helps to preserve the original API names to avoid confusion and make it easier to search the code base for related code.

  • Removed from_inner_ref on IUnknown. There is no need for this any more, since all COM interface method implementations are on Foo_Impl types now.

👍

  • Changed the methods on WeakRefCount that can change the refcount to have unsafe signatures. Any change to a refcount could affect memory lifetime, so these operations need to be unsafe. This allows us to expose &WeakRefCount for ComObject<T>, which simplifies other things. The motivation here is to make it difficult to accidentally leave a memory safety hole (in safe code).

I'm not sure I agree with this - the WeakRefCount itself is quite safe. This is also not consistent with Rust's convention with methods like slice::as_ptr for example which is completely safe even though it can often be used in very unsafe ways but its not as_ptr that's unsafe.

  • "Inlined" a little code that touches the reference count and simply directly used the refcount field.

👍

@sivadeilra
Copy link
Collaborator Author

There might be an issue here if an implementation needs to implement both IVector and IVector for example. I encountered such edge cases with C++/WinRT.

Fortunately, that won't cause a collision. Both interface chains will have a different index, and the generated ident contains the index as well as the name suffix. So this case would generate something like interface4_ivector and interface5_ivector. Not terribly useful for distinguishing them, but no worse than the previous approach which used tuples.

Speaking of tuples, we've been relying on a behavior that is not guaranteed. The compiler is allowed to reorder the fields within a tuple; tuples are repr(Rust), not repr(C). In our case the compiler never does, but nothing guarantees that. Generating one field per interface chain lets us guarantee the layout, rather than generating a single field whose type is a tuple.

How about this:

  • I'll revert the changes to IUnknownImpl and ComObject and propose those as separate PRs.
  • I'll keep the interface chain renaming. It's purely cosmetic and does not need changes in code outside of the macros.
  • And keep the gen_* split, of course.

@kennykerr
Copy link
Collaborator

The compiler is allowed to reorder the fields within a tuple; tuples are repr(Rust), not repr(C).

Even when the field is part of a #[repr(C)] struct? 😯

@sivadeilra
Copy link
Collaborator Author

I believe so, because the layout of types is determined by their definition, but not their context. In this case, we're getting lucky because these definitions are always "a sequence of non-nullable pointer-sized fields". But if we had a tuple like (u8, u64, u8), then I would expect the compiler to reorder it to (u64, u8, u8) to remove some alignment padding.

We're just getting lucky right now. ;)

Copy link
Collaborator

@dpaoliello dpaoliello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comments are on existing code which has been moved...

@sivadeilra sivadeilra force-pushed the user/ardavis/implement-gen branch from 47095b2 to 61e9b32 Compare February 24, 2025 22:14
@sivadeilra sivadeilra requested a review from Copilot February 24, 2025 23:03

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

crates/tests/libs/implement_core/src/com_object.rs:437

  • The removal of raw pointer assertions in the iunknown_identity test has reduced explicit verification of the canonical IUnknown pointer behavior. Consider reintroducing these assertions or alternative checks to ensure interface pointer integrity.
let app = MyApp::new(0);
@sivadeilra sivadeilra force-pushed the user/ardavis/implement-gen branch from c002135 to cc8115c Compare February 25, 2025 06:09
//! These tests are just a way to quickly run the `#[implement]` macro and see its output.
//! They don't check the output in any way.
//!
//! This exists because of some difficulties of running `cargo expand` against the `#[implement]`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just a comment about how to run a test and see the output - for example I run this:

 cargo test -p windows-implement --tests simple_type

But cargo test does not automatically display the stdout for println.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g.

cargo test -p windows-implement --tests simple_type -- --nocapture

Copy link
Collaborator

@kennykerr kennykerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much more manageable thanks!

@kennykerr kennykerr changed the title Improvements to COM object support Simplify and refactor the implement macro Feb 25, 2025
@kennykerr kennykerr merged commit e4ffdcf into microsoft:master Feb 25, 2025
34 checks passed
@sivadeilra sivadeilra deleted the user/ardavis/implement-gen branch February 26, 2025 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants