Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Message Descriptor information #336

Closed
wants to merge 16 commits into from

Conversation

fdeantoni
Copy link

@fdeantoni fdeantoni commented May 25, 2020

A new trait MessageDescriptor which includes the following info about a message:

  • package_name which is taken from the package field in the proto file;
  • message_name which is the message name itself;
  • type_url which is "type.googleapis.com/" + package_name + "." + message_name;

The prefix "type.googleapis.com" is the default prefix for all protobuf types (see here).

Adding the MessageDescriptor trait will help with the packing and unpacking of messages in the Any type, but may be helpful in other situations as well, and can easily be extended to include more proto information in future (e.g. source file etc).

This pull request can address #277 completely as it will allow an external crate to provide all the functionalities without polluting the prost crate with additional dependencies. For an external crate that provides all the functionality described in #277, see prost-wkt.

A new trait `MessageProto` has been added which includes the following info about a message:
 - `package_name` which is taken from the `package` field in the proto file;
 - `message_name` which is the message name itself;
 - `type_url` which is "type.googleapis.com/" + package_name + "." + message_name;

 The prefix "type.googleapis.com" is the default prefix for all protobuf types (see [here](https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Any)).

 Adding the MessageProto trait will help with the packing and unpacking of messages in the `Any` type, but may be helpful in other situations as well, and can easily be extended to include more proto information in future (e.g. source file etc).
@fdeantoni fdeantoni changed the title Include Message Proto information Include Message Descriptor information May 27, 2020
@aljoscha
Copy link

aljoscha commented Jul 3, 2020

Is there any update on this PR? I recently stumbled across this because I need to work with Any. I'd really like to use Prost because I think it feels much more idiomatic but right now I'm forced to use protobuf.

@teburd
Copy link

teburd commented Aug 20, 2020

I'd really like to see this happen, along with perhaps some attribute describing the .proto real field name rather than the more friendly rust one, such as #[prost(package=..., message=..., field=...)] to effectively describe everything needed for prost-derive to possibly generate the reflection API

@fdeantoni
Copy link
Author

Hi @danburkert , as you suggested in your comment on issue 255, I broke up the requirement into smaller parts. This PR now only implements a single trait with three fields (message_name, package_name, and type_url) for each Message struct, and generates those structs from protobuf messages with a new package macro attribute (to provide the package_name field). It does not pull in any other dependencies, and should be fully backwards compatible with previous releases. Do let me know if this is ok or if you think more work is needed. I really like this library and hope this PR can be included!

@fdeantoni
Copy link
Author

Hi @danburkert, I was wondering if you have plans of adding some kind of message descriptor to Prost in future? Although this PR implements it, I hope any implementation that allows me to retrieve a message type_url can make into the project. Really love Prost and it is the only thing I'm missing from this great library!

Copy link

@jcaden jcaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good to me. I was wondering if they could be merged as they allow for serializing/Deserializing into/from Any type that is the only missing functionality for us in prost.

@haozibi
Copy link

haozibi commented Feb 25, 2021

WANT MERGE!!!

@jcaden
Copy link

jcaden commented Feb 25, 2021

Whie this is merged, I've managed to do a workaround without forking prost nor prost build. Basically I ask prost-build to left the files in a known directory and then get the path from the file name, using that information a rewrite the generated code adding package information. I also add an attribute to the generated types in order to include some methods. It's not the best solution ever, but at least I can carry on without forking.

The buidl is changed by this function:

/// Generates source from protocol buffer definitions and adds it information about the package
/// That can be used by the proto_package_name macro to generate metadata information
pub fn generate_interface_source<P>(
    out_file: P,
    protos: &[P],
    includes: &[P],
) -> Result<(), Box<dyn std::error::Error>>
where
    P: AsRef<Path>,
{
    let mut out_dir: PathBuf = env::var_os("OUT_DIR")
        .ok_or_else(|| Error::new(ErrorKind::Other, "OUT_DIR environment variable is not set"))
        .map(Into::into)
        .unwrap();

    out_dir.push("generated");

    // Ignoring create_dir error as it can already exist
    let _ = fs::create_dir(&out_dir);

    prost_build::Config::new()
        .out_dir(&out_dir)
        .type_attribute(".", "#[proto_package_name(PACKAGE_NAME)]")
        .compile_protos(protos, includes)?;

    fs::remove_file(&out_file)?;
    let mut generated_output = OpenOptions::new()
        .read(true)
        .write(true)
        .create(true)
        .open(&out_file)?;

    for entry in fs::read_dir(&out_dir)? {
        let entry = entry?;
        let path = entry.path();
        if !path.is_dir() {
            let mut content = String::new();
            File::open(&path).unwrap().read_to_string(&mut content)?;
            let file_name = path.file_stem().unwrap().to_str().unwrap();

            let tokens = file_name.split(".");
            let mut tokens_len = 0;
            for module in tokens {
                writeln!(generated_output, "pub mod {} {{", module)?;
                tokens_len = tokens_len + 1;
            }

            writeln!(
                generated_output,
                "\nuse proto_package_name::proto_package_name;\n"
            )?;

            writeln!(
                generated_output,
                "static PACKAGE_NAME : &str = \"{}\";\n",
                file_name
            )?;

            generated_output.write_all(content.as_bytes())?;

            for _ in 0..tokens_len {
                writeln!(generated_output, "}}")?;
            }
        }
    }

    generated_output.sync_all()?;

    fs::remove_dir_all(&out_dir)?;

    Ok(())
}

The code of the macro looks like this:

#[derive(Debug)]
struct ProtocPackageNameParams {
    module_name_constant: Ident,
}

impl Parse for ProtocPackageNameParams {
    fn parse(input: ParseStream) -> Result<Self> {
        Ok(ProtocPackageNameParams {
            module_name_constant: input.parse()?,
        })
    }
}

#[proc_macro_attribute]
pub fn proto_package_name(attr: TokenStream, item: TokenStream) -> TokenStream {
    let item = parse_macro_input!(item as DeriveInput);
    let struct_name = &item.ident;
    let struct_name_string = quote!(#struct_name).to_string();
    let params = parse_macro_input!(attr as ProtocPackageNameParams);
    let module_name_constant = &params.module_name_constant;

    let item = TokenStream::from(quote!(
        #item impl #struct_name {
            pub fn package_name() -> &'static str{
                &#module_name_constant
            }

            pub fn message_name() -> &'static str {
                #struct_name_string
            }

            pub fn type_url() -> String {
                format!(
                    "type.googleapis.com/{}.{}",
                    Self::package_name(),
                    Self::message_name()
                )
            }
        }
    ));

    item
}

If there is interest I can create a project to implement this workaround.

@Khalilw1
Copy link

Hello guys any ETA on this. I really want to use Any without moving to another library (I really like Prost generation)

@mkatychev
Copy link

@jcaden that looks really interesting, do you mind providing a minimal example of the two snippets in action?

@jcaden
Copy link

jcaden commented Jul 22, 2021

@jcaden that looks really interesting, do you mind providing a minimal example of the two snippets in action?

Actually I found some cases where this is not working as expected when having multiple packages there were some name collisions, not sure if that's the best solution.

@teburd
Copy link

teburd commented Jul 23, 2021

So my start of this solutions is to effectively include and create a static with the message descriptors

https://github.com/openenergysolutions/openfmb-rs/blob/develop/openfmb-descriptors/src/lib.rs#L9

I then codegen from that crate and have all the metadata available at build and runtime that way.

@fdeantoni
Copy link
Author

That could work too but I think it is still cleaner to have the descriptor as part of the message via a trait. To me adding a MessageDescriptor trait seems like a relatively safe feature to add, or am I overlooking something important? Really hope the maintainers of Prost can help out and give their feedback...

@teburd
Copy link

teburd commented Aug 2, 2021

@fdeantoni I'd say implementing the metadata/descriptor bits as functions that pull out data from the static variable is the sanest route, that way you don't end up with circular references which Rust won't like.

The static is just step one, step two might be adding a trait and trait impls being created for each generated message type to get that descriptor data from the static.

@fdeantoni
Copy link
Author

Thanks for the feedback @BFrog! I'm definitely not against using a static to get the package information, and using a trait impl for each message to get that data. All I want really is just a way to reconstruct the package url from the message :) . However, I figured just placing this data in the trait would be fastest (and safest). In essence the only thing this PR really does is add the following trait:

pub trait MessageDescriptor: Message + 'static {
    fn message_name(&self) -> &'static str;
    fn package_name(&self) -> &'static str;
    fn type_url(&self) -> &'static str;
}

The derive macro then takes care of implementing this trait for each message, for example:

    impl ::prost::MessageDescriptor for Foo {
        fn message_name(&self) -> &'static str {
            "Foo"
        }
        fn package_name(&self) -> &'static str {
            "com.example.package"
        }
        fn type_url(&self) -> &'static str {
            "type.googleapis.com/com.example.package.Foo"
        }
    }

You mention circular references as a potential risk to this implementation though. Can you tell me more how/where this might happen? If there is a weakness in the design then definitely it should be addressed!

@teburd
Copy link

teburd commented Aug 3, 2021

@fdeantoni As soon as you start looking at the fields of a message you start running into issues

For example the protobuf api itself provides this, https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.message#Message.GetDescriptor.details

A Descriptor provides a way to introspect the fields of a message.

A FieldDescriptor has a method containing_type" call provides the Descriptor for the message type holding that field.

So you have a circular reference in Descriptor <-> FieldDescriptor in the GetDescriptor protobuf introspection API

You don't have to replicate the Descriptor API entirely, but it is really useful, especially when crafting custom code generators that maybe generate new types/functions/etc on top of the existing Message implementations.

@fdeantoni
Copy link
Author

Ok, but I think that is a much bigger change than what I'm proposing for this PR, which is really a very simple and dumb message descriptor trait containing only the package name and message name (to construct a url). The next step indeed could be adding a static reference to get the field descriptors, but that I think should be a separate PR.

Nonetheless, if the maintainers are willing to go with a much bigger change that implements the full Descriptor API comparable to the cpp version then that's even better, but I was hoping by keeping this PR small and simple (with potential to add features incrementally) it had more chance of being included. Hopefully the maintainers can weigh in here and suggest the best course of action.

pub trait MessageDescriptor: Message + 'static {
fn message_name(&self) -> &'static str;
fn package_name(&self) -> &'static str;
fn type_url(&self) -> &'static str;
Copy link

@neverchanje neverchanje Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not need type_url to be included in the trait. Let's take a look on how C++ implements Any::PackFrom:

bool AnyMetadata::PackFrom(Arena* arena,
                           const Message& message,
                           StringPiece type_url_prefix) {
  type_url_->Set(
      &::PROTOBUF_NAMESPACE_ID::internal::GetEmptyString(),
      GetTypeUrl(message.GetDescriptor()->full_name(), type_url_prefix),
      arena);
  return message.SerializeToString(
      value_->Mutable(ArenaStringPtr::EmptyDefault{}, arena));
}

It uses an API called Descriptor::full_name which returns <package>.<message>, and finally generates the type_url via GetTypeUrl, which is hard-coded in the library and can not be extended like a trait.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type_url was just added for convenience really. If this stops the PR from being merged then I'm all for removing it! A third-party library that implements pack and unpack like prost-wkt can then implement it instead (with the prefix configurable and whatnot).

Copy link

@neverchanje neverchanje Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate your work on prost-wkt and I'm actually not one of the prost maintainers, so I couldn't and didn't intend to stop anybody. I left this comment only because I feel a minimum public trait is better and will not lead to incompatibility issues in the future.

Anyway, I'm really looking forward to this PR being merged, then we will not have to use a forked version of prost in prost-wkt any longer. I don't understand why such tiny PR has been blocked so long since last year. 😢

fn type_url(&self) -> &'static str {
#type_url
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the above methods should be associated functions instead. Someone will be able to use Foo::type_url() instead of constructing Foo and then calling the method.

For example this is how pack/unpack to Any will be implemented with associated functions:

fn pack_to_any<M>(msg: M) -> prost_types::Any
where
    M: prost::Message + prost::MessageDescriptor,
{
    prost_types::Any {
        type_url: M::type_url().to_owned(),
        value: msg.encode_to_vec(),
    }
}

fn unpack_from_any<M>(msg: prost_types::Any) -> Option<M>
where
    M: prost::Message + prost::MessageDescriptor + Default,
{
    if msg.type_url == M::type_url() {
        Some(M::decode(&msg.value[..]).ok()?)
    } else {
        None
    }
}

@oblique
Copy link

oblique commented Oct 1, 2021

I created a workaround example for packing/unpacking to Any that uses FileDescriptorSet in build.rs: https://github.com/oblique/prost-any-workaround

@fdeantoni
Copy link
Author

Thanks for the example @oblique. It looks like the prost maintainers will only allow this route to retrieve the package name from the proto files.

For anybody interested, I reworked prost-wkt to use FileDescriptorSets to retrieve the package_name instead and it works ok for my projects. As this is sufficient for my requirements, I will be closing this PR.

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants