Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to override the representation of enums #1907

Open
nbigaouette opened this issue Oct 27, 2020 · 10 comments
Open

Add an option to override the representation of enums #1907

nbigaouette opened this issue Oct 27, 2020 · 10 comments

Comments

@nbigaouette
Copy link

nbigaouette commented Oct 27, 2020

In https://github.com/nbigaouette/onnxruntime-rs I use bindgen in onnxruntime-sys/build.rs to generate bindings on different platforms (macos, linux, windows).

The original C header file is https://github.com/microsoft/onnxruntime/blob/v1.5.2/include/onnxruntime/core/session/onnxruntime_c_api.h (which is quite large).

This header contains a couple of enums, for example OrtLoggingLevel that gets wrapper as u32: https://docs.rs/onnxruntime-sys/0.0.9/onnxruntime_sys/type.OrtLoggingLevel.html

But on Windows, I get i32s for them (docs.rs doesn't contain windows values, yet).

Because of this I have trouble creating an API that can work both on Windows an other platforms.

Input C/C++ Header

wrapper.h

#include "onnxruntime_c_api.h"

onnxruntime_c_api.h

typedef enum OrtLoggingLevel {
  ORT_LOGGING_LEVEL_VERBOSE,
  ORT_LOGGING_LEVEL_INFO,
  ORT_LOGGING_LEVEL_WARNING,
  ORT_LOGGING_LEVEL_ERROR,
  ORT_LOGGING_LEVEL_FATAL,
} OrtLoggingLevel;

Bindgen Invocation

On macOS:

$ bindgen onnxruntime-sys/wrapper.h --whitelist-type OrtLoggingLevel -- -I target/debug/build/onnxruntime-sys-4dd6b75a91cb40e3/out/onnxruntime/onnxruntime-osx-x64-1.5.2/include

On Windows:

$ bindgen onnxruntime-sys/wrapper.h --whitelist-type OrtLoggingLevel -- -I C:\cargo_target\debug\b
uild\onnxruntime-sys-6a0af3699f92fd5c\out\onnxruntime\onnxruntime-win-x64-1.5.2\include

Actual Results

On macOS:

/* automatically generated by rust-bindgen 0.55.1 */

pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_VERBOSE: OrtLoggingLevel = 0;
pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_INFO: OrtLoggingLevel = 1;
pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_WARNING: OrtLoggingLevel = 2;
pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_ERROR: OrtLoggingLevel = 3;
pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_FATAL: OrtLoggingLevel = 4;
pub type OrtLoggingLevel = ::std::os::raw::c_uint;

On Windows:

/* automatically generated by rust-bindgen 0.55.1 */

pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_VERBOSE: OrtLoggingLevel = 0;
pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_INFO: OrtLoggingLevel = 1;
pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_WARNING: OrtLoggingLevel = 2;
pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_ERROR: OrtLoggingLevel = 3;
pub const OrtLoggingLevel_ORT_LOGGING_LEVEL_FATAL: OrtLoggingLevel = 4;
pub type OrtLoggingLevel = ::std::os::raw::c_int;

Using --rustified-enum="*" I get:

❯ bindgen onnxruntime-sys/wrapper.h --rustified-enum="*" --whitelist-type OrtLoggingLevel -- -I target/debug/build/onnxruntime-sys-4dd6b75a91cb40e3/out/onnxruntime/onnxruntime-osx-x64-1.5.2/include
/* automatically generated by rust-bindgen 0.55.1 */

#[repr(u32)]
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
pub enum OrtLoggingLevel {
    ORT_LOGGING_LEVEL_VERBOSE = 0,
    ORT_LOGGING_LEVEL_INFO = 1,
    ORT_LOGGING_LEVEL_WARNING = 2,
    ORT_LOGGING_LEVEL_ERROR = 3,
    ORT_LOGGING_LEVEL_FATAL = 4,
}
PS Microsoft.PowerShell.Core\FileSystem::\\VBOXSVR\onnxruntime> bindgen onnxruntime-sys/wrapper.h --rustified-enum="*" --whitelist-type OrtLoggingLevel -- -I C:
\cargo_target\debug\build\onnxruntime-sys-6a0af3699f92fd5c\out\onnxruntime\onnxruntime-win-x64-1.5.2\include
/* automatically generated by rust-bindgen 0.55.1 */

#[repr(i32)]
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
pub enum OrtLoggingLevel {
    ORT_LOGGING_LEVEL_VERBOSE = 0,
    ORT_LOGGING_LEVEL_INFO = 1,
    ORT_LOGGING_LEVEL_WARNING = 2,
    ORT_LOGGING_LEVEL_ERROR = 3,
    ORT_LOGGING_LEVEL_FATAL = 4,
}

Expected Results

Note how macOS gives me a std::os::raw::c_uint and Windows gives me a std::os::raw::c_int (or #[repr(u32)] vs #[repr(i32)]) for the enum representation.

Why does bindgen generates different code on macOS/Linux and Windows? Can I tell bindgen to use std::os::raw::c_int everywhere? Or i64, or whatever else?

Thank you!

@emilio emilio changed the title C enums are either i32 or u32 depending on platform Add an option to override the representation of enums Nov 3, 2020
@emilio
Copy link
Contributor

emilio commented Nov 3, 2020

Because that's what the underlying platform is using, see #1361. So that part of the report is invalid (MSVC is weird here :/).

A callback or such to override the enum repr may be worth it as an option, but note that as soon as you do that (if you make it i64 for example) then structs containing that enum can break and so on, so it's a bit of a footgun. Repurposing this to track that feature request.

@ThatGeoGuy
Copy link

Bindgen seems to be doing the wrong thing based on the C standard.

I understand that making things i64 everywhere could be problematic. enums in C aren't 64-bits unless int is 64 bit (more on this later), so converting enum ordinals as i64 would be a bad idea.

On the other hand, MSVC is not the weird one in this case. It is the Unix behaviour that seems wrong. First, lets look at the C standard for what type enums should be:

In both cases you're looking for Section 6.7.2.2, paragraph 3 - Semantics. There it writes (and I quote directly here):

The identifiers in an enumerator list are declared as constants that have type int and may appear wherever such are permitted.

They always have type int in C, which is a signed type. So in all cases on x86_64 (which I assume is what we're targeting here) we would want bindgen to do the correct thing and treat these as i32. On other platforms this may change as int may be i16 on some older platforms. If such a platform existed that int was i64, then perhaps we might consider i64 a correct representation for that platform.

Beyond that, Modern C provides similar advice:

Takeaway 1.5.6.6 Enumeration constants are of type signed int

I'm not sure why u32 is being purported as "correct" behaviour by bindgen here. I think given the above (and the fact that enums can contain signed ordinals), bindgen should reconsider and represent everything as i32 constants on x86_64. If we're being a bit more particular, it should instead look at what type signed int actually compares to, and use that.

@tgross35
Copy link
Contributor

tgross35 commented Oct 25, 2022

Why is #[repr(C)] not used here? From the nomicon:

repr(C) is equivalent to one of repr(u*) (see the next section) for fieldless enums. The chosen size is the default enum size for the target platform's C application binary interface (ABI). Note that enum representation in C is implementation defined, so this is really a "best guess". In particular, this may be incorrect when the C code of interest is compiled with certain flags.

As noted, a C enum is always the size of int, so [repr(C)] seems like a better choice than [repr(i32)]. In fact, that would make it safe to make the default enum style be rust or rust_non_exhaustive, instead of const, which I would strongly vouch for (much cleaner and a better representation of the original code).

@pvdrz
Copy link
Contributor

pvdrz commented Feb 10, 2023

Also #1966 is somewhat related to this.

@pvdrz
Copy link
Contributor

pvdrz commented Feb 10, 2023

Why is #[repr(C)] not used here? From the nomicon:

repr(C) is equivalent to one of repr(u*) (see the next section) for fieldless enums. The chosen size is the default enum size for the target platform's C application binary interface (ABI). Note that enum representation in C is implementation defined, so this is really a "best guess". In particular, this may be incorrect when the C code of interest is compiled with certain flags.

As noted, a C enum is always the size of int, so [repr(C)] seems like a better choice than [repr(i32)]. In fact, that would make it safe to make the default enum style be rust or rust_non_exhaustive, instead of const, which I would strongly vouch for (much cleaner and a better representation of the original code).

we could add a flag to switch between #[repr(C)] and #[repr(whatever_int_is_for_this_target)]. Would that be a sensible option? @tgross35 @emilio ?

@tgross35
Copy link
Contributor

That would be great for my case, just another thing that works a little bit better cross platform without necessarily needing to regenerate bindings.

Is there any reason not to make this option the default for rust style enums?

@pvdrz
Copy link
Contributor

pvdrz commented Feb 11, 2023

I don't have strong arguments against enabling #[repr(C)] by default. The only thing I can see on the horizon would be that someone is relying on it being #[repr(i32)] for some odd reason.

@pvdrz
Copy link
Contributor

pvdrz commented Feb 16, 2023

One "open question" that I see here is if we can use #[repr(C)] for every C enum or not.

Apparently, C23 allows to specify the underlying type of an enum. So, I guess, in this particular case, we should use #[repr(the_int_that_c_said)] instead of #[repr(C)].

I'm not sure if there's another way this feature could break something or not.

Edit: An alternative would be to leave this reasoning to the user :trollface:. What I mean is we could use regular expressions so people can use .* if they want to use #[repr(C)] for every enum or filter accordingly.

@tgross35
Copy link
Contributor

Interesting caveat with the C23 syntax! I guess in those cases it's already determined.

The only thing I can see on the horizon would be that someone is relying on it being #[repr(i32)] for some odd reason.

I think this is pretty unlikely, and there are probably more cases the other way around where i32 is not the correct enum representation. Since the enum size is ABI defined (per the nomicon) and the size of c_int isn't fixed anyway (16 bits on avr and msp430, looking at the source), I don't think defaulting to #[repr(C)] would break anything that doesn't already have to use some sort of workarounds because i32 isn't right anyway

@pvdrz
Copy link
Contributor

pvdrz commented Feb 16, 2023

I'll start working on this but I'll wait for @emilio's opinion on which should be the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants