Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size_t_is_usize should default to true #1901

Closed
dkg opened this issue Oct 16, 2020 · 7 comments
Closed

size_t_is_usize should default to true #1901

dkg opened this issue Oct 16, 2020 · 7 comments

Comments

@dkg
Copy link

dkg commented Oct 16, 2020

In #1671, there is a reasonable argument made that a principled mapping between size_t and usize is improper.

However, all platforms that i'm aware of do equate the two.

bindgen's defaults since 0.53 mean that any binding that exposes a size_t results in an incompatible API across platforms. That is, if some package foo exposes a bindgen-derived mapping of a C function size_t bar(), then bar will have a different signature on x86 (where it returns a u32) than on x86_64 (where it returns a u64). All downstream code that relies on that function will find itself dealing with the same divergent API across platforms. That's pretty weird and ugly for a typesafe langauge like rust. The behavior before 0.53 meant that the exposed API was congruent across platforms.

I think bindgen's default should be what the pre-0.53 behavior was, even though it is not in principle "correct". If we do that (so that size_t_is_usize defaults to true), then if bindgen is handling any size_t on a platform where usizesize_t it could fail and abort. A package that wants to can still set size_t_is_usize(false) if they want to be able to build on such a platform (and impose the consequent API type churn on their downstream dependencies).

@dkg dkg mentioned this issue Oct 16, 2020
@dkg
Copy link
Author

dkg commented Oct 16, 2020

If bindgen does make this change, then it'll cause another API change in downstream projects that were already using bindgen between 0.53 and 0.55 that have not explicitly set size_t_is_usize(true) and exposing a size_t somewhere. but i think that risk of API change is worthwhile for the convergent API across platforms (and for keeping the generated API stable from versions of bindgen before 0.53 as well).

dkg added a commit to dkg/rust-bindgen that referenced this issue Oct 16, 2020
This addresses the first part of rust-lang#1901.

If size_t_is_usize is manually set to false, and bindgen encounters
any size_t, then we should also probably test to confirm that size_t
is congruent to uintptr_t on the current platform.  I'm not sure of
the best way to do this check.
dkg added a commit to dkg/rust-bindgen that referenced this issue Oct 16, 2020
If size_t_is_usize is manually set to false, and bindgen encounters
any size_t, then we should also probably test to confirm that size_t
is congruent to uintptr_t on the current platform.  I'm not sure of
the best way to do this check.

Fixes: rust-lang#1901
dkg added a commit to dkg/rust-bindgen that referenced this issue Oct 16, 2020
@dkg
Copy link
Author

dkg commented Oct 16, 2020

I've separated out the idea of doing a check (failing with an error) as a different ticket (#1903), so that this ticket can focus specifically on changing the default.

@michaelkirk
Copy link

michaelkirk commented Dec 10, 2020

👍

To make sure I'm understanding, and to clarify why I think this is important:

The previous behavior allowed me as a crate author to pre-build bindings and include them with my crate. This let's my users use my crate without having to compile the bindgen dependency tree, which necessarily is very large and slow to build. Skipping this is a real improvement for my users, who, like all rust users would love to speed up their builds.

Maybe packaging prebuilt bindings is controversial - but it's far from niche:

Typically (as with these crates) a bindgen feature is exposed to allow the user to opt-in to building bindgen and generating their own bindings at compile time.

It does seem like though usize is technically not the same thing as size_t and that they probably have different values on some unpopular platforms, that on all common platforms it "just works", and this change has caused the popular prebuilt bindings workflow to fail across popular platforms.

One other thing I thought was interesting - seemingly the libc crate makes the same "mistake": https://docs.rs/libc/0.2.30/src/libc/lib.rs.html#134

pub type size_t = usize;

@michaelkirk
Copy link

I wonder if the ultimate solution is to try to get c_size_t added to std::os::raw 😬

@teythoon
Copy link

The previous behavior allowed me as a crate author to pre-build bindings and include them with my crate.

Our crate, nettle-sys, does not pre-build bindings, and we are also negatively affected by this change. From my point of view, these are the negative effects of this change:

  • Updating bindgen from what we use now (0.51.1) to 0.53 breaks the API exposed by nettle-sys.
  • Our high-level crate, nettle-rs, has a usize and wants to hand it to the low-level crate. The change in bingen 0.53 means that nettle-sys' API now depends on the platform. How should the high-level crate handle this?

geofft added a commit to geofft/rust-bindgen that referenced this issue Jun 3, 2021
…ng#1901, rust-lang#1903)

This addresses the underlying issue identified in rust-lang#1671, that size_t
(integer that can hold any object size) isn't guaranteed to match usize,
which is defined more like uintptr_t (integer that can hold any
pointer). However, on almost all platforms, this is true, and in fact
Rust already uses usize extensively in contexts where size_t would be
more appropriate, such as slice indexing. So, it's better for ergonomics
when interfacing with C code to map the C size_t type to usize. (See
also discussion in rust-lang/rust#65473 about how usize really should be
defined as size_t, not uintptr_t.)

The previous fix for rust-lang#1671 removed the special case for size_t and
defaulted to binding it as a normal typedef.  This change effectively
reverts that and goes back to mapping size_t to usize (and ssize_t to
isize), but also ensures that if size_t is emitted, the typedef'd type
of size_t in fact is compatible with usize (defined by checking that the
size and alignment match the target pointer width). For (hypothetical)
platforms where this is not true, or for compatibility with the default
behavior of bindgen between 0.53 and this commit, onwards, you can
disable this mapping with --no-size_t-is-usize.
@solb
Copy link

solb commented Jun 4, 2021

Indeed, the change associated with #1671 breaks builds of my thesis work under newer versions of bindgen. I can work around the issue with --size_t-is-usize, but this would be annoying to incorporate into my build system because old versions of bindgen don't expect that flag and I'd have to do version detection to continue to support them. So for now, I'm stuck putting a wrapper script in my path that always passes the flag to the real bindgen. Ugh.

@reivilibre
Copy link

This behaviour (the type being u32 on some platforms and u64 on others) is what appears to be leading to this issue: rvolosatovs/fvad#1, whereby a crate can't compile on ARMv7 because the wrapper crate for the bindings has been developed on a system with u64s and accidentally built that into its assumptions.

I acknowledge the conflicting requirements but this does feel a lot worse than making the cheeky assumption or assertion that usize is size_t.

I don't think the library author stood much of a chance of noticing this and it's not clear to me that the users of a crate should be having to suspect that the type annotations of their dependencies will change per-platform.

I wonder if the ultimate solution is to try to get c_size_t added to std::os::raw grimacing

This sounds sensible to me, though in the meantime, wonder if this could be done with an external crate which defines c_size_t alternatively. I guess it's not ideal, but it would mean there's a definite type which clearly says 'I will change meanings on different platforms' and it would keep it confined to one place.

ac-frg pushed a commit to ac-frg/minijail that referenced this issue Jul 17, 2022
This will give us identical bindings for all platforms regardless of
what underlying C type they use to define size_t.

For some reason, this isn't the default:
rust-lang/rust-bindgen#1901

Previously, we generated different things on different platforms:

x86-64:     pub type size_t = ::std::os::raw::c_ulong;
32-bit arm: pub type size_t = ::std::os::raw::c_uint;

It also lets us eliminate a couple of size_t casts in the Rust wrapper
code.

Bug: None
Test: (cd rust/minijail; cargo test)
Change-Id: Ic37fc7fb2cf2ad173b4bc184a499011d299c8917
pvdrz pushed a commit to ferrous-systems/rust-bindgen that referenced this issue Sep 22, 2022
…ng#1901, rust-lang#1903)

This addresses the underlying issue identified in rust-lang#1671, that size_t
(integer that can hold any object size) isn't guaranteed to match usize,
which is defined more like uintptr_t (integer that can hold any
pointer). However, on almost all platforms, this is true, and in fact
Rust already uses usize extensively in contexts where size_t would be
more appropriate, such as slice indexing. So, it's better for ergonomics
when interfacing with C code to map the C size_t type to usize. (See
also discussion in rust-lang/rust#65473 about how usize really should be
defined as size_t, not uintptr_t.)

The previous fix for rust-lang#1671 removed the special case for size_t and
defaulted to binding it as a normal typedef.  This change effectively
reverts that and goes back to mapping size_t to usize (and ssize_t to
isize), but also ensures that if size_t is emitted, the typedef'd type
of size_t in fact is compatible with usize (defined by checking that the
size and alignment match the target pointer width). For (hypothetical)
platforms where this is not true, or for compatibility with the default
behavior of bindgen between 0.53 and this commit, onwards, you can
disable this mapping with --no-size_t-is-usize.
@pvdrz pvdrz closed this as completed in cc78b6f Sep 24, 2022
qsdrqs pushed a commit to qsdrqs/rust-bindgen that referenced this issue Oct 26, 2022
…ng#1901, rust-lang#1903)

This addresses the underlying issue identified in rust-lang#1671, that size_t
(integer that can hold any object size) isn't guaranteed to match usize,
which is defined more like uintptr_t (integer that can hold any
pointer). However, on almost all platforms, this is true, and in fact
Rust already uses usize extensively in contexts where size_t would be
more appropriate, such as slice indexing. So, it's better for ergonomics
when interfacing with C code to map the C size_t type to usize. (See
also discussion in rust-lang/rust#65473 about how usize really should be
defined as size_t, not uintptr_t.)

The previous fix for rust-lang#1671 removed the special case for size_t and
defaulted to binding it as a normal typedef.  This change effectively
reverts that and goes back to mapping size_t to usize (and ssize_t to
isize), but also ensures that if size_t is emitted, the typedef'd type
of size_t in fact is compatible with usize (defined by checking that the
size and alignment match the target pointer width). For (hypothetical)
platforms where this is not true, or for compatibility with the default
behavior of bindgen between 0.53 and this commit, onwards, you can
disable this mapping with --no-size_t-is-usize.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants