Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for #![feature(c_size_t)] (std::os::raw::c_size_t/std::os::raw::c_ssize_t) #88345

Open
1 of 3 tasks
Tracked by #1568
thomcc opened this issue Aug 25, 2021 · 26 comments
Open
1 of 3 tasks
Tracked by #1568
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@thomcc
Copy link
Member

thomcc commented Aug 25, 2021

Feature gate: #![feature(c_size_t)].

This is a tracking issue for std::os::raw::{c_size_t, c_ssize_t}, which are guaranteed to be the same size as the underlying C size_t and ssize_t types from stddef.h.

Currently, on all targets, this is equivalent to usize and isize, however Rust has historically gone somewhat out of its way to avoid promising this. There are some targets with vaguely-planned support where this is not true (W65, used for SNES homebrew, for example)

Further reading here is available:

And probably more.

Public API

// std::os::raw

pub type c_size_t = usize;
pub type c_ssize_t = isize;

Steps / History

Unresolved Questions

  • Do we want to instead guarantee usize and size_t are the same? See https://internals.rust-lang.org/t/pre-rfc-usize-is-not-size-t/15369
  • Should this instead live in libcore somehow, given that probably libstd may never support platforms where this is not true. (This may not be true, since I imagine CHERI will support libstd)
  • Do we need both the signed and unsigned version, given that size_t is more common in function signatures.
  • ...
@thomcc thomcc added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 25, 2021
@joshtriplett
Copy link
Member

Should this instead live in libcore somehow, given that probably libstd may never support platforms where this is not true.

At some point, I'd like to copy all the types from std::os::raw into core::ffi. core is compiled for the target system.

This particular proposal doesn't need to block on that, though; these should go into std::os::raw for now, and we can separately work on populating core::ffi.

@chorman0773
Copy link
Contributor

Note that some types may not be the same on all targets for the same Architecture. For example (though I don't believe c_long exists), on x86_64-*-linux-gnu (or any Sys-V target for x86_64) c_long would be i64, but on x86_64-*-windows-msvc, it would be i32. I believe that, currently, everything in core, with the exception of atomic types, is gated on the architecture only. That may be something to consider.

@bjorn3
Copy link
Member

bjorn3 commented Aug 26, 2021

Even atomic types are effectively gated on target cpu only and not the OS AFAICT. It just so happens that different target triples on the same architecture sometimes use a different target cpu with different levels of atomic support.

@chorman0773
Copy link
Contributor

chorman0773 commented Aug 26, 2021 via email

@thomcc
Copy link
Member Author

thomcc commented Aug 26, 2021

Modulo bugs, I believe platforms where OS support is required for atomic types are considered not to have atomic types by Rust.

(This seems to be getting off topic, perhaps)

@magicant
Copy link

ssize_t is actually not from stddef.h but from sys/types.h which is specific to Unix. On Windows, SSIZE_T is defined in BaseTsd.h.

@thomcc
Copy link
Member Author

thomcc commented Oct 30, 2021

Nice catch. Do you want to submit a PR fixing the docs?

@magicant
Copy link

Yes, I'd love to. But I'm not sure if just fixing the doc for ssize_t is okay. Since ssize_t is platform-specific as I said, I'd rather suggest removing it from std::os::raw and possibly placing it under std::os::*::raw. If so:

  • For Unix, std::os::unix::raw has been deprecated in favor of the libc crate. Maybe we should not add std::os::unix::raw::ssize_t now.
  • For Windows, adding std::os::windows::raw::SSIZE_T would be okay, but adding only SSIZE_T among many types defined in BaseTsd.h might be kind of weird.
  • For other platforms, I'm not sure if ssize_t is available in C at all.

What do you think?

@ChrisDenton
Copy link
Member

Officially, isn't ptrdiff_t the signed counterpart to size_t?

@chorman0773
Copy link
Contributor

chorman0773 commented Oct 30, 2021 via email

@thomcc
Copy link
Member Author

thomcc commented Oct 30, 2021

This probably should be changed to either expose ptrdiff_t instead of ssize_t, or to only expose size_t.

Either way, the future of this API depends a lot on https://internals.rust-lang.org/t/pre-rfc-usize-is-not-size-t/15369.

@thomcc
Copy link
Member Author

thomcc commented Oct 30, 2021

This is hardly scientific, but to quantify my hunch that size_t is more common than ssize_t and ptrdiff_t in my installed headers, I did a check1 to see if this is true. The numbers I got were:

Linux:

  • 23793 hits for size_t
  • 928 hits for ptrdiff_t
  • 869 hits for ssize_t

Mac:

  • 5761 hits for size_t
  • 282 hits for ptrdiff_t
  • 136 hits for ssize_t

Which probably means that size_t is the important thing here, and the signed version doesn't matter so much.

Footnotes

  1. In the spirit of reproducibility, I ran the following command in fish-shell2 to get these counts:

    rg -ce "$pattern" $include_dirs | string replace -r '^.*:' '' | paste -s -d+ - | bc
    

    Where:

    • $pattern is: \bsize_t\b, \bssize_t\b, or \bptrdiff_t\b
    • $include_dirs on linux was /usr/local/include and /usr/include.
    • $include_dirs on mac was /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include and /usr/local/include. These both might be system dependent in the following ways:
      • The first you can find it with something like xcrun --show-sdk-path --sdk macosx
      • And the second is where brew installs things (which seems to be different on new installs? Or maybe just on M1? IDK).
  2. fish-shell is required because string replace is easier for me than remembering how to use sed, but mostly becasue I had something very similar in my history.

    Sorry. If you tell me the equivalent sed incantation to delete everything prior to the last : in a line, I'll update this comment

bors added a commit to rust-lang-ci/rust that referenced this issue Nov 3, 2021
…now-ptrdiff_t-is-my-best-friend, r=joshtriplett

Replace `std::os::raw::c_ssize_t` with `std::os::raw::c_ptrdiff_t`

The discussions in rust-lang#88345 brought up that `ssize_t` is not actually the signed index type defined in stddef.h, but instead it's `ptrdiff_t`. It seems pretty clear that the use of `ssize_t` here was a mistake on my part, and that if we're going to bother having a isize-alike for FFI in `std::os::raw`, it should be `ptrdiff_t` and not `ssize_t`.

Anyway, both this and `c_size_t` are dubious in the face of the discussion in https://internals.rust-lang.org/t/pre-rfc-usize-is-not-size-t/15369, and any RFC/project-group/etc that handles those issues there should contend with these types in some manner, but that doesn't mean we shouldn't fix something wrong like this, even if it is unstable.

All that said, `size_t` is *vastly* more common in function signatures than either `ssize_t` or `ptrdiff_t`, so I'm going to update the tracking issue's list of unresolved questions to note that perhaps we only want `c_size_t` — I mostly added the signed version for symmetry, rather than to meet a need. (Given this, I'm also fine with modifying this patch to instead remove `c_ssize_t` without a replacement)

CC `@magicant` (who brought the issue up)
CC `@chorman0773` (who has a significantly firmer grasp on the minutae of the C standard than I do)

r? `@joshtriplett` (original reviewer, active in the discussions around this)
@ojeda
Copy link
Contributor

ojeda commented Jun 29, 2022

These are now in core::ffi too since #94503 for #![feature(core_ffi_c)].

@CAD97
Copy link
Contributor

CAD97 commented Jul 17, 2022

Just noting that these types are currently missing a reëxport into std. Even if presence of std means c_size_t === usize, the type should be available to show intent and keep std's public API a strict superset of core's (ignoring internal-only perma-unstable items that are only public as an impl detail ofc).

copybara-service bot pushed a commit to google/crubit that referenced this issue May 18, 2023
This CL provides mappings for most types under `core::ffi`, except for:
* Types like `core::ffi::ptrdiff_t`, because they depend on an
  experimental feature: rust-lang/rust#88345
* `core::ffi::c_char` because of b/276931370 and because continuing
  translating `char` to `u8` reduces the amount of cascading changes in
  this CL (e.g. changes to `crubit/support/cc_std/string_view.rs`).

Note that some other `clang/AST/BuiltinTypes.def` types like `char16_t`
don't have an equivelent under `core::ffi`. This CL doesn't change how
such types are handled (some of them are already covered by
`rs_bindings_from_cc/type_map.cc`).

PiperOrigin-RevId: 533279988
@tgross35
Copy link
Contributor

tgross35 commented Nov 24, 2023

Would we be able to stabilize these or a subset? I know we have the size_t ?= usize discussion somewhat ongoing, but I don't think this blocks us from providing the C representation, right?

Assuming they are always the same for our current supported targets, we just need to resolve that before adding targets where uintptr_t and size_t do differ... I think

@briansmith
Copy link
Contributor

I don't believe the current implementation of pub type c_size_t = usize; is correct if the long-term plans to allow usize != size_t because the current implementation guarantees implicit conversion between usize and c_size_t. I believe that we would instead need to use #[repr(transparent)] struct c_size_t(usize); and then implement explicitly the operations that we are guaranteeing, and then also implement TryFrom conversions for the rest. This would require us to decide what guarantees we are making; e.g. usize is always the same size as size_t or larger than size_t, but never smaller?

Or otherwise, we need to document that on targets where usize and size_t are the same, implicit conversions are supported, but on other targets they are not. Or something else to that effect. However, in the past, we have not allowed even EXPLICIT conversions to be supported or not on a per-target basis. For example, there is no impl From<u32> for usize on targets where usize is the same size (or larger than) u32.

let x: usize = 1;
let y: core::ffi::size_t = x;  // Works when a type alias, doesn't work otherwise.

@chorman0773
Copy link
Contributor

chorman0773 commented Nov 27, 2023

This is true for the other c_* types. c_int is a type alias of i32 on most platforms, but it's documented to be platform specific. I think it's fine if we make it clear that pub type c_size_t = usize; is a "True on this platform only" thing.

@briansmith
Copy link
Contributor

This is true for the other c_* types. c_int is a type alias of i32 on most platforms, but it's documented to be platform specific. I think it's fine if we make it clear that pub type c_size_t = usize; is a "True on this platform only" thing.

Was this a historical accident or is this intentional though? IIRC it was intentional to not support the explicit conversions with From because they wouldn't work across all targets. It would be good to have an explicit decision about these things, e.g. so we could eventually resolve what to do about From<u32> and whatnot consistently as well.

@jmillikin
Copy link
Contributor

I think the following table roughly captures current state:

C type Standard Rust type Definition
size_t C89 core::ffi::c_size_t C89: "the unsigned integral type of the result of the sizeof operator"
ptrdiff_t C89 core::ffi::c_ptrdiff_t C89: "the signed integral type of the result of subtracting two pointers"
intptr_t C99 isize C99: "a signed integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer"
uintptr_t C99 usize C99: "an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer"
ssize_t POSIX core::ffi::c_ssize_t / libc:ssize_t POSIX.1-2001: "Used for a count of bytes or an error indication"

I agree that changing ffi::c_size_t and ffi::c_ptrdiff_t to #[repr(transparent)] structs would be correct, because aside from tradition there's not really anything requiring them to be large enough to store a pointer.

  • Presumably on a CHERI architecture with ptr = (u64 caps, u64 addr) these types would both be u64 rather than u128.
  • There should be impl From<u16> for c_size_t and impl From<i16> for c_ptrdiff_t, but conversion from integer sizes > 16 bits (including usize and isize) should be TryFrom.
  • It's safe to impl From<c_size_t> for usize, based on the assumption that Rust doesn't target architectures where value sizes can exceed the address range of pointers.
  • It's safe to impl From<c_ptrdiff_t> for isize, because the difference between two pointers can't exceed the size of the pointers themselves.

It does seem that Rust is missing type core:ffi::c_intptr_t = isize; and type core::ffi::c_uintptr_t = usize;, which aren't strictly necessary but would be great for documentation (e.g. c_size_t could link them to explain why it's not the same as usize).

IMO core::ffi::c_ssize_t should simply be removed. It's specific to the Unix idiom of inline error signaling by returning -1, and is inherently non-portable to non-hosted environments that core must support.

@chorman0773
Copy link
Contributor

Changing them to structs would mean you can't initialize them from integer literals.

@jedbrown
Copy link
Contributor

I just want to make sure folks in this thread are aware of the Rust for Morello/CHERI Pre-RFC usize semantics, which would imply size_t = usize and ptrdiff_t = isize. This would be consistent with strict_provenance. These want to move away from conflating intptr_t/uintptr_t with isize/usize, and it seems the main dilemma is between compatibility (keep working without warnings on non-CHERI) versus strictness (warnings or errors for bad conversions even on non-CHERI architectures where they are harmless).

@jmillikin
Copy link
Contributor

Changing them to structs would mean you can't initialize them from integer literals.

That doesn't seem like a big deal -- the size_t values are going to be either things passed back from the FFI, or TryFrom-converted from a native Rust type.

For (hopefully rare?) cases when the C API expects things like buffer sizes to be provided directly, the From<u16> lets some_c_fn(123u16.into()) work (despite being slightly ugly).

I just want to make sure folks in this thread are aware of the Rust for Morello/CHERI Pre-RFC usize semantics, which would imply size_t = usize and ptrdiff_t = isize. This would be consistent with strict_provenance. These want to move away from conflating intptr_t/uintptr_t with isize/usize, and it seems the main dilemma is between compatibility (keep working without warnings on non-CHERI) versus strictness (warnings or errors for bad conversions even on non-CHERI architectures where they are harmless).

I've seen a lot of "Pre-RFC" threads in the forum that don't go anywhere. Is that a Pre-RFC in the sense that the Rust core language team is onboard with potentially changing the definition of isize / usize and just wants to solicit early feedback? Or in the sense of "[Pre-RFC] wishlist for things that won't happen"?

If the definition of isize / usize changes to match the C semantics of ptrdiff_t / size_t then that would mean that table should be flipped around (c_size_t and c_ptrdiff_t as aliases, c_intptr_t and c_uintptr_t as transparent structs).

@Sympatron
Copy link

IMO core::ffi::c_ssize_t should simply be removed. It's specific to the Unix idiom of inline error signaling by returning -1, and is inherently non-portable to non-hosted environments that core must support.

I don't really understand this point. In my experience it is a common idiom in C in general, not just Unix. An how is that related to core? ssize_t is just the signed version of size_t, so it will be available on any platform just with a different corresponding Rust type.

@briansmith
Copy link
Contributor

I don't really understand this point. In my experience it is a common idiom in C in general, not just Unix. An how is that related to core? ssize_t is just the signed version of size_t, so it will be available on any platform just with a different corresponding Rust type.

ssize_t is an operating-system-defined thing. I think it may make sense in std::os::raw but IDK if it makes sense in core::ffi which (IME) is oriented to completely-OS-agnostic code. Note that we already have core::ffi::c_ptrdiff_t.

@CAD97
Copy link
Contributor

CAD97 commented Nov 29, 2023

Extremely pedantically speaking, it's somewhat possible to argue that c_size_t should be a type alias for u64 (or whichever size is appropriate), not usize. Arithmetic types in C are fairly fungible due to the usual arithmetic conversions1, but on an LP64 system, it's typically the case that size_tC, uint64_tC, and unsigned long intC are all the same type. And even in the face of arithmetic conversions, this is observable by anything that asks about compatible types, e.g. _Generic.

On the other hand, C differentiates between short, int, long, and long long, even though at least two of those will probably2 have the same width, but Rust says (LP64) that c_long and c_longlong are both aliases of i64, so we already don't really care about which types C differentiates, just about integer width, which is presumably all that the ABI cares about.


Dialing back the pedantry.

I don't think it's possible to stabilize c_size_t without answering whether usize is size_tC, uintptr_tC, both, or neither. The chart of implication, as I see it, is roughly

c_size_t c_uintptr_t usize is...
unavailable unavailable indeterminate
usize unavailable size_t
u64 unavailable uintptr_t
usize usize both
u64 u64 neither

Without looking specifically at CHERI or W65, I think the most viable options are:

  • c_size_t and c_uintptr_t are both aliases for usize. Make it clear that these are still target dependent aliases and don't necessarily match, but they're both usize on most targets.
  • c_size_t and c_uintptr_t are both aliases for uNN, as target appropriate. Deal with the required conversions, like is already done when using e.g. c_int.
  • c_uintptr_t is an alias for a new type which acts like an integer for API/ABI but maintains provenance (similar to sptr::uptr). Probably then define usize as always being size_tC.

On CHERI, a "uptr" type must be used to match the behavior of intptr_tC carrying provenance and having the alignment/size of a pointer (128+1 bits) but only the integer range of vaddr_tC (64 bits).

Footnotes

  1. At one point I thought _BitInt(N) was supposed to be exempt from integer conversions, but no, it's only exempt from integer promotion. Promotion is what eagerly turns everything smaller into int; conversion handles arithmetic between types unequal after promotion and (as if) by assignment to a smaller type.

  2. I believe CHERI/Morello still uses 64 bit long long for ease of porting and due to ABI constraints, even though theoretically 128 bit long long would be legal.

@chorman0773
Copy link
Contributor

Having it be uN unconditionally will make my life easier, since I was planning to just have it expand to a macro called cfg_int!(target_size_width) (which is a little bit of unstable, compiler-specific shenanigans).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests