-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support index size != pointer width #65473
Comments
Thank you for this information, it is useful to be sure that the "simplest" solution isn't good enough.
Agreed, otherwise, according to the information provided, we cannot technically support such targets properly.
Integers are casted back to pointers and subsequently dereferenced all the time. I think that what that issue shows is that doing so is ok (and therefore answering your question, provenance isn't a problem).
The problem is that we do guarantee that this safe Rust code is portable to all platforms that Rust supports and is correct (playground): let ptr: *const i32;
let x: usize = ptr as usize;
let y = x as *const i32;
assert_eq!(ptr, y); A lot of correct tricky unsafe code relies on this to work on all targets, and we guarantee that such code is portable. It is unclear to me whether we guarantee this for all platforms that Rust will ever support or for all platforms that Rust currently supports. Either way, I don't think the distinction is very important if we can find a good solution. Note that we already allow So we could add the following new language feature:
With that, code that fails to compile on CHERI can be upgraded from If we ever wanted to be consistent about using This might not be a backward compatible change, depending on whether this breaks the Alternatively, we could just define an |
Drive-by note (I currently have no budget to dive deeply into this topic):
|
Damn, thanks. So the reference doesn’t say what that does AFAICT, only that
it is a pointer to address cast, but the address doesn’t fit, so I suppose
it gets truncated?
|
I have some bandwidth to address this topic in rustc (i.e. implementation work), but I guess there's lots of people who need to know about the semantics.
I think this is a separate issue. Given that it's Safe Rust, I'd be cautious about proposing we restrict the guarantee to currently-supported targets. In any case, no problem with CHERI; casting integers to pointers is OK unless you dereference the result (much like Rust, except it also traps if you try to forge a pointer to a valid object).
One of the big questions in this issue is what the It's not clear to me: I (wishfully) want to interpret it as meaning
This feels like doing something you don't mean. From a compiler perspective, there seems to no more reason to use u128 on CHERI than on x86. It just happens that pointers take up extra space. You can't recover more data than the 64-bit address from the architecture*, and I suspect this approach would mean the same workaround for future architectures. (*) when pretending a pointer is an int |
This feels like RFC material. |
The documentation of
That's what we currently guarantee, and all existing safe and unsafe Rust code can and does rely on this being true.
Thanks, this is useful. Maybe we could "tune" the definition of let x: *mut T;
let x: usize = transmute(x); would fail to compile because This would certainly be "weird", and this does not turn
I think that any change to the guarantees of |
Ah, I suppose the question should be how liberally this can be interpreted. I claim that yes, while sometimes code relies on
This model works nicely for CHERI. I don't know about future architectures, but I guess this definition hints at some sort of bijection between
Ah, but as the transmute documentation, nobody should be doing that...transforming pointers into pointers is fine (i.e. fiddling the types), and if you're converting to I guess I'm just highlighting that
Digression: I could compile nostd Rust (c. 1.35 nightly) programs with 128-bit CHERI capabilities earlier this year, after patching libcore. Not many changes required! Everything I have written above is really "lessons learnt" and design thoughts, but full support will be a different story.
Absolutely; this issue was to flesh out ideas before presenting an RFC. If/when the consensus is that clear options exist, I'm more than happy to bring this to RFC. Thanks for all the brainwork so far! |
Notice that independently of what we recommend people to do or not, we still do guarantee that this works correctly (transmute is unnecessary for that case, but it is not wrong). Notice that we do also guarantee this for |
Point taken.
Yes, and I suppose code that relies on specific pointer representations will never work on architectures with different pointer representations, capabilities or not. If we accept such architectures as legitimate targets, the best we can do (from a compiler/language perspective) is probably to reject compilation with a helpful message. |
Sounds good to me. I think that for code using |
FYI, for those unaware, Arm are investing heavily in a prototype adaptation of CHERI for Armv8 called Morello (https://developer.arm.com/architectures/cpu-architecture/a-profile/morello), with development boards for a real quad-core processor (based on the Neoverse N1 used in things like Amazon's Graviton2 instances) becoming available early next year (there is already a simulator you can download from Arm's developer portal). Arm are not committing to making an official CHERI extension to Armv8 (and Morello is definitely not how it will look; it includes multiple ways of accomplishing the same thing in order to determine which ways perform best, are most useful for software, etc, and a production architecture would trim it down), but if the program is successful it is likely that it will eventually happen, so CHERI is quickly becoming a very real architecture. |
I am responsible for the design of the built-in Our CHERI is a single-provenance model. Every pointer must be derived from exactly one existing pointer. Within the C abstract machine, addresses of globals or stack allocations and returns from Note that the separation between a |
…ng#1901, rust-lang#1903) This addresses the underlying issue identified in rust-lang#1671, that size_t (integer that can hold any object size) isn't guaranteed to match usize, which is defined more like uintptr_t (integer that can hold any pointer). However, on almost all platforms, this is true, and in fact Rust already uses usize extensively in contexts where size_t would be more appropriate, such as slice indexing. So, it's better for ergonomics when interfacing with C code to map the C size_t type to usize. (See also discussion in rust-lang/rust#65473 about how usize really should be defined as size_t, not uintptr_t.) The previous fix for rust-lang#1671 removed the special case for size_t and defaulted to binding it as a normal typedef. This change effectively reverts that and goes back to mapping size_t to usize (and ssize_t to isize), but also ensures that if size_t is emitted, the typedef'd type of size_t in fact is compatible with usize (defined by checking that the size and alignment match the target pointer width). For (hypothetical) platforms where this is not true, or for compatibility with the default behavior of bindgen between 0.53 and this commit, onwards, you can disable this mapping with --no-size_t-is-usize.
https://www.reddit.com/r/rust/comments/x28fea/zen4_avx512_support_reminds_me_32bit_indices_in/ Thanks for this issue. I've always been after this . basically an alternate target with 64bit addressing, but 32bit datastructures - a world where no individual collection can exceed 4gb. This is the GPU use case, and also suits the more advanced auto-vectorizable CPU SIMD options (AVX512 making a resurgence with Zen4, and there are RISC-V designs). It would also be my default. still in 2022, I am targeting 8-16gb RAM, just as I was in 2014 when I first discovered rust , and 64bit indices are just overkill. My idea is to make a type "std::index", probably "std::uindex" aswell. These would be defined to "usize", "isize" by default keep usize= strictly the pointer size, no change there. Change all the collections in the stdlib and index operator to use the "std::index" type alias. Then introduce either a #[cfg()] or new target tripple for a world where the the pointer size is 64bits, and the index size is 32bits (and also for retro users, 32bit pointer, 16bit index to handle x86 real mode ; perhaps there are microcontrollers out there that work like this. As this is all opt-in there would be no breaking changes in any codebase. Just refactor any code that may want to use the new modes to use std::index instead of usize where appropriate. |
…ng#1901, rust-lang#1903) This addresses the underlying issue identified in rust-lang#1671, that size_t (integer that can hold any object size) isn't guaranteed to match usize, which is defined more like uintptr_t (integer that can hold any pointer). However, on almost all platforms, this is true, and in fact Rust already uses usize extensively in contexts where size_t would be more appropriate, such as slice indexing. So, it's better for ergonomics when interfacing with C code to map the C size_t type to usize. (See also discussion in rust-lang/rust#65473 about how usize really should be defined as size_t, not uintptr_t.) The previous fix for rust-lang#1671 removed the special case for size_t and defaulted to binding it as a normal typedef. This change effectively reverts that and goes back to mapping size_t to usize (and ssize_t to isize), but also ensures that if size_t is emitted, the typedef'd type of size_t in fact is compatible with usize (defined by checking that the size and alignment match the target pointer width). For (hypothetical) platforms where this is not true, or for compatibility with the default behavior of bindgen between 0.53 and this commit, onwards, you can disable this mapping with --no-size_t-is-usize.
…ng#1901, rust-lang#1903) This addresses the underlying issue identified in rust-lang#1671, that size_t (integer that can hold any object size) isn't guaranteed to match usize, which is defined more like uintptr_t (integer that can hold any pointer). However, on almost all platforms, this is true, and in fact Rust already uses usize extensively in contexts where size_t would be more appropriate, such as slice indexing. So, it's better for ergonomics when interfacing with C code to map the C size_t type to usize. (See also discussion in rust-lang/rust#65473 about how usize really should be defined as size_t, not uintptr_t.) The previous fix for rust-lang#1671 removed the special case for size_t and defaulted to binding it as a normal typedef. This change effectively reverts that and goes back to mapping size_t to usize (and ssize_t to isize), but also ensures that if size_t is emitted, the typedef'd type of size_t in fact is compatible with usize (defined by checking that the size and alignment match the target pointer width). For (hypothetical) platforms where this is not true, or for compatibility with the default behavior of bindgen between 0.53 and this commit, onwards, you can disable this mapping with --no-size_t-is-usize.
…ng#1901, rust-lang#1903) This addresses the underlying issue identified in rust-lang#1671, that size_t (integer that can hold any object size) isn't guaranteed to match usize, which is defined more like uintptr_t (integer that can hold any pointer). However, on almost all platforms, this is true, and in fact Rust already uses usize extensively in contexts where size_t would be more appropriate, such as slice indexing. So, it's better for ergonomics when interfacing with C code to map the C size_t type to usize. (See also discussion in rust-lang/rust#65473 about how usize really should be defined as size_t, not uintptr_t.) The previous fix for rust-lang#1671 removed the special case for size_t and defaulted to binding it as a normal typedef. This change effectively reverts that and goes back to mapping size_t to usize (and ssize_t to isize), but also ensures that if size_t is emitted, the typedef'd type of size_t in fact is compatible with usize (defined by checking that the size and alignment match the target pointer width). For (hypothetical) platforms where this is not true, or for compatibility with the default behavior of bindgen between 0.53 and this commit, onwards, you can disable this mapping with --no-size_t-is-usize.
Currently we enforce that our code only runs on machines with a certain pointer width (32 or 64). One of the underlying reasons is because of requirements in consensus code in Bitcoin Core which requires containers with more than 2^16 (65536) items [0]. We can better express our requirements by asserting on Rust's index size (the `usize` type). As a side benefit, there is active work [1] to make Rust support architectures where pointer width != idex size. With this patch applied `rust-bitcoin` will function correctly even if that work progresses. - [0] rust-bitcoin#2929 (comment) - [1] rust-lang/rust#65473
Currently we enforce that our code only runs on machines with a certain pointer width (32 or 64). One of the underlying reasons is because of requirements in consensus code in Bitcoin Core which requires containers with more than 2^16 (65536) items [0]. We can better express our requirements by asserting on Rust's index size (the `usize` type). As a side benefit, there is active work [1] to make Rust support architectures where pointer width != idex size. With this patch applied `rust-bitcoin` will function correctly even if that work progresses. - [0] rust-bitcoin#2929 (comment) - [1] rust-lang/rust#65473
Currently we enforce that our code only runs on machines with a certain pointer width (32 or 64). One of the underlying reasons is because of requirements in consensus code in Bitcoin Core which requires containers with more than 2^16 (65536) items [0]. We can better express our requirements by asserting on Rust's index size (the `usize` type). As a side benefit, there is active work [1] to make Rust support architectures where pointer width != idex size. With this patch applied `rust-bitcoin` will function correctly even if that work progresses. - [0] rust-bitcoin#2929 (comment) - [1] rust-lang/rust#65473
Currently we enforce that our code only runs on machines with a certain pointer width (32 or 64). One of the underlying reasons is because of requirements in consensus code in Bitcoin Core which requires containers with more than 2^16 (65536) items [0]. We can better express our requirements by asserting on Rust's index size (the `usize` type). As a side benefit, there is active work [1] to make Rust support architectures where pointer width != idex size. With this patch applied `rust-bitcoin` will function correctly even if that work progresses. - [0] rust-bitcoin#2929 (comment) - [1] rust-lang/rust#65473
Currently we enforce that our code only runs on machines with a certain pointer width (32 or 64 by failing to compile if pointer size width is 16). One of the underlying reasons is because of requirements in consensus code in Bitcoin Core which requires containers with more than 2^16 (65536) items [0]. We can better express our requirements by asserting on Rust's index size (the `usize` type). As a side benefit, there is active work [1] to make Rust support architectures where pointer width != idex size. With this patch applied `rust-bitcoin` will function correctly even if that work progresses. - [0] rust-bitcoin#2929 (comment) - [1] rust-lang/rust#65473
Currently we enforce that our code only runs on machines with a certain pointer width (32 or 64 by failing to compile if pointer size width is 16). One of the underlying reasons is because of requirements in consensus code in Bitcoin Core which requires containers with more than 2^16 (65536) items [0]. We can better express our requirements by asserting on Rust's index size (the `usize` type). As a side benefit, there is active work [1] to make Rust support architectures where pointer width != idex size. With this patch applied `rust-bitcoin` will function correctly even if that work progresses. - [0] rust-bitcoin#2929 (comment) - [1] rust-lang/rust#65473
Preliminaries
usize
is the pointer-sized unsigned integer type [1].It is also Rust's index type for slices and loops; this definition works well when pointer size corresponds to the space of indexable objects (most targets today). Informally,
uintptr_t == size_t
.Note that the target pointer width is indisputably set by the LLVM data layout string.
It would be correct to say that it is currently impossible to have
usize
different totarget_pointer_width
without breaking numerous assumptions in rustc [2, 3].Unfortunately,
uintptr_t == size_t
doesn't hold for all architectures. For context, I've worked toward (not active) compiling Rust for MIPS/CHERI (CHERI128) [4]. This target has 128-bit capability pointers (as in layout string), and a 64-bit processor and address space.I also assume that we don't want programmers messing with pointers in Safe Rust, and that they shouldn't have to care how a pointer (or reference) is represented/manipulated by an architecture.
Problem
I think that more than one type is necessary here, to distinguish between the "index" or "size" component of a pointer (a la
size_t
), and the space required to contain a pointer (uintptr_t
).To me, the ideal solution is to change
usize
to be in line withsize_t
and notuintptr_t
. As @briansmith notes, this would be a breaking semantic change. I claim that this is only problematic on architectures whereuintptr_t != size_t
. As such, code breakage from changing this assumption is constrained to targets where the code was already broken.Why not have a 128-bit
usize
? This is technically feasible, and it's the basis of my compilation of Rust for CHERI. But:memcpy
with 128-bit integers. This isn't defined in the backend, and arguably shouldn't be defined. I will not be the last person to wonder whymemcpy
doesn't generate any instructions.ptr as int
gives an LLVMi64
, which can't be cast/isn't comparable to ani128
; again there is no good reason to manipulate 128-bit integers here. Likewise when callinginttoptr
, which is a valid instruction even if the result can't be dereferenced [5].It may not be necessary to define and expose a
uintptr_t
type. It's optionally defined in C; I'm not sure programmers want to use such a type, and it could be relegated to the compiler. I haven't thought about this seriously, though.The key issue is the conflict between index size and pointer width. How can we resolve this conflict, and support architectures with index size != pointer width? (or: why isn't this a problem at all?)
Other questions
Is this a better kind of broken? I don't know, that's what this issue is for. What is certain is that lots of libc-using code probably depends on
usize == uintptr_t == size_t
and that these will break in either case.Is provenance a problem? From my experience with the Rust compiler, no [6]. Integers (
usize
) are never cast back to pointers and dereferenced. We already know this at some level (rust-lang/unsafe-code-guidelines#52). This suggests no fundamental link between indexing (i.e.usize
) and pointer width.Will we really see 128-bit pointers in our lifetime? I don't speak with authority on CHERI, but 64 bits definitely isn't enough for the "usual" 48-bit address space there [7].
But CHERI breaks the C specification; how can we discuss this issue in terms of C types? This issue really isn't about CHERI [8], or C. I won't speculate on the C specification or whether it's helpful for Rust. I use C types as the people likely to engage with this issue are familiar with them.
What about LLVM address spaces? This is a whole new can of worms. I believe rustc will only use one LLVM address space, and in particular won't support two address spaces with different pointer widths. This is an issue for CHERI in hybrid capability mode, but also of supporting any architecture with multiple address spaces. AVR-Rust probably cares about address spaces and may have some expertise here.
Related
usize == uintptr_t
(Deprecate pointer-width integer aliases libc#1400)usize
==size_t
will break C FFI code (Are raw pointers to sized types usable in C FFI ? unsafe-code-guidelines#99). This isn't a problem per se, but we almost encourage wrong assumptions in unsafe code.usize
being linked to the bitness of the architecture (What about: volatile, concurrency, and interaction with untrusted threads unsafe-code-guidelines#152)get_pointer_width
#56567); also related: Policy for assumptions about the size ofusize
rfcs#1748Notes
[1] From https://doc.rust-lang.org/std/primitive.usize.html
[2] As remarked by @gnzlbg in rust-lang/libc#1400 (comment); this related problem is a bit subtle and quite complex.
[3] It isn't clear (to me!) whether this is primarily a compiler implementation problem or a semantic problem, but that is not the subject of this issue.
[4] This issue does not motivate support of a particular architecture, though there has been community interest in CHERI.
[5] This is relevant when finding out the size of an object, for example. While generating instructions to extend or truncate the integers is possible, this seems a silly use of cycles at compile time (and possibly runtime).
[6] My experience is limited to rustc (c. 1.35 nightly), libcompiler_builtins, libcore, and liballoc. Some modification was needed to make this work, but no egregious violations.
[7] See CHERI Concentrate for an overview of the considerations.
[8] In particular I'm not asking for help in porting Rust to CHERI, or any other platform. However, I would like support for other architectures to be technically possible.
(edits because I accidentally posted early)
The text was updated successfully, but these errors were encountered: