Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: int/uint portability to 16-bit CPUs #161

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
167 changes: 167 additions & 0 deletions active/0000-intindex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
- Start Date: 2014-07-11
- RFC PR #: (leave this empty)
- Rust Issue #: (leave this empty)

# Summary

Either rename the types `int` and `uint` to `index` and `uindex` to avoid
misconceptions and misuses, or specify that they're always at least 32-bits wide
to avoid the worst portability problems. Also document when to use and not use
these types and which integer type to pick "by default." (See below for the
meaning of "by default.")


# Motivation

So Rust libraries won't have new overflow bugs when run on embedded devices with

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just replace the entier paragraph with: "Avoid bugs where programmer presumed default integer size for indexing of arrays and eleswhere."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can expand a little to just say that this concerns non-32 bit targets, mostly, 8-bit and 16-bit MCUs and, to some extend 64-bit CPUs too.

16-bit addressing, ditto for code written for 64-bit addressing then run in
32-bit environments. Rust is a very compelling replacement for C/C++ in embedded
devices, "Internet of Things" devices, and safety-critical robotics actuators.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "Internet of Things" has to do with that anyway?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better phrasing: IoT is a class of embedded devices that's forecast to surpass the PC + tablet + phone units combined within 3 years and sell 9G units 4 years from now, and the biggest downside is threats to data & physical security.

http://www.businessinsider.com/the-internet-of-everything-2014-slide-deck-sai-2014-2?op=1

I don't want them programmed in C/C++.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but for this purpose it's just a spammy statement. We all know about
those "things" and the term IoT doesn't need to be thrown in every piece of
text on the internet. I definitely wouldn't use Rust if it claimed itself
"The language of the IoT".


So programmers will know when to use which integer types.


# Background

Rust defines types `int` and `uint` as integers that are wide enough to hold a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even true on 16-bit devices, or do modern ones still use a segmentation system? Are there any relevant 16-bit chips anymore?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XMEGA are 8/16-bit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some Atmel AVR controllers http://en.wikipedia.org/wiki/Atmel_AVR and some PIC controllers http://en.wikipedia.org/wiki/PIC_microcontroller have 16-bit address spaces. These tend to have Harvard architectures, that is, separate instruction and data memory/addresses.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the MSP430...

pointer. The language uses them for array indexes since `uint` is large enough
to index into any memory array and `int` is useful for a difference between two
pointers or array indexes.

(A rationale given for using unsigned array indexes is to allow array bounds
checking in one comparison rather than two. However, a compiler can generate one
unsigned comparison to bounds-check a signed integer index as long as the lower
bound is 0.)

`int`/`uint` can also good choices for indexing and sizing in-memory containers.

The point of an integer type that depends on the target address space is to give
the same code compact array indexes in small-address targets while supporting
huge arrays in large-address targets. But using these types for computations
that are not limited by addressing leads to code that's not portable to
smaller-address targets than it was developed and tested on.

From decades of C/C++ experience, programmers have learned to pick `int`/`uint`
as the "default" integer types where not particularly constrained by
requirements, e.g.:

* where any modest-size integer will do (e.g. a loop index)
* to avoid cluttering APIs with ungermane integer sizes
* to hold tags and other values supplied by callers

(Java programmers are also accustomed to `int` as the default integer type, but
a Java `int` is always 32-bits.)

Programmers should figure out a value's needed integer range then maybe widen to
a "default" type for easy interconnections. For a value in the range 1 .. 100,
you can pick from 10 types. Choosing an 8-bit or 16-bit integer is an
optimization. Premature? Which integer type should you pick when you're writing
exploratory code and haven't yet done the range analysis? What if you're passing
values through many layers of code and computations?

A default is handy but a target-dependent size does not make a good default. And
yet `int` and `uint` _look_ like default integer types.

To clear up some misconceptions from C/C++:

* _They're not the fastest integers._ Example: x86_64 and ARM64 have 64-bit address spaces and 64-bit integer registers, but 32-bit integers are faster since those arithmetic instructions are faster, more data fits in cache, and the vector instructions can operate on twice as many values at a time.
* _They're not "native" size or register size._ Example: The [x32 ABI](https://en.wikipedia.org/wiki/X32_ABI) has 64-bit general purpose registers but 32-bit pointers so its `int`/`uint` are 32-bit.
* _They're not necessarily the same size as C `int`._ C doesn't define `int` the same way.
* _They're not wide enough to casually pick,_ given 16-bit address spaces.
* _They're not "portable."_ They overflow differently and take different numbers of binary I/O bytes on different platforms.

These misconceptions lead to misuses and thus to code with overflow bugs
(checked or unchecked) when running in a smaller address space than originally
considered and tested.

The worst failure mode is in libraries written with desktop CPUs in mind and
then used in small embedded devices.


# Detailed design

Change these two type names so they're self-documenting and less misused. The
names `index` and `uindex` are meant to convey their use in array indexing. Use
them more narrowly.

Alternate name choices:

- `isize` and `usize`, [borrowing from C's](http://en.cppreference.com/w/cpp/types/integer) `ssize_t` and `size_t` but adopting Rust's integer prefixes.
- `intptr` and `uintptr`, [borrowing from C's](http://en.cppreference.com/w/cpp/types/integer) `intptr_t` and `uintptr_t`. These names are awkward by design.
- `PointerSizedInt` and `PointerSizedUInt`.
- `intps` and `uintps`.

To ease the transition, first deprecate the old types.

**Alternative:** specify that these two integer types are _at least 32-bits
wide_ on every target architecture. That avoids the worst failure mode although
it doesn't help when code tested in a 64-bit address space later runs in a
32-bit address space.

**Either way:** The style guide should document when to use and not use these
types and elect a particular integer type for programmers to pick "by default".
This RFC recommends `i32`.

The style guide should also recommend using signed integers except when unsigned values are required such as for modulo 2^N arithmetic. The
[Google Style Guide](http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Integer_Types) explains:

> In particular, do not use unsigned types to say a number will never be negative. Instead, use assertions for this. ...
>
> Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion makes a lot of sense in a context where overflow/underflow silently wraps around. However, if something like RFC PR #146 were to be implemented, then it would once again make sense to use types which more accurately express the range of legal values (i.e., which are self-documenting), because compiler-added checks can be enabled to catch errors where the value would go out of range. Accurate types with compiler-added assertions beats inaccurate types with programmer-added assertions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glaebhoerl So would you recommend we wait for PR #146 to be accepted or rejected before evaluating this RFC further?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. This was just an ancillary remark on an ancillary part of the proposal. The main part of the proposal (which is about changes to the language to better accomodate [portability to] 16-bit architectures) is unaffected.

(And anyway, the suggestion makes sense in the context of the current language, and the style guide could just be updated again if the language changes.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Nice insight, @glaebhoerl.

I'll make the style guide recommendation conditional on overflow-checking.

Q. Does/will overflow checking happen during conversion between integer types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A. It doesn't currently, but in the context of #146, if #[overflow_checks(on)], I think it should.

Rationale: As far as I can tell as is meant to preserve meaning rather than representation, e.g. 5000i32 as f32 is equivalent to 5000f32 and not to transmute::<i32, f32>(5000i32). Therefore if attempting to transport the meaning of the original value to the target type causes it to overflow, it should be caught.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Otherwise computing a value in one integer type then converting to another would accidentally bypass the overflow checks.


Furthermore:

> You should assume that an `int` is at least 32 bits, but don't assume that it has more than 32 bits.

This assumption does not hold for PalmOS even on 32-bit ARM, where `int` is
16-bits for backward compatibility with PalmOS running on Motorola 68000.


# Drawbacks

- Renaming `int`/`uint` requires figuring out which of the current uses to replace with `index`/`uindex` vs. `i32`/`u32`/`BigInt`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And some people will just end-up redefining the int and uint to be 32-bit in their projects...

- The new names are more verbose.


# Alternatives

1. Set a coding style guide and code review expectation to use `int`/`uint` only for array indexing and related operations despite C programmers' expectations. Elect an integer type such as `i32` to use "by default." Update the existing libraries.
2. Fix the portability bugs later.


# Notes

See the discussions from many contributors to [Issue #14758](https://github.com/rust-lang/rust/issues/14758) and [Issue #9940](https://github.com/rust-lang/rust/issues/9940).

Also see [Issue #11831](https://github.com/rust-lang/rust/issues/11831) about
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that that issue is closed, and that int and uint are no longer the compiler driven default, as the fall back has been removed. rust-lang/rust#6023 (i.e. it's already been decided that people aren't happy with this choice.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear on Issue #11831 "Decide whether to keep pointer sized integers as the default", "make a firm decision one way or the other on whether we want to commit to int and uint being pointer sized".

Q. "Default" means for the compiler not for programmers, right?

Q. When the issue was closed, was that a commitment to keeping int and uint as pointer-sized?

If so, it's not my intention to reopen a design decision, just to propose in that case making int/uint be at least 32 bits.

keeping pointer sized integers as the default. If people are happy with that
choice, then this RFC is about making `int`/`uint` at least 32-bits wide and
setting style guidelines for integer types.

[Daniel Micay notes](https://github.com/rust-lang/rust/issues/9940#issuecomment-32104831):

> If you're using `int` as a "default", then you're not using it correctly. It
> will be 16-bit on an architecture with a 16-bit address space, 32-bit or
> 64-bit. If you're doing your testing on a 64-bit architecture, you're going to
> miss plenty of bugs.

[Carter Tazio notes](https://github.com/rust-lang/rust/issues/9940#issuecomment-32088729)
that system-dependent integers in GHC Haskell cause recurring problems, and
there's some buy-in for fixing it.

[Niko Matsakis requested](https://github.com/rust-lang/rust/issues/9940#issuecomment-32119318)
a survey of uses of `int` and `uint` showing how many of them are
appropriate / inappropriate / borderline.

More recently, [type inference no longer falls back to `int`/`uint`](https://github.com/rust-lang/rust/issues/6023) and there's an RFC for
[Scoped attributes for checked arithmetic](https://github.com/rust-lang/rfcs/pull/146).


# Not in scope of this RFC

Changes in overflow handling.


# Unresolved questions

Who'll implement the changes?