allow integer types to be any range #3806

andrewrk · 2019-11-29T22:31:12Z

Zig already has ranged integer types, however the range is required to be a signed or unsigned power of 2. This proposal is for generalizing them further, to allow any arbitrary range.

comptime {
    assert(i32 == @Int(-1 << 31, 1 << 31));
    assert(u32 == @Int(0, 1 << 32));
    assert(u0 == @Int(0, 1);
    assert(noreturn == @int(0, 0));
}

Let's consider some reasons to do this:

One common practice for C developers is to use -1 or MAX_UINT32 (and related) constants as an in-bound indicator of metadata. For example, the stage1 compiler uses a size_t field to indicate the ABI size of a type, but the value SIZE_MAX is used to indicate that the size is not yet computed.

In Zig we want people to use Optionals for this, but there's a catch: the in-bound special value uses less memory for the type. In Zig on 64-bit targets, @sizeOf(usize) == 8 and @sizeOf(?usize) == 16. That's a huge cost to pay, for something that could take up 0 bits of information if you are willing to give up a single value inside the range of a usize.

With ranged integers, this could be made type-safe:

const AbiSize = @Int(0, (1 << usize.bit_count) - 1);
const MyType = struct {
    abi_size: ?AbiSize,
};
var my_type: MyType = undefined;

test "catching a bug" {
    var other_thing: usize = 1234;

    my_type.abi_size = other_thing; // error: expected @Int(0, 18446744073709551615), found usize
}

Now, not only do we have the Optionals feature of zig protecting against accidentally using a very large integer when it is supposed to indicate null, but we also have the compile error helping out with range checks. One can choose to deal with the larger ranged value by handling the possibility, and returning an error, or with @intCast, which inserts a handy safety check.

How about if there are 2 special values rather than 1?

const N = union(enum) {
    special1,
    special2,
    normal: @Int(0, (1 << u32) - 2),
};

Here, size of N would be 4 bytes.

Let's consider another example, with enums.

Enums allow defining a set of possible values for a type:

const E = enum {
    one,
    two,
    three,
};

There are 3 possible values of this type, so Zig chooses to use u2 for the tag type. It will require 1 byte to represent it, wasting 6 bits. If you wrap it in an optional, that will be 16 bits to represent something that, according to information theory, requires only 2 bits. And Zig's hands are tied; because currently each field requires ABI alignment, each byte is necessary.

If #3802 is accepted and implemented, and the is_null bit of optionals becomes align(0), then ?E can remain 1 byte, and ?E in a struct with align(0) will take up 3 bits.

However, consider if the enum was allowed to choose a ranged integer type. It would choose @Int(0, 3). Wrapped in an optional, it actually could choose to use the integer value 3 as the is_null bit. Then ?E in a struct will take up 2 bits.

Again, assuming #3802 is implemented, Zig would even be able to "flatten" several enums into the same integer:

const Mode = enum { // 2-bit tag type
    Debug,
    ReleaseSafe,
    ReleaseFast,
    ReleaseSmall
};
const Endian = enum { // 1-bit tag type
    big,
    little,
};
pub const AtomicOrder = enum { // 3-bit tag type
    Unordered,
    Monotonic,
    Acquire,
    Release,
    AcqRel,
    SeqCst,
};
pub const AtomicRmwOp = enum { // 4-bit tag type
    Xchg,
    Add,
    Sub,
    And,
    Nand,
    Or,
    Xor,
    Max,
    Min,
};
const MyFancyType = struct {
    mode: Mode align(0),
    endian: Endian align(0),
    atomic_order: AtomicOrder align(0),
    op: AtomicRmwOp align(0),
};

If you add up all the bits of the tag type, it comes out to 10, meaning that the size of MyFancyType would have to be 2 bytes. However, with ranged integers as tag types, zig would be able to flatten out all the enum tag values into one byte. In fact there are only 21 total tag types here, leaving room for 235 more total tags before MyFancyType would have to gain another byte of size.

This proposal would solve #747. Peer type resolution of comptime ints would produce a ranged integer:

export fn foo(b: bool) void {
    // master branch: error: cannot store runtime value in type 'comptime_int'
    const x = if (b) -10 else 100;
    // proposal: @typeOf(x) == @Int(-10, 101)
}

With optional pointers, Zig has an optimization to use the zero address as the null value. The allowzero property can be used to indicate that the address 0 is valid. This is effectively treating the address as a ranged integer type! This optimization for optional pointers could now be described in userland types:

const PointerAddress = @Int(1, 1 << usize.bit_count);
const Pointer = ?PointerAddress;

comptime {
    assert(@sizeOf(PointerAddress) == @sizeOf(usize));
    assert(@sizeOf(Pointer) == @sizeOf(usize));
}

One possible extension to this proposal would be to allow pointer types to override the address integer type. Rather than allowzero which is single purpose, they could do something like this:

comptime {
    assert(*addrtype(usize) i32 == *allowzero i32);
    assert(*addrtype(@Int(1, 1 << usize.bit_count)) == *i32);
}

This would also introduce type-safety to using more than just 0x0 as a special pointer address, which is perfectly acceptable on most hosted operating systems, and also typically set up in freestanding environments as well. Typically, the entire first page of memory is unmapped, and often the virtual address space is limited to 48 bits making @Int(os.page_size, 1 << 48) a good default address type for pointers on many targets! Combining this with the fact that pointers also have alignment bits to play with, this would give Zig's type system the ability to pack a lot of data into pointers which are annotated with align(0).

What about two's complement wrapping math operations? Two's complement only works on powers-of-two integer types. Wrapping math operations would not be allowed on non-power-of-two integer types. Compile error.

The text was updated successfully, but these errors were encountered:

daurnimator · 2019-11-30T02:10:09Z

I'm worried this might result in a code explosion as different types are passed to a generic.

Idea: input to a generic could be "marked" for which properties are used.
e.g. if I have a pure-storage generic, like ArrayList, then the range of the members doesn't really matter for the implementation of that generic: the only property it cares about at runtime is the size of the type, the rest is only useful at comptime.

Rocknest · 2019-11-30T15:11:39Z

@andrewrk i cannot understand how did you get required size of MyFancyType struct 'less that one byte ... leaving room for 235 more tags'.

Since struct is a product type, MyFancyType has a total of 4 * 2 * 6 * 9 = 432 possible states. Requiring at least ⌈log₂(432)⌉ = 9 bits to store them.

andrewrk · 2019-11-30T15:24:04Z

@Rocknest my math is wrong, thanks for the correction 👍

emekoi · 2019-12-01T00:16:52Z

wouldn't it be @Type(TypeInfo.Int{...}) where TypeInfo.Int looks like:

pub const Int = struct {
    is_signed: bool,
    bits: comptime_int,
	from: comptime_int,
    to: comptime_int,
};

because of the @Type builtin?

andrewrk · 2019-12-01T00:19:54Z

Yes except only from and to are needed - the rest is redundant information. I'd call it start and end to hint that the upper bound is exclusive. Good point, I forgot that @Int is deprecated.

@Int(0, 1) == @Type(.{ .Int = .{ .start = 0, .end = 1}})

@Type is a bit more verbose though, for such a fundamental type as an integer. I'm not 100% sold on @Type yet.

emekoi · 2019-12-01T00:21:02Z

should this be marked a breaking proposal as well?

daurnimator · 2019-12-01T00:23:44Z

Yes except only from and to are needed - the rest is redundant information.

You still need the bits. e.g. I might have an API that takes a usize but only allows values between 1 and 4096

andrewrk · 2019-12-01T00:30:39Z

1 is >= 0, so it's unsigned
log2(4096) = 12, so it's 12 bits

You're thinking of a different, already planned issue, which is zig keeping track of additional information beyond the type, for expressions and const values, which will allow, for example, this test to pass:

test "runtime value hints" {
    var a: u64 = 99999;
    // `a` is runtime known because it is `var`.
    const b: u64 = a % 256;
    const c: u8 = b; // allowed without @intCast because `b` has a runtime value hint
}

emekoi · 2019-12-01T00:35:39Z

i don't think so. take for example the WebSocket protocol, the opcodes are always 4 bits, but 0xB to 0xF are reserved for future use, so effectively they are not allowed.

daurnimator · 2019-12-01T00:42:42Z

i don't think so. take for example the WebSocket protocol, the opcodes are always 4 bits, but 0xB to 0xF are reserved for future use, so effectively they are not allowed.

That sounds like a usecase for a non-exhaustive enum:

enum(u4) {
    Continuation = 0,
    Text = 1,
    Binary = 2,
    Close = 8,
    Ping = 9,
    Pong = 10,
    _,
}

BarabasGitHub · 2020-02-16T22:43:22Z

This is very nice and fancy, but I would like to note that it might have a negative effect on performance. This is definitely more efficient memory wise, so for storing it (on disk) this is perfect. However for passing it around in code in this format it would mean that every time there is a calculation bit fiddling comes into play to extract the value and put it back.

Just something to think about.

andrewrk · 2020-02-16T22:45:02Z

However for passing it around in code in this format it would mean that every time there is a calculation bit fiddling comes into play to extract the value and put it back.

This is not correct, except with align(0) fields (see #3802). Zig already has arbitrary sized integers. If you use i3, for example, @sizeOf(i3) == 1 because @alignOf(i3) == 1.

Also in many cases if you were to use align(0) fields, the smaller memory use on the cache lines would cause more performance gains than lost to the extra bit shifting.

BarabasGitHub · 2020-02-17T14:43:43Z

Yes, sorry I was thinking about align(0) cases. For the other cases this proposal only adds range checks, right?

Also in many cases if you were to use align(0) fields, the smaller memory use on the cache lines would cause more performance gains than lost to the extra bit shifting.

I'm a bit sceptical about this as I've had experience of the opposite. It probably depends a lot on the circumstances. Which is why I don't think a compiler should be sneaky and perform bit fiddling behind your back. However if it just applies to align(0), it is something which you ask for so it's fine.

Somehow I feel though that this is more something to do with optionals and other constructs which have more information than arbitrary numbers and enums, right?

It's more like in this example:

const N = union(enum) {
    special1,
    special2,
    normal: @Int(0, (1 << u32) - 2),
};

Where each case has a special number or range of numbers, such as:

const N = union(enum) {
    special1 = max32,
    special2 = max32 -1,
    normal = 0..max32 - 2,
};

And I don't think it goes for your other example where you have four, what to me seems independant, enums:

const MyFancyType = struct {
    mode: Mode align(0),
    endian: Endian align(0),
    atomic_order: AtomicOrder align(0),
    op: AtomicRmwOp align(0),
};
If you add up all the bits of the tag type, it comes out to 10, meaning that the size of MyFancyType would have to be 2 bytes. However, with ranged integers as tag types, zig would be able to flatten out all the enum tag values into one byte. In fact there are only 21 total tag types here, leaving room for 235 more total tags before MyFancyType would have to gain another byte of size.

Aren't there 4 * 2 * 6 * 9 = 432 possible combinations of those four enums? Which means you still need 9 bits (or two bytes)?

In general I do think it would be great if the optionals can work more generically. Not having to write all the -1 checks would be cool. Especially since sometimes people work with indices instead of pointers and the special value trick is quite common.

kyle-github · 2020-02-17T15:29:30Z

Note that integer ranges have a long history in some other languages like Pascal and Ada. There they are heavily used and can be used as ranges for arrays. This leads to some nice code where you can use ranged for loops without any possibility of going out of bounds.

It has been decades since I used Pascal or Ada in anger but even back then I seem to remember that most run time checks could be dropped as the compiler could prove that the use was safe. A very naive compiler would do the checks all the time, but existing compilers show that the possible overhead is quite low. As long as the programmer is aware (such as choosing different arithmetic with overflow wrapping) that this could have a cost, I think this is fine.

The driving case here: more compact representation of enums and optionals is a very strong one IMHO. Having just optionals use the same space as non-optional types would be a huge argument to get programmers to use them. The cache argument is a strong one too. A cache miss can cost 25-50 clock cycles (if you hit L2, worse if you fall all the way to RAM) and you can run a lot of bit twiddling code in that time.

SpexGuy · 2020-02-19T22:16:06Z

I like the core of this proposal, but there are a few things I'm worried about.

First, I feel like the optimizations are too implicit. You have to hope that you left the correct number of values open. If you do it wrong, there is no indication that you have messed up, you just have bigger types than you expected. As code changes over time, it's easy for additional values to be added to enums that are sensitive to this, which may break these optimizations without warning.
If I'm writing Zig and my goal is to remove the overhead of the optional, I would want a guarantee from the compiler that my code has received the optimization, or a compile error. The original suggestion is difficult to think about when optional optimization is your end goal, and it fails silently when the optimization can't be applied.

I'm worried that, because it's so implicit, general zig advice may become "put explicit bounds on every integer type you use, so that the compiler can optimize it". This could create a lot of additional cognitive load when reading and writing zig code. Does ?@Int(0, 262144) receive the optimization? It doesn't, but it's not immediately apparent unless you happen to know off the top of your head that 262144 is 1<<18.

It also seems to me like working with these types would in many cases require @intCast peppered all over the place. If you want to iterate over the range, you need to be able to test when you've gone past the end. Which means you need to use a larger integer type for the iterator and cast it inside the loop. If you want to do math between ranged ints and store the result in a ranged int, you will very probably need a range cast. Those casts can harden the code, making it more difficult to change the ranges if your requirements change. They also have the potential to make generic code that uses ranged ints very difficult to write correctly. And if you get the range wrong, it can't be a compile error because it's a cast. It can only manifest as checked UB. How much mental bandwidth are you willing to spend on calculating the ranges of your intermediate values?

Finally, while applying range information to type resolution would solve #747, it also introduces a problem when assigning directly. var x = 4; would now compile, but set the type of x to @Int(4, 5), which is not a very useful type for a var. This would then move the compile failure to the line where x is modified, with a pretty cryptic error message (or worse, checked UB, depending on how arithmetic between ranged types resolves).

There is a precedent for an opaque optional optimization in Zig: ?*. This generates a sentinel value zero, and I think we can all agree that this is a Good Idea. Where I think the above proposal differs is that this new optimization is dependent on picking range constants that obey properties that are not immediately obvious. I think that we could go a long way with a simpler guarantee like "If there are unused bits in an integer type in a struct, an optional of that integer type is guaranteed to put a valid flag in the unused bits.". This is mechanically the same as making the flag in optionals align(0), but (at least for me) is a much easier wording to load into Brain RAM and work with. We could then add explicit mechanisms to trigger the other optimizations suggested above if you need them.

Here are some off-the-cuff suggestions. There's definitely room for improvement for these, but I think they do a good job of showing the direction I'm looking.

Range-checked Explicit Integers

These are the same as the original proposal, but the integer bit-width is specified by the programmer. For example, @RangedInt(u32, 5, 1<<32-1) or @RangedInt(comptime_int, 10, 101). If the range cannot be represented by the underlying type, it is a compile error. This provides a path for type resolution of arithmetic on ranged integers that won't introduce surprises if ranges change. The type resolution of addition of these values is the result type of addition of the explicit integer types of the two operands, restricted to the maximum range of addition of the two operands, saturated to the maximum representable range of the resolved bit type. For convenience, we could supply a library function SmallestRangedInt(comptime min_value: comptime_int, comptime exclusive_max_value: comptime_int) type. But I don't think an inferred type should be the language default.

Integer Unions:

An integer union (or alternatively, "partition" or "ranged enum") would be a union of non-overlapping ranged ints. It is guaranteed to be the size of its backing integer type.

const Foo = partition(u32) {
    normal = 0..(1 << 32) - 2,
    special1 = (1 << 32) - 2,
    special2 = (1 << 32) - 1,
};

This is very similar to the union example in the original proposal, but with this if the range is specified incorrectly or a new value is added, there would be a compile error instead of a drastic change in representation. If the integer type is not specified, it would be inferred from the ranges as the smallest type that can represent the range. But the ability to specify the bit width and receive an error if it can't be represented is valuable. Ranged unions could also support the '_' option from extensible unions.

Explicit Sentinel Optionals:

If an optional type could be specially constructed by specifying a sentinel, this would solve sentinel checks in a way that guarantees the optimization. Something like @OptionalSentinel(comptime IntType: type, comptime sentinel: meta.BackingIntegerType(IntType)). This documents the sentinel value, which makes debugging much less painful. It also guarantees that the optimization is made, so you don't need to worry that changes to the underlying int range will reintroduce the optional flag bit. IntType could be an integer, ranged integer, enum, or integer union.

None of these counter-proposals are perfect, but I like that they take more direct control over the optimization. E.g. instead of thinking "I'd like to optimize this optional to use a sentinel value, I should put ranges this int type everywhere it's used", I can instead think "I'd like to optimize this optional to use a sentinel value, I should tell zig to optimize this optional to use a sentinel value.".

BarabasGitHub · 2020-02-20T13:26:12Z

I think that we could go a long way with a simpler guarantee like "If there are unused bits in an integer type in a struct, an optional of that integer type is guaranteed to put a valid flag in the unused bits.".

Sorry, small nitpick. It's not actually bits, it's unused values. For example in the pointer case 0 isn't a specific bit, it's just a special value. It's a slight difference. Same with the common -1/max-uint, it's not just the last bit, it's actually one value. But I see from your examples/proposals that you understood.

I agree with you that it would be cleaner if it's more explicit.

SpexGuy · 2020-02-20T15:53:18Z

Sorry, yeah, my wording was unclear. I'm trying to convey that the simpler guarantee of a bit for optionals of non-power-of-two integers is easier to guarantee and think about than an inferred sentinel, and that something like @OptionalSentinel might be better for when a sentinel value is needed.

ghost · 2020-05-19T14:00:50Z

If we make it so an optional ranged type interprets any out-of-range value as null, we could do things like this:

// Zero-cost wrapper for an ABI that uses negative values as error codes
const Result = fn (Int: Type) Type {
    return union {
        // Nullable limits to avoid generating unnecessary bounds checks
        success: @Ranged(Int, 0, null),
        failure: @Ranged(Int, null, -1),
    };
};

Define canonical null as the most negative value above the range if it exists, else the most positive value below the range. That way, even if a type is multiple-optionalised, the span of non-null values at any level is always contiguous, and does not change with type widening/narrowing in the most common cases.

Possible optimisation: allow nesting of @Ranged optionals, and only allow the ? type operator to be used on ranged types (pointer types are implicitly ranged), like so:

// Can be optimised to u32 with tag
// -- don't have to set global rules
const OptionalU32 = ?@Ranged(u32, null, maxInt(u32));

// Don't need another range -- canonical null of
// OptionalU32 is inner null, anything else is outer null
const OptionalOptionalU32 = ?OptionalU32;

// ;)
const Bool = ?@Ranged(u1, 1, null);
const True: Bool = 1;
const False: Bool = null;
const And = fn (T: Type, a: ?T, b: ?T) ?T {
    return if (a) { b } else { null };
};
const Or = fn (T: Type, a: ?T, b: ?T) ?T {
    return a orelse b;
};
const Not = fn (a: Bool) Bool {
    return if (a) { null } else { 1 };
};

Pro

More explicit in intention
Optionals can now be used in packed objects
Booleans don't have to be primitive ;-)
Con
Optionalising arbitrary calltime-known type now impossible, though I can't think of a use case for this that isn't possible another way

shawnl · 2020-06-08T06:34:42Z

You could use INT_MIN for null, as it already has some pretty funny properties. This still requires a new type (that interestingly would be agnostic to twos-complement and ones-complement), but it is far less invasion that this proposal, which IMHO belongs in an optimizing compiler and not zig.

shawnl · 2020-06-18T10:37:12Z

We already have a ranged integer type. It is called enum. This proposal creates far more problems that it solves.

andrewrk · 2023-07-04T23:50:07Z

Thanks for looking into it. I agree with everything you said, and to add to that, one thing I'm concerned about is how easy or difficult it will be to choose integer types for function parameters. I think we just have to fuck around and find out.

davidstone · 2023-09-30T20:31:34Z

I have implemented such a family of types in C++: https://github.com/davidstone/bounded-integer/blob/main/bounded-readme.md. I have not experienced any code size or run-time performance issues (and sometimes I get improvements), and I have a moderate-sized code base that uses bounded::integer pervasively. For instance, I have an implementation of std::vector (C++'s dynamic array type) that is implemented in terms of this integer type and based on all of my test cases generates better code than any other standard library (although I believe most of this improvement is due to unrelated reasons, this shows it doesn't prevent optimization).

From the documentation:

bounded::integer<1, 100> const x = f();
bounded::integer<-3, 7> const y = g();
auto z = x + y;
static_assert(std::same_as<decltype(z), bounded::integer<-2, 107>>, "Type of z incorrect.");

After using this, I feel crippled when I'm forced to use the types of integers that come with most programming languages.

The main drawback is increased compile times, but a big portion of that is due to the cost of C++ template instantiation. I'm not sure how that differs in Zig and whether you'd be implementing it in the compiler.

mlugg · 2023-10-01T03:43:22Z

@davidstone Thank you for this, it's a very useful data point.

I have no reason to believe an implementation of this would lead to any significant changes in compile times - the only thing I can think of is that when dealing with particularly large types, you may be performing a fair bit more bigint arithmetic, but that would only matters for types with bounds larger than probably i64/u64. I still doubt that would have any noticeable perf impact, since the vast majority of time in semantic analysis is not spent on these simple arithmetic operations.

shoutouts to ziglang/zig#3806

When the slice-by-length start position is runtime-known, it is likely protected by a runtime-known condition and therefore a compile error is less appropriate than a runtime panic check. This is demonstrated in the json code that was updated and then reverted in this commit. When #3806 is implemented, this decision can be reassessed. Revert "std: work around compiler unable to evaluate condition at compile time" Revert "frontend: comptime array slice-by-length OOB detection" This reverts commit 7741aca. This reverts commit 2583b38.

When the slice-by-length start position is runtime-known, it is likely protected by a runtime-known condition and therefore a compile error is less appropriate than a runtime panic check. This is demonstrated in the json code that was updated and then reverted in this commit. When ziglang#3806 is implemented, this decision can be reassessed. Revert "std: work around compiler unable to evaluate condition at compile time" Revert "frontend: comptime array slice-by-length OOB detection" This reverts commit 7741aca. This reverts commit 2583b38.

This reverts commit cb5a6be. I deeply apologize for the churn. This change is problematic given that we do not have ranged integers (yet? see #3806). In the meantime, this type needs to be `usize`, matching the length and index types for all std lib data structures. Users who want to save memory should not use heap-allocated BoundedArray values, since it is inherently memory-inefficient. Use a different memory layout instead. If #3806 is accepted and implemented, the length value can become an integer with the appropriate range, without the footgun. If that proposal is not accepted, len type will remain a usize.

mnemnion · 2024-09-21T19:49:08Z

It seems worth observing that this offers a possible solution to the felt need in #17039/#14704. Andrew made a comment to that effect in the first of those issues, I wanted to add some notes about it here.

If and when integers are defined as a range, rather than a width, it could be worthwhile to make this legal:

for (u8) |byte| {
   // ...
}

If I saw this I would have no question about what values byte will take, or the type, or the order in which they're iterated.

If that's too weird, it could take a builtin:

for (@range(u8)) |byte| {
    // ...
}

But I think that in a world where integers are ranges, it would be eloquent for them to coerce to an iteration of that range in a for loop position. Anything involving a different order or stride would need to use a while loop, this approach wards off the temptation to add 'just a bit more syntax', it treats an integer type as implicitly iterable, like an array or slice.

This brings me to a point which is tangential to the main thrust of this comment, yet necessary for what follows: ranged integers should be defined inclusively, as closed intervals. I feel that this accords with how we reckon with integers already: I think of a u8 as [0, 255], not [0, 256): everything which fits in eight bits, nothing which does not.

I do understand why using the exclusive range felt more natural, of course, but we really want it to be inclusive: most of this comment serves as the argument for that, although this point is not why I came to this issue today.

That would make u0 == @Int(0,0), u1 == @Int(0,1), and so on. It wouldn't be possible to define noreturn as an integer type, which I think is correct: an uninhabited integer range is not a necessary or especially coherent thing to have, the empty set of integers (if needed) can be void. We don't currently have a way to express an uninhabited integer type, and I don't see a compelling need for it: it could be spelled @Int(1,0), there's precedent, but I would vote against this. Zig might benefit from an uninhabited bottom type, which noreturn is not (#6602-comment ), but this shouldn't involve integers in any way.

This is how Ada does it, and with reason. Ada also has a 'modular' type, which also specifies wraparound arithmetic for the type. Since this uses modulus, it mentions the first unsigned integer which isn't in the range. Zig uses explicit modular operators, so this is probably not a useful concept for us.

Another data point is the C standard, Section 7.20.1. It doesn't use interval notation at all, but it defines macros for limits in terms of minimum and maximum (inclusive) values.

Inclusive is the better convention in its own right, and also best compatible with the for loop coercion just described. It would be weird if coercing u8 to a range didn't include 255. That was an unexamined problem with #17039, writing something like for (0..256) |byte: u8| is.. legible, but unsatisfyingly inconsistent, normally you can't say 256 and u8 in the same breath. This is in fact a variation on why integer types should be described inclusive of their extreme values.

I broadly agree with not having two range syntaxes for a for loop, certainly not if they vary by one ., and given that, "slice style" instead of "switch style" is the better choice. But both of these are useful nonetheless, and this would give us both.

There's another place this coercion could apply: switch statements.

fn byteSwitch(b: u8) void {
    const a_range: type = @Int(0, 0x7f)
    const b_lead: type = @int(0xc2, 0xdf)
    // ... 
    switch (b) {
        a_range => ... ,
        b_range => ... ,
    }
}

These custom types coerce to u8, and it should be perfectly clear what's going on here. Switch ranges are already inclusive, as they should be, and as integer ranges should be as well.

For the same reason that it would be confusing to have an exclusive range in a switch statement, it would be confusing to have an exclusive range in a ranged integer definition.

Part of why I wrote this out was to be able to link to it later, but also on the premise that it's good to have discussions of the positive (or negative) consequences of a feature in the issue tracking it. I don't see a lot of point in writing the possible coercion features up as a separate issue until or unless this one is tagged 'accepted', although I'm willing to do so.

mnemnion · 2024-09-21T21:46:26Z

I also wanted to respond separately to @SpexGuy's comment, since it has coherent implementation concerns which haven't yet been completely addressed. That was four years ago, and Spex might have a completely different take now, I want to acknowledge that before continuing. Zig has also changed quite a bit in that period of time.

For me, the benefit of this proposal is lifting ranges up into the type system, and the niche optimizations are a nice bonus. I think we can come up with rules which are general, easy to understand, and cover all cases of optional integer types, whether by themselves or in a packed struct.

At the limit, someone concerned if they left a niche in their enum type can use a test: try expectEqual(@bitSizeOf(?E), @bitSizeOf(E));. I don't think losing an existing niche is that different from expanding the backing width of an enum which doesn't specify its integer type: they're both problems where it will either show up (invalidating the width of a packed struct) or if it's important it's worth a test. The literal first Zig file I ever wrote had a test asserting the size of the tagged union I was writing, I think it's basically ok to require users to assert invariants which need to stay that way.

Another solution is this:

const ThreeEnum = enum(@Int(1,3)) {
     fee,
     fie,
     foe,
};

This seems like something which would have to be legal if integer ranges are just integers. This guarantees (according to what follows) an optional sentinel at zero.

It seems to me that the solution to the limited-range problem of assigning comptime ints is something we'd have to do already: you need to provide a type for a var, but not a const. So it would be fine that const a = 4; gets the type @Int(4,4): but a var needs more than a value, it needs a restriction on the future values it can take, and a comptime int value can't provide that. I'd suggest this would apply with or without integer range types, unless we wanted to punt and just say that a comptime_int coerces to a usize or isize when no type is provided. I would prefer that not happen.

Niche Optimization of Ranged Integer Types

I propose a rule like the following: A ranged integer type has an automatic bitwidth, which is the minimum necessary to hold every value in the range. If there are any negative values in the range, this is a signed width, otherwise, unsigned.

If there are representable values in that width which are not part of the range, then the first value higher than the maximum, if it exists, is the optional sentinel, otherwise, the first value lower than the minimum.

If there are no representable values in the bitwidth (these are the integers we have already) then the original bitwidth is sign-extended by one bit, and the first value of the new width which is higher than the maximum of the old width is the optional.

I think this covers all the bases. Part of what Spex was addressing is combinatoric difficulties with the align(0) part of the proposal, which were predicated on another proposal which didn't make it. This rule makes it simple enough to add integer ranges to packed structs, and it reflects the reality of the CPUs the code will ultimately run on.

This rules out some of the more exotic 'optimizations', which would be space-optimizing only, and at the expense of everything else. For example, [-3, 240] could be represented in eight bits, this would require it to have a bitwidth of i9 instead. I think we actually want this property, just on the grounds of sanity: imagine debugging a [450, 700] which is stored inside a single byte! This is something we could do, but really, truly, shouldn't. That range would have a u10 bitwidth, and what we get in return is efficiency and coherence.

Even in the event of following in the footsteps of Pascal and Ada by having arrays with widths and indices defined by a ranged integer, the offsetting of an index into that array should be at the point of use, not in the storage value of a ranged integer used for indexing that array. This is the tip of a huge iceberg, which I will now veer away from.

Alternative version of the rule which is actually better: for an unsigned range, if there are available values higher than the maximum, the highest value which fits in the bitwidth is chosen, otherwise a bit is added and the sentinel is the maximum value of the new width. For a signed range, if there are available values lower than the minimum, the lowest possible value of the width is chosen, otherwise the highest possible value, otherwise it is sign-extended by one and the lowest value of the new bitwidth is the optional sentinel.

My little bait-and-switch here is because that formula is significantly harder to understand, and looks worse at a first glance. But it's better, because it always chooses all 1 bits when it can, and when it can't, it chooses 0 followed by all 1s. This is the opposite of ?*T, but the same as ?*T will frequently not be available, and it's the exact opposite, which is nice. This is less for the compiler to think about, collapsing cases is generally a good thing.

There might be a case for adding "if zero is not in the range, then zero is the sentinel" but I wouldn't make it. Someone else might, but I'm sensitive to the fact that the second version of the rule is more complex, and therefore harder to understand. I'll acknowledge that zero is a good value to be working with when it comes time to emit machine code, however.

I could also be wrong about the more complex rule being better for the compiler, but it would be easier to debug, and that's not nothing.

This neglects the actual size which the integer will take, and that's intentional. A u9 would have an optional bitwidth of u10, and a sentinel of 0b1111_1111_11, and when packed, would be exactly that many bits. On its own, or in an array, it would be of bitsize 16, or sizeof 2, and the sentinel would more than likely be 0b1111_1111_1111_1111, although note that I'm assuming a convention which might not turn out to be the correct one, this is for illustration purposes only.

As a note, Zig's documentation doesn't specify how non-natural-width integers are represented on the machine, and perhaps that's intentional, but, maybe it should.

Back to the comment I'm reply to: I don't think we want either of "bring your own bitwidth" or "bring your own sentinel". That's creating opportunities for incompatibility where there otherwise would not be. It seems to me that even in the packed case, a width wider than the minimum dictated by the rules here can be emulated by adding a pad field after the ranged integer.

It also brings the weirdness that a u32[0,777] should coerce to a u16[0,1000], since it inherently fits. But we don't let a u32 coerce to a u16. This could be made to work, sure, but I think it's causing more problems than it solves.

The major reason to even define what the sentinel will be is debugging, it isn't a value that user code has native access to, although presumably this would reveal it:

const opt_ranged: ?@Int(1, 14) = null;
const as_cast: u8 = @bitCast(opt_ranged);

And I think the fact that the above would be possible, and debugging, are good enough reasons to have the sentinel value be a specified part of the language.

This may be irrelevant, given that align(0) is closed, but I also see little value in allowing enums to combine in the align(0) style of the first post. One consequence of that would be making hex dumps and other debug info very opaque indeed. Set unions of enums have been proposed before, and they have the same basic issues as densely packing logically-separate enums into a single structure: instead of one concrete value, enums could have up to several, and that is most likely the wrong tradeoff to make.

I'd argue that even for packed structs, trying to minimize space in a range like [256, 511] should be resisted. If it's imperative that values of that nature fit in eight bits and not nine, it's reasonable to do that with arithmetic. Things would just get weird otherwise.

The partitioned integer is a cool idea though. I haven't a clue how feasible it is, but it's probably its own issue rather than this one.

davidstone · 2024-10-12T03:21:37Z

You'll have to forgive that all of my example code here uses C++ syntax, I'm not familiar enough with Zig to be confident giving the Zig equivalent.

My C++ library defines the range bounds inclusively (integer<0, 3> can take any of the values in the set {0, 1, 2, 3}), and in my usage that has almost always been the more convenient choice and I can't remember any examples where it was confusing and caused a bug. The primary time that comes to mind where I wanted a half-open range where when I want to get the index type of a range (integer<0, max_size - 1>), but I just added an index_type that does the that for me and then never worried about it again.

I agree that we should make the representation of the integer type match its value. In other words, integer<10, 15>(10) should store bits that represent 10, not 0. This means that an integer type like integer<256, 260> requires 2 bytes of storage even though it theoretically only needs 1, but in exchange arithmetic and comparisons are a single instruction. In cases where you are very size constrained, it's fairly easy to write a packed_integer type that performs the necessary arithmetic adjustments for you, but you'll get much better code generation for the common case by not offsetting.

My integer type also works with a compressed optional implementation -- it's split into the general optional type and an associated traits class to manage the optimization for integer types specifically. The general idea is similar to what was outlined here: use the unused values as sentinel values. My implementation uses the least representable value as the first tombstone value and counts up from there, skipping over the valid value range. So if integer<1, 10> is represented by a uint8_t, a disengaged optional of that type would be represented by 0x00. The tombstone_traits idea allows space-efficient optional<optional<T>>, as well as more general tombstone-style code (for instance, some hash maps need two tombstone values, so in that example the two representations would be 0x00 and 0x0A). I chose this only because that ensures that for unsigned underlying storage, the representation of a disengaged optional<T> is all 0x00 bytes. I don't know if that actually matters for anything, but every other choice seemed more complex or even less likely to have a beneficial representation.

I much prefer the design of my library where the user says "I want an integer in this range, you figure out how to store it (integer<5, 10>)" over a design where the user says "I want an integer in this range that is stored in this underlying type (u8<5, 10>)".

davidstone · 2024-10-12T03:26:40Z

It's probably not clear by casual inspection and I don't know how optional support works in Zig, but in my C++ library, the type optional<integer<0, 254>> takes 1 byte of storage with the value of 255 being the sentinel and it follows the compressed code path, but the type optional<integer<0, 255>> takes two bytes of storage and it follows the regular uncompressed codepath that is used by default for any types that don't have the ability to store sentinel values in them. This is shown in optional.cpp by looking at the optional<nullable T> specialization vs the optional<typename T> base template.

This reverts commit cb5a6be. I deeply apologize for the churn. This change is problematic given that we do not have ranged integers (yet? see ziglang#3806). In the meantime, this type needs to be `usize`, matching the length and index types for all std lib data structures. Users who want to save memory should not use heap-allocated BoundedArray values, since it is inherently memory-inefficient. Use a different memory layout instead. If ziglang#3806 is accepted and implemented, the length value can become an integer with the appropriate range, without the footgun. If that proposal is not accepted, len type will remain a usize.

blblack · 2024-11-06T18:57:52Z

Just chiming in that there's also a nice interaction between this and the "named integer types" shown in https://ziglang.org/devlog/2024/#2024-11-04 for defining named types with massive but specific ranges of valid values:

pub const SpecialType = enum(@Int(40, 40_000_000)) {
    _,
};

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Nov 29, 2019

andrewrk added this to the 0.7.0 milestone Nov 29, 2019

andrewrk mentioned this issue Nov 29, 2019

proposal: Non-zero integer types #3386

Closed

andrewrk added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Dec 1, 2019

daurnimator mentioned this issue Dec 28, 2019

std.unicode fixes #3987

Merged

daurnimator mentioned this issue Jan 19, 2020

Turn errno into an enum #4233

Closed

andrewrk mentioned this issue Feb 16, 2020

update std.math functions to widen instead of returning errors #4481

Open

This was referenced Apr 25, 2020

Proposal: Use padding to optimize optional type sizes #5162

Closed

optional type optimization #104

Open

daurnimator mentioned this issue Jun 8, 2020

Unicode (utf-8) rework #5559

Closed

andrewrk mentioned this issue Jul 9, 2023

introduce a compile error for when a result location provides integer widening ambiguity #16310

Open

andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023

mlugg mentioned this issue Nov 22, 2023

langref: add basic documentation of RLS #18043

Closed

dweiller mentioned this issue Dec 31, 2023

sema: add compile error for OOB by-length slice of array #18410

Merged

rohlem mentioned this issue Jan 20, 2024

sema: add OOB safety check for by-length slice of array #18598

Closed

1 task

andrewrk added a commit to andrewrk/flag2struct that referenced this issue Feb 7, 2024

bigger type around @clz

3c3afe4

shoutouts to ziglang/zig#3806

andrewrk mentioned this issue Mar 21, 2024

add OOB safety check for by-length slice of array #19374

Merged

andrewrk mentioned this issue Apr 9, 2024

make @round, @trunc, @floor, and @ceil support integers; delete std.math.lossyCast #19575

Open

mlugg mentioned this issue Jun 5, 2024

Proposal: Improve overflow operations #20203

Open

rohlem mentioned this issue Jun 20, 2024

-@as(u0,0) unnecessarily triggers a compile error #20360

Open

InKryption mentioned this issue Jul 10, 2024

Proposal: std.math.{min,max}Int return T #20574

Open

andrewrk mentioned this issue Aug 16, 2024

For loop range is unusable with types other than usize #17039

Closed

andrewrk mentioned this issue Aug 24, 2024

BoundedArray: add a len() function to get the length as a usize #18073

Closed

mnemnion mentioned this issue Sep 24, 2024

Switch Prongs Defined as Comptime-Known Arrays #21507

Open

mlugg mentioned this issue Dec 7, 2024

Proposal: remove Peer Type Resolution from the language #22182

Open

mlugg mentioned this issue Dec 28, 2024

Incorrect return type for @abs #22340

Closed

mlugg mentioned this issue Jan 12, 2025

Should entries be u32 for std.os.linux.IoUring.init? #22470

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow integer types to be any range #3806

allow integer types to be any range #3806

andrewrk commented Nov 29, 2019 •

edited

Loading

daurnimator commented Nov 30, 2019

Rocknest commented Nov 30, 2019

andrewrk commented Nov 30, 2019

emekoi commented Dec 1, 2019

andrewrk commented Dec 1, 2019

emekoi commented Dec 1, 2019

daurnimator commented Dec 1, 2019 •

edited

Loading

andrewrk commented Dec 1, 2019

emekoi commented Dec 1, 2019

daurnimator commented Dec 1, 2019 •

edited

Loading

BarabasGitHub commented Feb 16, 2020

andrewrk commented Feb 16, 2020 •

edited

Loading

BarabasGitHub commented Feb 17, 2020 •

edited

Loading

kyle-github commented Feb 17, 2020 •

edited

Loading

SpexGuy commented Feb 19, 2020

BarabasGitHub commented Feb 20, 2020 •

edited

Loading

SpexGuy commented Feb 20, 2020 •

edited

Loading

ghost commented May 19, 2020 •

edited by ghost

Loading

shawnl commented Jun 8, 2020

shawnl commented Jun 18, 2020

andrewrk commented Jul 4, 2023

davidstone commented Sep 30, 2023

mlugg commented Oct 1, 2023

mnemnion commented Sep 21, 2024 •

edited

Loading

mnemnion commented Sep 21, 2024

davidstone commented Oct 12, 2024

davidstone commented Oct 12, 2024

blblack commented Nov 6, 2024

allow integer types to be any range #3806

allow integer types to be any range #3806

Comments

andrewrk commented Nov 29, 2019 • edited Loading

daurnimator commented Nov 30, 2019

Rocknest commented Nov 30, 2019

andrewrk commented Nov 30, 2019

emekoi commented Dec 1, 2019

andrewrk commented Dec 1, 2019

emekoi commented Dec 1, 2019

daurnimator commented Dec 1, 2019 • edited Loading

andrewrk commented Dec 1, 2019

emekoi commented Dec 1, 2019

daurnimator commented Dec 1, 2019 • edited Loading

BarabasGitHub commented Feb 16, 2020

andrewrk commented Feb 16, 2020 • edited Loading

BarabasGitHub commented Feb 17, 2020 • edited Loading

kyle-github commented Feb 17, 2020 • edited Loading

SpexGuy commented Feb 19, 2020

Range-checked Explicit Integers

Integer Unions:

Explicit Sentinel Optionals:

BarabasGitHub commented Feb 20, 2020 • edited Loading

SpexGuy commented Feb 20, 2020 • edited Loading

ghost commented May 19, 2020 • edited by ghost Loading

shawnl commented Jun 8, 2020

shawnl commented Jun 18, 2020

andrewrk commented Jul 4, 2023

davidstone commented Sep 30, 2023

mlugg commented Oct 1, 2023

mnemnion commented Sep 21, 2024 • edited Loading

mnemnion commented Sep 21, 2024

Niche Optimization of Ranged Integer Types

davidstone commented Oct 12, 2024

davidstone commented Oct 12, 2024

blblack commented Nov 6, 2024

andrewrk commented Nov 29, 2019 •

edited

Loading

daurnimator commented Dec 1, 2019 •

edited

Loading

daurnimator commented Dec 1, 2019 •

edited

Loading

andrewrk commented Feb 16, 2020 •

edited

Loading

BarabasGitHub commented Feb 17, 2020 •

edited

Loading

kyle-github commented Feb 17, 2020 •

edited

Loading

BarabasGitHub commented Feb 20, 2020 •

edited

Loading

SpexGuy commented Feb 20, 2020 •

edited

Loading

ghost commented May 19, 2020 •

edited by ghost

Loading

mnemnion commented Sep 21, 2024 •

edited

Loading