Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: remove Peer Type Resolution from the language #22182

Open
mlugg opened this issue Dec 7, 2024 · 13 comments
Open

Proposal: remove Peer Type Resolution from the language #22182

mlugg opened this issue Dec 7, 2024 · 13 comments
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mlugg
Copy link
Member

mlugg commented Dec 7, 2024

This may be the spiciest proposal I ever write.

Background

Peer Type Resolution (PTR) is a mechanism to combine a set of arbitrarily many types into a final type which all of the inputs can coerce to. For instance, u8 and u16 peer resolve to u16, while *volatile [4:0]u8 and *const [7:0]u8 peer resolve to [:0]const volatile u8.

The main purpose of PTR is to combine the results of different control flow branches. Opposing branches of a control flow construct are known as "peers", and PTR is applied to the results of those branches to figure out the result of the entire construct. For example:

var t = true; // runtime-known
test {
    const a: *const volatile [4:0]u8 = "abcd";
    const b: *const [7:0]u8 = "abcdefg";
    const x = if (t) a else b;
    @compileLog(@TypeOf(x)); // [:0]const volatile u8
}

The output here shows that the if statement applied Peer Type Resolution to the two "peer expressions" a and b.

This issue will propose removing Peer Type Resolution from Zig. However, we must first note that Peer Type Resolution is also used in three other places in Zig.

  • Binary operators like + and &, including some builtins like @addWithOverflow, apply PTR to their operands. This proposal does not propose changing these operators. It is likely that their typing rules will change in the future anyway (see allow integer types to be any range #3806, introduce a compile error for when a result location provides integer widening ambiguity #16310), but even if they don't, these operators require only a very small subset of the behavior of modern PTR.
  • Passing multiple arguments to the @TypeOf builtin. In this case, the builtin will apply PTR to the types of the operands. This proposal does suggest removing this functionality.
  • When switching on a tagged union, a prong containing multiple items with distinct payload types apply PTR to determine the type of the capture; e.g. switch (u) { .my_u32, .my_u16 => |u32_val| ... }. This proposal does suggest removing this functionality.

When we get into the meat of the proposal, I'll discuss these cases in a little more detail.

Problems with PTR

Peer Type Resolution, while commonly accepted as a part of Zig, actually has some problems.

Firstly, it leads to a common way for semantics to differ between runtime and comptime execution. For instance, consider this code:

fn f(b: bool) u32 {
    const opt = if (b) null else @as(u32, 123);
    if (opt) |x|return x;
    return 0;
}

This function is fairly straightforward, if quite esoteric. But the key point here is: what is the type of opt? When f is called at runtime, the type is determined using PTR; the peers have types @Type(.null) and u32, which peer resolve to ?u32, so opt has type ?u32. This makes the following if statement work as expected. However, if f is called at comptime, PTR is not used, because the not-taken branch of the first conditional is not evaluated. This is a very useful feature of Zig, and not one that should change; but it means that the type opt is either @Type(.null) or u32, depending on the comptime-known value of b. In the u32 case (i.e. comptime f(false)), this causes the second if statement to emit a compile error, because u32 is not an optional type, so this construct is invalid! So, this code emits a compile error when called at comptime.

The second issue is that it can unintuitive impacts on comptime-only types. The statement const x = if (b) 1 else 2 is invalid at runtime in Zig, because the resoved type of x is comptime_int, and its value depends on runtime control flow (the if expression), so we have a value of a comptime-only type depending on runtime control flow, which is disallowed. However, this error goes away if one of the peers has a concrete integer type -- for instance, const x = if (b) 1 else @as(u32, 2) works fine, because the first peer is coerced to u32 which can exist at runtime. This kind of "spooky action at a distance" can be confusing for new Zig users.

Lastly, it can hinder readability. Consider these definitions:

const x = if (b) "hi" else "world";
const y = if (b) "hello" else "world";

What are the types of x and y? If you said [:0]const u8, you're actually wrong; that's the type of x, but the peers of y are strings of the same length, so the types peer resolve to *const [5:0]u8. In this case, that's probably not a huge deal, but you can imagine it being more confusing when, say, integer widening is involved, or more significant pointer qualifiers like volatile. In these cases, it would be more clear to annotate the type of the variables. This can also help to clarify what properties of the type your code depends on; for instance, a user might annotate the type of x as []const u8, because the null terminator doesn't matter for their use case.

Proposal

Remove Peer Type Resolution from the language. The features of Zig which utilize it are changed as follows:

This change resolves all three of the issues described above:

  • Since all peers must have the same type, it's fine that comptime only evaluates one; the expression will have the same type. This eliminates the need for Proposal: Improve type inference of if/switch with comptime-known operand #13025, which would be a complex language change to fix this issue. This would also solve Catch null in if-condition cails at comptime. #5462, which is the same issue.
  • It would no longer be possible for e.g. comptime_int to implicitly become a runtime type due to the type of a peer; if you intend for the result to be e.g. a u32, you would have to coerce all peers. More realistically, you would annotate the type outside of the expression; more on this in a second.
  • The type of an expression would have to be the same across all peers, making it hopefully obvious. In cases where the types are not identical, you would use an explicit type annotation; again, more on this below.

This proposal also simplifies the language in general, which is a nice plus.

The effect of this proposal on user code would be to encourage more type annotations in places where types are non-obvious. This style of including type annotations where possible is something Zig has been moving towards in recent years:

  • We have an "unofficial" preference for const x: T = .{ ... } over const x = T{ ... }.
  • Decl literals encourage writing const x: T = .foo rather than const x = T.foo.
  • The "new" (not that new anymore) casting builtins encourage explicit type annotations by sometimes requiring them around type casts.

The advantages of explicit type annotations are as follows:

  • It increases readability for humans, since it becomes easier to know what types different expressions have; in particular, giving local variables type annotations can make the variables' uses easier to understand.
  • It is useful to tooling acting on Zig source code; for instance, language servers or documentation tooling which is performing a "best effort" interpretation of code without full semantic analysis capabilities can know more types with certainty, just like how humans can.
  • For container-level declarations, it increases the ability of the compiler to be parallelized, since the type can be determined while queuing value resolution for later.
  • For container-level declarations, it will allow self-reference and mutual references; see ability for global variables to be initialized with address of each other #131.

With all of these in mind, it's pretty clear that type annotations are a Good Thing, and I tend to support features which encourage more of them (within reason). I think this proposal probably falls within that category.

Impact on Real Code

This proposal will almost certainly cause a lot of breakage in the wild, including in the standard library. As I see it, the main question will be whether the diffs required to fix these breakages make code more or less readable. I strongly suspect the answer is that code will become more readable. However, I think we will have to implement this in the compiler (which would be relatively straightforward) and take a look at some of what breaks in a large codebase, probably the standard library and the compiler itself.

EDITS BELOW

Clarification: Result Types

This proposal never affects semantics when an expression has a result type. For instance, this code still works:

const x: u32 = if (b) 123 else my_u16;

Here, even though the peers have types comptime_int and u16, the result type of u32 is propagated to these expressions and is applied before the values "exit" the conditional branch. This code working is actually a key motivation for this proposal: it encourages adding type annotations like this.

Discussion: catch and orelse

Under this proposal as written, the following code would fail to compile:

const E = enum { a, b, c };
fn getE() ?E { ... }
test {
    const result = getE() orelse .c;
    _ = result;
}

That's because the orelse statement currently applies Peer Type Resolution to the types E and @Type(.enum_literal). Without PTR, these types would not match. The same applies to catch.

However, if this proposal is accepted, this code actually can work; not through PTR, but by providing a result type to the RHS. If we call ?T the type of the LHS after being evaluated, then the RHS can be evaluated with result type T; this is acceptable because under this proposal, it would need to have type T anyway for the peers to successfully combine. Again, the same thing applies to catch.

To be honest, I could see an argument that this isn't desirable, and that the above snippet should indeed require a type annotation on result. But it's a possibility nonetheless.

Discussion: Ranged Integers

One potential downside to this proposal is that it could make #3806 significantly more difficult to work with. For instance, consider this code:

const x: u8 = something;
const y = if (b) x else x + 1;

Under #3806 with PTR, y has type @Int(0, 257), since PTR is applied to the peer types @Int(0. 256) and @Int(1, 257). However, this proposal would cause this code to emit a compile error, because the peer types differ. That could be a big problem, since it could cancel out some of the benefits of implicit range expansion by requiring explicit type annotations.

Assuming this is indeed awkward in practice, I'm not sure if there's a good way to reconcile these two proposals. This gives way to a counter-proposal...

Counter-proposal: Restrict PTR to Numeric Types

Instead of eliminating PTR altogether, we could potentially just nerf it a lot. Here's what I would suggest:

This refocuses PTR to be about combining numeric types. This restriction still solves the problems discussed in the original issue, whilst avoiding conflicting with #3806:

  • It wouldn't really matter that comptime evaluation only evaluates one peer: the only thing that could differ between runtime and comptime is an exact integer or float type. The former could not have any effect on semantics, aside from explicitly depending on @TypeOf(expr). The latter could affect floating-point precision/rounding, but it seems reasonable that if you need precise details of one floating-point type, you should be annotating it anywhere where it's unclear.
  • Under ranged integers, comptime_int ceases to exist anyway, so this case where adding a runtime peer makes runtime evaluation work doesn't exist. It might still exist for floats if we allow comptime_float to peer resolve, which we probably should. This is a minor downside to this counter-proposal.
  • One property of ranged integers is that exact types don't actually matter that much, so it wouldn't necessarily be an issue that this limited PTR can make those types non-obvious. Likewise, for floating-point types, it rarely matters too much which exact type is being used; where it does, it again seems reasonable to expect annotations anyway.
@mlugg mlugg added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Dec 7, 2024
@mlugg mlugg added this to the 0.16.0 milestone Dec 7, 2024
@andrewrk andrewrk modified the milestones: 0.16.0, 0.15.0 Dec 7, 2024
@andrewrk
Copy link
Member

andrewrk commented Dec 7, 2024

Proposals - especially breaking ones - need to be evaluated asap so the language can be finished. However, 0.14.0 is too early since the release is imminent.

@mlugg
Copy link
Member Author

mlugg commented Dec 7, 2024

Okay, noted -- apologies!

@andrewrk
Copy link
Member

andrewrk commented Dec 7, 2024

Apology not accepted since it is unnecessary! I'm just trying to get better at communicating project management stuff.

@scottredig
Copy link
Contributor

I think in support of this change is that this behavior already exists in a very similar situation: returning from a function. The return type of a function is always required, so distinct return statements don't need to perform peer type resolution. (except that's not entirely true, more below). The general lack of problems with this indicates that this proposal is likely reasonable. There will be some comptime heavy use cases where determining the type does require more work, but if return types are a good indicator, those are a significant minority that probably benefit from being more explicit anyways.

As an exercise, I went through ~2k lines of my own code trying to find places where this would break things, and found 3 items of note, at an average of one breaking change per 1k lines:

  1. A switch statement that would still work, but the code improved when I moved the type up to the variable declaration.
  2. An switch evaluating to string literals, an easy and reasonable fix.
  3. Multiple breaks out of a block to be handled by a single error handler:

https://github.com/scottredig/zig-live-webserver/blob/f4082ee0ae6134a97525085cc178c16857b10de7/src/main.zig#L132-L156

This does pose a problem, because this is a case where the error types being implied is really handy. I see a few options here:

  1. Just use @as(anyerror here. In this specific case it'd be fine, I guess, but it'd be pretty easy for this code to change in a way where this wasn't fine anymore. It's not exactly a common idiom as far as I know, but I already dislike how errdefer treats return and break differently, and this would make using error handling mechanisms within a function more clunky.
  2. Keep peer type resolution, but only for the error set of error unions. This proposal doesn't suggest removing it for implied return types anyways.
  3. Do 2, and additionally support implied error syntax for the type field in var and const. I think this would be my preference. It wouldn't be commonly used, but it'd avoid hitting clunky language barriers when trying to do proper error handling.

@mlugg
Copy link
Member Author

mlugg commented Dec 8, 2024

@scottredig Thanks for the examples. For your error union example, I would generally suggest splitting that functionality off into a separate function with an IES, because it makes your intent clearer:

callHandler(req) catch |err|
    log.warn("Error {s} responding to request from {any} for {s}", .{ @errorName(err), req.conn.address, path });
};

// ...

fn callHandler(req: *Request) !void {
    const embed_files = [_][]const u8{
        "style.css",
        "script.js",
        "icon.svg",
    };
    
    const path = req.http.head.target;
    
    if (std.mem.startsWith(u8, path, "/" ++ options.frame_path ++ "/")) {
        return req.handleEmbed("index.html", "text/html");
    }
    
    inline for (embed_files) |embed_file| {
        if (std.mem.eql(u8, path, "/__live_webserver/" ++ embed_file)) {
            const mime_type = mime.extension_map.get(fs.path.extension(path)) orelse
                .@"application/octet-stream"; // NOTE: I have something to say about this line, will explain below!
            return req.handleEmbed(embed_file, @tagName(mime_type));
        }
    }
    
    if (std.mem.eql(u8, path, "/__live_webserver/reload.js")) {
        return req.handleReloadJs();
    }
    
    if (std.mem.eql(u8, path, "/__live_webserver/ws")) {
        return req.handleWebsocket();
    }
    
    return req.handleFile();
}

I've flagged up one interesting line there. Under a strict interpretation of this proposal, this line would no longer work, because the extension_map value type is not the same as @Type(.enum_literal) (the RHS of orelse). However, this proposal would actually allow orelse to send a result type to its RHS, since the type must match the child type of the LHS, so this could actually work fine. This is a benefit of this proposal I forgot to discuss; it actually allows providing more result types than we currently can. This would apply to catch and orelse; it could also apply to conditionals, but I'd consider that a little weird since conditions are more "symmetric" than catch and orelse.

@scottredig
Copy link
Contributor

I would generally suggest splitting that functionality off into a separate function with an IES, because it makes your intent clearer

Breaking a block of code out to a function is one of the fundamental actions performed in coding. I think the opposite is equally important and should be ergonomically supported by the language.

Aside from this proposal, I think keeping that block of code in the handle function is better than splitting it out. Ergo making the code worse solely to get access to IES is sub-optimal. Hence my suggestion to allow IES (a feature that's staying in the language anyways, afaik) here.

(I'll avoid derailing the conversation on justifying why I don't like this splitting out of the function. Happy to have the conversation elsewhere but it's an overly minor point of contention in the context of this proposal.)

I've flagged up one interesting line there.

Yeah I missed that line while doing a cursory read through. Nice catch. I honestly wouldn't mind putting a type here, and a slight change to mime to take advantage of decl literals would remove redundancy if desired.

@mnemnion
Copy link

My first reaction is that ZLS often yields the type hint 'either type' here, which interferes with language services on the variable, so I've been inclined to annotate those cases anyway.

This example is from the Peer Type Resolution section of the docs, am I right in thinking this would still work because the return type coerces the strings?

fn boolToStr(b: bool) []const u8 {
    return if (b) "true" else "false";
}

It would feel fairly heavyweight to use two @as calls in this kind of case.

@pfgithub
Copy link
Contributor

@mnemnion You are correct, that would still work. Instead of the if returning @TypeOf("true", "false") (peer type resolution), each of the branches of the if would have a result type of []const u8 and *const [4:0]u8 would implicitly cast

@mlugg
Copy link
Member Author

mlugg commented Dec 15, 2024

Anyone following this issue: I just added a large section to the bottom of the original post, with some further thoughts I've had and a potential counter-proposal. Please give that a read if you're interested!

@Trevor-Strong
Copy link

How would this effect top level decls who's type differs based on the target, like what std.c does to indicate a certain value or function isn't available for the current target like here where the type of NAME_MAX is either comptime_int or void?

@mnemnion
Copy link

However, if this proposal is accepted, this code actually can work; not through PTR, but by providing a result type to the RHS.

I'd call this an argument for the proposal, since I had figured that orelse turned a ?T result type into a T, and catch a !T into a T, it was not obvious this was happens-to-be-the-case through peer type resolution.

Under #3806 with PTR, y has type @Int(0, 257), since PTR is applied to the peer types @Int(0, 256) and @Int(1, 257).

This is a #3806 kibbitz, but then again, it's a #3806 heavy addendum: this illustrates why integer ranges should be closed intervals. This statement is much less confusing with (in order) @Int(0,256), @Int(0, 255), @Int(1, 256). The latter doesn't mention three numbers which aren't allowed to appear. I think we got good feedback in this comment from someone who has explored this question fairly thoroughly.

Under ranged integers, comptime_int ceases to exist anyway

Is this entirely true? I figured it would exist as a type annotation for comptime parameters, at least. I suppose those could just become comptime param: usize or whatever else works for a range. Less useful when 42 is becoming @Int(42,42) anyway (please don't make this @Int(42,43), it's begging for trouble).

As a meta, I feel like two things are true: my comments here are reasonable responses to the added section of your proposal, and they also kinda don't belong here. Would it be reasonable to narrow this thread to removing Peer Type Resolution, and have another issue for changes to integer casting and coercion implied by #3806, and downstream of this one? There's clearly overlap, but I think it's fair to parse your counter-proposal as "remove Peer Type Resolution, but keep a comparable mechanism for numerics in order to support integer ranges", and I would think the result of the latter discussion could get folded into the PTR mechanism if the decision on #22182 is to keep it. I'm referring to the second two of your four added sections here, to me, they imply an additional issue which could host a focused discussion of the questions they raise.

I think it's a good proposal, for the record, and that the main effect on user code would be to require some annotations which should be there anyway.

@vadim-za
Copy link

vadim-za commented Dec 19, 2024

By the wording of the proposal, IIUC, the following pattern (and similar ones) will no longer work:

const Tracker = if(std.debug.runtime_safety) ResourceTracker else void;
tracker: Tracker = if(std.debug.runtime_safety) .init(0); // ".init" is "ResourceTracker.init", the implicit "else" branch returns a void value

Is this meant/intended? Are comptime conditions exempt from PTR?
(Seems to be a duplicate or a particular case of #22182 (comment))

@rohlem
Copy link
Contributor

rohlem commented Dec 19, 2024

Are comptime conditions exempt from PTR?

Comptime-known branching is already exempt from PTR in status-quo - branches not taken are not semantically analyzed.
They don't have to compile if they can't be taken, so the compiler doesn't figure out their type information either.

void also doesn't combine with arbitrary other types, so if (x) T else void is already invalid for runtime x.
It is however allowed for noreturn, which is explicitly mentioned in the proposal to stay the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

8 participants