Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: 'typedef' for zig #5132

Closed
ghost opened this issue Apr 22, 2020 · 16 comments
Closed

Proposal: 'typedef' for zig #5132

ghost opened this issue Apr 22, 2020 · 16 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ghost
Copy link

ghost commented Apr 22, 2020

There is a related proposal for distinct types, but it doesn't explore all the opportunities that a more generic form of distinct types could provide. This proposal is an attempt to do just that.

It is also an attempt at finding a new core language abstraction in zig which hopefully is both simple (conceptually) and expressive enough to be worthwhile.

'typedef' in zig could become a common platform for any additional or improved feature that is "comptime only" in nature. This in contrast to implementing each such comptime feature in separation and from scratch.

Being a "comptime only" feature, typedef wouldn't add much (if any) complexity to the codegen stage, and rather require changes to the earlier compilation stages instead.

That there is a similar (though more limited) typedef feature in C will hopefully help make typedef in zig discoverable for newcomers.

Overall, I'd say this proposal is a relatively logical and natural addition to zig, as there are no "new" concepts introduced. It's rather a merger of existing ones:

  • Types are values
  • Types can be namespaces
  • Methods are just functions (unified call syntax)
  • Coercion of types
  • Casting between types
  • Comptime and comptime reflection
  • The existence of builtin types (primitives) that have more language support than user defined types can have, e.g some syntax will only work with builtin types.
  • Syntax relying on built-in types/enums for configuration

Use cases:

Can be implemented on top of typedef:

To avoid making this text even longer than it is, I will mostly refer back from comments in other issues to cover the use cases that way, instead of writing in detail about all the use cases here. Other times, a feature might map so trivially to typedef that it is natural to include it in the summary section below.

The focus of this post is instead on the principles and framework that the implementations for those use cases would be based on.


Usage perspective:

Introducing the typedef syntax:

const MyInt = typedef(u32, .Distinct);        // restricted coercion
const MyInt2 = typedef(u32, .Alias) {         // unrestricted coercion
  fn increment(self: MyInt2) MyInt2 {return self+1;}    // const x: MyInt2 = 10; const y = x.increment(); assert(y==11);
}
const TaggedFloat = typedef(f64, .Tag{.txt="percentage"});   // reflection with std.....hasTag()

Then a more elaborate (but very generic) example that is explained in detail below:

// .Tdconf = "Typedef config" (instance of zig built-in tagged union)
const Td = typedef(T, .Tdconf) {
  const FOO = 100;
  fn method(self: T) bool{
     // ...
  }
};

test "typedef" {
  const t : T = createInstanceOf(T);
  const td : Td = @as(Td, t)
}
// T, t, Td, td, and .Tdconf will frequently be referenced below:

Read the above as:

  • Td is a (runtime) type alias of the type T. They have the exact same runtime representation.
    • Casting with @as(,) from Td to T is always possible, while casting from T to Td may be restricted if desirable. The default is that casting is always possible in both directions.
    • T is the base (type) of Td
  • Td is comptime distinct from T. They can be separated from each other at compile time.
    • Can also separate between two different typedefs, not just between typedefs and the base type.
  • T can be any type. Primitive, struct, enum, array... doesn't matter.
  • Td has an (optionally) attached scope for top level declarations (no fields!)
  • If the attached scope in the declaration of Td was omitted:
    • The namespace (members) of Td and T are the same
    • td.method() points to function in T
  • Otherwise, (namespace/scope was attached):
    • Td.member does not point to T.member. The namespaces of Td and T are separate
    • td.method() points to a function in Td
    • Methods declared in T still work, e.g T.origMethod(td,param)
  • Fields and other non-method members like td.field, td.len, td[2] are still accessible, (as if t was substituted for td), no matter whether Td had an attached namespace or not.
  • When td is used in an expression, the compiler will try to auto coerce td to the type the expression expects.
    • Example: If T happens to be f64, then td + td might coerce to f64 + f64
      and the expression returns f64.
  • Td is configured with an instance of a zig built-in tagged union, here .Tdconf.
    • The different union members carry parameter "paylods" depending on their type. In many cases void is sufficient, and you can write .Tdconf (instead of .Tdconf2{.param = value})

The different typedef configs allow for user specified comptime behavior regarding coercion, unified call syntax, member access, available syntax, restricted syntax, custom validation, etc.

Creating a standardized way to get more control over these compile time aspects is what allows other features to be built on top of typedef.

One typedef config will usually only affect a small subset of comptime behavior at a time. E.g typedef(u32, .Distinct) would be orthogonal to typedef(MyStruct, .Tag{.txt="mytag"}) with regards to which compilation behavior they would parametrize or modify, even though they are both
ultimately based on typedef.

Continuing the list:

  • .Tdconf might affect Td and/or td in the following ways:
    • impose restrictions on T in the typedef declaration. E.g some configs might demand T to be a numeric type.
    • how type coercion works when attempting to coerce from ..
      • td to other typedefs also wrapping T
      • td to T directly
      • some instance of T (wrapped in a typedef or otherwise) to Td
    • which member access syntax on td that is enabled
    • behavior of td in expressions
    • restrictions and validation on the namespaces of T and/or Td ( at the declaration time of Td)

And finally:

  • Tdconf supports comptime reflection, so user code can inspect the data contained in the config instance for a particular typedef type or instance. Td-configs are just union members after all.
    • @typeInfo exposes the typedef data, or a new builtin could be introduced: @typedefInfo

Implementation perspective:

// current syntax
const T1   = u32;
const T2   = SomeStruct;
const T3   = []const u8;
const A_T1 = T1;

// typedef syntax
const TD_T1 = typedef(T1, .Distinct) {};    // keyword takes type, config value, and optional scope
const TD_T2 = typedef(T2, .Distinct);       // typedef can wrap/copy all zig types.
const TD_T3 = typedef(T3, .Distinct) {};    // .. primitives or not
const TD_TD_T1 = typedef(TD_D1, .Distinct); // typedef may be nested
const TD_R = typedef(u32, .ResourceHandle); // config can be something else than .Distinct

Representation of typedef:

Let a typedef "type" be represented by the shorthand TD(T,N,B,S,C).

  • T : type = base zig type
  • N : u32 = typedef id number
  • B : u32 = base typedef id number.
  • S : u32/pointer = attached scope, pointer/id
  • C : u32/pointer = builtin tagged union config pointer/id

Where TD(T,0,0,0,0) is "reserved" as being equivalent to the plain zig type T

T is always a plain zig type, whether the typedef declaration "wrapped" a normal zig type or another typedef.

N is a unique (nonzero) ID that is assigned per declaration of a typedef. N may be zero only when the shorthand represents a plain zig type.

B is included to enable nesting of typedefs, with B equal to the id (N) of the base typedef,
or with B = 0 if the base is a plain zig type.

S is an ID or pointer corresponding to the attached namespace of a typedef declaration. S == 0 (null) if there was no namespace attached.

C stores an ID or pointer corresponding to a typedef config (created per declaration, but there exists only one instance per unique config) where C == 0 (null) only when the shorthand represents a plain zig type, not a typedef.

Example:

// NOTE: refer to the context above
// ID field N set to zero for plain zig types/type aliases
TdTypeOf(T1)   -> TD(u32,0,0,0,0)
TdTypeOf(T2)   -> TD(SomeStruct,0,0,0,0)
TdTypeOf(A_T1) -> TD(SomeStruct,0,0,0,0)

// c1, c2 != 0, and c1 != c2
TdTypeOf(TD_T2)   -> TD(SomeStruct,n,0,s,c1) // n != 0, and s != 0 (has attached scope)
TdTypeOf(TD_D_T1) -> TD(u32,m,n,0,c1)        // m != 0 and m != n (typedef nesting)
TdTypeOf(TD_R)    -> TD(u32,l,0,0,c2)        // l != 0

Zig compiler and typedef:

I don't know as much about the zig compiler as others here, so I have to go by assumptions and hope they are valid or at least not completely off.

My assumptions are that:

  • The shorthand TD(T,N,B,S,C) is a sufficient representation of typedef declarations for a
    powerful typedef feature in zig:
    • Note: N and B are used for separation of typedefs wrapping the same base type, and for controlling coercion mechanics.
    • Note: S adds flexibility and control with regard to namespaces and member access.
    • Note: C adds comptime logic parametrization that is standardized in the form of a union, allowing ..
      • a typedef to invoke non-standard behavior if that is necessary to make a language feature work.
      • a typedef (wrapping a primitive type) to be as predictable to the compiler as if the typedef was a primitive itself.
  • For each typedef declaration encountered, the compiler can add entries into a table (or table
    like structure), with columns corresponding to the typedef-shorthand.
  • The scope (top level declarations) and config instance attached to a given typedef can be stored in separate list structures, and be referenced from the table.
  • All desired comptime logic or processing involving typedef is possible (specifically, there is sufficient information available) if the following are true:
    • There exists a table with rows corresponding to all typedef declarations, and columns equivalent to the shorthand TD(T,N,B,S,C)
    • ZigValue has a field tdId for the typedef id N.
      • Note: As per the section above, when the ZigValue
        is a plain zig type (not a typedef), tdId is equal to zero.
      • Note: Coercion and casting between typedefs wrapping the same base type
        boils down to modifying the tdId field of the ZigValue.
      • Note/Example: Coercion or casting from a typedef to its base type would
        simply set tdId to zero in the result value.
  • instructions for doing arbitrary mathematical operations on numeric arrays can be generated prior to the codegen stage
    • Note/Example: Multiplication of complex numbers.

Another look at typedef declarations
const Td = typedef(T, .Tdconf) {
  const FOO = 100;
};

const TdTd = typedef(Td, .Tdconf);

test ".." {
    const t : T = T{};

    const myTd1 : Td = t; // will this work?
    const myTd2 = @as(Td,t);

    const myTdTd = @as(TdTd, myTd1);

    const expr = 100 + Td.FOO;
}

The following are the results from the typedef declarations and casts above, assuming T is a base type:

  • T is a ZigValue of type "type" or "meta", with its typedef ID equal to zero
  • Td is a ZigValue exactly the same as T, except that the typedef ID n is non-zero and uniquely generated for this particular declaration
  • TdTd is a ZigValue exactly the same as T, but has a non-zero typedef id m, with m != n.
  • Typedef config collection, changes:
    • Add .Tdconf once to the collection, and mark it with id or pointer tc. Added once only due to memoization of equivalent payloads for Td and TdTd.
  • Typedef scope collection, changes:
    • Add the scope that contains the declaration const FOO = 100 to the collection, and mark it with id or pointer s
  • Typedef table, changes:
    • New row for Td, TD(T,N,B,S,C) => TD(T,n,0,s, tc)
    • New row for TdTd, TD(T,N,B,S,C) => TD(T,m,n,0, tc)

Casting and coercing between typedefs essentially involve changing the typedef id associated with a ZigValue according to rules that are parametrized by the typedef configs.

Handling of assignments, typedef perspective only. t is a ZigValue of type T, with typedef id equal to 0:

  • myTd1:
    • Compiler will check typeid of Td to see if it's non-zero, which is the case here.
    • Compiler must either accept or reject the coercion expressed as 0 -> n in terms of the required change in typedef id, depending on the coercion rules defined by the typedef config.
    • Compiler looks up tc in the table using typedef id n, and can make a decision.
    • If the coercion is accepted, then myTd1 will be a ZigValue of type T, with a typedef id equal to n.
  • myTd2
    • Casting to a typedef might be restricted, so as above, the compiler will have to look up the typedef config of Td to determine the correct action.
  • myTdTd
    • Compiler must either accept or reject the coercion expressed as n -> m. In this case,
      it can be taken into account that the base-typedef id of TdTd is non-zero and equal to n, so the hierarchical relation between TdTd and Td can be determined by comparing these numbers. The decision might involve querying typedef configs of both Td and TdTd.
  • expr
    • T is not necessarily a struct, and Td was declared with an attached scope, overriding any scope the base type might define.
    • Td is just a ZigValue of type T with non-zero typedef ID. The declaration Td.FOO is not directly associated with Td. The compiler must replace Td.FOO with something akin to getTypedefStructDecl(s,"FOO"), by looking up the attached scope s in the "typedef table".

Mapping typedef to use cases:

The capabilities of typedefs come in form of

  • (1) Attached namespace and "syntax views" of a type
  • (2) Coercion control
  • (3) Providing payloads for comptime reflection
  • (4) Defining additional language primitive types/typedefs
  • (5) Comptime validation of the specified base

As specified before, all typedef configs are members of a built-in union. Below is a subset of possible typedef configs that would solve some existing github issues, while spanning enough of the "typedef framework" to get decent initial test coverage of it.

Only if these four typedef configs are possible to include in an architecturally sound manner
should additional features (more typedef configs) be considered.

Core typedef configs:

  • .Alias : void (1)
    • Used to mimic const T2 = T1, which creates a "dumb alias" where T1 and T2 automatically coerce to each other without needing any explicit casting. Still, a useful config to have when using typedef solely for the purpose of defining methods.
  • .Distinct : void (2)
    • For creating a "smart alias" with more restricted coercion mechanics.
  • .Tag: []const u8 (3)
    • Attaches a user specified "text payload" to a type, where that text is available at comptime through reflection.
  • .Complex : void (4)
    • For creating a "built-in" type supporting operators like + and *, that under the hood is a typedef wrapping a numeric array of length 2. Configs like these only work together with a specific subset of types in typedef declarations.

Additional typedef configs: Extensive list of possible or theoretically viable typedef configs, where not all of them might be worth it or possible to include.

  • .Matrix : [2]usize (Dimensions) (4)
    • Allowed for typedef wrapping a numeric array with length equal to the product of the entries in the dimension array payload. Could support arithmetic operators similar to how operators would work for .Complex.
  • .ResourceHandle : void (2)
    • For integer types only. For when you want an integer that does not need to coerce in expressions.
  • .Measure : void (2, 4)
    • For numeric types only. Used for "units of measure", where each individual unit is expressed as a typedef wrapping the same measure group typedef.
  • .MeasureGroup : void (2, 4)
    • For numeric types only. Declare one typedef (wrapping a numeric type) with this config for each group of units (like 'Currency' or 'SI')
  • .ConfigurableDistinct : (DistinctConfig : enum) (2)
    • If there is need for alternative distinct type coercion behavior different from what the default .Distinct provides
  • .Encapsulated : (EncapsulationConfig : enum) (1)
    • To wrap an existing type in a typedef that has fewer methods, fields, and other syntax exposed than the original type.
    • Different encapsulation levels could be: FieldsHide, FieldsReadOnly, and MethodsOnly.
  • .FixedPointInt : usize (Amount of fractional bits) (4)
    • For integer types only. Multiplication or division of FixedPointInt instances may return a result with a fractional bit count different from that of (either) operand.
  • .NamedArray : []const u8 (1, 4)
    • For arrays only, where each comma separated "string" in the payload is mapped to a position in the array so that the entries can be accessed with field syntax. (A form of syntax sugar) Amount of entries must correspond to array length. Related to Builtin vector properties #4961. Examples:
      • const Color = typedef([4]u8, .NamedVector{.fields="r,g,b,a"})
      • const Pos3d = typedef([3]f64, .NamedVector{.fields="x,y,z"})
  • .Embedding : type (4, 5)

1: Attached namespace and "syntax views" of a type

Currently, primitive types cannot have methods, even though the only difference between methods and functions are the syntax used for calling them. This becomes an artificial restriction on primitive types.

Also, you cannot add methods to 3rd party user defined types without modifying the 3rd party source code.

With typedef, given a type T you want to define a method on, where T happens to be either a primitive or a 3rd party struct type: Simply cast or coerce T to a typedef Td wrapping T, where the attached scope of Td contains the method.

Typedef provides a form of "method extensions" (#1170), but without introducing any confusion to where a method is defined. It's defined either in the base type, or in the typewrap. Case closed. If methods from the base types are to be included in the typedef, they have to be included by explicit delegation.

It is logical for the "syntax view" of a type to mirror the type (in terms of structure) as closely as possible, and especially for a systems language like zig, but there are times when a different syntax view could be significantly more ergonomic or safe.

Typedef could be a way to offer alternative "syntax views" in zig, either by disabling syntax or enabling syntax sugar depending on the application.


2: Coercion control

Looking at typedef coercion through the lens of the shorthand TD(T,N,B,S,C), with n,m,l != 0, none of n,m,l being equal , and x reads as "not considered".

Lef => Right            // read, "Left" coerces to "Right"

TD(T,l,m,x,x) => TD(T,m,n,x,x)  // coercion down, one level
TD(T,m,n,x,x) => TD(T,n,0,x,x)  // coercion down, one level
TD(T,n,0,x,x) => TD(T,0,0,0,0)  // coercion one level down, to the base type (final level)
TD(T,l,m,x,x) => TD(T,0,0,0,0)  // coercion down, multiple levels

TD(T,n,0,x,x) => TD(T,m,n,x,x)  // coercion up, one level
TD(T,0,0,0,0) => TD(T,m,n,x,x)  // coercion up, multiple levels

As can be seen above, the hierarchical relationship between typedefs can be determined by looking at the entries N and B of the shorthand, and then typedef coercions can be categorized in various ways:

  • Amount of "levels" jumped for one coercion
  • Coercion being "upwards" or "downwards"
  • Compatible vs incompatible base types (redundant with existing type system)

From that, you can have various distinct type behavior based on typedef. For instances tdd of type typedef(T,.Distinct): Whenever a coercion to or from tdd is requested, accept or reject (compile error) depending on the "category" of the given coercion.

A coercion rule set that maps almost exactly to this comment in the distinct types issue #1595, is:

  • Only allow coercions that are one level down to, or from, a distinct instance (typedef with the distinct config)

Different rule sets can of course be applied to different use cases.


2: Reflection on typedef configs

Just like normal types support comptime reflection through @TypeInfo, so could typedef support comptime reflection on the config payload.

This would allow comptime tags to be implemented:

const TaggedType = typedef(T, .Tag{.txt="mytag"})
test "" {
  const tt = @as(TaggedType,getT());
  comptime assert(std.meta.hasTag(tt,"mytag")) // ok
  comptime assert(std.meta.hasTag(tt,"notthetag")) // fails
}

To add tags to a typedef as opposed to normal type, one would have to resort to typedef nesting.


4: Additional zig built-in types leveraging typedef

There are many builtin primitive types in zig that get special treatment with respect to syntax:

  • Indexing syntax on arrays and slices
  • Arithmetic operators on integers and floats,
  • Bitwise operators on integers
  • If-statements on optionals or booleans
  • Automatic coercion from non-optional to optional type
  • Properties like .len for arrays and slices

Some primitive types (primitive in the syntax support sense) are arguably even generic, like [N]SomeType, or ?SomeType.

The idea to be presented here is that a typedef that wraps a primitive base type will be very predictable for the compiler to deal with. The base type is a primitive, and the config payload is standardized because it is a builtin union member.

Definition of a typedef primitive Tp:

  • The plain base type T wrapped by Tp must be a primitive zig type, like bool or a numeric array.
  • The typedef config used in the declaration of Tp adds enough context to T for the compiler to treat Tp as a "new" type.
  • Tp is treated as a zig primitive in the sense that syntax can be unavailable for normal user defined types, but still be available for instances of Tp.
  • Tp is "memoized". Two typedef declarations with the same base type and config payload will not be distinguishable at comptime.
    • Note: Because of this memoization, it might be best to introduce a second keyword typedefprimitive, which starts of as an alias for typedef, but differs in the following ways:
      • Some typedef configs only work with typedef, and others only with typedefprimitive.
        • Possibly, there could be two different unions, but in the "typedef table" described above, all typedef configs would belong in the same column.
      • A primitive typedef declaration receiving the same arguments (base type and config union member) as a previous primitive typedef declaration, will not assign a new unique typedef ID (that is, add a row to the typedef table), but reuse the previous typedef ID that was associated with the given arguments.

Mathematical types can often be modeled as numeric arrays, thus defining additional mathematical "primitive" types in zig could in many cases be done by wrapping a numeric array in a typedef and attach metadata (the typedef config).

Example, complex numbers:

const Complex = std.builtin.Complex(f64); // typedefprimitive([2]f64, .Complex) under the hood, with attached namespace

test "Complex number" {
  const c1 = Complex.init(1.0, 3.2);
  const c2 = Complex.init(-43, 12.5);
  const c3 = c1.pow(c2) + c1 * c2.conj() - (c1*c1+c1)/Complex.init(12.0, 15);
  const r3 = c3.re(); // or c3.re for that matter (syntax view)
  const i3 = c3.im();
}

Having arithmetic operators working for a complex number "typedef" is IMO perfectly feasible,
as per the reasoning given in this issue comment by @skyfex. The functions or procedures necessary for complex number arithmetic does not need any control flow or to return any errors, could be defined as built-in functions, and cannot really be compared to operator overloading where everyone is able to customize how arithmetic operations behave in a multitude of ways.

The compiler would have enough information available to handle the cases where one or both of the operands in an arithmetic expression is a complex (typedef) primitive, and insert instructions corresponding to the appropriate builtin function.

This principle extends to fixed point types and matrices also, even despite the fact that arithmetic operation on these (mathematical) types may return a result with a different type than the type of either or both of the operands. Matrix4x4 * Matrix4x3 yields Matrix4x2. Still, this is unproblematic. The matrix dimensions are attached to the operands in form of the typedef config payload. Because of that, the compiler can ensure that the operands are compatible and determine the correct output "typedefprimitive".

In the codegen stage, these mathematical types would just be arrays.


5: Comptime validation of the specified base

Typedef is a comptime feature, so whenever a typedef is declared some comptime code can be triggered to run. This could include various validation functions that could be defined as builtins, basing it all on the existing reflection capabilities in zig.

Examples could be checking that a struct is a superset of another (embeds), or that the fields in a struct follow the convention set by a typedef config. Example:

const WidgetAnchorFlags = typedef(u4, .BitFlag){
  const SPARE : u4 = 0;
  const LEFT : u4 = 1;
  const RIGHT : u4 = 2;
  const TOP : u4 = 4;
  const BOTTOM : u4 = 8;
};

When the typedef config is .Bitflag, the compiler may enforce the existence of top level declarations with type equal to the base type, that each must have a unique value, that together must fully "cover" all the bits of the base type (which obviously has to be a uint), and where only one of the declarations can be something else than a power of two (the spare).

Additionally, as a bitflag could fit the definition of a typedef primitive established earlier, syntax sugar could be enabled, so that for example widgetAnchors.LEFT would map to widgetAnchors & WidgetAnchorFlags != 0. With primitive types (whether typedef based or not), syntax can be adjusted to improve usability.


Discussion and questions for the zig community:

  • Which subset of this proposal (even if it's the "empty subset") might be the best option for zig?
  • Are there any (whether possibly or certainly) problematic edge cases that does not work with the framework presented here? Examples would be great. Some I can think of myself are:
    • Combining multiple typedef configs is only possible through successive wrapping
    • There can be conflicts between typedef configs
    • If creating a "generic typedef" with a function returning a typedef, zig generic memoization could work against the principle of typedefs being distinguishable from each other at compile time.
      • Technically, a typedef is not directly a type, so memoization could perhaps be disabled for generic typedefs without it impacting the memoization behavior of regular generic types.
  • Would it be possible to make @Vector (and maybe the proposed @Matrix) into typedefs wrapping numeric arrays while still having them map down to LLVM in codegen?
  • Does stage1 have to support all language semantics supported by stage2?
  • Are there any possible use cases not mentioned here that could utilize the typedef framework?
  • Would it be feasible to incrementally implement typedef in zig by extending the amount of typedef config union members bit by bit until the desired simplicity vs features tradeoff is met?
@jakwings
Copy link

Question: How will peer type resolution work on these types?

@ityonemo
Copy link
Contributor

There have been several languages where I have wanted this feature (nonexplixit conversion-blocked typealiases), including: C, Julia, elixir/erlang.

@ghost
Copy link
Author

ghost commented Apr 22, 2020

Question: How will peer type resolution work on these types?

I wish my understanding of the zig compiler was better, but is this example relevant to your question: const x = if(cond) @as(Typedef1,value) else @as(Typedef2,value); ?

If both Typedef1 and Typedef2 allow coercion down to the base type (depending on the coercion rule set associated with the typedef config used in their declaration), the result type is the typedef base shared by Typedef1 and Typedef2.
const x : Base = value;

If Typedef1 and Typedef2 both wrap a Typedef3, they might coerce down to Typedef3.
const x = @as(Typedef3,value);

Or you'd get a compile error. Typedef, just like a normal type, must always be comptime known.


Note: Feel free to add zig typedef pseudocode in the issue comments. That'll make it easier for me to answer.


There have been several languages where I have wanted this feature (nonexplixit conversion-blocked typealiases), including: C, Julia, elixir/erlang.

@ityonemo Are you able to present some short examples/use cases?

@ghost
Copy link
Author

ghost commented Apr 22, 2020

Just want to specify something:

const Td = typedef(u32, .Alias); // does NOT create a new ZigType for IR.

test ".." {
  const x : u32 = 5;
  const y = @as(Td,x); // y has ZigType u32, but the ZigValue has a typedef id != 0
}

Typedef from the perspective of Zig IR is just an id field attached to ZigValue that is used to separate between between otherwise equal ZigTypes during compile time, and where the id points to metadata in a table (one row for each typedef declaration) that can influence compiler decisions.

My assumption (which I hope is sufficiently accurate) is that:

  • IR instructions essentially just create and modify ZigValues
  • the compiler recursively replaces generic instructions with more specific instructions until they are all specific enough to be mapped to machine instructions in codegen.
  • the type assigned each ZigValue influence the compiler "decision process"

Typedef is just a way to add structured metadata to ZigValues in addition to the existing types that can parametrize the compiler "decision process" in useful ways.

@jakwings
Copy link

If Typedef1 and Typedef2 both wrap a Typedef3, they might coerce down to Typedef3.

If the base type is not a primitive number type then it seems no problem. Doesn't it conflict with the current behavior when the base type is a number type? say u8 will coerce to u32, now what happen for alias(u32) + u32, alias(u32) + u8, alias(u8) + u32, alias(u8) + alias(u32), alias(u8) + alias(alias(u8)), alias(u8) + alias(alias(u32)), ...?

Other questions (just curious about the details and complexity):

  • Does peer type resolution only happen between aliases? There are so many settings so it is confusing.

  • Can peer type resolution happen between distinct type with the same base?

  • Coercion between distinct types looks like an unsafe variant of C++'s dynamic_cast (no matter how many levels of coercion is allowed). It seems only useful when you know how the internal work. So I guess this is not designed for public API and users must not do such coercion by themselves?

  • It seems most of the behaviors are determined by how the reflection on typedef configs works? What is the best default?

@ghost
Copy link
Author

ghost commented Apr 23, 2020

If Typedef1 and Typedef2 both wrap a Typedef3, they might coerce down to Typedef3.

If the base type is not a primitive number type then it seems no problem. Doesn't it conflict with the current behavior when the base type is a number type? say u8 will coerce to u32, now what happen for alias(u32) + u32, alias(u32) + u8, alias(u8) + u32, alias(u8) + alias(u32), alias(u8) + alias(alias(u8)), alias(u8) + alias(alias(u32)), ...?

The intended semantics are that the "coercion" of typedefs and coercion of regular types are parallel processes as far as possible.

For an operation (assignment, binary op, cast, ..) with input ZigValue(s) that may be or may not be typedef:

  • IF: the operation is supported with respect to the base types THEN:
    • Perform the operation, but the typedef ID of the output ZigValue has yet to be determined.
  • ELSE: Look up the typedef configs for the operands, to see if the configs mark the operands as "primitive typedefs"
    • IF: operands are primitive typedefs (e.g complex number) THEN:
      • perform the operation by delegating to a function that are tailored to handling operations on primitive typedefs, but the typedef ID of the output ZigValue has yet to be determined.
    • ELSE: raise compilation error, "incompatible (primitive) types".
  • Finally, use the coercion rules associated with the typedef configs to either determine the correct output typedef ID, or raise a compilation error if the typedefs were not compatible in the given context.

Behavior for const x = t1 + t2; where @TypeOf(t1)==T1 and @TypeOf(t2)==T2

  • CASE: T1 == typedef(u32,.Distinct) , T2 == typedef(u8,.Distinct):

    • First, "perform" the addition with respect to the base types only. The result xbecomes u32 as per existing peer type resolution, but the typedef ID of the output ZigValue has yet to be determined.
    • Then, from a "typedef perspective" check if T1 and T2 are base types or typedef primitives. Distinct typedefs do not mark their base type as a typedef primitive, like typedef([2]f64, .Complex) would, so this means the add operation is unsupported for .Distinct typedefs directly, but .Distinct allow downward casts in a single step, and to avoid having to raise an error, auto typedef coercion is attempted first.
    • After downward cast by one step, each operand becomes the base of T1 and T2, which are (pure) primitive types and implicitly supported since the instruction was already accepted in the step prior. (Note, it doesn't matter that base of T1 and T2 are different to from the typedef perspective)
    • The conclusion is that for this variant of distinct types, x has the plain output type u32, similar to how 42.4 + 2 coerces "down" to a float.
  • CASE: T1 == typedef(u32,.Distinct) , T2 == typedef(typedef(u8,.Distinct), .Distinct):

    • This would give a typedef related error. Even after coercing down one step from T1 and T2, the operands would still not have types supported for the given operation. This is intentional, but it could also be possible to make a variant of typedef distincts that will attempt to coerce downwards multiple times, or a distinct "primitive" typedef (that has to wrap float or int) that are supported by operators. This will hopefully be explored more in the "Units of Measure" issue, with the dedicated Measure and MeasureGroup configs.
  • CASE: T1 == typedef(u32,.Distinct) , T2 == u8:

    • Same result as for the case with T2 = typedef(u8, .Distinct). T1 will have to coerce down to the base type before the operation is valid, and this determines the output as well.

Behavior for const x :T3 = t1 + t2:

  • This is the equivalent of the two statements: const x_tmp = t1 + t2; const x : T3 = x; For (pure) base types the latter statement works if T3 == @TypeOf(x_tmp). For typedefs, the latter works if the base of T3 is equal to @TypeOf(x_tmp), and if the coercion rules associated with T3 allow
    coercion from other typedefs/types.

* Coercion between distinct types looks like an unsafe variant of C++'s `dynamic_cast` (no matter how many levels of coercion is allowed). It seems only useful when you know how the internal work. So I guess this is not designed for public API and users must not do such coercion by themselves?

The idea is that if you have a base type T that is used in a lot of functions, then it's convenient that typedef(T, .Distinct) can also be used in those functions.

Still, if you you make a function fn(x: MyTypedefDistinct) ... then the x parameter to this function cannot be the base of MyTypedefDistinct, so it's "convenient" in one direction, and "safe" in the other. direction.

Different coercion behavior could be defined as well. I've been thinking there could be resource handle typedefs (config = .Resourcehandle) that wrap integers, but do not coerce to the base type, so you couldn't add two const GlInt = typedef(u32, .ResourceHandle) for example.

* It seems most of the behaviors are determined by how the reflection on typedef configs works? What is the best default?

Indeed. The idea is that typedef is a broad (but fixed) framework on the compiler side (typedef configs being a closed set), while for the user typedef gives lots of configuration options for different use cases.

@ghost ghost mentioned this issue Apr 24, 2020
@ityonemo
Copy link
Contributor

ityonemo commented Apr 24, 2020

Yes, let me give a examples of what I'm talking about. Note that both Julia and Elixir have different relationships with their type systems than most other languages (including Zig)

In Julia, one thing that I implemented was a Galois Field type. Note that Julia's relationship with types is that they are used to square aggressive compile-time optimization with user abstraction and common mathematical libraries (I don't necessarily think this is something that Zig should shoot for). As a specific example, to do reed-solomon decoding in Julia, I could use my minimally implemented GF type and leverage the Julia standard library's matrix solver without having to rewrite LRU decomposition for GF arithmetic. At the same time, someone else easily subbed in patches to julia's matrix solver so that it would run normal FP incredibly efficiently on the awkward Intel Xeon Phi architecture, and, in theory you could have run that code side by side with my code and Julia would select the GF solver or the Xeon Phi solver based on the type of the data provided to the function call.

This Galois Field type was a direct GF(2^n) implementation backed by an unsigned integer of the appropriate size (usually 8). It would have been nice to be able to specify that "this was indeed an integer" by subtyping, allowing it to transparently inherit certain operations (like << or >>, bitwise &, |, etc) while overloading specific operations (+, -, *). I could get around this by explicitly doing conversions back and forth but I was never sure what was compiled down to a no-op or not (without checking the assembly code, which luckily, is very easy to do in julia).

BTW, I suspect zig as being potentially extremely useful for crypto purposes because by my naive understanding it's easy to accidentally write code in C that has non-constant execution, whereas zig doesn't hide function calls and perform as many sneaky tricks with code rewriting, so GF arithmetic might be a useful library down the line. Here's the sort of things that I would want to see for such a feature (let's call our hypothetical type g8):

<some code of typedeffing g8 to u8>
const my_g8a:g8 = @as(g8, 0xFF);   #okay to not have direct coercion
const my_g8b:g8 = 0xFF;                   #comptime coercion could be a nice-to-have
var my_u8v:u8 = 0xFF;
var my_g8v:g8 = my_u8v; # <--  forbidden!  should be an error.
var my_g8x:g8 = my_g8a ^ my_g8b;  # allowed, by inheritance from u8
var my_g8p:g8 = my_g8a + my_g8b; # allowed, even if stupid.
# or it could specifically opt in to certain operations via some sort of mechanism, as discussed above
var nope:g8 = my_g8a ^ my_u8v  # always disallowed no matter what.
var yes:g8 = my_g8a ^ @as(g8, my_u8v)

In elixir, the language is strongly but dynamically typed. There is, however, static typechecking. One can make a typealias, but effectively these are solely used for documentation.

so for example, in the stdlib, the type String.t (https://github.com/elixir-lang/elixir/blob/v1.10.2/lib/elixir/lib/string.ex#L277) is strictly equivalent to the primitive type binary. You're supposed to use String.t to indicate that everything in the string is 1) printable and 2) unicode-encoded, so it's a strict subset of what constitutes binary (contiguous range of memory of size 8 bits * n). Unlike Zig, I don't expect much out of my erlang typesystem in terms of compile-time guarantees, but it would at least be nice for the static typechecking to warn me if I try to pass something annotated as binary into something annotated as String.t. But the erlang typesystem doesn't have this typedef notion, which would at least give me some sort of static check on the programming intent.

I don't know how that usecase translates into Zig, but it does seem vaguely similar to the proposal of being able to do units of measure.

@ghost
Copy link
Author

ghost commented Apr 25, 2020

@ityonemo

Note that Julia's relationship with types is that they are used to square aggressive compile-time optimization with user abstraction and common mathematical libraries (I don't necessarily think this is something that Zig should shoot for).

Zig typedef wouldn't measure up to Julia in this regard as zig won't have user configurable operator overloading, but it seems like zig typedef could get you parts of the way there.

With typedef, if Library-A and Library-B both use [2]f64 to represent a complex number, then it's a bit easier to use both libraries interchangeably. It's slightly easier to cast between typedefs that are runtime equivalent than converting between user types, although all it takes is writing a function.

It would have been nice to be able to specify that "this was indeed an integer" by subtyping, allowing it to transparently inherit certain operations (like << or >>, bitwise &, |, etc) while overloading specific operations (+, -, *). I could get around this by explicitly doing conversions back and forth but I was never sure what was compiled down to a no-op or not (without checking the assembly code, which luckily, is very easy to do in julia).

To stick with the premise of not having user configurable operators in zig, I went with these principles for the typedef proposal:

  • Operators work on primitive/built-in types only
    • Workaround: typedef types can also be considered primitive in the certain contexts.
  • To make typedefs more ergonomic, enable coercion behavior in addition to explicit casting only.

For your Galois Field type, below is an example using the .Distinct typedef config. It provides some of the behavior you want, but you would need to wrap operator expressions in casts. Functions and methods could have the correct output type directly, and casts to/from a typedef would by definition be a no-op at runtime.

const g8_distinct = typedef(u8,.Distinct);
const my_g8a : g8_distinct = @as(g8_distinct, 0xFF); // OK
const my_g8b : g8_distinct = 0xFF; // Fails. cannot coerce upwards to distinct
var my_u8v : u8 = 0xFF;
var my_g8v : g8_distinct = my_u8v; // Fails. cannot coerce "upwards" to distinct
var my_g8x : g8_distinct = my_g8a ^ my_g8b; // fails. expression coerces down to u8. wrap in cast
var my_g8p : g8_distinct = my_g8a + my_g8b; // fails. expression coerces down to u8. wrap in cast
var my_g8o : g8_distinct = @as(g8_distinct, my_g8a + my_g8b); // Ok

var nope : g8_distinct = my_g8a ^ my_u8v // fails. expression coerces down to u8
var nope2 : u8 = my_g8a ^ my_u8v; // ok. expression coerces down to u8
var yes : g8_distinct = my_g8a ^ @as(g8, my_u8v) // fails. expression coerces down to u8

.MeasureGroup and .Measure might work, but with downsides that you'd need to specify the type of all the identifiers you introduce, and that another "measure" could be added to the same group at will. Could imagine something like this though:

const g8_new = typedef(typedef(u8, .MeasureGroup), .ClosedMeasure); // this
const g8_new = typedef(u8, .DistinctConfigurable{.mode=.Strict);   // or this

"Closed measures" or "strict distincts" could have the following semantics:

  • For an operation, if one operand is g8_new, then the other operand must also be g8_new (in terms of typedef ID)
  • In an operation where all operations are g8_new, the result will also be g8_new.

so for example, in the stdlib, the type String.t (https://github.com/elixir-lang/elixir/blob/v1.10.2/lib/elixir/lib/string.ex#L277) is strictly equivalent to the primitive type binary. You're supposed to use String.t to indicate that everything in the string is 1) printable and 2) unicode-encoded, so it's a strict subset of what constitutes binary (contiguous range of memory of size 8 bits * n). Unlike Zig, I don't expect much out of my erlang typesystem in terms of compile-time guarantees, but it would at least be nice for the static typechecking to warn me if I try to pass something annotated as binary into something annotated as String.t. But the erlang typesystem doesn't have this typedef notion, which would at least give me some sort of static check on the programming intent.

I actually think that typedef would work nicely here as well.

const Utf8Str = typedef( []const u8, .DistinctConfigurable{.mode=SelfInit}){ // attached namespace
    fn init(str: []const u8) Utf8Str {
        // do validation with std.debug.assert that all codepoints are valid
        return @as(Utf8Str,str); // cast is allowed in "self" scope
    }
}

test "typedef safe init" {
    const str = "hello";
    const str2: Utf8Str = str; // (typedef) compile error. coercion not accepted
    const str3: Utf8Str = @as(Utf8Str,str); // typedef compile error. cast to distinct type outside self scope is not supported
    const str4 : Utf8Str = Utf8Str.init(str); // accepted. function in typedef scope does validation. (assertion, but could also possibly return an error

    const rawstr : []const u8 = str4; // coerce one step downwards is ok, as for other distinct typedefs
    const rawstr2 = @as([]const u8, str4); // also ok. ALWAYS allow casts from typedef to its basetype.
}
``

This was referenced Apr 28, 2020
@ghost ghost mentioned this issue May 12, 2020
@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 27, 2020
@hamad-almamari
Copy link

hamad-almamari commented Mar 6, 2021

So much complicated with my respect.
We dont want zig to turn into c++.

@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@andrewrk andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021
@matu3ba
Copy link
Contributor

matu3ba commented Mar 20, 2022

  1. Do you have a realistic examples, which are simplifies by this proposal?
  2. Why does this need another keyword instead of a comptime-implementation in libstd?
  3. The functions of libraries A and B still must agree on the same conforming interface and data layouts to be interchangeable and adding comptime-boilerplate on top doesnt change that.
  4. Recursive replacement of method implementation (ie deriving method interfaces as specialization) is essentially what traits in Rust do. This requires to track all possible derivations in memory, even though they are unused, and makes things slow. How does this approach try to compensate that?
  5. You claim "Overall, I'd say this proposal is a relatively logical and natural addition to zig, as there are no "new" concepts introduced. It's rather a merger of existing ones:". Please provide a library implementation first to justify adding compiler complexity. Alternatively, provide an argumentation why this can not be in libstd or which parts could be part of libstd.

@andrewrk andrewrk removed this from the 0.10.0 milestone Apr 16, 2022
@KilianHanich
Copy link

  1. Do you have a realistic examples, which are simplifies by this proposal?

While I can't comment on a lot of other stuff, here an example of what would be simplified by this proposal for users, although it doesn't need a lot of it, just the "distinct types" part:

fn createUser(uid: std.posix.uid_t, gid: std.posix.gid_t) !void;

std.posix.uid_t and std.posix.gid_t are under the hood just alternative names for u16. So as far as a computer (and the compiler) is concerned, swapping the parameters around while calling the function is not a problem, but logically that doesn't make any sense.

If you could make these types distinct at comptime, this could be a compile error, preventing quite a few bugs from happening.

@mlugg
Copy link
Member

mlugg commented Aug 23, 2024

The convention is to use non-exhaustive enums for such use cases, e.g. const uid_t = enum(u16) { _ } -- see also #1595 (comment). I think this proposal should probably be closed, being essentially a more complex version of #1595.

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Aug 23, 2024
@tmccombs
Copy link

The convention is to use non-exhaustive enums for such use cases

What if the underlying type is itself a complex struct that can't be used as an enum type?

@KilianHanich
Copy link

The convention is to use non-exhaustive enums for such use cases

What if the underlying type is itself a complex struct that can't be used as an enum type?

How would you even go and define such a distinct type? Do you have an example where a different language implemented this for essentially arbitrary types because I fail to imagine how one could implement this (no matter if it's wanted or not).

@mlugg
Copy link
Member

mlugg commented Aug 25, 2024

What if the underlying type is itself a complex struct that can't be used as an enum type?

It's very rare that you need to do this -- distinct types are pretty much always for integer handles. If you do need it somewhere, this use case is served by a wrapper struct (perhaps extern struct or packed struct depending on the context).

@tmccombs
Copy link

In my experience, it is fairly common for string and array types, most notably []u8 in zig.

How would you even go and define such a distinct type?

The same way you would for primitive types.

Do you have an example where a different language implemented this for essentially arbitrary type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

9 participants