Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unions and Intersections #2

Closed
keean opened this issue Sep 18, 2016 · 81 comments
Closed

Unions and Intersections #2

keean opened this issue Sep 18, 2016 · 81 comments
Labels

Comments

@keean
Copy link
Owner

keean commented Sep 18, 2016

@shelby3 I want to discuss whether to include union types here. I would like to design a language that meets the requirements you have for extensibility, but I don't necessarily want to achieve it with unions. I am not fundamentally opposed to unions, and if they are the best way to achieve what we want then we have them, but in the interests of keeping the language as small as possible, I want to make sure we are adding something of value, that cannot be achieved in other ways. Can you provide some example use cases for union types so we can explore the possibilities?

@shelby3
Copy link

shelby3 commented Sep 18, 2016

More comments to follow...

@shelby3 wrote:

And this is why we need anonymous structural union types, e.g.:

if (x) true
else 3

The inferred type is Boolean | Number.

@keean
Copy link
Owner Author

keean commented Sep 18, 2016

I realise you can write "if x true else 3", but why would you do that? To me this is not a motivating example, it needs to be something a bit bigger, and algorithm or design pattern that benefits in readability or simplicity from having the feature. What do we lose by insisting both sides of the 'if' are the same type?

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean wrote:

I realise you can write "if x true else 3", but why would you do that?

Resolving to an instance of a heterogeneous type.

mycollection.append(if (something) new Type1() else new Type2())

@keean
Copy link
Owner Author

keean commented Sep 19, 2016

What about inferring the type "x where x is in all common traits"? If you allow traits as "return" types then this is not a problem.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean wrote:

What about inferring the type "x where x is in all common traits"?

I presume you mean subsuming to the greatest lower bound (GLB) when the heterogeneous types involved are traits (which btw may not exist since we can't subsume to any top type without instigating unsafe code, so it would often be compiler error). That won't apply when the heterogeneous types are data types since we can compute the anonymous structural type of the disjoint union, because otherwise data types could not be subsumed without subclassing. So without subclassing, we must have tagged disjoint unions, else we would lose the solution to Wadler's Expression Problem.

If you allow traits as "return" types then this is not a problem.

I am thinking it is a non-unified language with many corner case for us to attempt to only allow trait types (bounds) on function arguments. We must support trait objects every where a type can be used, except they (trait bounds and trait objects) make no sense in unions because traits by definition aren't tagged (reified), but unions are okay for data types will (must) be tagged in the run-time.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

Thinking about this more, traits are not types of instances. They are either monomorphic bounds on a type parameter of a function/method, or they are the polymorphic interface of trait objects with the dictionary joined to the instance just like subclasses (but with a hasA instead of isA subclassing semantics). Trait objects are polymorphic, and thus stall the CPU pipeline.

There are two ways for us to support polymorphism. One is to allow tagged unions of data types, which retains extension in both axes of Wadler's Expression Problem, because we can add new variants (data types) and new operations (trait implementations). The other is to allow trait objects, which only allow adding new data types, but not new operations, i.e. if we have an instance of a trait object, we can't implement it on other traits because we erased the underlying data type which is inside the trait object.

I remember we discussed all of this before in May, but I may not have been as clarifying and succinct as I am attempting to communicate now.

Edit: Trait objects enable polymorphism within a homogeneous type parametrized collection, so that an invariant collection does not need to be used and the type parameter can remain invariant.

So it appears we need both. The two ways to achieve polymorphism offer different trade-offs.

@keean
Copy link
Owner Author

keean commented Sep 19, 2016

@shelby3 wrote:

I presume you mean subsuming to the greatest lower bound (GLB) when the heterogeneous types involved are traits (which btw may not exist since we can't subsume to any top type without instigating unsafe code, so it would often be compiler error). That won't apply when the heterogeneous types are data types since we can compute the anonymous structural type of the disjoint union, because otherwise data types could not be subsumed without subclassing. So without subclassing, we must have tagged disjoint unions, else we would lose the solution to Wadler's Expression Problem.

No, you simply keep the intersection of the traits implemented by both sides, the type remains polymorphic (unknown).

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean wrote:

No, you simply keep the intersection of the traits implemented by both sides, the type remains polymorphic (unknown).

Which is the same as what I wrote. So why did you write "no"? You described the GLB.

Yes we can do that when the GLB is not the top type. But it is not entirely sufficient to only offer that capability and not also unions of data types.

You seem to be not quite grasping (or articulating that you've grasped) two of my points:

  1. The GLB might the top type. In a safe language, the top type must be the empty set.
  2. If there is a GLB, it remains polymorphic, but we've lost (erased) one of the axes of extension of Wadler's Expression Problem, as compared to retaining a union of data types.

@keean
Copy link
Owner Author

keean commented Sep 19, 2016

@shelby3 wrote:

There are two ways for us to support polymorphism. One is to allow tagged unions of data types, which retains extension in both axes of Wadler's Expression Problem, because we can add new variants (data types) and new operations (trait implementations). The other is to allow trait objects, which only allow adding new data types, but not new operations, i.e. if we have an instance of a trait object, we can't implement it on other traits because we erased the underlying data type which is inside the trait object.

In a way they both suffer from a form of the expression problem. With tagged-unions, we have type-case statements scattered throughout the code, and If we add a new type to the tagged-union all the type cases need to have an extra option added.

With the "trait-object", if we want to add a new method to the collection you have to add a new trait to the collection type.

@keean
Copy link
Owner Author

keean commented Sep 19, 2016

You seem to be not quite grasping two of my points:

The GLB might the top type. In a safe language, the top type must be the empty set.
If there is a GLB, it remains polymorphic, but we've lost (erased) one of the axes of extension of Wadler's Expression Problem, as compared to retaining a union of data types.

If there are no traits in common between two types, then there are no functions you can call on the object. Consider two objects A and B that have no methods in common. If we have a function that returns "A|B" what methods can we call on the value with type "A|B"... none at all.

If you want to do a type-case (runtime matching) you should be using a tagged union. The point of type-classes is to derive the statically safe set of functions/methods that can be called on some object.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean wrote:

In a way they both suffer from a form of the expression problem. With tagged-unions, we have type-case statements scattered throughout the code, and If we add a new type to the tagged-union all the type cases need to have an extra option added.

Disagree. This is the crux of my proposed complete solution to the Expression Problem which I explained to you before in May. Let me try again to explain it.

When calling a function with a trait bound and passing a union of data types as the input, the compiler can automatically create the type-case boilerplate and wrap it as if it is a single implementation of that trait bound and pass that to function simulating monomorphism. Within said implementation, it will be doing the type-case logic, so it does stall the CPU more than monomorphim, but it is seamless at the source code and emitted function implementation level; and it is more optimizeable than trait objects which erase their knowledge of the data types and thus must use agnostic (blinded) jump tables.

It is the outcome of using only a trait bound to operate on the tagged union, which makes it fully automated boilerplate. That was my insight.

Of course that doesn't prevent the programmer from doing type-class guards to get at the precise data types for some other reason, but that wouldn't be an extensible design pattern.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean wrote:

You seem to be not quite grasping (or articulating that you've grasped) two of my points:

1‌‌​. The GLB might the top type. In a safe language, the top type must be the empty set.

If there are no traits in common between two types, then there are no functions you can call on the object. Consider two objects A and B that have no methods in common. If we have a function that returns A|B what methods can we call on the value with type A|B... none at all.

So we are in agreement.

Except that typically inferred subsumption to the top type is a compiler error (because it probably represents something the programming didn't intend, since it is more or less useless). The programmer must declare the top type, if that is what they want.

Btw, if you use backticks ` in markdown instead of quotes then your programming terms will be rendered correctly in a monospace font as shown above.

2​. If there is a GLB, it remains polymorphic, but we've lost (erased) one of the axes of extension of Wadler's Expression Problem, as compared to retaining a union of data types.

If you want to do a type-case (runtime matching) you should be using a tagged union. The point of type-classes if to derive the statically safe set of functions/methods that can be called on some object.

Seems you didn't quite understand what I was trying to explain to you this past May (granted my elucidation was probably not coherent enough then). See my prior comment for another attempt to explain how to avoid type-case boilerplate.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

Tangentially, I am doubting (not understanding) the claim that was made for Rust that monomorphic trait bounds do not stall the CPU pipeline. The function can't know which implementation it will be using until run-time, unless we generate a new version of the function for each implementation of a data type. Thus the function will be calling the methods using an array of pointers to the methods. So this is a jump table. We can't inline the methods into the function unless we generate a new version of the function for each implementation of a data type. So for a tagged union type, the same array of method pointers can be used, but of course within those wrapper methods will be run-time type-case branching which further stall the CPU pipeline, but branches are more optimizeable by the CPU than arbitrary jumps that would be the case for trait objects.

@keean
Copy link
Owner Author

keean commented Sep 19, 2016

@shelby3 I think you are not fully understanding the difference between dynamic (runtime) and static (compile time) polymorphism, and this is leading to confusion between us when we discuss this. Lets start with static polymorpism:

id<A>(x: A) : A {return x;}
id(4)

In this program id has a polymorphic type, that means its implementation is the same no matter what type it is passed. In this case only a single implementation of 'id' is required for all types.

class Show B {
    show(x : B)
}

instance Show for Int {
    show(x) = print_int(x)
}

instance Show for String {
    show(x) = print_string(x)
}

show(4)
show("ABC")

Here we see specialisation, there are different overloaded implementations for 'show' that are chosen by type. Each implementation deals with exactly one type.

show_twice<A : Show>(x : A):
    show(x)
    show(x)

Using type-classes we can write generic functions like 'show_twice' that work for any type by looking the specialised version of the function calls it uses up in the dictionaries for the types passed, but is a single generic implementation (the type class dictionary for 'Show A' is an implicit parameter').

However it is important to note that the type parameter is invariant, that means we must know the concrete type at compile time (this is the difference between parametric types and universally-quantified types, with universally-quantified types there are some situations like polymorphic-recursion where we might not know the type until runtime, which is why Haskell does this by dictionary passing). So with parametric types 'A' must be a ground type in order to compile the function, which means we can monomorphise and output a specific version of 'show_twice' for the type of the argument passed. So this is what happens:

show_twice("ABC")

compiles to:

show_twice___string(x : String):
    print_string(x)
    print_string(x)

show_twice___string("ABC")

combined with inlining this will produce:

print_string("ABC")
print_string("ABC")

So you can see there is no use for 'unions' as we must provide a concrete/ground type as the type parameter for a parametric function, and we can do this if the data is static.

When calling a function with a trait bound and passing a union of data types as the input, the compiler can automatically create the type-case boilerplate and wrap it as if it is a single implementation of that trait bound and pass that to function simulating monomorphism.

This is where I think you have a misunderstanding, if we can statically determine the type, then we can statically determine the exact type, no unions necessary. For example:

x : bool = true
y = if x then "hello" else 32
show_twice(y)

Here we know the type of 'y' is a string because 'x' is true, as such what is passed to show_twice is always 'string', it is never a 'string or int' because such a thing cannot exist y is always precisely a string or an int, never both at the same time. So for static polymorphism we never need union types.

Note: this kind of thing is not normally allowed because the type of 'y' depends on the value of ''x' being true or false. Such things are called dependent types, and languages like Rust don't allow this, instead they only allow types to depend on other types.

@keean
Copy link
Owner Author

keean commented Sep 19, 2016

A follow on from above, although we cannot support monomorphisation easily in the final case above, we can use path dependent types to do something similar. If we require the return type from 'if' to be the same on both sides, or perhaps more easily have 'if' as a statement not an expression so there is no 'return' type we can simply do:

if x then:
    show_twice("hello")
else:
    show_twice(32)

Here the type of 'show_twice' depends on the path taken at runtime, but we know the exact type passed to each instantiation of the polymorphic functions 'show_twice'. This is a compelling reason to not allow 'if' in expressions.

I would argue the above code is simpler and easier to read the the 'if' expression version, with the small cost of having to repeat 'show_twice'...

@keean
Copy link
Owner Author

keean commented Sep 19, 2016

So are there any circumstances where we cannot statically determine the type of something, the answer is if the type of something depends on the result of program IO.

For example if we have a polymorphic collection that we insert values of different types into depending on what key a user presses, we can no longer statically determine what the type of any given element selected from the collection will be.

This changes things significantly, so lets discuss the static case first and then move onto the dynamically polymorphic case.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean your one false assumption leads you astray:

This is where I think you have a misunderstanding, if we can statically determine the type, then we can statically determine the exact type, no unions necessary. For example:

x : bool = true
y = if x then "hello" else 32
show_twice(y)

Here we know the type of y is a string because x is true, as such what is passed to show_twice is always string, it is never a string | int because such a thing cannot exist y is always precisely a string or an int, never both at the same time. So for static polymorphism we never need union types.

A function exported from a module can't know at its compile-time, which trait implementation nor instance of a tagged union type it will be called with. That would defeat separate compilation. We need run-time polymorphism precisely because we need extension not because we linked together modules untyped as any dynamically at run-time, but because we use modular separate compilation at compile-time. Moreover, we have no way to trace all types through a program as that would be dependent typing, which is not Turing-complete. So we can't track which instances are in a tagged union type generally, which is the entire raison d'être of tagged union types. The only way you could track that would be dependent typing, which we aren't proposing to do because it isn't even Turing-complete.

What you wrote above does not apply to what I am proposing to use unions of data types for.

Btw, modular separate compilation is absolutely critical to JIT compilation, because the compiler doesn't have to compile the entire program in order to run a compiled portion of it.

Note: this kind of think is not normally allowed because the type of y depends on the value of x being true or false. Such things are called dependent types, and languages like Rust don't allow this, instead they only allow types to depend on other types.

I already knew that years ago.


I think you are not fully understanding the difference between dynamic (runtime) and static (compile time) polymorphism, and this is leading to confusion between us when we discuss this

I am fully understanding. You seem to be a bit confused, perhaps because Haskell typically doesn't support separate compilation. Haskell is global type inference and all libraries are typically presented in source code form and the entire program is compiled monolithically. But this absolutely isn't the way programming will be done with JavaScript.

Lets start with static polymorpism:

id<A>(x: A) : A {return x;}
id(4)

In this program id has a polymorphic type, that means its implementation is the same no matter what type it is passed. In this case only a single implementation of id is required for all types.

Because id calls no methods on A, thus you don't need to specialize the implementation of the id function.

Meta: One minor problem is developing between us, but it could be an indicator of a larger problem of working relationship that could develop. I hope not, so I will speak up now rather than let it fester. I am a stickler for perfection. I have asked you two times to please use backticks ` for quoting your code symbols and keywords (as I have shown above by editing your quote), because that is the idiomatic way to do it in markdown, so these terms appear in a monospace font with a grey background so they are easy to spot in the English text. And you did not reply indicating why you prefer the non-idiomatic style. So it seems you are ignoring this request for us to have a consistent style. I know old habits are hard to break, but really we should use idiomatic markdown so our verbiage is easier for readers who will be reading this for a long-time if we succeed to make a revolutionary programming language.

Edit: I realize you may be preferring quote marks instead of backticks since the latter are erased as rendering, because they copy+paste well (such as for quoting the other person when replying), but I prioritize beauty and readability over convenience. Hopefully one day Github will add a [Quote] button.

Using type-classes we can write generic functions like show_twice that work for any type by looking the specialised version of the function calls it uses up in the dictionaries for the types passed, but is a single generic implementation (the type class dictionary for Show A is an implicit parameter').

Correct. That is what I wrote immediately above your comment. Why are you pretending to disagree with me when you are not?

See the bolded text below.

@shelby3 wrote:

Thus the function will be calling the methods using an array of pointers to the methods.


However it is important to note that the type parameter is invariant, that means we must know the concrete type at compile time (this is the difference between parametric types and universally-quantified types, with universally-quantified types there are some situations like polymorphic-recursion where we might not know the type until runtime, which is why Haskell does this by dictionary passing). So with parametric types 'A' must be a ground type in order to compile the function, which means we can monomorphise and output a specific version of 'show_twice' for the type of the argument passed. So this is what happens:

show_twice("ABC")
compiles to:

show_twice___string(x : String):
    print_string(x)
    print_string(x)

show_twice___string("ABC")

Correct. That is what I wrote immediately above your comment. Why are you pretending to disagree with me when you are not?

See the bolded text below. My point was that creating a specialized function for each implementation of the trait may be too costly in terms of emitted source code in some cases.

@shelby3 wrote:

The function can't know which implementation it will be using until run-time, unless we generate a new version of the function for each implementation of a data type. ... We can't inline the methods into the function unless we generate a new version of the function for each implementation of a data type.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

Again I will repeat what I wrote:

There are two ways for us to support polymorphism. One is to allow tagged unions of data types, which retains extension in both axes of Wadler's Expression Problem, because we can add new variants (data types) and new operations (trait implementations). The other is to allow trait objects, which only allow adding new data types, but not new operations, i.e. if we have an instance of a trait object, we can't implement it on other traits because we erased the underlying data type which is inside the trait object.

We need both forms of polymorphism. A tagged union type is not static polymorphism. You were arguing this same point in May and I can see I still haven't been able to get you to understand that without dependent typing, then the state of the instance of a union type can't be statically tracked through all code paths.

The union type is first-class (except on function application as we discussed in May that would be undecidable under type inference). So the state of of instance of a union can be passed around through unfathomable code paths. There is no way to track the state of that dependently.

As you know that dependent typing is totally inapplicable for our programming language. It eliminates Turing-completeness because it can prove that programs halt. And it is very painstaking. Dependent typing is for formal provers and academic research languages.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean wrote:

So are there any circumstances where we cannot statically determine the type of something, the answer is if the type of something depends on the result of program IO.

For example if we have a polymorphic collection that we insert values of different types into depending on what key a user presses, we can no longer statically determine what the type of any given element selected from the collection will be.

This changes things significantly, so lets discuss the static case first and then move onto the dynamically polymorphic case.

The issue is not even that narrow. As I explained, separate compilation causes us to lose tracking of the state of the tagged union type at compile-time. And not just separate compilation, but even within a module, because we don't have dependent typing.

You don't need to think of special cases where your thinking was invalid. Your thinking was invalid generally.

I don't know what caused you to think that way, because I remember you had this same thinking in May. And you kept saying we didn't need tagged structural unions, because we could statically determine the state of the instance in your pet examples where I might want to infer a union type. But your pet examples are misleading you, because in general we can't do dependent typing.

All of us have moments where our minds get stuck in the wrong conceptualization. No worries, it has happened to me at other times. Nothing personal. That is why were are a team, to check each other's logic. I just want us to resolve this.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean wrote:

This is a compelling reason to not allow if in expressions.

Disagree strongly.

A follow on from above, although we cannot support monomorphisation easily in the final case above, we can use path dependent types to do something similar. If we require the return type from if to be the same on both sides, or perhaps more easily have if as a statement not an expression so there is no return type we can simply do:

if x then:
    show_twice("hello")
else:
    show_twice(32)

Here the type of show_twice depends on the path taken at runtime, but we know the exact type passed to each instantiation of the polymorphic functions show_twice.

There is nothing significantly disadvantageous with allowing the inference to a tagged union since in most cases the choice of if-else will not be known until run-time (which is the entire raison d'être of if-else otherwise the compiler would routinely erase the if-else). And most uses of if-else as expressions in the setting of using that value within the current function, will not resolve to disjoint data types. Either they will subsume to common trait or be the same data type in most cases.

And in the rare instance that the programmer doesn't want a tagged union (mostly for performance reasons), he can optimize manually as you have shown. Also, the compiler can use a much simpler form of path analysis (independent of run-time state) to do that performance optimization without needing dependent typing (although without purity it becomes more difficult for the compiler to know what is safe to reorder). Most of the time that sort of construct is going to be used to return a tagged union, which can't be optimized in any case. So in most cases, the programmer doesn't even need to think about it.

Forcing the programmer to break DNRY (DRY) when there is no advantage that the compiler can't usually optimize, makes for an unnecessarily noisy language. Which makes the code more difficult to follow and reason about. And makes const assignments impossible in some cases.

@keean
Copy link
Owner Author

keean commented Sep 19, 2016

The issue is not even that narrow. As I explained, separate compilation causes us to lose tracking of the state of the tagged union type at compile-time. And not just separate compilation, but even within a module, because we don't have dependent typing.

You don't need to think of special cases where your thinking was invalid. Your thinking was invalid generally.

We can know the types statically, that is the fundamental nature of static type systems. The answer is we do not compile a module to final code but to intermediate code, and when a module function is called from a different module the types become concrete, and this is when the monomorphisation happens. In some systems this instantiation happens at runtime (Look up "Ada Elaboration"), and others at compile time. The key is to consider what gets imported when we import code. In languages with generics it is the intermediate representation and type signatures that get imported, and the code is generated in place.

As I pointed out above anything that does not depend on runtime IO can be evaluated as constants at compile time.

I am not opposed to tagged unions, and we can easily have:

data Either a b = Left a | Right b
let y = if x then (Left "XYZ") else (Right 32)

in which case y would have the type Either String Int

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean wrote:

So you can see there is no use for 'unions' as we must provide a concrete/ground type as the type parameter for a parametric function, and we can do this if the data is static.

We can create a variant of the function that accepts a dictionary in addition to (or instead of) the specialized function variants for inlining every trait implementation. That was the point of my comment.

I also wrote:

See the bolded text below. My point was that creating a specialized function for each implementation of the trait may be too costly in terms of emitted source code in some cases.

The trait objects alternative also can't be inlined monomorphically. Just accept that polymorphism requires passing in a dictionary.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean wrote:

We can know the types statically, that is the fundamental nature of static type systems

Please stop conflating compile-time type with run-time state of instance of a type.

We can't generally know the run-time state of the instance of a type of union where the contained data items are tagged, because it requires dependent typing.

The answer is we do not compile a module to final code but to intermediate code, and when a module function is called from a different module the types become concrete, and this is when the monomorphisation happens. In some systems this instantiation happens at runtime (Look up "Ada Elaboration"), and others at compile time. The key is to consider what gets imported when we import code. In languages with generics it is the intermediate representation and type signatures that get imported, and the code is generated in place.

You are proposing to have a linker either at static or dynamic linking time. This is unthinkable for JavaScript modules at this juncture of the ecosystem (maybe we could change that in future). In any case, whether monomorphism is practical inter-module is orthogonal to our discussion about whether the run-time state of the instance of a type of union can be tracked through code. I say that tracking can not be accomplished without dependent typing.

As I pointed out above anything that does not depend on runtime I/O can be evaluated as constants at compile time.

But there is no way to generally isolate those dependencies without dependent typing or purity. With purity and a monadic effect system, you can isolate certain dependencies.

And when you say a constant (referring to the run-time state of the instance of a type of union), it becomes impossible to track (regardless of whether dependent on I/O or not) what that constant is through the code paths without dependent typing. Just because something never changes and even if we can identify that it doesn't ever change, doesn't mean we can compute its value. You are speaking about UFOs and hyperbolic theory, not reality. I want to make a real language, not talk about fantasies.

I hope you come back grounded in reality soon, because this is turning me off. We are creating too much verbiage again, as we did in May. Can't we get directly to the point?

I am not opposed to tagged unions, and we can easily have:

data Either a b = Left a | Right b
let y = if x then (Left "XYZ") else (Right 32)

in which case y would have the type Either String Int

That is too much boilerplate. TypeScript, N4JS, Ceylon, and others already have anonymous structural unions. Why are you forcing us back to Haskell's flaws? I want to move forward in a modern language design. The compiler can tag the instances of data type in the emitted code, so no need to tag the union. The anonymous structural union types are erased at compile-time, as they reference an instance of a data type. The instance of a data type tracks what type it is.

If I didn't need my complete solution to Wadler's Expression Problem and anonymous structural unions, and I wanted to use a monadic effect system, I would just go use PureScript. No need to create a language in that case.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

Dependency typing would be on the order of but worse than Rust's lifetime checking in tsuris and inflexibility.

Declared purity could help us isolate invariants that don't change. But still we would need polymorphism for those things that do change. There is no argument you can make to say we don't need polymorphism.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

@keean if you want to create a boilerplate-prone, noisy language that violates DNRY, then I will not participate. I want a very elegant language which is concise. I wouldn't even fathom an Either as an alternative to anonymous structural union. I wouldn't fathom not having everything as an expression available to me. I am not against using them as statements, since many people are accustomed to that statements style. I am trying to allow imperative, impure, mutable data structures programming while also offering functional, pure, and immutable coding options. If you want to go exclusively imperative, impure, mutable data structures programming, then we are in disagreement.

Please let me know what you have in mind.

I was hoping we could finish up the rough design choices within a day or two, so we could move on to coding.

@keean
Copy link
Owner Author

keean commented Sep 19, 2016

@shelby3 wrote:

Please stop conflating type with state of instance of a type.

In static polymorphism all the ground states of every type variable is known at compile time. With runtime polymorphism we cannot know the types at compile time (because they are unknown until runtime). The two have different type signatures. Personally I think most languages to not make a clear distinction in the type system between static and dynamic polymorphism, so it is hard to understand. Rust gets this right, as function generics are all static polymorphism, and trait-objects are needed for runtime polymorphism. You can see the same distinction in C++ where all non-virtual function dispatch can be decided statically at compile time, and you need to mark a method 'virtual' to enable runtime polymorphism. I blame Java, where every method is automatically virtual for making this hard to understand.

@shelby3
Copy link

shelby3 commented Sep 19, 2016

What is hard to understand? You haven't written anything in that prior comment that refutes anything I have written before it. I agree with your definitions. So what is your point?

@shelby3
Copy link

shelby3 commented Sep 19, 2016

We must choose if we will allow run-time polymorphism. If yes, then we have trait objects and anonymous tagged unions to accomplish it with. They are both enable run-time polymorphism. Choose one or both. I choose both. You choose only trait objects. That doesn't make me incorrect. It is just a different choice. Without anonymous tagged unions, then we can't solve both axes of Wadler's Expression Problem. Why do I have to repeat this?


And the other salient point is you can't easily separate run-time variables from constants without dependent typing. And you can't even track the values of constant values at compile-time (in order to compile static monomorphism) without dependent typing. Thus static polymorphism doesn't work well when you need to track values. You keep conflating typing with values. You can track the union type easily, but you can't track the value in that type easily even if it doesn't change at run-time, because you need dependent typing to do that.

Why are you making this obtuse? Please stop thinking it is myself who doesn't understand.

@keean
Copy link
Owner Author

keean commented Sep 20, 2016

@shelby3 this paper seems relevant: http://arxiv.org/pdf/1606.01106v1.pdf

Edit: Reading the paper, it looks like they run into the usual problem of needing to have whole-program compilation.

@keean
Copy link
Owner Author

keean commented Sep 20, 2016

@shelby3 Okay, lets say I accept union types are a good idea because they enable concise code (and the paper I linked to above agrees with you on that point). How do they solve the expression problem. I think I need to see what the 'expression evaluation' example I provided above looks like in your system. To recap, I want to have some objects like Mul, Add, and Num that I can use to define expressions, and a function eval that calculates the result of the expression.

I want to be able to add a new operator (object) like Sub without touching the definitions of any of the existing objects or functions.

I want to be able to add a new function, for example print that will print any expression, without touching the definitions of any of the existing objects or functions.

Finally when writing the code, you do not know what expression I will want to evaluate or print, as I am the user of the library.

What would this look like with your solution?

@shelby3
Copy link

shelby3 commented Sep 20, 2016

@keean wrote:

this paper seems relevant: http://arxiv.org/pdf/1606.01106v1.pdf
Okay, lets say I accept union types are a good idea because they enable concise code (and the paper I linked to above agrees with you on that point)

Yeah for example I quote from the paper, “it is still not possible to form unions of values of generic types, but just finite enumerations of tagged values”.

Interestingly I wrote the similar point on the Rust forum in May, as quoted from the paper, “subtyping
corresponds to set containment
”. Note it eliminates the undesirable aspects of subclassing and subsuming (often associated with subtyping) for polymorphism in other subclassing languages and Rust. And combined with typeclasses, trait objects entirely eliminate covariance issues (becomes invariant always) for parametrized typeclasses but my solution requires invariant type parametrized containers (e.g. List) because the union does express a subtyping relationship.

And my discovery and publish in the Rust forum in May apparently predates (is prior art for) theirs. I was pushing for union types to improve the polymorphism/subtyping/heterogeneous-parametrization issue since at least 3 years ago. As they said, it is not that novel of an idea. The novel aspect is how I combined it with typeclasses to form a complete solution to the Expression Problem, which is out-of-the-scope (not considered by) that paper you cited.

@keean
Copy link
Owner Author

keean commented Sep 20, 2016

Ill have a go...

struct Num {
    n : Int
}

struct Add<A, B> {
    e1: A
    e2: B
}

trait Eval {
    eval() : Int
}

impl Eval for Num {
    eval() {
        return this.n
    }
}

impl<A, B> Eval for Add<A, B> { 
    eval() {
        return this.e1 + this.e2
    }
}

This solution in Rust seems quite good, you can add new operators without touching the above code:

struct Mul<A, B> {
    e1: A
    e2: B
}

impl<A, B> Eval for Mul<A, B> { 
    eval() {
        return this.e1 * this.e2
    }
}

And I can add new functions without touching the above code:

trait Print {
    print() : String
}

impl Print for Num {
    print() {
        return int_to_string(this.n)
    }
}

impl<A, B> Print for Add<A, B> { 
    print() {
        return e1.print() + "+" + e2.print()
    }
}

impl<A, B> Print for Mul<A, B> { 
    print() {
        return e1.print() + "*" + e2.print()
    }
}

So we don't need any unions so far... but I think I can see where they come in.

I can do the following without unions:

(new Mul(new Add(new Num(1), new Num(2)), new Num(3))).print()

But that is all statically determined, next post I am going to see if I have got the union bit right.

@shelby3
Copy link

shelby3 commented Sep 20, 2016

@keean wrote:

Okay, lets say I accept union types are a good idea because they enable concise code (and the paper I linked to above agrees with you on that point). How do they solve the expression problem. I think I need to see what the 'expression evaluation' example I provided above looks like in your system. To recap, I want to have some objects like Sub, Add, and Num that I can use to define expressions, and a function eval that calculates the result of the expression.

I want to be able to add a new operator (object) like Mul without touching the definitions of any of the existing objects or functions. [Edit: swapped Mul and Sub to be consistent with your upthread example]

I want to be able to add a new function, for example print that will print any expression, without touching the definitions of any of the existing objects or functions.

What would this look like with your solution?

Rust's trait objects allow you to do that because they allow polymorphism for adding new data type variants such as Mul and the typeclasses mechanism allows you to add new operations such as Print orthogonal to the declaration (definition) of the data types (so no need to recompile the data types which is the requirement of the Expression Problem), so you might think you don't need my solution.

But creating a trait object with trait bound Print prematurely binds your Print trait bound (containing the print() method operation) to the variant data types you assign to the instances of said trait object. And this prevents you from implementing a new operation Show and applying it to said preexisting trait object with bound Print, without recompiling said trait object. Thus violating the requirement of the Expression Problem.

My solution instead retains the typing information about the data types in the structural union (instead of assigning to a trait object), so that the instances of the said structural union can be passed as input to any function with any trait bound (and of course we provide implementations for that said trait bound for all data types listed in the said structural union, but these don't require any recompilation of preexisting code).

Think about this way. Typeclasses (e.g. Rust traits) differ from subclassing because they delay binding of implementation of interface until the function call site. But that only applies for the monomorphism case. Trait objects bind the implementation to the interface at the time of assignment to the trait object, which is premature. My solution extends delayed binding of implementation to the function call site for the polymorphism case.

Hopefully it clicks now for you? Did you have that "A ha" moment now?

Again note that trait objects don't require invariant collections, so they have a use-case where they are an advantage. My solution requires invariant containers (e.g. List) because it is not invariant for the type parametrized container, given that unions express a subtyping relationship.

@keean
Copy link
Owner Author

keean commented Sep 20, 2016

So I think where I need the unions is in something like a parser:

y = if x:
        new Add(new Num(1), new Num(2))
    else:
        new Mul(new Num(1), new Num(2))
y.print()

So y would have the type Add<Num, Num> | Mul<Num, Num> and y.print() would have to be passed both dictionaries.

But with a parser its all going to go horribly wrong because we will need recursive types, and at every parse decision we will double the number of terms in type.

Consider the type signatures of the parser functions

parse_num(in : Stream) : Num
parse_add(in : Stream) : Add<Num, Num>
parse_mul(in : Stream) : Mul<Num, Num>
parse_expr(in : Stream) : Num | Add<Num, Num> | Add<Add<Num, Num>, Num> | Add<Add<Add<Num, Num>, Num>, Num> ... etc forever.

So we will also need recursive types, and be able to do something like:

type Expr = Num | Add<Expr, Expr> | Mul<Expr, Expr>
parse_expr(in : Stream) : Expr

Is that going to work? Can we infer the recursive types? What does it mean for dictionary passing?

@shelby3
Copy link

shelby3 commented Sep 20, 2016

@keean wrote:

I can do the following without unions:

(new Mul(new Add(new Num(1), new Num(2)), new Num(3))).print()

But that is all statically determined...

Exactly. Once you introduce polymorphism then without unions you will need to discard statically (in the source code) the struct types and assign to a Rust trait object. If you choose Eval for the bound of said trait object, then you won't be able to Print on it. Even if you have the foresight to put Eval+Print bound on said trait object, it still won't be open to extension to other trait bounds that were not created yet.

My solution isn't only applicable to type parametrized containers (such as List) but also in general for polymorphism as you have explained above. Thus the disadvantage of my solution compared to trait objects being that for its interoption with type parametrized containers, we are forced to use an invariant container such as List (which maintains invariance on the type parameter despite the subtyping relationship expressed by union types), is inapplicable (irrelevant) when using my solution not with a type parametrized container.

@keean
Copy link
Owner Author

keean commented Sep 20, 2016

@shelby3 What do you think about the recursive typing issue in 'parse_expr', I think we may be able to infer it, and then it is a case of working out what dictionaries will be required in the trait-method. I think I get what is going on, and I think a parser and expressions are a good example to work with, as it is the core part of writing a compiler, which is something we are doing to implement the language.

I think its time to switch to the inferencing discussion: #5

@shelby3
Copy link

shelby3 commented Sep 20, 2016

@keean wrote:

I think its time to switch to the inferencing discussion: #5

Agreed.

@jdegoes
Copy link

jdegoes commented Sep 20, 2016

I don't know anything about zenscript, but union types of sufficient power can enable overlapping ADTs and functions that are polymorphic across the exact shape of those ADTs.

You probably won't be able to support infinite types (surely not type inference for them), but if you used fixed point data, you don't need to (though that requires higher-kinded types to be really useful).

As an example in present day Scala:

case class Mul[A](left: A, right: A)
case class Sub[A](left: A, right: A)

type Coproduct[F[_], G[_], A] = Either[F[A], G[A]]

type Expr[A] = Coproduct[Mul, Sub, A]

case class Fix[F[_]](unwrap: F[Fix[F]])

You can regain a recursive definition for Expr[A], for example, by using Fix[Expr], and with a little effort, you can see how this representation is basically a "poor person's union type", albeit a higher order one. You can also see how ADTs can overlap and thus reuse parts of other ADTs, and how operation can be defined such that they require only the minimum components necessary.

As for how all this would translate into zenscript (or not), I'm not sure since I only learned of this project and this issue today thanks to an IM from @shelby3. Good luck Shelby! 😄

@keean
Copy link
Owner Author

keean commented Sep 20, 2016

@jdegoes its early days, so there's plenty of scope to influence the direction of the project :-) I have started building the compiler in JavaScript, with JavaScript as the first target. Big items to sort out are the syntax and the type system. I have been playing with type systems for a while, especially type inference, so I have a pretty good idea of how to implement it.

I think the core feature set at the moment are type-classes and union types. I think we will go with that and try and rapidly get to a first prototype, so that we can start writing programs, probably self-hosting the compiler would be the obvious thing to do.

Type inference for recursive types is not a problem in my experience, the type system internally works in the universe of regular-trees, and then we convert to 'Mu' types for display. For example:

((\x . ((x x) x)) (\x . ((x x) x))) : {} |- ((a -> b as a) -> b as b)

@shelby3
Copy link

shelby3 commented Sep 21, 2016

@jdegoes wrote:

I don't know anything about zenscript, but union types of sufficient power can enable overlapping ADTs and functions that are polymorphic across the exact shape of those ADTs.

If I understand your point correctly, then we also realized that and is one of the reasons we need union types. That when combined with a novel way of resolving typeclasses for unions also enables us to truly solve the Expression Problem where apparently others haven't even they apparently erroneously claimed to.

You can regain a recursive definition for Expr[A], for example, by using Fix[Expr], and with a little effort, you can see how this representation is basically a "poor person's union type", albeit a higher order one.

We want to support a first-class unnamed structural union, to eliminate boilerplate and also so the inference engine can infer structural union types.

We are trying to provide what we see is a missing variant in programming language offerings, especially one designed to be compiled to JavaScript (although other targets might be a possibility as well). That is a strict evaluation, no subclassing(!), typeclasses, and first-class structural unions, for greater extension capabilities in a soundly typed language that is also uncluttered with paradigms and thus reasonably simple to grasp. Also a syntax reasonably close to JavaScript. We're distilling from the best features of each language.

We are building this because @keean and I decided we wanted to code (or at least experiment) in a language like this. And we see a gap in the market that is not filled. Kudos thus far to @keean on doing the most on implementation thus far. So far, we also have some interest expressed from @SimonMeskens and @svieira (both of whom I recently met via discussions at the TypeScript Github).

Thanks John for visiting our nascent project.

@keean
Copy link
Owner Author

keean commented Sep 21, 2016

@shelby3 I am closing this issue because I think we are decided that anonymous structural unions should be in the language in some form (at least rank1).

@keean keean closed this as completed Sep 21, 2016
@shelby3
Copy link

shelby3 commented Sep 21, 2016

@keean can you please edit the close to add a Label that indicates a favorable resolution?

@shelby3
Copy link

shelby3 commented Oct 7, 2016

I think I want flow typing and cleanest solution to null employing anonymous unions.

@keean
Copy link
Owner Author

keean commented Oct 7, 2016

I think "flow typing" is what I call path-dependent typing, and if so, then yes it is something I want too.

Regarding null I think that makes sense. Null is effectively the unit type (void), and can be inferred into any union type.

There is a small problem with our type class approach, because normally we would want all types in the union to implement the type classes required by a function, when null will not implement all (if any) of them.

The solution is to force the programmer to explicitly provide a type-case for Null if null does not implement the type-class required for a function.

This is really a very simple axiom, and in the future we might allow more complex axioms, like inequality constraints on argument types.

This will feel almost exactly like dependent typing except we never depend on types at runtime, instead we force the user to statically insert a check value. This is really the combination of "path dependent types" and "refinement types".

@keean
Copy link
Owner Author

keean commented Oct 7, 2016

I notice Ceylon always requires the type parameters, as in:

Box<Out> fmap<Box,In,Out>(Out(In) fun, Box<In> box) 
        given Box<Element> { ... }

I am sure we are okay to infer them, but it might cause problems with other Ceylon-like types, we just have to be careful as we expand the type system, and ideally keep it as simple as possible.

@SimonMeskens
Copy link

As long as not all types can be Null, I'm fine. As far as I'm concerned, no languages should have Null at all, a union type or an empty monad is all you need. Implicit nullable types are evil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants