-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unions and Intersections #2
Comments
I realise you can write "if x true else 3", but why would you do that? To me this is not a motivating example, it needs to be something a bit bigger, and algorithm or design pattern that benefits in readability or simplicity from having the feature. What do we lose by insisting both sides of the 'if' are the same type? |
@keean wrote:
Resolving to an instance of a heterogeneous type.
|
What about inferring the type "x where x is in all common traits"? If you allow traits as "return" types then this is not a problem. |
@keean wrote:
I presume you mean subsuming to the greatest lower bound (GLB) when the heterogeneous types involved are traits (which btw may not exist since we can't subsume to
I am thinking it is a non-unified language with many corner case for us to attempt to only allow trait types (bounds) on function arguments. We must support trait objects every where a type can be used, except they (trait bounds and trait objects) make no sense in unions because traits by definition aren't tagged (reified), but unions are okay for data types will (must) be tagged in the run-time. |
Thinking about this more, traits are not types of instances. They are either monomorphic bounds on a type parameter of a function/method, or they are the polymorphic interface of trait objects with the dictionary joined to the instance just like subclasses (but with a There are two ways for us to support polymorphism. One is to allow tagged unions of data types, which retains extension in both axes of Wadler's Expression Problem, because we can add new variants (data types) and new operations (trait implementations). The other is to allow trait objects, which only allow adding new data types, but not new operations, i.e. if we have an instance of a trait object, we can't implement it on other traits because we erased the underlying data type which is inside the trait object. I remember we discussed all of this before in May, but I may not have been as clarifying and succinct as I am attempting to communicate now. Edit: Trait objects enable polymorphism within a homogeneous type parametrized collection, so that an invariant collection does not need to be used and the type parameter can remain invariant. So it appears we need both. The two ways to achieve polymorphism offer different trade-offs. |
@shelby3 wrote:
No, you simply keep the intersection of the traits implemented by both sides, the type remains polymorphic (unknown). |
@keean wrote:
Which is the same as what I wrote. So why did you write "no"? You described the GLB. Yes we can do that when the GLB is not the top type. But it is not entirely sufficient to only offer that capability and not also unions of data types. You seem to be not quite grasping (or articulating that you've grasped) two of my points:
|
@shelby3 wrote:
In a way they both suffer from a form of the expression problem. With tagged-unions, we have type-case statements scattered throughout the code, and If we add a new type to the tagged-union all the type cases need to have an extra option added. With the "trait-object", if we want to add a new method to the collection you have to add a new trait to the collection type. |
If there are no traits in common between two types, then there are no functions you can call on the object. Consider two objects A and B that have no methods in common. If we have a function that returns "A|B" what methods can we call on the value with type "A|B"... none at all. If you want to do a type-case (runtime matching) you should be using a tagged union. The point of type-classes is to derive the statically safe set of functions/methods that can be called on some object. |
@keean wrote:
Disagree. This is the crux of my proposed complete solution to the Expression Problem which I explained to you before in May. Let me try again to explain it. When calling a function with a trait bound and passing a union of data types as the input, the compiler can automatically create the type-case boilerplate and wrap it as if it is a single implementation of that trait bound and pass that to function simulating monomorphism. Within said implementation, it will be doing the type-case logic, so it does stall the CPU more than monomorphim, but it is seamless at the source code and emitted function implementation level; and it is more optimizeable than trait objects which erase their knowledge of the data types and thus must use agnostic (blinded) jump tables. It is the outcome of using only a trait bound to operate on the tagged union, which makes it fully automated boilerplate. That was my insight. Of course that doesn't prevent the programmer from doing type-class guards to get at the precise data types for some other reason, but that wouldn't be an extensible design pattern. |
@keean wrote:
So we are in agreement. Except that typically inferred subsumption to the top type is a compiler error (because it probably represents something the programming didn't intend, since it is more or less useless). The programmer must declare the top type, if that is what they want. Btw, if you use backticks ` in markdown instead of quotes then your programming terms will be rendered correctly in a monospace font as shown above.
Seems you didn't quite understand what I was trying to explain to you this past May (granted my elucidation was probably not coherent enough then). See my prior comment for another attempt to explain how to avoid type-case boilerplate. |
Tangentially, I am doubting (not understanding) the claim that was made for Rust that monomorphic trait bounds do not stall the CPU pipeline. The function can't know which implementation it will be using until run-time, unless we generate a new version of the function for each implementation of a data type. Thus the function will be calling the methods using an array of pointers to the methods. So this is a jump table. We can't inline the methods into the function unless we generate a new version of the function for each implementation of a data type. So for a tagged union type, the same array of method pointers can be used, but of course within those wrapper methods will be run-time type-case branching which further stall the CPU pipeline, but branches are more optimizeable by the CPU than arbitrary jumps that would be the case for trait objects. |
@shelby3 I think you are not fully understanding the difference between dynamic (runtime) and static (compile time) polymorphism, and this is leading to confusion between us when we discuss this. Lets start with static polymorpism:
In this program id has a polymorphic type, that means its implementation is the same no matter what type it is passed. In this case only a single implementation of 'id' is required for all types.
Here we see specialisation, there are different overloaded implementations for 'show' that are chosen by type. Each implementation deals with exactly one type.
Using type-classes we can write generic functions like 'show_twice' that work for any type by looking the specialised version of the function calls it uses up in the dictionaries for the types passed, but is a single generic implementation (the type class dictionary for 'Show A' is an implicit parameter'). However it is important to note that the type parameter is invariant, that means we must know the concrete type at compile time (this is the difference between parametric types and universally-quantified types, with universally-quantified types there are some situations like polymorphic-recursion where we might not know the type until runtime, which is why Haskell does this by dictionary passing). So with parametric types 'A' must be a ground type in order to compile the function, which means we can monomorphise and output a specific version of 'show_twice' for the type of the argument passed. So this is what happens:
compiles to:
combined with inlining this will produce:
So you can see there is no use for 'unions' as we must provide a concrete/ground type as the type parameter for a parametric function, and we can do this if the data is static.
This is where I think you have a misunderstanding, if we can statically determine the type, then we can statically determine the exact type, no unions necessary. For example:
Here we know the type of 'y' is a string because 'x' is true, as such what is passed to show_twice is always 'string', it is never a 'string or int' because such a thing cannot exist y is always precisely a string or an int, never both at the same time. So for static polymorphism we never need union types. Note: this kind of thing is not normally allowed because the type of 'y' depends on the value of ''x' being true or false. Such things are called dependent types, and languages like Rust don't allow this, instead they only allow types to depend on other types. |
A follow on from above, although we cannot support monomorphisation easily in the final case above, we can use path dependent types to do something similar. If we require the return type from 'if' to be the same on both sides, or perhaps more easily have 'if' as a statement not an expression so there is no 'return' type we can simply do:
Here the type of 'show_twice' depends on the path taken at runtime, but we know the exact type passed to each instantiation of the polymorphic functions 'show_twice'. This is a compelling reason to not allow 'if' in expressions. I would argue the above code is simpler and easier to read the the 'if' expression version, with the small cost of having to repeat 'show_twice'... |
So are there any circumstances where we cannot statically determine the type of something, the answer is if the type of something depends on the result of program IO. For example if we have a polymorphic collection that we insert values of different types into depending on what key a user presses, we can no longer statically determine what the type of any given element selected from the collection will be. This changes things significantly, so lets discuss the static case first and then move onto the dynamically polymorphic case. |
@keean your one false assumption leads you astray:
A function exported from a module can't know at its compile-time, which trait implementation nor instance of a tagged union type it will be called with. That would defeat separate compilation. We need run-time polymorphism precisely because we need extension not because we linked together modules untyped as What you wrote above does not apply to what I am proposing to use unions of data types for. Btw, modular separate compilation is absolutely critical to JIT compilation, because the compiler doesn't have to compile the entire program in order to run a compiled portion of it.
I already knew that years ago.
I am fully understanding. You seem to be a bit confused, perhaps because Haskell typically doesn't support separate compilation. Haskell is global type inference and all libraries are typically presented in source code form and the entire program is compiled monolithically. But this absolutely isn't the way programming will be done with JavaScript.
Because Meta: One minor problem is developing between us, but it could be an indicator of a larger problem of working relationship that could develop. I hope not, so I will speak up now rather than let it fester. I am a stickler for perfection. I have asked you two times to please use backticks ` for quoting your code symbols and keywords (as I have shown above by editing your quote), because that is the idiomatic way to do it in markdown, so these terms appear in a monospace font with a grey background so they are easy to spot in the English text. And you did not reply indicating why you prefer the non-idiomatic style. So it seems you are ignoring this request for us to have a consistent style. I know old habits are hard to break, but really we should use idiomatic markdown so our verbiage is easier for readers who will be reading this for a long-time if we succeed to make a revolutionary programming language. Edit: I realize you may be preferring quote marks instead of backticks since the latter are erased as rendering, because they copy+paste well (such as for quoting the other person when replying), but I prioritize beauty and readability over convenience. Hopefully one day Github will add a
Correct. That is what I wrote immediately above your comment. Why are you pretending to disagree with me when you are not? See the bolded text below. @shelby3 wrote:
Correct. That is what I wrote immediately above your comment. Why are you pretending to disagree with me when you are not? See the bolded text below. My point was that creating a specialized function for each implementation of the trait may be too costly in terms of emitted source code in some cases. @shelby3 wrote:
|
Again I will repeat what I wrote:
We need both forms of polymorphism. A tagged union type is not static polymorphism. You were arguing this same point in May and I can see I still haven't been able to get you to understand that without dependent typing, then the state of the instance of a union type can't be statically tracked through all code paths. The union type is first-class (except on function application as we discussed in May that would be undecidable under type inference). So the state of of instance of a union can be passed around through unfathomable code paths. There is no way to track the state of that dependently. As you know that dependent typing is totally inapplicable for our programming language. It eliminates Turing-completeness because it can prove that programs halt. And it is very painstaking. Dependent typing is for formal provers and academic research languages. |
@keean wrote:
The issue is not even that narrow. As I explained, separate compilation causes us to lose tracking of the state of the tagged union type at compile-time. And not just separate compilation, but even within a module, because we don't have dependent typing. You don't need to think of special cases where your thinking was invalid. Your thinking was invalid generally. I don't know what caused you to think that way, because I remember you had this same thinking in May. And you kept saying we didn't need tagged structural unions, because we could statically determine the state of the instance in your pet examples where I might want to infer a union type. But your pet examples are misleading you, because in general we can't do dependent typing. All of us have moments where our minds get stuck in the wrong conceptualization. No worries, it has happened to me at other times. Nothing personal. That is why were are a team, to check each other's logic. I just want us to resolve this. |
@keean wrote:
Disagree strongly.
There is nothing significantly disadvantageous with allowing the inference to a tagged union since in most cases the choice of And in the rare instance that the programmer doesn't want a tagged union (mostly for performance reasons), he can optimize manually as you have shown. Also, the compiler can use a much simpler form of path analysis (independent of run-time state) to do that performance optimization without needing dependent typing (although without purity it becomes more difficult for the compiler to know what is safe to reorder). Most of the time that sort of construct is going to be used to return a tagged union, which can't be optimized in any case. So in most cases, the programmer doesn't even need to think about it. Forcing the programmer to break DNRY (DRY) when there is no advantage that the compiler can't usually optimize, makes for an unnecessarily noisy language. Which makes the code more difficult to follow and reason about. And makes |
We can know the types statically, that is the fundamental nature of static type systems. The answer is we do not compile a module to final code but to intermediate code, and when a module function is called from a different module the types become concrete, and this is when the monomorphisation happens. In some systems this instantiation happens at runtime (Look up "Ada Elaboration"), and others at compile time. The key is to consider what gets imported when we import code. In languages with generics it is the intermediate representation and type signatures that get imported, and the code is generated in place. As I pointed out above anything that does not depend on runtime IO can be evaluated as constants at compile time. I am not opposed to tagged unions, and we can easily have:
in which case |
@keean wrote:
We can create a variant of the function that accepts a dictionary in addition to (or instead of) the specialized function variants for inlining every trait implementation. That was the point of my comment. I also wrote:
The trait objects alternative also can't be inlined monomorphically. Just accept that polymorphism requires passing in a dictionary. |
@keean wrote:
Please stop conflating compile-time type with run-time state of instance of a type. We can't generally know the run-time state of the instance of a type of union where the contained data items are tagged, because it requires dependent typing.
You are proposing to have a linker either at static or dynamic linking time. This is unthinkable for JavaScript modules at this juncture of the ecosystem (maybe we could change that in future). In any case, whether monomorphism is practical inter-module is orthogonal to our discussion about whether the run-time state of the instance of a type of union can be tracked through code. I say that tracking can not be accomplished without dependent typing.
But there is no way to generally isolate those dependencies without dependent typing or purity. With purity and a monadic effect system, you can isolate certain dependencies. And when you say a constant (referring to the run-time state of the instance of a type of union), it becomes impossible to track (regardless of whether dependent on I/O or not) what that constant is through the code paths without dependent typing. Just because something never changes and even if we can identify that it doesn't ever change, doesn't mean we can compute its value. You are speaking about UFOs and hyperbolic theory, not reality. I want to make a real language, not talk about fantasies. I hope you come back grounded in reality soon, because this is turning me off. We are creating too much verbiage again, as we did in May. Can't we get directly to the point?
That is too much boilerplate. TypeScript, N4JS, Ceylon, and others already have anonymous structural unions. Why are you forcing us back to Haskell's flaws? I want to move forward in a modern language design. The compiler can tag the instances of data type in the emitted code, so no need to tag the union. The anonymous structural union types are erased at compile-time, as they reference an instance of a data type. The instance of a data type tracks what type it is. If I didn't need my complete solution to Wadler's Expression Problem and anonymous structural unions, and I wanted to use a monadic effect system, I would just go use PureScript. No need to create a language in that case. |
Dependency typing would be on the order of but worse than Rust's lifetime checking in tsuris and inflexibility. Declared purity could help us isolate invariants that don't change. But still we would need polymorphism for those things that do change. There is no argument you can make to say we don't need polymorphism. |
@keean if you want to create a boilerplate-prone, noisy language that violates DNRY, then I will not participate. I want a very elegant language which is concise. I wouldn't even fathom an Please let me know what you have in mind. I was hoping we could finish up the rough design choices within a day or two, so we could move on to coding. |
@shelby3 wrote:
In static polymorphism all the ground states of every type variable is known at compile time. With runtime polymorphism we cannot know the types at compile time (because they are unknown until runtime). The two have different type signatures. Personally I think most languages to not make a clear distinction in the type system between static and dynamic polymorphism, so it is hard to understand. Rust gets this right, as function generics are all static polymorphism, and trait-objects are needed for runtime polymorphism. You can see the same distinction in C++ where all non-virtual function dispatch can be decided statically at compile time, and you need to mark a method 'virtual' to enable runtime polymorphism. I blame Java, where every method is automatically virtual for making this hard to understand. |
What is hard to understand? You haven't written anything in that prior comment that refutes anything I have written before it. I agree with your definitions. So what is your point? |
We must choose if we will allow run-time polymorphism. If yes, then we have trait objects and anonymous tagged unions to accomplish it with. They are both enable run-time polymorphism. Choose one or both. I choose both. You choose only trait objects. That doesn't make me incorrect. It is just a different choice. Without anonymous tagged unions, then we can't solve both axes of Wadler's Expression Problem. Why do I have to repeat this? And the other salient point is you can't easily separate run-time variables from constants without dependent typing. And you can't even track the values of constant values at compile-time (in order to compile static monomorphism) without dependent typing. Thus static polymorphism doesn't work well when you need to track values. You keep conflating typing with values. You can track the union type easily, but you can't track the value in that type easily even if it doesn't change at run-time, because you need dependent typing to do that. Why are you making this obtuse? Please stop thinking it is myself who doesn't understand. |
@shelby3 this paper seems relevant: http://arxiv.org/pdf/1606.01106v1.pdf Edit: Reading the paper, it looks like they run into the usual problem of needing to have whole-program compilation. |
@shelby3 Okay, lets say I accept union types are a good idea because they enable concise code (and the paper I linked to above agrees with you on that point). How do they solve the expression problem. I think I need to see what the 'expression evaluation' example I provided above looks like in your system. To recap, I want to have some objects like I want to be able to add a new operator (object) like I want to be able to add a new function, for example Finally when writing the code, you do not know what expression I will want to evaluate or print, as I am the user of the library. What would this look like with your solution? |
@keean wrote:
Yeah for example I quote from the paper, “it is still not possible to form unions of values of generic types, but just finite enumerations of tagged values”. Interestingly I wrote the similar point on the Rust forum in May, as quoted from the paper, “subtyping And my discovery and publish in the Rust forum in May apparently predates (is prior art for) theirs. I was pushing for union types to improve the polymorphism/subtyping/heterogeneous-parametrization issue since at least 3 years ago. As they said, it is not that novel of an idea. The novel aspect is how I combined it with typeclasses to form a complete solution to the Expression Problem, which is out-of-the-scope (not considered by) that paper you cited. |
Ill have a go...
This solution in Rust seems quite good, you can add new operators without touching the above code:
And I can add new functions without touching the above code:
So we don't need any unions so far... but I think I can see where they come in. I can do the following without unions:
But that is all statically determined, next post I am going to see if I have got the union bit right. |
@keean wrote:
Rust's trait objects allow you to do that because they allow polymorphism for adding new data type variants such as But creating a trait object with trait bound My solution instead retains the typing information about the data types in the structural union (instead of assigning to a trait object), so that the instances of the said structural union can be passed as input to any function with any trait bound (and of course we provide implementations for that said trait bound for all data types listed in the said structural union, but these don't require any recompilation of preexisting code). Think about this way. Typeclasses (e.g. Rust traits) differ from subclassing because they delay binding of implementation of interface until the function call site. But that only applies for the monomorphism case. Trait objects bind the implementation to the interface at the time of assignment to the trait object, which is premature. My solution extends delayed binding of implementation to the function call site for the polymorphism case. Hopefully it clicks now for you? Did you have that "A ha" moment now? Again note that trait objects don't require invariant collections, so they have a use-case where they are an advantage. My solution requires invariant containers (e.g. |
So I think where I need the unions is in something like a parser:
So But with a parser its all going to go horribly wrong because we will need recursive types, and at every parse decision we will double the number of terms in type. Consider the type signatures of the parser functions
So we will also need recursive types, and be able to do something like:
Is that going to work? Can we infer the recursive types? What does it mean for dictionary passing? |
@keean wrote:
Exactly. Once you introduce polymorphism then without unions you will need to discard statically (in the source code) the My solution isn't only applicable to type parametrized containers (such as |
@shelby3 What do you think about the recursive typing issue in 'parse_expr', I think we may be able to infer it, and then it is a case of working out what dictionaries will be required in the trait-method. I think I get what is going on, and I think a parser and expressions are a good example to work with, as it is the core part of writing a compiler, which is something we are doing to implement the language. I think its time to switch to the inferencing discussion: #5 |
I don't know anything about zenscript, but union types of sufficient power can enable overlapping ADTs and functions that are polymorphic across the exact shape of those ADTs. You probably won't be able to support infinite types (surely not type inference for them), but if you used fixed point data, you don't need to (though that requires higher-kinded types to be really useful). As an example in present day Scala: case class Mul[A](left: A, right: A)
case class Sub[A](left: A, right: A)
type Coproduct[F[_], G[_], A] = Either[F[A], G[A]]
type Expr[A] = Coproduct[Mul, Sub, A]
case class Fix[F[_]](unwrap: F[Fix[F]]) You can regain a recursive definition for As for how all this would translate into zenscript (or not), I'm not sure since I only learned of this project and this issue today thanks to an IM from @shelby3. Good luck Shelby! 😄 |
@jdegoes its early days, so there's plenty of scope to influence the direction of the project :-) I have started building the compiler in JavaScript, with JavaScript as the first target. Big items to sort out are the syntax and the type system. I have been playing with type systems for a while, especially type inference, so I have a pretty good idea of how to implement it. I think the core feature set at the moment are type-classes and union types. I think we will go with that and try and rapidly get to a first prototype, so that we can start writing programs, probably self-hosting the compiler would be the obvious thing to do. Type inference for recursive types is not a problem in my experience, the type system internally works in the universe of regular-trees, and then we convert to 'Mu' types for display. For example:
|
@jdegoes wrote:
If I understand your point correctly, then we also realized that and is one of the reasons we need union types. That when combined with a novel way of resolving typeclasses for unions also enables us to truly solve the Expression Problem where apparently others haven't even they apparently erroneously claimed to.
We want to support a first-class unnamed structural union, to eliminate boilerplate and also so the inference engine can infer structural union types. We are trying to provide what we see is a missing variant in programming language offerings, especially one designed to be compiled to JavaScript (although other targets might be a possibility as well). That is a strict evaluation, no subclassing(!), typeclasses, and first-class structural unions, for greater extension capabilities in a soundly typed language that is also uncluttered with paradigms and thus reasonably simple to grasp. Also a syntax reasonably close to JavaScript. We're distilling from the best features of each language. We are building this because @keean and I decided we wanted to code (or at least experiment) in a language like this. And we see a gap in the market that is not filled. Kudos thus far to @keean on doing the most on implementation thus far. So far, we also have some interest expressed from @SimonMeskens and @svieira (both of whom I recently met via discussions at the TypeScript Github). Thanks John for visiting our nascent project. |
@shelby3 I am closing this issue because I think we are decided that anonymous structural unions should be in the language in some form (at least rank1). |
@keean can you please edit the close to add a Label that indicates a favorable resolution? |
I think I want flow typing and cleanest solution to |
I think "flow typing" is what I call path-dependent typing, and if so, then yes it is something I want too. Regarding null I think that makes sense. Null is effectively the unit type (void), and can be inferred into any union type. There is a small problem with our type class approach, because normally we would want all types in the union to implement the type classes required by a function, when null will not implement all (if any) of them. The solution is to force the programmer to explicitly provide a type-case for Null if null does not implement the type-class required for a function. This is really a very simple axiom, and in the future we might allow more complex axioms, like inequality constraints on argument types. This will feel almost exactly like dependent typing except we never depend on types at runtime, instead we force the user to statically insert a check value. This is really the combination of "path dependent types" and "refinement types". |
I notice Ceylon always requires the type parameters, as in:
I am sure we are okay to infer them, but it might cause problems with other Ceylon-like types, we just have to be careful as we expand the type system, and ideally keep it as simple as possible. |
As long as not all types can be Null, I'm fine. As far as I'm concerned, no languages should have Null at all, a union type or an empty monad is all you need. Implicit nullable types are evil. |
@shelby3 I want to discuss whether to include union types here. I would like to design a language that meets the requirements you have for extensibility, but I don't necessarily want to achieve it with unions. I am not fundamentally opposed to unions, and if they are the best way to achieve what we want then we have them, but in the interests of keeping the language as small as possible, I want to make sure we are adding something of value, that cannot be achieved in other ways. Can you provide some example use cases for union types so we can explore the possibilities?
The text was updated successfully, but these errors were encountered: