Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Remove recursive aliases #5155

Open
asterite opened this issue Oct 20, 2017 · 64 comments
Open

[RFC] Remove recursive aliases #5155

asterite opened this issue Oct 20, 2017 · 64 comments

Comments

@asterite
Copy link
Member

Recursive aliases were introduced when we were implementing JSON parsing. We needed a type that could be Nil, Bool, Int64, Float64, String or an array of those same things, recursively, or a hash with values of those same things, recursively. So we decided that having a type that could reference itself in its definition was the way to go.

That works, but it has two disadvantages:

  1. We can't define methods on this new type, because we can't define methods on aliases
  2. The language already has a way to express recursive types!

For example, the current definition of JSON::Type is:

module JSON
  alias Type = Nil | Bool | Int64 | Float64 | String | Array(Type) | Hash(String, Type)
end

We can express that same thing using a struct:

module JSON
  struct Type
    @value : Nil | Bool | Int64 | Float64 | String | Array(Type) | Hash(String, Type)

    def initialize(@value)
    end
  end
end

Or simpler using the record macro:

module JSON
  record Type, value : Nil | Bool | Int64 | Float64 | String | Array(Type) | Hash(String, Type)
end

In terms of memory representation and performance it's exactly the same: a recursive alias is a union, and as such it's represented as a struct.

Now, when working with a recursive alias such as JSON::Type we can simply cast a value to an integer using json.as(Int64). With the struct approach we have to do json.value.as(Int64). But since we don't like those casts, we have JSON::Any in the standard library which wraps the JSON::Type alias. Sounds familiar? We could have simply never used a recursive alias in the first place! Just make JSON.parse return a JSON::Type that is already a struct whose value is the union of possible types (recursively). Want to get the raw value? Call value. Similar to JSON::Any.

That said, I believe we can safely remove recursive aliases from the language. They are not an essential feature, and in fact using a struct is better because we can add methods to them. And removing them will also simplify the compiler's code.

@asterite
Copy link
Member Author

And in fact, this also solves another problem that's a source of confusion and complaint with JSON::Any. Right now we have: JSON::Any#to_a which assumes the raw value is an Array and returns it as such. But this Array is of JSON::Type, not of JSON::Any, so we can't invoke to_a on them.

Using the struct approach, the elements are themselves JSON::Type which will provide these methods. So much simpler and powerful!

@paulcsmith
Copy link
Contributor

Nice thinking. This looks great

@lbguilherme
Copy link
Contributor

At first I immediately didn't like the proposal because I use recursive type aliases a lot in a personal toy project (https://github.com/lbguilherme/rethinkdb-lite) and it will break badly. But looking carefully, this looks great. I wasn't aware that an struct could have a self-reference like this, nice!

One issue I see a lot is taking an Array(String) and turning this into, say, JSON::Type. This is what you need to do now:

x = ["a", "b"]
x.map(&.as(JSON::Type)).as(JSON::Type)

And for the future:

x = ["a", "b"]
JSON::Type.new(x.map {|e| JSON::Type.new(e) })

But JSON::Type could have a constructor taking any Array and doing the map in the constructor, so this would be reduced to just JSON::Type.new(x), which I like a lot.

+1 for this change. The refactoring will result in better code after it.

@asterite
Copy link
Member Author

@Papierkorb @oprypin Care to share why you 👎 ? I'm honestly interested in what you have to say about this (I only used recursive aliases in JSON and YAML and in these cases I think they are not very useful, but maybe there are other use cases I'm missing).

In the meantime, I will try to refactor JSON (and probably YAML) to use this approach and see how it goes, if the API ends up being nicer.

@refi64
Copy link
Contributor

refi64 commented Oct 21, 2017

I wasn't aware that an struct could have a self-reference like this, nice!

This is kind of guesswork, but I think the key here is in the union; IIRC without it, recursive structs wouldn't be supported, since they would take infinite memory... but it works here because the union adds a pointer indirection.

@yxhuvud
Copy link
Contributor

yxhuvud commented Oct 21, 2017

kirbyfan64: It works here because the struct isn't directly self-recursive, it contains itself only by way of Array or Hash intermediaries, both which of do add indirection.

@Papierkorb
Copy link
Contributor

Papierkorb commented Oct 21, 2017

Comments on the proposal text

in fact using a struct is better because we can add methods to them

I don't see this as advantage, actually it's a disadvantage:

I can add methods to them. I don't have to. The issue is that I as reader of source can't know this, and have to subsequently look it up. Reading an alias is fast, it's only a single line with a restrictive syntax. Reading a struct is by very definition more complex. Thus, not being able to add a method if not needed is an advantage.

And removing them will also simplify the compiler's code.

The language serves the usual programmer, not the compiler developer. A compiler is not the common program. I think the presented reasoning is harmful in that it might set a dangerous precedence of throwing stuff out that doesn't make sense for only the compiler, while everyone benefits from it.

I'm also missing discussion of the bigger picture in this RFC.

The purpose of recursive union types

I think we should not discuss throwing out features at random without actually discussing what we actually were trying to achieve in the first place. It is correct that I wasn't there right from the start, so I have to do educated guesses.

What we want is a way to store data of an union type inside a container for that very union type itself. It's basically what JSON::Type is.

Issues with the proposed syntax

The biggest issue is that your proposal changes the whole usage semantics of recursive aliases. The issue isn't even the struct (I regard that as implementation detail). But that I now have to use #value to get the data out of it. This in turn clutters the users code with .value for no gain on their end.

Possible solutions

The current solution is using an recursive alias. You propose making the user manually write a struct instead.

Without knowing the compiler source, I see a few other solutions countering the two main issues: Compiler complexity first, then the verbose struct-as-alisa syntax second:

  1. Allow for recursive structures using the current syntax. The compiler can detect if it's recursive, and if so, rewrite it to a struct internally (Also with automatic #value unpacking). Then the compiler could rely on the existing, reliable struct code emission. Hopefully, this would reduce overall complexity, focusing this language feature in a single place instead of all-over-the-place.
  2. If my assumption in 1. is wrong, at the very least offer a recursive-alias-struct macro, and market it as such.
  3. Or of course, also an option, fix the existing code.

A short use-case study

So what's the use of recursive aliases? At least for me, the following is a common pattern:

def visit_recursively(value : JSON::Type)
  case value
  when String then visit_string(value)
  when Array(JSON::Type) then visit_array(value)
  # And so on ...
end

Observations:

  1. I assume that for many users, value is a popular (as fitting) name for the variable. Having to write case value.value is more confusing than simple case value.
  2. With the struct, you'd actually have to write: case v = value.value. How ugly is this?
  3. A struct isn't a union type of course. So sometimes, case f would be fine, other times case g = f.value is required. Confusing?

Another possible solution

I see one possibility of using the struct solution, while getting rid of the #value spam. However, this would require changes to a different part of the language:

  1. Make Object#as(T) overridable.
  2. Make Object#===(T : Class) overridable.
  3. No change to Object#is_a?.
  4. Alternatively, add Object#like_a?(T), checking if #as(T) exists and if #===(T) would return true. Does appropriate type restrictions in the following body, just like #is_a?.

The algorithm when encountering an (implicit) value type check would now be as follows:

  1. Encounter a case condition checking for a type
  2. Check if Object#=== returns true, and that Object#as(T) exists
  3. Implicitly call #as(T) on the value and retype its type restriction in the inner block as is done today
  4. Else, do all other already existing checks

If #like_a? is accepted, the algorithm would be:

  1. Encounter a case condition checking for a type
  2. Check if #like_a?(T) is true
  3. Do the re-typing

This would allow the user (the stdlib) to write recursive alias-structs without spamming other code with it. This also elevates user types to first class citizens for the compilers type deduction.

Conclusion

This is a hard topic. "Just throw it out" is in my opinion too short-sighted and harmful. In the end, you have to pick your poison: To diverge from what made Crystal great by keeping boilerplate to the absolute minimum, or to make the language more powerful.

@asterite
Copy link
Member Author

@Papierkorb I understand your reasoning.

When I say "it will simplify the compiler" I also meant "it will simplify the language". Removing a feature that is essentially a duplicate of another feature (the way I see it) is better: less things to learn, less ways to do a same thing.

As far as I know, no such feature exists in other programming languages. I'm pretty sure that's because it's not needed.

I personally don't see writing json.value as something that's so terrible. Plus you can have methods to extract the value from it (in fact JSON::Any has this right now) so you can do:

if string = json.as_s?
  # string
elsif array = json.as_a?
  # array
end

and of course you can also do this, as you say:

case value = json.value
when String
  # ...
when Array
 # ...
end

Comparing having to write value = json.value vs. adding a feature that's complex to implement and understand (maybe not for you or for me, but it is tricky), that's bug prone (for example you can define a recursive without a base type and that crashes the compiler, and I don't know yet how to detect that, plus having to repeat the logic of detecting an infinitely expanded recursive alias as a struct) and that simplifies the language (less concepts to learn), I prefer having to write that. I mean, if we remove the feature you will have to write maybe 10 more chars in an entire program. I don't think that's a reason to keep a feature.

In any case, as I said, I'll try to refactor the std to not use recursive aliases at all and see how it looks and feels like. Maybe then I'll have a better opinion on this.

@oprypin
Copy link
Member

oprypin commented Oct 21, 2017

As far as I know, no such feature exists in other programming languages. I'm pretty sure that's because it's not needed.

No, it's because it's not possible. If you removed union types, on the other hand, then it would not be needed. And oh boy how much simpler the compiler would become!

@Papierkorb
Copy link
Contributor

As far as I know, no such feature exists in other programming languages. I'm pretty sure that's because it's not needed.

I don't know if any other static-typed language has this feature, or a feature like this. I just want to point out that this may be a fallacy. I think the type system of Crystal is kinda unique, and with that, unique solutions to problems can (and should!) evolve.

@asterite
Copy link
Member Author

@oprypin I actually believe that removing union types from the languages would be something good. I know it's one of the things that make Crystal unique, but I now think algebraic data types are better, and probably just nullable types (not union of X and Nil). But it's probably too late to change all of that now.

@Papierkorb
Copy link
Contributor

Without union types, I wouldn't be using Crystal at this point. It also is the most popular (most liked) feature when the question "What do you like the most in Crystal?" comes up every other week in the chat channel.

Functional programming is nice. But not that nice. Ruby showed that copying the useful parts from FP while maintaining a OOP language is not only feasible, but actually good. Crystal simply improves upon this, and for good reason.

@yxhuvud
Copy link
Contributor

yxhuvud commented Oct 21, 2017

As far as I know, no such feature exists in other programming languages. I'm pretty sure that's because it's not needed.

Well, one alternative explanation is that it is hitting the boundaries on where the research frontier is.

http://www.cl.cam.ac.uk/%7Esd601/papers/mlsub-preprint.pdf made a fairly large splash when it came earlier this year and is an example of a paper that is investigating the general area of recursive typing and hindley-milner. It has an example implementation at https://github.com/sweirich/hs-inferno - see the tests for examples.

@faustinoaq
Copy link
Contributor

I think without recursive alias y-combinator looks quite difficult to achieve:

alias T = Int32
alias Func = T -> T
alias FuncFunc = Func -> Func
alias RecursiveFunction = RecursiveFunction -> Func

fact_improver = ->(partial : Func) {
  ->(n : T) { n.zero? ? 1 : n * partial.call(n - 1) }
}

y = ->(f : FuncFunc) {
  g = ->(r : RecursiveFunction) { f.call(->(x : T) { r.call(r).call(x) }) }
  g.call(g)
}

fact = y.call(fact_improver)
fact = fact_improver.call(fact)
pp fact.call(5) # => 120

https://carc.in/#/r/2xpy

https://stackoverflow.com/questions/45237446/recursive-proc-in-crystal

I almost got it but failed 😅

record T, value : Int32 do
  def *(other)
    self.value * other.value
  end
end
record Func, value : T -> T
record FuncFunc, value : Func -> Func
record RecursiveFunction, value : RecursiveFunction -> Func

fact_improver = ->(partial : Func) {
  ->(n : T) { n.value.zero? ? 1 : T.new(n.value) * partial.value.call(T.new(n.value - 1)) }
}

y = ->(f : FuncFunc) {
  g = ->(r : RecursiveFunction) { f.value.call(->(x : T) { r.value.call(r.value).value.call(x.value) }) }
  g.value.call(g)
}

fact = y.value.call(FuncFunc.new(fact_improver))
fact = fact_improver.value.call(Func.new(fact))
pp fact.value.call(T.new(5)) # => 120

https://carc.in/#/r/2xq3

WDYT @veelenga ?

@ghost
Copy link

ghost commented Oct 22, 2017

the Y combinator can be used to formally define recursive functions in a programming language that doesn't support recursion.

So why would you write such code in crystal?

@faustinoaq
Copy link
Contributor

So why would you write such code in crystal?

@monouser7dig Not use case at all, just wanted to make a y-combinator 😅 and @veelenga gave me a brilliant solution using recursive alias 😉

@ghost
Copy link

ghost commented Oct 22, 2017

@faustinoaq ok I thought this was supposed to be an argument for keeping rekursive aliases and that surprised me

I have no strong opinion on the recursiveness, I'd just like to have usable typeoptions instead of alias

@shelvacu
Copy link

Thinking about this, as long as performance is the same (it is in this case), I mostly don't care what happens behind the scenes. All I care about is the syntax.

I'm okay with this change:

#before
alias Type = Nil | Bool | Int64 | Float64 | String | Array(Type) | Hash(String, Type)

#after
record Type, value : Nil | Bool | Int64 | Float64 | String | Array(Type) | Hash(String, Type)

But I do not like this change:

#before
def foo(thing : Type)
  case thing
  when Float
    #do stuff
  when Bool
    #do stuff
  #etc..
  end
end
foo(123)

#after
def foo(thing : Type)
  case (t = thing.value)
  when Float
    #do stuff
  when Bool
    #do stuff
  #etc..
  end
end
foo(Type.new(123))

Is there no way to keep the same syntax while changing things behind the scenes?

@asterite asterite reopened this Jun 30, 2018
@asterite
Copy link
Member Author

Should we go forward with this? The standard library doesn't use recursive aliases anymore, and it seems others are finding wrappers like JSON::Any more useful than recursive aliases (cc @straight-shoota).

I can send a PR to remove this feature from the language.

After that, I can probably implement generic aliases (it's much easier once recursive aliases aren't possible).

@RX14
Copy link
Contributor

RX14 commented Jun 30, 2018

If removing recursive aliases gets us generic aliases, i'm all for it.

@sdogruyol
Copy link
Member

Anything for better generics 👍 Let's go for it @asterite 💯

@bcardiff
Copy link
Member

bcardiff commented Jul 2, 2018

I hope recursive alias would come back eventually. I think they are easier to think about them than the indirection of a wrapper struct.

But I understand there is no clean solution right now with recursive aliases and nested values (the now old JSON::Type vs JSON::Any in constructor issue). Since I don't see an easy way to solve that for the time being, I accept dropping recursive aliases.

@bararchy
Copy link
Contributor

I'm with @straight-shoota on this, in our code base the change from JSON::Type to JSON::Any improved the code, the readability and the way we loop around it and parse.

Let's not take a step back, and instead improve upon JSON::Any

@j8r
Copy link
Contributor

j8r commented Jul 17, 2018

What do you think of this gist?
This merges duplicate code of JSON::Any and YAML::Any by using a macro, and add the ability to search keys:

js = JSON.parse %({"a": {"b": "c"}, "e": [{"o": "p"}, 12, 2]})

p js[["a", "r"]]?    #=> nil
puts js[["e", 0, "o"]] = JSON::Any.new "test"
puts js        #=> {"a" => {"b" => "c"}, "e" => [{"o" => "test"}, 12_i64, 2_i64]}
js.delete ["e",0, "o"]
puts js          #=> {"a" => {"b" => "c"}, "e" => [{}, 12_i64, 2_i64]}

If the community like this, I will continue for []= and delete, and then creating a PR.
I really need it. If it can't be in the stdlib it will be in a shard.
Work also for yaml, and #[Enumerable], #[Enumerable]= and #delete(Enumerable) are implemented.

Of course this lead to less efficient code, but we have no ther choice when dealing with dynamic documents - jq is a perfect example.

@straight-shoota
Copy link
Member

straight-shoota commented Mar 20, 2019

Recursive aliases continue to introduce bugs into Crystal code (latest example: #7567). They're buggy and should better not be used at all. We've gotten rid of using them in the stdlib already.

I'd suggest to remove them from the language. It's either that or fixing them.
If we want to try to implement JSON & co deserialization based on recursive aliases (and I don't see we're getting there), it would require fixing recursive aliases first anyway. So we don't really lose anything if we remove them now and maybe decide adding a complete implementation at some point in the future. But the immediate benefits would be a simplification of the language and compiler and keeping people from running into bugs caused by recursive aliases.

@asterite
Copy link
Member Author

I'm all for this and I could do it in under one day, after all it's just removing code. But I'd like @bcardiff and @RX14 to agree about this too.

@jgaskins
Copy link
Contributor

I agree strongly with @Papierkorb's comment here and @shelvacu's comment here in favor of keeping recursive aliases and fixing their implementation. If that's not feasible, I'm with @bcardiff here in that, if they are removed, I hope we can bring them back at some point because of how much easier they make thinking about my program's types.

The struct indirection certainly has a lot of value, which has been shown in this thread, but I don't feel like the recursive alias is just a different way to do the same thing, but more that they are both great for different use cases. The struct is awkward to use in some places where a recursive type does well — this has also been shown in this thread. One thing I've noticed in building apps with Crystal is that if something feels wonky, there's probably a better abstraction you could be using.

@straight-shoota
Copy link
Member

@jgaskins Yes, the only reason to remove recursive aliases is because they're broken and there is no perspective to fix them anytime soon. This is not meant as a decision against having recursive aliases in the language.

@BrucePerens
Copy link

The @asterite method using Struct doesn't work in generics. I can only get it to work in macros. This is as good as I could do this evening.

# Implementation of a pseudo-generic that contains itself without use of
# recursively-defined aliases, which are problematical and/or broken
# in the compiler.
macro recursive_hash(name, keytype, valuetype)
  struct Type_%a
    property value : {{valuetype.id}}
    def initialize(@value)
    end
  end
  
  class {{name.id}}
    @h = Hash({{keytype.id}}, Type_%a).new

    def []=(key, value)
      @h[key] = Type_%a.new(value)
    end
  end
end

recursive_hash(MyHash, Symbol, String|MyHash|Array(MyHash))

h = MyHash.new
h[:itself] = h
h[:array] = [MyHash.new]
p h.inspect

@asterite
Copy link
Member Author

@BrucePerens What's your use case?

@BrucePerens
Copy link

The initial implementation of a shard for generics that can contain themselves is at https://github.com/BrucePerens/recursive_generic .
It's not ready for your use yet.

@rubyFeedback
Copy link

I'd like simple aliases like "def foo; end; alias bar foo"!

@Blacksmoke16
Copy link
Member

Blacksmoke16 commented Apr 16, 2021

@rubyFeedback FWIW this issue is about recursive type types, not methods. E.g. (non recursive example tho):

alias BigInt = Int64

def double(value : BigInt)
  value * 2
end

See https://crystal-lang.org/reference/syntax_and_semantics/alias.html and also https://github.com/crystal-lang/crystal/wiki/FAQ#why-are-aliases-discouraged.

@HertzDevil
Copy link
Contributor

Having worked with the compiler's code I can tell that recursive aliases are a pain to deal with because they are, well, recursive aliases to non-recursive types, instead of non-recursive aliases to recursive types. Proper inference for recursive types is probably needed to avoid false positives like this:

x = 1
y = pointerof(x)
x = Pointer(Int32).null
# Error: recursive pointerof expansion: Pointer(Int32), Pointer(Int32 | Pointer(Int32)), ...

I have had this rough idea where all types in Crystal are recursive during type inference, and their actual types are defined to be the least fixed point of an equivalent alias expression, formed by repeated substitution. This gives:

alias T1 = T1                # => NoReturn
alias T2 = Int32 | T2        # => Int32 | Int32 | ... = Int32
alias T3 = Int32 | Array(T3) # => Int32 | Array(Int32 | Array(...)),
                             #    *not* Int32 | Array(Int32) | Array(Int32 | Array(Int32)) | ...

Every node initially has the type alias T = T where T is unique to that node, and ASTNode#bind_to essentially adds a new type to the right hand side of the above alias as a union variant. This allows us to type the following snippet:

# x : alias T1 = Int32 | T1
x = 1

while rand < 0.5
  # y : alias T2 = Array(T1) | T2
  y = [x]

  # x : alias T3 = T2 | T3
  x = y

  # in a while loop the `x` before the loop must be merged with the last `x` inside
  # x : alias T1 = Int32 | T3 | T1
end

# we have:
#
# T2 = Array(T1) | T2
#    = Array(T1)
#
# T3 = T2 | T3
#    = T2
#    = Array(T1)
#
# T1 = Int32 | T3 | T1
#    = Int32 | T3
#    = Int32 | Array(T1)
#    = Int32 | Array(Int32 | Array(Int32 | Array(...)))

Going back to the pointer example, the type of pointerof(var) is the union of Pointer(T) for every type T ever observed by the variable var. We have:

x = 1                   # x : alias T1 = Int32 | T1
y = pointerof(x)        # y : alias T2 = Pointer(T1) | Pointer(T3) | T2
x = Pointer(Int32).null # x : alias T3 = Pointer(Int32) | T3

# T1 = Int32
# T3 = Pointer(Int32)
# T2 = Pointer(T1) | Pointer(T3) | T2
#    = Pointer(Int32) | Pointer(Pointer(Int32))

The corresponding recursive case is:

x = 1            # x : alias T1 = Int32 | T1
y = pointerof(x) # y : alias T2 = Pointer(T1) | Pointer(T3) | T2
x = y            # x : alias T3 = T2 | T3

# T1 = Int32
# T3 = T2
# T2 = Pointer(T1) | Pointer(T3) | T2
#    = Pointer(T1) | Pointer(T2)
#    = Pointer(T1) | Pointer(Pointer(T1) | Pointer(Pointer(T1) | Pointer(...)))

This requires a complete rewrite of the bind_to logic, but IMO it actually makes type inference easier to reason about, and gives us working recursive aliases at the same time.

@beta-ziliani
Copy link
Member

IIUC, @HertzDevil you're saying that every type is recursive until proven otherwise? Isn't that dangerous from the point of view of performance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests