Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate union types #10693

Closed
agluszak opened this issue Dec 7, 2020 · 19 comments · Fixed by #20027 · May be fixed by #16312
Closed

Deduplicate union types #10693

agluszak opened this issue Dec 7, 2020 · 19 comments · Fixed by #20027 · May be fixed by #16312

Comments

@agluszak
Copy link

agluszak commented Dec 7, 2020

Minimized code

def test[A, B](a: A, b: B): A | B = a                                    
val v0 = test("string", 1)
val v1 = test(1, "string")
val v2 = test(v0, v1)
val v3 = test(v1, v0)
val v4 = test(v2, v3)
val v5 = test(v3, v2)
val v6 = test(v4, v5)

Output

val v0: String | Int = string
val v1: Int | String = 1
val v2: String | Int | (Int | String) = string
val v3: Int | String | (String | Int) = 1
val v4: String | Int | (Int | String) | (Int | String | (String | Int)) = string
val v5: Int | String | (String | Int) | (String | Int | (Int | String)) = 1
val v6: String | Int | (Int | String) | (Int | String | (String | Int)) | (Int | String | (String | Int) | (String | Int | (Int | String))) = string

Expectation

I'd expect all vN values to have type String | Int. Initially I thought that only printing in the REPL was affected, but compilation time also grows exponentially.

@abgruszecki abgruszecki changed the title Union type exponential explosion Deduplicate union types Dec 8, 2020
@abgruszecki
Copy link
Contributor

I'd expect our compilation time to be worst-case exponential anyway, this is just another way to trigger worst-case behaviour. I believe one can do similar things with tuples.

Arguably the example is almost abusive, though we clearly could do slightly better when instantiating unions of type parameters.

@odersky
Copy link
Contributor

odersky commented Dec 26, 2020

Any concrete ideas how to do better that do not increase compilation times for normal cases?

@abgruszecki
Copy link
Contributor

We discussed this during the lab meeting. One thing which seems feasible to do is to "deduplicate" union types when instantiating polymorphic definitions. We could flatten union types, and then compare all union members and remove those that have the same identity. We don't want to do subtype comparisons b/c of perf concerns. The implementation probably should modify the .simplify method for simplifying unions.

@tpolecat
Copy link

tpolecat commented Feb 5, 2021

Another example

scala> ("a","b").toList
val res0: List[String | (String | Nothing)] = List(a, b)

@griggt
Copy link
Contributor

griggt commented Feb 5, 2021

scala> (1,2,3).toList
val res0: 
  List[Int | (Int | (Int | 
    scala.Tuple.Fold[scala.Tuple$package.EmptyTuple.type, Nothing, 
      [x, y] =>> x | y
    ]
  ))] = List(1, 2, 3)

@tpolecat
Copy link

tpolecat commented Feb 5, 2021

ok that is sweet

@soronpo
Copy link
Contributor

soronpo commented Feb 5, 2021

Maybe the examples should also cover commutative and distributive with &.
X & Y | Y & X
X & (Y | Z) | (X & Y | X & Z)

@johnynek
Copy link

johnynek commented Feb 6, 2021

In this example: #10693 (comment) we are also missing, what I expect is a law, that A | Nothing == A. Similarly, A | Any = Any.

@Swoorup
Copy link

Swoorup commented Feb 6, 2021

inline def tupleValues[A <: Tuple]: A = 
    inline erasedValue[A] match
        case _ : (t *: ts) => 
            summonInline[t *: ts <:< A](valueOf[t] *: tupleValues[ts])
        case _ : EmptyTuple => 
            summonInline[EmptyTuple <:< A](EmptyTuple)

case class TradeEvent()
case class OrderEvent()

type Trades = "trades"
type Orders = "orders"
type Channel = Trades | Orders

type EventSelector[C] = C match 
  case Trades => TradeEvent 
  case Orders => OrderEvent 

type SelChannel[C] = C match 
  case x *: xs => EventSelector[x] | SelChannel[xs]
  case EmptyTuple => Nothing

inline def subscribe[Channels <: Tuple](using Tuple.Union[Channels] <:< Channel): List[SelChannel[Channels]] = 
  // val channels: List[Channel] = tupleValues[Channels].toList.asInstanceOf[List[Channel]]
  val channels: List[Channel] = tupleValues[Channels].toList
  val s = List(TradeEvent(), OrderEvent())
  s.asInstanceOf[List[SelChannel[Channels]]]

The line

val channels: List[Channel] = tupleValues[Channels].toList

gives the error

Found:    List[
  Channels match {
    case EmptyTuple => Nothing
    case 
      [h, t <: Tuple] =>> 
        scala.runtime.MatchCase[h *: t, h | 
          scala.Tuple.Fold[t, Nothing, [x, y] =>> x | y]
        ]
  }
]
Required: List[Channel]

where:    Channels is a type in method subscribe with bounds <: Tuple

Wonder that should just work too?

@dwijnand
Copy link
Member

dwijnand commented Feb 6, 2021

scala> Tuple1(1).toList
val res0: List[Int | Nothing] = List(1)

@bjornregnell
Copy link
Contributor

bjornregnell commented Apr 12, 2021

scala> (1,2,"A").toArray
val res3: Array[Object] = Array(1, 2, A)

scala> (1,2,"A").toArray : Array[Int | String]
1 |(1,2,"A").toArray : Array[Int | String]
  |^^^^^^^^^^^^^^^^^
  |Found:    Array[Object]
  |Required: Array[Int | String]

scala> Array[Int | String](1,2,"A")
val res4: Array[Int | String] = Array(1, 2, A)

But:

scala> (1,2,"A").toList : List[Int | String]
val res5: List[Int | String] = List(1, 2, A)

@bjornregnell
Copy link
Contributor

Any concrete ideas how to do better that do not increase compilation times for normal cases?

@odersky Can there be some special "hack" for unions with only two or three or some n < N number of types in the union so that it at least works for the most common cases that humans come up with even if it does not behave for zillions of types? (I'd assume, admittedly without hard empirical evidence, that we mostly create unions of a small number of different types, and a special hack for those could perhaps make it tractable without in general increasing compilation speed?)

@odersky
Copy link
Contributor

odersky commented Sep 4, 2021

@bjornregnell Unfortunately, the "union types are usually small" hypothesis does not hold up. Programs often traverse largish ASTs representing business data; these could have dozens or hundreds of cases.

@bjornregnell
Copy link
Contributor

bjornregnell commented Sep 4, 2021

Aha. Thanks. And I guess my "hack for only small cases"-proposal will not work even for simple cases that humans produce? Like in the examples above...

@odersky
Copy link
Contributor

odersky commented Sep 4, 2021

It's just deeply inelegant to have the rules change at an arbitrary small constant. Also, what should the constant be?

@bjornregnell
Copy link
Contributor

bjornregnell commented Sep 4, 2021

There already seems to be a special case for the constant N = 2, given by my old example here https://contributors.scala-lang.org/t/making-union-types-even-more-useful/4927/17 included here again for convenience (sorry for long code but have not found an elegant minimization):

$ scala -version
Scala compiler version 3.0.1 -- Copyright 2002-2021, LAMP/EPFL
$ scala
scala> sealed abstract class Empty                                                                                                          
      object Empty extends Empty:
        override def toString = "Empty" 

scala> extension [T](x: T | Empty)     
        def get: T = x match
          case Empty => (throw new NoSuchElementException("Empty.get")).asInstanceOf[T] 
          case _ => x.asInstanceOf[T]

scala> val ie: Int | Empty = 42                                                                                                             
val ie: Int | Empty = 42

scala> val ise: Int | String | Empty = 42                                                                                                   
val ise: Int | String | Empty = 42

scala> ie.get                                                                                                                               
val res0: Int = 42

scala> ise.get                                                                                                                              
val res1: Matchable = 42

For n == 2 it gives me a precise type, but for n > 2 it does not. But maybe that's different? (Sorry for naïvely starting to discuss this again even if I have little or bad understanding of the underlying theory; I just find union types to come with the hope of so nice "low boilerplate", but often get me into corners that bring boilerplate back at use site... but maybe that's just the laws of mathematical "nature" that I need to learn to live with :) )

@bjornregnell
Copy link
Contributor

bjornregnell commented Sep 4, 2021

It's just deeply inelegant to have the rules change at an arbitrary small constant. Also, what should the constant be?

I'd vote for a pragmatical N = 22. Deeply inelegant, indeed - but not unheard of. 😄

@odersky
Copy link
Contributor

odersky commented Sep 4, 2021

And I was so relieved that we are finally rid of it! 😜

mucciolo referenced this issue in mucciolo/dotty Nov 8, 2022
OrType simplification enhancements:
- member deduplication without subtype comparisons by the means of object equality/hash code
- extended to member of type AppliedType which could be left unsimplified when in a union (Tuple.Union)
mucciolo added a commit to mucciolo/dotty that referenced this issue May 10, 2023
@LucySMartin
Copy link

Made a basic attempt of this - any feedback would be great:
#20014

odersky added a commit to dotty-staging/dotty that referenced this issue Mar 26, 2024
Simplify had some elaborate condition that prevented hard union types to be
recomputed with a lub. I am not sure why that was. In the concrete scenario
of i10693.scala, we had an explicitly union result type `B | A` where `A` and `B` are
type parameters. So that is a hard union type. Then `A` was instantiated by `Int | String`
and `B` was instantiated by `String | Int`. Re-forming the lub of that union would
have eliminated one pair, but since the union type was hard tyat was not done. On the
other hand I see no reason why hard unions should not be re-lubbed. Hard unions are
about preventing the widening of or types with a join. I don't see a connection with
avoiding re-lubbing.

Fixes scala#10693
smarter added a commit that referenced this issue Apr 4, 2024
Simplify had some elaborate condition that prevented hard union types to
be recomputed with a lub. I am not sure why that was. In the concrete
scenario of i10693.scala, we had an explicitly written union result type
`B | A` where `A` and `B` are type parameters. So that is a hard union
type. Then `A` was instantiated to `Int | String` and `B` was
instantiated to `String | Int`. Re-forming the lub of that union would
have eliminated one pair, but since the union type was hard that was not
done. On the other hand I see no reason why hard unions should not be
re-lubbed. Hard unions are about preventing the widening of or types
with a join. I don't see a connection with avoiding re-lubbing.

Fixes #10693
@Kordyjan Kordyjan added this to the 3.5.0 milestone May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment