Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dig deeper on avoidance #8900

Closed
odersky opened this issue May 7, 2020 · 17 comments · Fixed by #14026 or #15746
Closed

Dig deeper on avoidance #8900

odersky opened this issue May 7, 2020 · 17 comments · Fixed by #14026 or #15746
Assignees
Labels
area:typer itype:soundness Soundness bug (it lets us compile code that crashes at runtime with a ClassCastException)
Milestone

Comments

@odersky
Copy link
Contributor

odersky commented May 7, 2020

#8867 fixed a tricky avoidance problem #8861 by introducing new techniques for preventing closure types from capturing variables they are not supposed to see.

There are several areas where our understanding of this issue is still incomplete. It would be good to follow this up with test cases, code fixes and possibly formalizations, as the subject matter is extremely slippery and the requirements for type inference are extremely stringent.

I left in the code base of #8867 several comments starting with

// AVOIDANCE TODO

This should be followed up.

@smarter
Copy link
Member

smarter commented Jun 29, 2020

Here's one problematic case:

class Inv[T](val elem: T)
object Test {
  def unwrap[Outer](inv: Inv[Outer]): Outer = inv.elem
  def wrap[Inner](i: Inner): Inv[Inner] = new Inv(i)

  val a = unwrap({
    class Local
    val local = new Local
    wrap(local)
  })
}

The inferred type for a is Local, even though it is out of scope at that point. There's at least two reasons the avoidance doesn't work here currently:
https://github.com/lampepfl/dotty/blob/50df4d236bde1f883f07e6bf279320aba7711135/compiler/src/dotty/tools/dotc/core/Types.scala#L4174-L4177

  • We check forTermRef(NoPrefix, _) but Local is a TypeRef(NoPrefix, Local), this is easy to fix.
  • The owner of Local is val a, so the nesting level check doesn't prevent the type variable of unwrap from being instantiated to Local. This one looks trickier to fix without rethinking how we compute nesting levels, suggestions welcome.

(I'm also growing skeptical about the usage of nesting levels in general: because nesting forms a tree, the fact that a symbol has a smaller nesting level does not necessarily mean it's in scope currently, it might just be in a different branch of the nesting tree, but I haven't tried to write a proof of concept for this yet).

@smarter
Copy link
Member

smarter commented Apr 28, 2021

Welp, this is going to be trickier than I thought. My plan was to run avoidance on all our open constraints whenever some symbols went out of scope, but I now realize that this isn't enough: these local symbols can have bad bounds, meaning we can derive absurd constraints from them in a way that we cannot recover from just using avoidance, for example the following crashes at runtime:

trait Base {
  type M
}
trait A {
  type M >: Int | String
}
trait B {
  type M <: Int & String
}
object Test {
  def foo[T](z: T, x: A & B => T): T = z

  def main(args: Array[String]): Unit = {
    val a = foo(1, x => (??? : x.M))
    val b: String = a // ClassCastException
  }
}

Because our constraints successively look like:

?T >: 1
?T >: x.M // because 1 <: x.M
?T >: Int & String // because avoidance picks the upper-bound of x.M, which happens to also be a subtype of its lower-bound

This particular example could be fixed by checking if the constraints we end up with after typing the lambda entail the constraints we had before, but I'm not convinced this is enough. More radically we could simply throw away all constraints computed locally, then constraint the result type of the lambda after avoidance to be a subtype of its expected type to recover constraints that are not tainted by reasoning-via-bad-bounds. This would also avoid the need for running avoidance on all constraints as I originally planned to do.

@LPTK
Copy link
Contributor

LPTK commented Apr 29, 2021

val y: String = x // ClassCastException

Why does this even type check? We have ?T >: 1 and ?T >: Int & String (which is equivalent to having ?T >: Int & String | 1). There is nothing ?T could be consistently instantiated to that would allow for ?T <: String, which is required on this line. Indeed, that would imply Int & String <: String (ok) and 1 <: String (not ok).

@smarter
Copy link
Member

smarter commented Apr 29, 2021

I tried to show that in my description of the constraints involved: We start with ?T >: 1 but then we narrow this to ?T >: x.M and not ?T >: 1 | x.M as one might expect because 1 <: x.M, after avoidance this becomes ?T >: Int & String, then instantiation picks ?T := Int & String. The flaw here is that x.M has bad bounds and was allowed to taint constraints defined outside of the scope where x is visible.

@LPTK
Copy link
Contributor

LPTK commented Apr 29, 2021

Ok, I see.

Conceptually, if you were to go with a level-based approach (which is what I've been experimenting with privately), you should not allow registering a bound x.M mentioning a local symbol x inside an outer-level type ?T. Instead, in such case of extrusion, you'd introduce an existential and register ?T :> exists x: X; x.M, i.e., ?T :> X#M, from which a lower bound can't be leveraged, according to the theory of sound type projections 😄

Anyway, I know existentials and type projections are being phased out (as opposed to fixed). This is just some intuition nuggets in case it gives you new ideas on how to solve this! I think proper type-theoretic existentials (not Scala 2 ones!) are a good way of thinking about type avoidance.

@smarter
Copy link
Member

smarter commented May 5, 2021

Conceptually, if you were to go with a level-based approach (which is what I've been experimenting with privately), you should not allow registering a bound x.M mentioning a local symbol x inside an outer-level type ?T.

I don't think this is enough in general, it seems to me that the mere existence of a value with bad bounds in scope means that you can taint your global constraints, even if that particular value never appear in those constraints.
For example, imagine that I locally know that x.M >: Int <: String and that I have a global constraint ?T <: String, then in principle I should be able to refine that constraint to ?T <: Int. In practice this doesn't work because we don't have transitivity: if I try to register that constraint, the compiler will check Int <: String and fail to find any evidence for that, but that means we're on really shaky foundations.

Thinking about it more, instead of introducing a new variable with bad bounds, we can also use the GADT logic to refine the bounds of an existing type, my first attempt was:

trait Base {
  type M
}
trait A extends Base {
  type M >: Int | String
}
trait B extends Base {
  type M <: Int & String
}

enum Foo[T] {
  case One extends Foo[String]
  case Two(val elem: A & B) extends Foo[elem.M]
}
import Foo._

object Test {
  def id[I](x: I): I = x
  def foo[T](x: Foo[T], y: T) =
    // I'd like to infer ?I = T but I get ?I = Any
    id(x match
      case One =>
        1 // infer ?I >: 1
      case Two(_) =>
        y: T // infer ?I >: T because we're reasoning with bad bounds
    )

  def main(args: Array[String]): Unit =
    val x: String = foo(One, "")
    println(x)
}

This almost worked, we did end up with the incorrect ?I >: T after typechecking each case, but the last thing we do in a match is to assign it a type based on the lub of the types of each case, and so we ended up with the safe ?I >: T | Int.

Finally, I remembered that if the expected type of a match expression is a match type, we type it with that match type instead of the lub of the cases, and I managed to produce this monstrosity which compiles (with one warning about an unreachable case) and crashes with a ClassCastException:

trait Base:
  type M

trait A extends Base:
  type M >: Int | String

trait B extends Base:
  type M <: Int & String

sealed trait Foo[T]
class One[T]() extends Foo[T]
class Two(val elem: A & B) extends One[elem.M]

object Test:
  type Const[X, Y] <: Y = X match
    case AnyRef => Y
    case Two => Y

  def id[A, I](a: A, x: Const[A, I]): I = x

  def foo[T](x: Foo[T], y: T) = // The inferred result type is `T` !
    id(x, x match
      case _: AnyRef =>
        1 // infer ?I >: 1
      case _: Two =>
        y: T // infer ?I >: T because we're reasoning with bad bounds
    )

  def main(args: Array[String]): Unit =
    val x: String = foo(new One[String], "") // ClassCastException: java.lang.Integer cannot be cast to java.lang.String
    println(x)

@smarter
Copy link
Member

smarter commented May 5, 2021

@Blaisorblade, am I correct to assume that the literature on DOT has not considered the impact of bad bounds on type inference so far?

@LPTK
Copy link
Contributor

LPTK commented May 6, 2021

if I try to register that constraint, the compiler will check Int <: String and fail to find any evidence for that, but that means we're on really shaky foundations

So in that case, the lack of transitivity would be a feature, not a bug :^)

But I see your point with GADT reasoning messing thing up in that case. If I understand your example correctly, in the Two case, we go from ?I >: 1 to ?I >: T because within that case, the new bound 1 | T is (wrongly) simplified to T, using the assumption Int <: T that does not hold outside the case. What I would say is that there's an unhealthy mixup of constraints at different levels of reasoning here. But it seems that fixing this would require a pretty extensive change in how GADT reasoning and type simplification work, unfortunately.

If I go back to my levels idea, I'd say that T within the Two case, since it is locally merged with pattern.M, should be considered at the same (nested) level as pattern, and thus it would be guarded from mixing with 1 in the inferred bound, which would be ?I >: 1 | (exists p: Foo[T] & Two; T) or ?I >: 1 | (exists p: Foo[T] & Two; p.M).

@Blaisorblade
Copy link
Contributor

@Blaisorblade, am I correct to assume that the literature on DOT has not considered the impact of bad bounds on type inference so far?

I fear you're right; all I can think of is:

  • https://doi.org/10.1145/3371077, which doesn't get that far, but @HuStmpHrrr might know better 😄
  • languages with equality reflection (NuPRL, *PRL with Jon Sterling's languages, Andromeda) would have similar problems if they did type inference, but I'm not sure they try.

@smarter
Copy link
Member

smarter commented May 6, 2021

Thanks for the references @Blaisorblade!

If I understand your example correctly, [...]

Yes, that's right!

If I go back to my levels idea, I'd say that T within the Two case, since it is locally merged with pattern.M, should be considered at the same (nested) level as pattern

I'm not sure how that would work in practice: if T already appears in constraints outside of the case, then if you locally change its nesting level, your constraints aren't valid anymore. On the other hand, you could create a local T1 and use that, but then for GADT reasoning to work, you still want any local subtype check involving T to take into account that T =:= T1, otherwise you loose expressivity, but that should lead to the same issues mentioned above with having a general transitivity property. So while I like the level idea I'm getting more convinced the "throw local constraints away" proposal I made above is really the only thing we can do here (by the way @LPTK, thanks for taking the time to respond here, it's nice to get a different perspective on this stuff!).

@HuStmpHrrr
Copy link

I don't really know better. my work on algorithmic (sub)typing is all based on removing subtyping reflection, but indeed, how to handle subtyping relfection is an interesting problem which I would like to come back someday in the future. It is not clear to me that within D<:, whether subtyping reflection is already undecidable, but with higher kinded types as in Scala, subtyping reflection itself induces combinatory logic which is readily undecidable. it will be particularly interesting if there is a syntax-directed algorithm which can solve part of subtyping reflection.

@LPTK
Copy link
Contributor

LPTK commented May 9, 2021

On the other hand, you could create a local T1 and use that

Yeah, this is the kind of things I end up doing. Create a T1 >: T <: T whose bounds are valid and can be refined further, but which is equivalent to T.

you still want any local subtype check involving T to take into account that T =:= T1, otherwise you loose expressivity

Yes, this can be done by replacing T by T1 every time T occurs directly in a constraint while we're at the nested level. We know that T1 is a local alias of T, so as long as we're within a scope where we know T1 exists, this is fine.

the "throw local constraints away" proposal I made above is really the only thing we can do here

But is this sound? I have trouble believing simply throwing away constraints is not going to create problems. For instance, what about something like f => x => { f(x); x }. Are you going to throw away the constraint that says x should be a valid argument for f?

@smarter
Copy link
Member

smarter commented May 9, 2021

Yes, this can be done by replacing T by T1 every time T occurs directly in a constraint while we're at the nested level.

Can you really? Suppose you have an existing constraint ?R <: T and locally you have T1 := Int, so you can refine that constraint to ?R := Int but that refinement is invalid outside of the current case.

or instance, what about something like f => x => { f(x); x }. Are you going to throw away the constraint that says x should be a valid argument for f?

In present day Scala this isn't an issue because arguments to lambdas are always fully instantiated before typing their body so yes :). Now if we had #9076 (which was already on shaky grounds) the situation would be more complicated indeed, so this might be a death knell for #9076 (or maybe we can still salvage some of it, I haven't spent much time thinking about it).

@LPTK
Copy link
Contributor

LPTK commented May 10, 2021

Suppose you have an existing constraint ?R <: T and locally you have T1 := Int, so you can refine that constraint to ?R := Int but that refinement is invalid outside of the current case.

So, let's say we have ?R <: T and locally we have T =:= T1 =:= Int. We try to constraint Int <: ?R to refine its lower bound, for instance, which should not work. Upon inserting a bound on the lower-level variable ?R, we first need to make sure the type is at the same level (here it's already the case, because Int is at level 0), and then I'd say we continue and propagate the bound at the level of ?R, where T =:= T1 no longer holds.

This level update upon going through lower-level type variables makes sense if you think about the fact that { val tmp = f(x); y => ... tmp ... } should type check similarly to { y => ... f(x) ... }, meaning the constraints introduced by f(x), which are independent from y, should not be "captured" as constraints local to the lambda. That is, unless y somehow has an influence on the expression – if we were to conceptually elaborate { y => ... f(x) ... } as something like { y => ... f(x: y.T) ... }, things would be quite different.

In present day Scala this isn't an issue because arguments to lambdas are always fully instantiated before typing their body so yes :)

This is not very reassuring, though. I would not be surprised at all if there were many corner cases where things can still go sideways. It does not seem like a good idea to rest the soundness of the type system on an "accidental" type inference limitation :^)

Now if we had #9076 (which was already on shaky grounds) the situation would be more complicated indeed

Well, I think the current GADT reasoning implementation, its interactions with type inference, and the propose approach of simply discarding constraints wholesale, seem to be on vastly more shaky grounds than the simple idea of inferring lambda parameter types!

In any case, this discussion raises many interesting questions!

@smarter
Copy link
Member

smarter commented May 12, 2021

and then I'd say we continue and propagate the bound at the level of ?R, where T =:= T1 no longer holds.

Ah I see, so the constraint propagation for a given variable is done in a context where more deeply nested variables aren't visible basically.

Meanwhile, I've also realized that GADT constraints can propagate outwards incorrectly even without bad bounds being present /cc @abgruszecki :

sealed trait Foo[T]
class One[T]() extends Foo[T]
class Two() extends One[Int]

class Contra[-X]

object Test:
  type Const[X, Y] <: Y = X match
    case AnyRef => Y
    case Two => Y

  def id[A, I](a: A, x: Const[A, Contra[I]]): Contra[I] = x

  def foo[T](x: Foo[T], y: T) = // The inferred result type is `Contra[1]` !
    id(x, x match
      case _: AnyRef =>
        new Contra[T] // infer ?I <: T
      case _: Two =>
        // T := Int
        new Contra[1] // Infer ?I <: 1
    )

Besides the approaches we've discussed above, it's also worth mentioning that GHC solves this problem by having implication constraints (so instead of registering ?I <: 1 you'd register ?T := Int implies ?I <: Int, see p. 35 of https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/jfp-outsidein.pdf).

@dwijnand
Copy link
Member

Congrats, @smarter !

@smarter
Copy link
Member

smarter commented Jun 1, 2022

Reopening as we had to temporarily turn off level-checking to avoid regressions due to interactions with other compiler parts that will take a while to sort out: #15343

@smarter smarter reopened this Jun 1, 2022
@smarter smarter added the itype:soundness Soundness bug (it lets us compile code that crashes at runtime with a ClassCastException) label Jun 1, 2022
@Kordyjan Kordyjan added this to the 3.3.0-RC1 milestone Jun 4, 2022
@Kordyjan Kordyjan removed this from the 3.3.0-RC1 milestone Aug 1, 2023
@Kordyjan Kordyjan added this to the 3.2.1 milestone Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:typer itype:soundness Soundness bug (it lets us compile code that crashes at runtime with a ClassCastException)
Projects
None yet
7 participants