Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix spurious subtype check pruning when both sides have unions #18213

Merged
merged 5 commits into from
Jul 20, 2023

Conversation

Linyxus
Copy link
Contributor

@Linyxus Linyxus commented Jul 14, 2023

Fixes #17465.

In TypeComparer, fourthTry calls isNewSubType and isCovered to detect the subtype queries that have been covered by previous attempts and prune them. However, the pruning is spurious when both sides contain union types, as exemplified by the following subtype trace before the PR:

==> isSubType (test1 : (Int | String){def foo(x: Int): Int}) <:< Int | String?
  ==> isSubType (Int | String){def foo(x: Int): Int} <:< Int | String?
    ==> isSubType (Int | String){def foo(x: Int): Int} <:< Int?
    <== isSubType (Int | String){def foo(x: Int): Int} <:< Int = false
    ==> isSubType (Int | String){def foo(x: Int): Int} <:< String?
      ==> isSubType (Int | String){def foo(x: Int): Int} <:< String?
      <== isSubType (Int | String){def foo(x: Int): Int} <:< String = false
    <== isSubType (Int | String){def foo(x: Int): Int} <:< String = false
    // (1): follow-up subtype checks are pruned here by isNewSubType
  <== isSubType (Int | String){def foo(x: Int): Int} <:< Int | String = false
<== isSubType (test1 : (Int | String){def foo(x: Int): Int}) <:< Int | String = false

At (1), the pruning condition is met, and follow-up recursions are skipped. However, in this case, only after (1) are the refinement on LHS dropped and the subtype between two identical OrTypes are accepted. The pruning is spurious.

This PR tempers the pruning conditions specified in isCovered and isNewSubType to fix these false negatives.

@Linyxus Linyxus requested a review from odersky July 14, 2023 23:32
@Linyxus Linyxus mentioned this pull request Jul 14, 2023
Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this needs a better justification why expensive recursion operations are needed and some reworking to avoid allocations.

if (isCovered(tp1) && isCovered(tp2))

def isCovered(tp: Type): (Boolean, Boolean) =
var containsOr: Boolean = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entails an allocation since variables accessed from local functions are heap allocated

val (covered1, hasOr1) = isCovered(tp1)
val (covered2, hasOr2) = isCovered(tp2)

if covered1 && covered2 && !(hasOr1 && hasOr2) then
Copy link
Contributor

@odersky odersky Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So total, 4 allocations just here, not counting all the lists built in the recursive calls. We should avoid them in subtype checks, where possible.

Techniques do so:

  • Instead of a variable, pass a second parameter to recur.
  • Use an enum instead of a pair of booleans. We need three states: Uncovered, CoveredWithOr, Covered


def isCovered(tp: Type): (Boolean, Boolean) =
var containsOr: Boolean = false
@annotation.tailrec def recur(todos: List[Type]): Boolean = todos match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why recur over a list if types?

case tp: TypeRef =>
if tp.symbol.isClass && tp.symbol != NothingClass && tp.symbol != NullClass then recur(todos)
else false
case tp: AppliedType => recur(tp.tycon :: todos)
Copy link
Contributor

@odersky odersky Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These nested recurrences might turn out to be really expensive. Why do we need them?

Copy link
Contributor Author

@Linyxus Linyxus Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recursion is equivalent to the one before. I refactored it to take a list of types, which is essentially a working list of recursive invocations, to make the function tail-recursive. (e.g. recur(tp.tp1) && recur(tp.tp2) becomes recur(tp.tp1 :: tp.tp2 :: todos)) I thought that this saves the stack and could lead to better performance, but I could revert this refactorization to avoid passing around a list during recursion and creating a new list object at each new invocation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we did not recurse in arguments of AppliedTypes, just in the type constructor. That's the one I would expect to matter most.

Generally, I think using stack is cheaper than allocating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are not recursing on the arguments of the AppliedType: note that the todos is the working list, not the arguments of the AppliedType.

Thanks for pointing this out! I will revert this rewriting, and in the future I'll prefer reducing heap allocations over achieving tail recursion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, I had misread that!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, maybe it's simplest to just use Ints as the result of recur. Then && could be replaced by min.

Copy link
Contributor Author

@Linyxus Linyxus Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just pushed the commit that reverts the tailrec rewriting and uses an integer-based representation for the result of isCovered, interpreted bit-wisely. min is used to aggregate results as you suggested. (I was at first using a bitwise & for aggregation but later found that min is more logical). We use bitwise operations to analyze the results as well.

@odersky odersky assigned Linyxus and unassigned odersky Jul 17, 2023
@Linyxus Linyxus force-pushed the fix-i17465 branch 2 times, most recently from 72405a5 to 2d3b0d9 Compare July 17, 2023 17:31
- Revert tailrec rewrite to reduce allocation
- Use a integer-based representation for the result of `isCovered`
  The results are now aggregated and inspected with bitwise operations.
@Linyxus Linyxus assigned odersky and unassigned Linyxus Jul 19, 2023
Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style comments only. Otherwise all LGTM

if (isCovered(tp1) && isCovered(tp2))
def isCovered(tp: Type): CoveredStatus =
tp.dealiasKeepRefiningAnnots.stripTypeVar match
case tp: TypeRef if tp.symbol.isClass && tp.symbol != NothingClass && tp.symbol != NullClass => CoveredStatus.Covered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case tp: TypeRef if tp.symbol.isClass && tp.symbol != NothingClass && tp.symbol != NullClass => CoveredStatus.Covered
case tp: TypeRef =>
if tp.symbol.isClass && tp.symbol != NothingClass && tp.symbol != NullClass
then CoveredStatus.Covered
else CoveredStatus.Uncovered

It's a bit faster since it does not try the other patterns in case of a non-class TypeRef.

if s == Uncovered then Uncovered
else s min that

inline def bothHaveOr(s1: Repr, s2: Repr): Boolean = ~((s1 | s2) & NotHasOr) != 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline def bothHaveOr(s1: Repr, s2: Repr): Boolean = ~((s1 | s2) & NotHasOr) != 0
inline def bothHaveOr(s1: Repr, s2: Repr): Boolean = s1 == CoveredWithOr && s2 == CoveredWithOr

Or just inline the rhs where it is used.

else s min that

inline def bothHaveOr(s1: Repr, s2: Repr): Boolean = ~((s1 | s2) & NotHasOr) != 0
inline def bothCovered(s1: Repr, s2: Repr): Boolean = (s1 & s2 & IsCovered) != 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline def bothCovered(s1: Repr, s2: Repr): Boolean = (s1 & s2 & IsCovered) != 0
inline def bothCovered(s1: Repr, s2: Repr): Boolean = s1 >= CoveredWithOr && s2 >= CoveredWithOr

Or just inline rhs at the point of use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you don't need the isCovered and NotHasOr fields. Better not to be too clever with bitsets unless it really gains performance - it's easy to get these wrong, and is generally harder to read than straight comparisons.


val covered1 = isCovered(tp1)
val covered2 = isCovered(tp2)
if CoveredStatus.bothCovered(covered1, covered2) && !CoveredStatus.bothHaveOr(covered1, covered2) then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if CoveredStatus.bothCovered(covered1, covered2) && !CoveredStatus.bothHaveOr(covered1, covered2) then
if (convered1 min covered2) >= ConveredStatus.CoveredWithOr && (covered1 max covered2) = ConveredStatus.Covered then

Then no helper functions are needed, and it's still quite clear, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's much simpler and clearer. Thanks!

@@ -2991,6 +2996,33 @@ object TypeComparer {
end ApproxState
type ApproxState = ApproxState.Repr

/** Result of `isCovered` check. */
object CoveredStatus:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be private

private inline val NotHasOr = 1

/** The type is not covered. */
val Uncovered: Repr = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: In this case I'd write the three vals on subsequent lines with // comments to the right. It's more compact taht way and just as legible.

@odersky odersky assigned Linyxus and unassigned odersky Jul 19, 2023
Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

private object CoveredStatus:
type Repr = Int

private inline val IsCovered = 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are no longer needed.

- Speedup pattern matching in `isCovered`
- Make `CoveredStatus` private
- Use direct integer comparison instead of bit operations for better readability
  and maintainability
@Linyxus Linyxus enabled auto-merge July 20, 2023 08:45
@Linyxus Linyxus merged commit 78e7163 into scala:main Jul 20, 2023
@Linyxus Linyxus deleted the fix-i17465 branch July 20, 2023 11:47
@Kordyjan Kordyjan added this to the 3.4.0 milestone Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in -Ycheck:typer when refining String on StringOps defs
4 participants