Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix occurs check bug #20

Merged
merged 14 commits into from
Jul 25, 2024
Merged

Fix occurs check bug #20

merged 14 commits into from
Jul 25, 2024

Conversation

changlinli
Copy link
Collaborator

@changlinli changlinli commented Jul 16, 2024

Quick Summary:

Fixes elm/compiler#2241. See there for SSCCE

  • Zokka: 0.191.0-beta.0
  • Browser: N/A
  • Operating System: NixOS 24.05.git.7dca15289a1cM (Uakari)

Additional Details

Currently WIP. Very very hacky.

@changlinli
Copy link
Collaborator Author

changlinli commented Jul 17, 2024

Copied from some Discord threads.

Here's how I understand what's going on, with the caveats that I still haven't had the time to really go understand the Type.Type.Constraint type, there's still some other holes in my knowledge here, and I'm trying to reverse engineer Evan's intentions from the code.

The Elm compiler tries to minimize the amount of occurs checks that it does, by cleverly taking advantage of Haskell's laziness. Indeed, it's a bit of a misnomer to say that the Elm compiler does an occurs check (even though there is a function called occurs).

The Elm compiler basically goes and performs unification without any occurs check. Without an occurs check we can end up with infinite types. However, the compiler assumes that this is safe because while we might want to unify an infinite type with a finite type, we should never need to unify an infinite type with an infinite type. This is because e.g. it's impossible to write a type annotation with an infinite type, so when comparing an expected type with the actual type, one or the other should be finite.

When performing unification on two types, the compiler recurses on both types. Because one of them must be finite, the recursion must end. Because Haskell is lazy, as long as we don't try to fully recurse all the way down an infinite value, we can have the infinite value lying around without causing any issues.

This means we can fully complete unification without any occurs checks! We still want to check whether we have infinite types at some point, because if we don't even though unification has succeeded, we might have an unsound type system due to the infinite types. But we can choose where to do it.

This means that we've traded needing to perform an occurs check at potentially every single step of unification, for just an infinite type check (in reality the only infinite types we can create are cyclic ones so this is really a cycle detection algorithm) of some types at the very end of the unification process, which can substantially cut down on the amount of checks we do.

And indeed that's why Elm's occurs function only takes in one argument. It's not really an occurs checker, it just checks whether its argument is infinite or not.

We can further optimize this by realizing that there's only a subset of variables/types to check that could plausibly be infinite.

And so as a separate pass after unification, we check for infinite types and if there are any, we "artificially" insert some errors that bubble up ultimately as type errors to the user.
But this relies crucially on the assumption "while we might want to unify an infinite type with a finite type, we should never need to unify an infinite type with an infinite type," which is wrong, as the example in compiler-output-tests/src/BadOccursCheck1.elm (and other issues in the Elm compiler's issue tracker) demonstrate.

The easiest way for it to be wrong (maybe the only way? But I haven't thought about it enough) is for there to be two independent sources of infinite types in an expression.

E.g. in compiler-output-tests/src/BadOccursCheck1.elm's case there were two |> applyIf condition withQuery Msg1 query1 lines (well the second was query2 but that's a minor detail).

If you delete one of those lines the Elm compiler is perfectly happy to spit back a reasonable error, but if both are there then all of a sudden the Elm compiler is trying to unify two infinite types (because again the Elm compiler never actually performs an occurs check despite the function name! It only checks for whether variables are infinite), and then it gets stuck recursing down (and forcing into memory) two infinite data structures.

Now the easiest way to fix this is to just check for infinite types at every step of unification (which is what my rough patch does). However, I believe that this is substantially more expensive than having an actual occurs check, even if the latter also occurs on most unification steps, because the latter is a simple set lookup, whereas the former involves traversing a tree of values every time. And the reason why Elm uses a tree of values rather than some flat data structure to represent all the constraints it needs to solve is I think precisely to support this "occurs-check-less" unification flow.

So to solve this in a principled way, it might be the case that the core data structures of the current unification code need to be rewritten and we need to substitute the infinite type check for an actual occurs check.

I've yet to have the time to run benchmarks though to see how much performance degrades and there's also another way to try to reduce the performance impact of the infinite type check.

Keep track of how far we've recursed, and only do an infinite type check after some number of recursions (say 100) that usually won't show up for correct code.

It's hacky and makes the code harder to follow, but hopefully minimizes perf impact while reducing the amount of code change that needs to be done.

This is a rough diagnosis! I think it's enough for me to merge in this patch if the benchmarks show no big drop in compiler performance, but I have uncertainty and would appreciate anyone else with knowledge of how the Elm compiler does type unification to chime in.

@changlinli changlinli force-pushed the fix-occurs-check-bug branch from 5e4007e to 354aa99 Compare July 17, 2024 21:41
Useful for debugging, but kind of scary because they all hinge on an
`unsafePerformIO`
@changlinli changlinli force-pushed the fix-occurs-check-bug branch from 354aa99 to 9caf2a0 Compare July 17, 2024 21:44
@changlinli changlinli marked this pull request as ready for review July 17, 2024 21:51
changlinli added a commit to changlinli/elm-compiler that referenced this pull request Jul 17, 2024
This fixes elm#2241

Preliminary testing seems to indicate this isn't as bad of a slowdown as
it may seem. For more details see Zokka-Dev/zokka-compiler#20

This is meant as a reference PR for other people working on the Elm
compiler.
@changlinli changlinli merged commit 90425fb into master Jul 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

elm make hang then runaway memory usage from orphan process
1 participant