-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix occurs check bug #20
Conversation
Just forcibly insert occurs checks. This is bad and should be fixed.
This should result in a compilation failure, which is not something that test can test for
Copied from some Discord threads. Here's how I understand what's going on, with the caveats that I still haven't had the time to really go understand the The Elm compiler tries to minimize the amount of occurs checks that it does, by cleverly taking advantage of Haskell's laziness. Indeed, it's a bit of a misnomer to say that the Elm compiler does an occurs check (even though there is a function called occurs). The Elm compiler basically goes and performs unification without any occurs check. Without an occurs check we can end up with infinite types. However, the compiler assumes that this is safe because while we might want to unify an infinite type with a finite type, we should never need to unify an infinite type with an infinite type. This is because e.g. it's impossible to write a type annotation with an infinite type, so when comparing an expected type with the actual type, one or the other should be finite. When performing unification on two types, the compiler recurses on both types. Because one of them must be finite, the recursion must end. Because Haskell is lazy, as long as we don't try to fully recurse all the way down an infinite value, we can have the infinite value lying around without causing any issues. This means we can fully complete unification without any occurs checks! We still want to check whether we have infinite types at some point, because if we don't even though unification has succeeded, we might have an unsound type system due to the infinite types. But we can choose where to do it. This means that we've traded needing to perform an occurs check at potentially every single step of unification, for just an infinite type check (in reality the only infinite types we can create are cyclic ones so this is really a cycle detection algorithm) of some types at the very end of the unification process, which can substantially cut down on the amount of checks we do. And indeed that's why Elm's occurs function only takes in one argument. It's not really an occurs checker, it just checks whether its argument is infinite or not. We can further optimize this by realizing that there's only a subset of variables/types to check that could plausibly be infinite. And so as a separate pass after unification, we check for infinite types and if there are any, we "artificially" insert some errors that bubble up ultimately as type errors to the user. The easiest way for it to be wrong (maybe the only way? But I haven't thought about it enough) is for there to be two independent sources of infinite types in an expression. E.g. in If you delete one of those lines the Elm compiler is perfectly happy to spit back a reasonable error, but if both are there then all of a sudden the Elm compiler is trying to unify two infinite types (because again the Elm compiler never actually performs an occurs check despite the function name! It only checks for whether variables are infinite), and then it gets stuck recursing down (and forcing into memory) two infinite data structures. Now the easiest way to fix this is to just check for infinite types at every step of unification (which is what my rough patch does). However, I believe that this is substantially more expensive than having an actual occurs check, even if the latter also occurs on most unification steps, because the latter is a simple set lookup, whereas the former involves traversing a tree of values every time. And the reason why Elm uses a tree of values rather than some flat data structure to represent all the constraints it needs to solve is I think precisely to support this "occurs-check-less" unification flow. So to solve this in a principled way, it might be the case that the core data structures of the current unification code need to be rewritten and we need to substitute the infinite type check for an actual occurs check. I've yet to have the time to run benchmarks though to see how much performance degrades and there's also another way to try to reduce the performance impact of the infinite type check. Keep track of how far we've recursed, and only do an infinite type check after some number of recursions (say 100) that usually won't show up for correct code. It's hacky and makes the code harder to follow, but hopefully minimizes perf impact while reducing the amount of code change that needs to be done. This is a rough diagnosis! I think it's enough for me to merge in this patch if the benchmarks show no big drop in compiler performance, but I have uncertainty and would appreciate anyone else with knowledge of how the Elm compiler does type unification to chime in. |
5e4007e
to
354aa99
Compare
Useful for debugging, but kind of scary because they all hinge on an `unsafePerformIO`
354aa99
to
9caf2a0
Compare
This fixes elm#2241 Preliminary testing seems to indicate this isn't as bad of a slowdown as it may seem. For more details see Zokka-Dev/zokka-compiler#20 This is meant as a reference PR for other people working on the Elm compiler.
Quick Summary:
Fixes elm/compiler#2241. See there for SSCCE
Additional Details
Currently WIP. Very very hacky.