-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why the compiler is slow and consumes a lot of memory #4864
Comments
Restrictions over argument and return types looks for me like good deal for compile times and predictability/readability of code. It doesn't touch feature of union types. |
I really like idea (2). Argument type being not actual type but just a restriction was always counterintuitive for me (e.g. my first issue #2977 appeared partially due to this misunderstanding (I thought that if i pass |
Mandatory return types are not bad at all. Type unions really deliver the "dynamicness" feel to crystal. I guess people interested in Crystal not because it's Ruby that runs fast, but a static language that has a sexy syntax of Ruby. I personally type everything in my Crystal scripts including arguments and value the safety and readability it gives more than the performance. |
Mandatory argument and return types for methods seems like a good choice, and not only for performance reasons. |
Modern C++ has type inference for return types. It's required because some types are hard to hand write (or can be inferred only on compile phase). |
@asterite First of all, thank you for performing some research and confirming what we all thought: the largest problem with scaling Crystal is the combinatorial expansion of Second of all, I want to point out early in this discussion that the Crystal compiler and language works fantastically well for a lot of people as-is. I mainly code small, self-contained, useful services in crystal which solve small problems. These are an absolute dream to code in Crystal because you can write your entire application with very few type restrictions and take full advantage of the power of the crystal compiler. And they all compile in around (or less than) a second. I don't want this to change, not only because I want to continue coding my applications without thinking too much about types, but because beginners do this too. I think we should be careful that any changes we make don't drive away newcomers by making the language feel too rigid and "Java-like". In addition to that, the fact that crystal works fantastically well for most people as-is gives us a lot of time to get any changes we make to enable Crystal not to go exponential in large apps exactly right. We shouldn't rush into any large-scale changes to the language without being sure that it's the right decision. I believe that solution 2 (and perhaps some small non-breaking optimizations around generics) is very powerful and will solve this problem with almost no major changes to the "feel" of the language. My hypothesis is that modules (parts of the stdlib, shards, etc.) will naturally form strongly typed interfaces between themselves simply by documenting their API. Providing type annotations on a public API is something that should be done already: it makes the documentation much easier to read, and it creates better errors. By annotating a method's type signature you not only give useful documentation to yourself and anyone else using the library, you tell the compiler enough information - most of the time - to instantiate your method only once. The problem right now, is that the compiler can't take advantage of that information, because it can only do it "most of the time". Arguments are not cast to their type restriction when they arrive in a method. This allows you to write code where a method will only correctly compile for some subset of the types which are actually covered by the type restriction. This is genuinely confusing behaviour which we should consider changing whether or not it gives us a massive performance improvement (and I think it gives us a massive one). By changing the semantic such that we essentially call However, what we shouldn't do is enforce that such type restrictions exist. It makes the language much less flexible, and in my view is entirely unnecessary. If the public API of a logical "module" of code is annotated with type restrictions (and it doesn't even need to be 100% coverage, only 90%), we still vastly reduce the number of possible method instantiations on the external API. This in turn vastly reduces the possible number of ways internal methods without any type restrictions are instantiated. Instead of a single hunk of code which exhibits I believe this method preserves all that I love about crystal, the dynamic and exciting coding style, while solving the large-program problem. All while strongly encouraging good documentation by tying it to compile-time performance. |
That's what I was thinking about as well: Especially for libraries it is good practice to specify argument and return types, because they are usually expecting something specific. All-over ducktyping can only go so far as there are no instance variables involved. Even in stdlib it's sometimes difficult to figure out what return type you can expect from a method or what type an argument can be without looking at the source code. In some cases it is actually documented in the doc text, but that information should really be in the code. It helps so much to understand exactly what you can do with a method. |
That's because the stdlib is badly documented :) |
Well:
I wonder if instead of memoizing a duplicated method body over and over again, it would only be done to determine args/return types, then let it be collected, only memoizing the args/return type pairs (which could be simplified), then eventually regenerate the method body during the LLVM IR code generation (or not, if types didn't changed since the last compilation). |
Option 1 doesn't seem bad to me; subclasses need to return compatible types anyway. Here's my question: in this case: class Super; end
class Sub < Super; end
class Foo
def meth
Super.new
end
end
class Bar
def meth
Sub.new
end
end
Bar.new.meth # Super or Sub? If it's Option 2 is great. IMO it should've been this way from the beginning. Option 3...I feel like this kills one of Crystal's main points: the ability to prototype easily and throw on type annotations when you need to. I think maybe it'd be worth it first checking how the language composes anyway. |
But how does this "compiling inefficiency" translate in numbers? Is the memory consumption something that grows exponentially with the project, to the point it will consume infeasible amounts of RAM? I've written half a dozen "micro" projects in Crystal (the larger has only about 2k lines of code), and I've never experienced that much RAM usage in compiling. My question is straight to the point: If a project reaches 10, 20, 50, 100k LOC will the memory consumption follow it linearly (or even stabilize after a certain amount) or there could be a point where Crystal would require simply too much memory (hardware-deterrent) to be acceptable? If the memory consumption is not something that will eventually reach blocking hardware usage as the project grows, I honestly don't see an issue at all. If you want to run your project binaries on a low-end machine (less than ~1GB RAM), you can always replicate the production environment on a build machine and then port the binary to production. In fact, you could even use Vultr/DigitalOcean VMs to do the build and simply kill their VM later, it would cost just a penny per production build. |
@kazzkiq Right now many PR's to crystal fails on CI because apparently CI machines has not enough memory to compile all specs. There is a partial solution to this (split specs by two parts), but it shows that memory consumption isn't going to stabilize. |
So, my two cents (and use case) As opposed to the majority of Crystal users who usually make micro-services, or webapps, I use Crystal for Machine learning and Data analysis, My projects get big, and fast, where the usual size is 40K lines and more. Compiling takes time, and memory, especially when using |
Compiler can follow two separate modes (like JavaScript with "use strict" pragma does). First and default one allows global type inference. Second - doesn't allow. It's common practice to distinguish requirements to smalls apps, libraries and big projects using pragmata etc. |
@akzhan That could work, and could be applied to specific methods instead of all methods? But then again, it's basically I think option 2 is best, and it's how it works in essentially any sane statically compiled language, and it's how it should've been from the start. Option 3 is fine for me too, but it would most definitely break a lot of stuff and be too "rigid" compared to now I guess, but I don't have too much of a problem with that, coming from C, C++ and Java. |
@totakaro yes, it is. but enabled by some pragma in source code instead of compiler option. There are lot of reasons to setup compiler behavior by source code. And it's already used widely by programming languages. See ruby's pragmata to utf-8 string, frozen string literals, perl's utf8, strict, javascript's 'use strict' etc. |
@akzhan most of those are hacks to be able to remove unwanted and unneeded behaviour while keeping backwards compatability. If those languages had been well designed in the first place such hacks wouldn't be required or wanted. We don't have the shackles of requiring backwards compatability so we shouldn't have multiple versions of the crystal language. |
@RX14 sure, We don't want to repeat Python 2 vs 3 story. |
@RX14 anyway we can't provide good shards/modules and large apps without strictures on type inference. No way. And I should note that's it's not backward compatibility question. It's required because we have two different target auditories. |
@akzhan Always exists a way to solve things 😄 |
@faustinoaq yes, it is. something may be solved by precompiling ASTs (parser phase). But type inference should be simplified just because it's now hard for compiler and not yet uniquely. |
I don't follow at all. I believe we should enable the compiler to take advantage of annotated types to compile much faster, but that the trade off between compile time and annotating types should be entirely up to the user. I do not believe we should force anything on anyone. We don't have to: the practicalities around compile times will force them to if their shard gets popular enough. |
I tried to work on this but failed miserably. I won't have time to play with this, but if someone has, I suggest to clean up and refractor the compiler code because it's a bit messy. It also changes the semantics of the language so it's a breaking change. |
Thanks for the report @asterite! I've been thinking about this one in the back of my head for some time, though I haven't touched the code in months. My baseline of thought has been: technical solution, not spec changing solution. Idea has mainly been to separate the "pure" AST-nodes and the AST-node annotations / semantics (typing). The thoughts are also founded in the wish to be able to come up with a way of keeping the program structure cached in a compilation server (as has also been discussed before) which obviously requires keeping its' space requirements down. To mutate only parts of trees affected by source changes. If that was achieved, higher cycle cost is offset by limiting the extent of work. Those are all big ifs. And much work. And much added complexity to the compiler. I routinely type restrict or use explicit generics for params and return type always, since catching errors at function boundaries leads to much faster development and much less bugs. But -- the reason I love the "implicit genericness" in Crystal vs. having to template it explicitly in C++ is that you can start off with PoC-code and algorithms and even script-like code first: quickly throwing it away, quickly rewriting, and then tighten it up when happy with the concept. Writing entire programs without typing func params/ret is very bad practice imo (with no benefits whatsoever from what I've seen in studies and my own work). But it still should be fast XD |
@asterite does that mean you think solution 2 is the way to go? Do you have any more comments/ideas about the ideas in my comment? Regarding the compiler change, it would be great if you could give a high-level on the kind of refactoring or other work required. To my uninitiated mind (I haven't worked on the typing code in the compiler) its as simple as cast all the params to a function which are restricted and attempt to cache instantiations. What am I missing? |
@RX14 This comment explains one of the difficulties of implementing this: method restrictions aren't typed immediately, they are typed over and over when trying to match a method call. So it's impossible to "cast the arguments to the restrictions" because when you match an argument to a method, the type that you get is the restricted one: the original one, dictated by the parameter, is lost in the matching. This gets more complicated with union types, and also with things like: def foo(x : Enumerable)
end which right now work because Maybe this small details needs to be discussed, decided, and tackled, and slowly improve the language specification and its implementation, tackling (and thinking about) cases like the above. |
We should not give up and we should not allow the problems to defeat us. We all sometimes feel frustrated. But WE CAN ONLY LOOSE IF WE ACCEPT THE DEFEAT. |
I know nothing about internals, but I am wondering if there is way to incrementally cache types per file, and invalidate cache if file hash is changed. file A
cache: none file X
cache: double(x : Int32) : Int32 file Y
cache: double(x : Int32) : Int32, double(x : Float64) : Float64 file X, Y change, cache no change -- any opinion, @asterite? |
@S-YOU I guess crystal doesn't have incremental cache yet 😅 |
Sure, just send a PR 👍 |
@S-YOU Currently crystal have some LLVM cache for
(1) Amount of files processed, you can see these files on Crystal cache directory. The kind of cache you asked, I think is not possible in crystal yet, as @RX14 said on gitter:
https://gitter.im/crystal-lang/crystal?at=5ab01c665f188ccc15d3df76 So, I think we need a bit more research on crystal type system or experiment some alternatives... 😉 |
I am wondering about type cache can be improved for "Semantic (main)" part?
If there is type cached for most methods, I am expecting that amount of time taken (and memory usage?) can be reduced better? I am also thinking that reconstructing the AST for the whole program is ok as long as there is type cached for each files that didn't changed? let me know If I am wrong. |
@S-YOU Oh, I understand you now 😅 , so, as far as I know currently crystal doesn't have a cache for the semantic phase, that's why "sometimes" the compiler is slow and consumes a lot of memory, even if you have compiled it previously 😅
As I said before,we need a bit more research on this, because crystal type system has an "special" type inference that apply some rules to get types, so if you change one file, the whole AST for all files get modified, so we can't apply "safe" incremental compilation or cache for semantic phase yet. Well, I think at least we can have a semantic cache, just like |
I am wondering that modification can always be addition of new type? And if there is only addition, incremental type annotation cache is possible? Please let me know If I am missing something. |
Caching is a good idea long-term but all of @asterite's proposals here seem to be aimed at reducing the number of instantiations instead of doing caching on them. That's probably because it's easier to do that than caching in the short term. |
There's no problem with the idea of caching, many languages do that. The problem is who and how they are going to implement it. But the answer is simple: nobody and never. Nobody sends PRs to fix easier stuff in the compiler, so I doubt somebody will implement such a hard task as a cache. I still don't understand what's the rush to get to 1.0. I'd rather Manas uses the money they get to rewrite the semantic and codegen phases from scratch with readability, performance and flexibility in mind. Another way to say it: I don't think anything significant will happen before 1.0 (or, well, 1.0 will never be reached). In my mind, 1.0 means a compiler rewrite from scratch. |
@asterite please, such statements are IMO highly discouraging, perhaps you're right but why close the door someone might want to go through? |
Yeah, sorry. I'm sure someone will come up with an excellent understanding of the compiler's code and soon send a PR to fix this and other issues. |
I disagree that it's nobody and never, but I certainly would expect it to happen after 1.0. Compiler performance is something that doesn't need to be improved before 1.0.
because people want to be confident in using crystal in production, and 1.0 means they can sell that to management or to their coworkers and to themselves that they can use crystal
nobody really cares about the compiler enough that they'd rather see their funds and time go towards rewriting the compiler instead of getting important features like windows and parallelism working. |
@RX14 and @asterite Both views have some truth to them 😉 |
Sigh. To improve my confidence into pushing Crystal into production (for anything serious) I need one thing: a team of people actively working on the compiler; not 2 individuals fixing a few bolts now and then. The compiler is "good enough" and "does the job", sure, but whatever the stdlib, windows support or even parallelism quality, can anyone say that the compiler is stable and solid enough to be considered a 1.0 grade? Please, be serious. We need developers to dig into the compiler, understand its internals, and actively work on it, which is a tedious task. |
Yeha, nothing about those notions is actionable, saying "the compiler sucks but no one will fix it" adds nothing and harms the community and moral. |
@ysbaddaden we will have @bcardiff from manas working full-time on crystal in 2 weeks time - after he gets back from holiday. I think he'll have time to do compiler work, not too sure if he's planning to though (I think parallelism is more important becausr that's a big breaking change). |
Very much understandable, but I believe that many people on earth thinks that compiler is not easy stuff. (Without looking at crystal source code of course) |
Yeah, parallelism and thread safety is important to my opinion too. I don't have any problem with compiler performance, and my packet processing programs can compile under 5 seconds (in release mode) I love the language and digging more and found this thread and commented some ideas I could come out, that's all. Cheers. |
@asterite if you want more people to contribute to the compiler more information should be shared/document of its internal workings. I bet you they're many who would like to jump in to improve the compiler it just feels like a very closed box. The community is eager for v1.0 and we are conscious of the effort that is needed but there is no insight of direction/sequence from the core members (think roadmap, designs, plans, decisions made). We know that you would like a rewrite but there is no plan/strategy to get there. There is 10k people following this project and 1% of that will most likely contribute to get this done. I propose the following: Document your intentions for the compiler what would you like to see and how you think it should work and write it down, beak it down into workable small tasks, treat it as a project (milestone) and share it with the community and you will see PRs flowing in. You cannot be the only one working on this is gonna take a toll on you and your health, you have the support of an entire community that wants to see Crystal in v1.0. Let me know if I can help in any of the above. |
@eliasjpr Nice idea 👍 , I remember @makenowjust wrote a small guide about how to implement a new feature in the compiler #4757 (comment) See: https://github.com/crystal-jp/techbookfest2/tree/master/makenowjust (Japanese) Isn't English but we can try to translate it to be a starting point 😉 |
Interesting post here https://github.com/nim-lang/Nim/issues/7874 😉 |
One solution would be to store an id of an top level definition (block) together with an ast checksum, binary checksum and all the depending ids on for this special id. If a block is in ast form, calculate checksum and compare with the ast checksum for the given id in cache, if they match take the binary from cache, if not: I'm really skeptical how much worth is intra-modular caching opposed to inter-modular caching. |
Why not add a compiler option for the enforcement of typing methods. Say we name this "IDE mode" where productivity and robustness is of value and otherwise we prefer the default mode for prototyping in the REPL or for one-peer projects. |
@bararchy any idea or guess how Crystal LOC relates to memory and/or release compilation time? How many GB to compile 20k LOC or 80k LOC ? Is there some kind of predictable-ish graph? Some bigger software projects like databases have approaching 1M LOC. And some even bigger projects like WebKit have millions of LOC. Would a bigger project like that even be viable with Crystal with its current memory usage (and compilation time?)? Also, I saw the comment above back on May 3, 2018 about how it's unlikely that the Crystal compiler will change for the better in terms of resource usage and speed, and maybe even a re-write is a better option. It's now over 2 years later. Has anything changed? Are faster compile times coming, or even on the horizon? |
@simonhf we have some improvements in mind that might come after 1.0 is released, but we're aiming for correctness first. Once 1.0 is released be sure that we'll focus, at least some part of our time, to increase the performance of the compiler. The time that currently takes to compile a project in release mode is mostly due to running LLVM optimizations. If you execute the compiler with |
Regarding the question about a relation between LOC and memory and speed: I doubt there can be any meaningful generalization or even a rule of thumb. Compiler performance does not exactly relate to the amount of code being processed. You can build three different projects with 10k LOC and they'll likely show very different results, depending on specifics of the respective program. |
So, I tried to reduce the memory consumed by the compiler. The first thing that came to my mind was to clear up bindings between nodes after the semantic phase and the call
GC.collect
. A binding between AST nodes is this: if a nodex
is bound to a nodey
, whenevery
's type changesx
's type is updated. This is used in the type inference algorithm.It turns out that by doing this I was able to free about 50MB of the total of 1GB consumed by compiling the compiler. This is insignificant and will not help reduce memory usage. My hope was to reduce the memory consumed before the codegen phase so no more memory is needed to be allocated.
So, the next thing is seeing what the compiler is doing with our code. Why isn't more memory freed by calling
GC.collect
? And why is so much memory consumed? The compiler isn't that big after all.To investigate this, I added some quick and dirty code to show what's going on. To understand the output we need to understand how the compiler works. It's not hard.When we have a method like this...
... Crystal doesn't compile anything yet. Only when you invoke the method it will instantiate it with the argument types. So if you invoke
double(1)
a methoddouble<Int32>
will be instantiated and typed (for method calls inside that method we do the same). If you then invokedouble(1.5)
a methoddouble<Float64>
will be instantiated, and so on. We call each of this an "instantiation" of a method, similar to how you instantiate a generic type with concrete types, because every method in Crystal is like a "generic" method. A instantiation means cloning the AST nodes for the method and type-checking them, and keeping this type-check result around until codegen (this consumes memory that can't be freed).This gist shows the list of all types and methods instantiated by compiling the compiler. Now we can understand why so much memory is consumed:
Array(T)
methods are instantiated over and over again for each differentT
. I think this is called "monomorphization". It's good because the compiler will generate specific, optimal code for eachT
. It's bad because ifT
is a reference type (represented as a pointer) then the generated code, at least for anArray
or aHash
, is the same, so there's no point to do this. Also, note that methods forPointer(T)
, a very low-level type, are instantiated over and over again.raise
inProgram
(the top-level). There's one instantiation for eachException
type that we pass to it. We can also see this happening inIO#<<
,Array#push
, etc. Methods in Crystal usually don't have a restriction, so the compiler will instantiate them once for each argument type. And, even if you write, for example,def raise(e : Exception)
, here: Exception
acts just as a restriction: one method will still be instantiated per given type.ASTNode
. What this means is that if we haveFoo < Bar
and a method only defined onFoo
, say#foo
, but we doBar.new.foo
,foo
will be instantiated forBar
. We'll have two different instantiations:Foo#foo
andBar#foo
. Now, this doesn't happen if we do(Foo.new || Bar.new).foo
: in this case the receiver isFoo+
and just in this case onlyFoo+#foo
will be instantiated (which is different thanFoo#foo
!). This optimization can be done because the instance variables ofFoo
come before the ones ofBar
: they share the same "initial" structure .IO
) the module is expanded to its including types and when you call a method on it, the method is instantiated for each including type. That's why you can seeto_s
being instantiated for each and everyIO
. And here we couldn't even try to do the optimization we do for virtual types to generate a single method (theFoo+#foo
) case because these including types might not share the same instance variables order/structure. This is one reason we'd like to prohibit variables of typemodule
and introduce the concept of interface. For this reason it's probably better to makeIO
be aclass
instead of amodule
.There's a lot of redundant information and processing done here. This not only results in bigger compile times and memory consumption, but also in code bloat in the generated executable.
This is not a rant: I'm explaining this so the team at Manas, core contributors, and anyone who reads this can brainstorm ideas to improve this situation.
Some ideas I have in mind are:
1: When we have
Foo < Bar
withFoo#foo
and we invokeBar#foo
, turn the receiver intoFoo+#
and use that. Similarly if we haveFoo#foo
: turn it intoFoo+#foo
.Now, this is a breaking change, because:
Basically, invoking
foo
will always result in a dispatch and in the broader type. Maybe it's not a big issue.This means that if I write
def foo(x : Foo)
and I passBar < Foo
,Bar
would be cast-up toFoo+
. Or, well, following change 1 maybe there won't be a difference anymore betweenFoo
andFoo+
.That way the method
foo
will only be instantiated once.Like change 1, this is a breaking change for more or less the same reasons.
So for example we can't write this anymore:
We must write it like this:
Or choose one type (or a union type, etc.):
There doesn't seem to be a point to this, because we can just use
forall
all over the place. The idea would be to document that generic methods like that will be instantiated once per argument type and that they will slow down compilation and bloat the generated executable, and to always prefer to write an explicit type if known. We remove the possibility to writedef double(x)
because it's to easy to write so everybody will use that instead of thinking the correct type.Because the norm would be to have method arguments typed, we can type-check them without waiting for someone to invoke them! This means we can give errors before a method is invoked. This means providing IDE support for autocompletion or showing the type of a local variable is a piece of cake (or, well, much easier, and much faster).
We can still use some level of duck-typing with
forall
. For example, I'd likeIO#<<(obj)
to be generic so I can append any object to it, at the cost of the method being instantiated once per object, because the method definition is very short (just two lines).Another idea would be to keep things just like now, and provide a tool (like the diff above) that would inform of methods that are instantiated too many times and could benefit from adding a type restriction. But ideally, just like with instance variables, it would be nice if the norm were to write the types down.
I don't have a solution for reducing the amount of generic type instantiations without complicating the type system a lot (like, becoming Java, C# or Rust), and without requiring mandatory return types, though mandatory return types are also nice to improve compile times, memory usage and understanding of the code. And of course all of this could help to implement incremental/modular compilation (though I'm not sure).
Another overall "solution" is to just say "Well, the language is awesome (it is) but the cost is slow compile times, a lot of memory consumption compiling code, and code bloat". That's fine too, if we accept that limitation.
The text was updated successfully, but these errors were encountered: