Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the compiler is slow and consumes a lot of memory #4864

Closed
asterite opened this issue Aug 21, 2017 · 84 comments
Closed

Why the compiler is slow and consumes a lot of memory #4864

asterite opened this issue Aug 21, 2017 · 84 comments

Comments

@asterite
Copy link
Member

So, I tried to reduce the memory consumed by the compiler. The first thing that came to my mind was to clear up bindings between nodes after the semantic phase and the call GC.collect. A binding between AST nodes is this: if a node x is bound to a node y, whenever y's type changes x's type is updated. This is used in the type inference algorithm.

It turns out that by doing this I was able to free about 50MB of the total of 1GB consumed by compiling the compiler. This is insignificant and will not help reduce memory usage. My hope was to reduce the memory consumed before the codegen phase so no more memory is needed to be allocated.

So, the next thing is seeing what the compiler is doing with our code. Why isn't more memory freed by calling GC.collect? And why is so much memory consumed? The compiler isn't that big after all.

To investigate this, I added some quick and dirty code to show what's going on. To understand the output we need to understand how the compiler works. It's not hard.When we have a method like this...

def double(x)
  x * x
end

... Crystal doesn't compile anything yet. Only when you invoke the method it will instantiate it with the argument types. So if you invoke double(1) a method double<Int32> will be instantiated and typed (for method calls inside that method we do the same). If you then invoke double(1.5) a method double<Float64> will be instantiated, and so on. We call each of this an "instantiation" of a method, similar to how you instantiate a generic type with concrete types, because every method in Crystal is like a "generic" method. A instantiation means cloning the AST nodes for the method and type-checking them, and keeping this type-check result around until codegen (this consumes memory that can't be freed).

This gist shows the list of all types and methods instantiated by compiling the compiler. Now we can understand why so much memory is consumed:

  • First, notice that for generic types like Array(T) methods are instantiated over and over again for each different T. I think this is called "monomorphization". It's good because the compiler will generate specific, optimal code for each T. It's bad because if T is a reference type (represented as a pointer) then the generated code, at least for an Array or a Hash, is the same, so there's no point to do this. Also, note that methods for Pointer(T), a very low-level type, are instantiated over and over again.
  • Second, note for example the method raise in Program (the top-level). There's one instantiation for each Exception type that we pass to it. We can also see this happening in IO#<<, Array#push, etc. Methods in Crystal usually don't have a restriction, so the compiler will instantiate them once for each argument type. And, even if you write, for example, def raise(e : Exception), here : Exception acts just as a restriction: one method will still be instantiated per given type.
  • Third, note that for types under a same hierarchy (for example ASTNode) methods are instantiated for subclasses even if the method is defined in ASTNode. What this means is that if we have Foo < Bar and a method only defined on Foo, say #foo, but we do Bar.new.foo, foo will be instantiated for Bar. We'll have two different instantiations: Foo#foo and Bar#foo. Now, this doesn't happen if we do (Foo.new || Bar.new).foo: in this case the receiver is Foo+ and just in this case only Foo+#foo will be instantiated (which is different than Foo#foo!). This optimization can be done because the instance variables of Foo come before the ones of Bar: they share the same "initial" structure .
  • Fourth, when you have a variable that is of type module (like IO) the module is expanded to its including types and when you call a method on it, the method is instantiated for each including type. That's why you can see to_s being instantiated for each and every IO. And here we couldn't even try to do the optimization we do for virtual types to generate a single method (the Foo+#foo) case because these including types might not share the same instance variables order/structure. This is one reason we'd like to prohibit variables of type module and introduce the concept of interface. For this reason it's probably better to make IO be a class instead of a module.

There's a lot of redundant information and processing done here. This not only results in bigger compile times and memory consumption, but also in code bloat in the generated executable.

This is not a rant: I'm explaining this so the team at Manas, core contributors, and anyone who reads this can brainstorm ideas to improve this situation.

Some ideas I have in mind are:

1: When we have Foo < Bar with Foo#foo and we invoke Bar#foo, turn the receiver into Foo+# and use that. Similarly if we have Foo#foo: turn it into Foo+#foo.

Now, this is a breaking change, because:

class Foo
  def foo
    1
  end
end

class Bar < Foo
  def foo
    "hello"
  end
end

Foo.new.foo # => 1 (Int32)
Bar.new.foo # => "hello" (String)

# But with this change:
Foo.new.foo # => 1 (Int32 | String)
Bar.new.foo # => "hello" (Int32 | String)

Basically, invoking foo will always result in a dispatch and in the broader type. Maybe it's not a big issue.

  1. When a method argument has a type restriction, make that restriction be the argument type.

This means that if I write def foo(x : Foo) and I pass Bar < Foo, Bar would be cast-up to Foo+. Or, well, following change 1 maybe there won't be a difference anymore between Foo and Foo+.

That way the method foo will only be instantiated once.

Like change 1, this is a breaking change for more or less the same reasons.

  1. Require mandatory types in all method arguments. Generic methods need to use free variables.

So for example we can't write this anymore:

def double(x)
  x * x
end

We must write it like this:

def double(x : T) forall T
  x * x
end

Or choose one type (or a union type, etc.):

def double(x : Int32)
  x * x
end

There doesn't seem to be a point to this, because we can just use forall all over the place. The idea would be to document that generic methods like that will be instantiated once per argument type and that they will slow down compilation and bloat the generated executable, and to always prefer to write an explicit type if known. We remove the possibility to write def double(x) because it's to easy to write so everybody will use that instead of thinking the correct type.

Because the norm would be to have method arguments typed, we can type-check them without waiting for someone to invoke them! This means we can give errors before a method is invoked. This means providing IDE support for autocompletion or showing the type of a local variable is a piece of cake (or, well, much easier, and much faster).

We can still use some level of duck-typing with forall. For example, I'd like IO#<<(obj) to be generic so I can append any object to it, at the cost of the method being instantiated once per object, because the method definition is very short (just two lines).

Another idea would be to keep things just like now, and provide a tool (like the diff above) that would inform of methods that are instantiated too many times and could benefit from adding a type restriction. But ideally, just like with instance variables, it would be nice if the norm were to write the types down.


I don't have a solution for reducing the amount of generic type instantiations without complicating the type system a lot (like, becoming Java, C# or Rust), and without requiring mandatory return types, though mandatory return types are also nice to improve compile times, memory usage and understanding of the code. And of course all of this could help to implement incremental/modular compilation (though I'm not sure).


Another overall "solution" is to just say "Well, the language is awesome (it is) but the cost is slow compile times, a lot of memory consumption compiling code, and code bloat". That's fine too, if we accept that limitation.

@akzhan
Copy link
Contributor

akzhan commented Aug 21, 2017

Restrictions over argument and return types looks for me like good deal for compile times and predictability/readability of code.

It doesn't touch feature of union types.

@konovod
Copy link
Contributor

konovod commented Aug 21, 2017

I really like idea (2). Argument type being not actual type but just a restriction was always counterintuitive for me (e.g. my first issue #2977 appeared partially due to this misunderstanding (I thought that if i pass Enumerable and it compiles, than it should work for any Enumerable)) and I've seen some people on reddit complaining about it. I think this idea is least "breaking" and strictly improves the language - user still have a choice, either write def double(x) and get nice type inference (perfect for small scripts) or specify type and get better compilation speed\memory and also compiler check of interface matching implementation (ideal for big programs).
Will it be enough (if all possible methods in compiler\stdlib will have types restricted)? Because other two could be more controversial (idea 1 means that Bar.clone will return Object+? And idea 3 means every code written so far will be broken and copying from ruby won't be so easy too).

@classyPimp
Copy link

classyPimp commented Aug 21, 2017

Mandatory return types are not bad at all. Type unions really deliver the "dynamicness" feel to crystal. I guess people interested in Crystal not because it's Ruby that runs fast, but a static language that has a sexy syntax of Ruby. I personally type everything in my Crystal scripts including arguments and value the safety and readability it gives more than the performance.
P.S. welcome back to crystal @asterite without you Crystal felt stalled.

@vegai
Copy link
Contributor

vegai commented Aug 21, 2017

Mandatory argument and return types for methods seems like a good choice, and not only for performance reasons.

@akzhan
Copy link
Contributor

akzhan commented Aug 21, 2017

Modern C++ has type inference for return types.

It's required because some types are hard to hand write (or can be inferred only on compile phase).

@RX14
Copy link
Contributor

RX14 commented Aug 21, 2017

@asterite First of all, thank you for performing some research and confirming what we all thought: the largest problem with scaling Crystal is the combinatorial expansion of types * method arguments.

Second of all, I want to point out early in this discussion that the Crystal compiler and language works fantastically well for a lot of people as-is. I mainly code small, self-contained, useful services in crystal which solve small problems. These are an absolute dream to code in Crystal because you can write your entire application with very few type restrictions and take full advantage of the power of the crystal compiler. And they all compile in around (or less than) a second. I don't want this to change, not only because I want to continue coding my applications without thinking too much about types, but because beginners do this too. I think we should be careful that any changes we make don't drive away newcomers by making the language feel too rigid and "Java-like".

In addition to that, the fact that crystal works fantastically well for most people as-is gives us a lot of time to get any changes we make to enable Crystal not to go exponential in large apps exactly right. We shouldn't rush into any large-scale changes to the language without being sure that it's the right decision.


I believe that solution 2 (and perhaps some small non-breaking optimizations around generics) is very powerful and will solve this problem with almost no major changes to the "feel" of the language. My hypothesis is that modules (parts of the stdlib, shards, etc.) will naturally form strongly typed interfaces between themselves simply by documenting their API. Providing type annotations on a public API is something that should be done already: it makes the documentation much easier to read, and it creates better errors. By annotating a method's type signature you not only give useful documentation to yourself and anyone else using the library, you tell the compiler enough information - most of the time - to instantiate your method only once.

The problem right now, is that the compiler can't take advantage of that information, because it can only do it "most of the time". Arguments are not cast to their type restriction when they arrive in a method. This allows you to write code where a method will only correctly compile for some subset of the types which are actually covered by the type restriction. This is genuinely confusing behaviour which we should consider changing whether or not it gives us a massive performance improvement (and I think it gives us a massive one). By changing the semantic such that we essentially call as(TypeRestriction) every time we pass an argument which has a type restriction, we can take full advantage of this information and reduce compile times.

However, what we shouldn't do is enforce that such type restrictions exist. It makes the language much less flexible, and in my view is entirely unnecessary. If the public API of a logical "module" of code is annotated with type restrictions (and it doesn't even need to be 100% coverage, only 90%), we still vastly reduce the number of possible method instantiations on the external API. This in turn vastly reduces the possible number of ways internal methods without any type restrictions are instantiated. Instead of a single hunk of code which exhibits O(types * method arguments) number of instantiations, we end up with many smaller modules, each of which has O(types * method arguments) complexity inside it, but also has vastly fewer types and method arguments. Between these modules exists a strongly typed API which is both good for the compiler, and good for the human reading the documentation.

I believe this method preserves all that I love about crystal, the dynamic and exciting coding style, while solving the large-program problem. All while strongly encouraging good documentation by tying it to compile-time performance.

@straight-shoota
Copy link
Member

My hypothesis is that modules (parts of the stdlib, shards, etc.) will naturally form strongly typed interfaces between themselves simply by documenting their API. Providing type annotations on a public API is something that should be done already: it makes the documentation much easier to read, and it creates better errors.

That's what I was thinking about as well: Especially for libraries it is good practice to specify argument and return types, because they are usually expecting something specific. All-over ducktyping can only go so far as there are no instance variables involved.

Even in stdlib it's sometimes difficult to figure out what return type you can expect from a method or what type an argument can be without looking at the source code. In some cases it is actually documented in the doc text, but that information should really be in the code. It helps so much to understand exactly what you can do with a method.

@RX14
Copy link
Contributor

RX14 commented Aug 21, 2017

Even in stdlib it's sometimes difficult to figure out what return type you can expect from a method or what type an argument can be without looking at the source code.

That's because the stdlib is badly documented :)

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Aug 21, 2017

Well:

  1. having a descendant class method return another type than a super method feels like wrong design; maybe an acceptable tradeoff?

  2. strict type constraints... well, that sounds just like how it should be; that will break a few cases, for example collection : Enumerable then assume collection[] to be defined (reminiscent of Proposal for type restrictions to limit the methods that can be called on the argument or ivar #4010), but I believe these are abusing the... bug?

  3. require typing methods arguments and return values... it definitely kills the "never have to type anything" goal, trades developer happiness for the sake of the compiler —the opposite of Ruby's hackability & sexyness.

  4. Crystal is awesome!

I wonder if instead of memoizing a duplicated method body over and over again, it would only be done to determine args/return types, then let it be collected, only memoizing the args/return type pairs (which could be simplified), then eventually regenerate the method body during the LLVM IR code generation (or not, if types didn't changed since the last compilation).

@refi64
Copy link
Contributor

refi64 commented Aug 21, 2017

Option 1 doesn't seem bad to me; subclasses need to return compatible types anyway. Here's my question: in this case:

class Super; end
class Sub < Super; end

class Foo
  def meth
    Super.new
  end
end

class Bar
  def meth
    Sub.new
  end
end

Bar.new.meth  # Super or Sub?

If it's Super, wouldn't that be kind of annoying when dealing with tree-like types?

Option 2 is great. IMO it should've been this way from the beginning.

Option 3...I feel like this kills one of Crystal's main points: the ability to prototype easily and throw on type annotations when you need to. I think maybe it'd be worth it first checking how the language composes anyway.

@kazzkiq
Copy link

kazzkiq commented Aug 21, 2017

But how does this "compiling inefficiency" translate in numbers?

Is the memory consumption something that grows exponentially with the project, to the point it will consume infeasible amounts of RAM?

I've written half a dozen "micro" projects in Crystal (the larger has only about 2k lines of code), and I've never experienced that much RAM usage in compiling.

My question is straight to the point: If a project reaches 10, 20, 50, 100k LOC will the memory consumption follow it linearly (or even stabilize after a certain amount) or there could be a point where Crystal would require simply too much memory (hardware-deterrent) to be acceptable?

If the memory consumption is not something that will eventually reach blocking hardware usage as the project grows, I honestly don't see an issue at all. If you want to run your project binaries on a low-end machine (less than ~1GB RAM), you can always replicate the production environment on a build machine and then port the binary to production. In fact, you could even use Vultr/DigitalOcean VMs to do the build and simply kill their VM later, it would cost just a penny per production build.

@sdogruyol
Copy link
Member

sdogruyol commented Aug 21, 2017

I think @bararchy can give us a great feedback about a really big Crystal project 👍 @kazzkiq

@konovod
Copy link
Contributor

konovod commented Aug 21, 2017

@kazzkiq Right now many PR's to crystal fails on CI because apparently CI machines has not enough memory to compile all specs. There is a partial solution to this (split specs by two parts), but it shows that memory consumption isn't going to stabilize.
For smaller projects of course compilation time is more important - and it's not very good too (at least comparing to languages like Rust or D).

@bararchy
Copy link
Contributor

So, my two cents (and use case)

As opposed to the majority of Crystal users who usually make micro-services, or webapps, I use Crystal for Machine learning and Data analysis, My projects get big, and fast, where the usual size is 40K lines and more.

Compiling takes time, and memory, especially when using --release mode.
Compiling takes: 1.2Gb to compile, which is quite absurd, think about trying a few different build at the same time while also trying to use the same server for tests and runs,

@akzhan
Copy link
Contributor

akzhan commented Aug 21, 2017

Compiler can follow two separate modes (like JavaScript with "use strict" pragma does).

First and default one allows global type inference. Second - doesn't allow.

It's common practice to distinguish requirements to smalls apps, libraries and big projects using pragmata etc.

@unleashy
Copy link

unleashy commented Aug 21, 2017

@akzhan That could work, and could be applied to specific methods instead of all methods? But then again, it's basically forall T but the reverse.

I think option 2 is best, and it's how it works in essentially any sane statically compiled language, and it's how it should've been from the start. Option 3 is fine for me too, but it would most definitely break a lot of stuff and be too "rigid" compared to now I guess, but I don't have too much of a problem with that, coming from C, C++ and Java.

@ghost
Copy link

ghost commented Aug 22, 2017

@akzhan maybe something like #4603?

@akzhan
Copy link
Contributor

akzhan commented Aug 22, 2017

@totakaro yes, it is. but enabled by some pragma in source code instead of compiler option.

There are lot of reasons to setup compiler behavior by source code. And it's already used widely by programming languages.

See ruby's pragmata to utf-8 string, frozen string literals, perl's utf8, strict, javascript's 'use strict' etc.

@RX14
Copy link
Contributor

RX14 commented Aug 22, 2017

@akzhan most of those are hacks to be able to remove unwanted and unneeded behaviour while keeping backwards compatability. If those languages had been well designed in the first place such hacks wouldn't be required or wanted. We don't have the shackles of requiring backwards compatability so we shouldn't have multiple versions of the crystal language.

@faustinoaq
Copy link
Contributor

@RX14 sure, We don't want to repeat Python 2 vs 3 story.

@akzhan
Copy link
Contributor

akzhan commented Aug 22, 2017

@RX14 anyway we can't provide good shards/modules and large apps without strictures on type inference.

No way.

And I should note that's it's not backward compatibility question.

It's required because we have two different target auditories.

@faustinoaq
Copy link
Contributor

faustinoaq commented Aug 22, 2017

@akzhan Always exists a way to solve things 😄

@akzhan
Copy link
Contributor

akzhan commented Aug 22, 2017

@faustinoaq yes, it is. something may be solved by precompiling ASTs (parser phase).

But type inference should be simplified just because it's now hard for compiler and not yet uniquely.

@RX14
Copy link
Contributor

RX14 commented Aug 23, 2017

we can't provide good shards/modules and large apps without strictures on type inference.

I don't follow at all. I believe we should enable the compiler to take advantage of annotated types to compile much faster, but that the trade off between compile time and annotating types should be entirely up to the user. I do not believe we should force anything on anyone. We don't have to: the practicalities around compile times will force them to if their shard gets popular enough.

@asterite
Copy link
Member Author

I tried to work on this but failed miserably. I won't have time to play with this, but if someone has, I suggest to clean up and refractor the compiler code because it's a bit messy.

It also changes the semantics of the language so it's a breaking change.

@ozra
Copy link
Contributor

ozra commented Aug 27, 2017

Thanks for the report @asterite!

I've been thinking about this one in the back of my head for some time, though I haven't touched the code in months.

My baseline of thought has been: technical solution, not spec changing solution.

Idea has mainly been to separate the "pure" AST-nodes and the AST-node annotations / semantics (typing).
An idea is that each instantiation of a func (for example) is a linear list of annotation-nodes. The original syntax-nodes which vary gets an index upon first traversal. The instance specific type-info is referenced via the syntax-nodes index into the specific instantiation (this is obviously done through a get-annotation-obj method). Have experimented a little bit with this. But the ideas should be seen as vapor, I haven't yet looked into how well it could be black-boxed enough to the majority of the inference and generation code. This is indexes-as-sort-of-relative-pointers, it means more dereferencing, thus more cycles used. And, I frankly have no idea what savings it would yield in practice - perhaps even the similar measly 50MB :-/

The thoughts are also founded in the wish to be able to come up with a way of keeping the program structure cached in a compilation server (as has also been discussed before) which obviously requires keeping its' space requirements down. To mutate only parts of trees affected by source changes. If that was achieved, higher cycle cost is offset by limiting the extent of work.

Those are all big ifs. And much work. And much added complexity to the compiler.

I routinely type restrict or use explicit generics for params and return type always, since catching errors at function boundaries leads to much faster development and much less bugs. But -- the reason I love the "implicit genericness" in Crystal vs. having to template it explicitly in C++ is that you can start off with PoC-code and algorithms and even script-like code first: quickly throwing it away, quickly rewriting, and then tighten it up when happy with the concept.

Writing entire programs without typing func params/ret is very bad practice imo (with no benefits whatsoever from what I've seen in studies and my own work). But it still should be fast XD

@RX14
Copy link
Contributor

RX14 commented Aug 28, 2017

@asterite does that mean you think solution 2 is the way to go? Do you have any more comments/ideas about the ideas in my comment?

Regarding the compiler change, it would be great if you could give a high-level on the kind of refactoring or other work required. To my uninitiated mind (I haven't worked on the typing code in the compiler) its as simple as cast all the params to a function which are restricted and attempt to cache instantiations. What am I missing?

@asterite
Copy link
Member Author

@RX14 This comment explains one of the difficulties of implementing this: method restrictions aren't typed immediately, they are typed over and over when trying to match a method call. So it's impossible to "cast the arguments to the restrictions" because when you match an argument to a method, the type that you get is the restricted one: the original one, dictated by the parameter, is lost in the matching. This gets more complicated with union types, and also with things like:

def foo(x : Enumerable)
end

which right now work because x will get the type of the argument at the call site, but with this change it will be a combination explosion of all possible Enumerable types (in fact, that can't be represented yet by the compiler so it won't work).

Maybe this small details needs to be discussed, decided, and tackled, and slowly improve the language specification and its implementation, tackling (and thinking about) cases like the above.

@ansarizafar
Copy link

So, yeah, Crystal is not a joke, it's something alive, useful and fun :-)

We should not give up and we should not allow the problems to defeat us. We all sometimes feel frustrated. But WE CAN ONLY LOOSE IF WE ACCEPT THE DEFEAT.

@S-YOU
Copy link

S-YOU commented May 2, 2018

I know nothing about internals, but I am wondering if there is way to incrementally cache types per file, and invalidate cache if file hash is changed.

file A

def double(x)
  x * x
end

cache: none


file X

double(1)

cache: double(x : Int32) : Int32


file Y

double(1.1)

cache: double(x : Int32) : Int32, double(x : Float64) : Float64


file X, Y change, cache no change
file A change, cache: rebuild

--

any opinion, @asterite?

@faustinoaq
Copy link
Contributor

faustinoaq commented May 2, 2018

@S-YOU I guess crystal doesn't have incremental cache yet 😅

@asterite
Copy link
Member Author

asterite commented May 2, 2018

Sure, just send a PR 👍

@faustinoaq
Copy link
Contributor

@S-YOU Currently crystal have some LLVM cache for .bc files generated during codegen phase, by example, check by yourself using --stats and --progress flags.

>>> crystal build -s --debug -p my_code.cr
Parse:                             00:00:00.0008560 (   0.34MB)
Semantic (top level):              00:00:00.2588280 (  27.91MB)
Semantic (new):                    00:00:00.0018450 (  35.91MB)
Semantic (type declarations):      00:00:00.0263890 (  35.91MB)
Semantic (abstract def check):     00:00:00.0015270 (  35.91MB)
Semantic (ivars initializers):     00:00:00.0018980 (  35.91MB)
Semantic (cvars initializers):     00:00:00.0158470 (  35.91MB)
Semantic (main):                   00:00:00.4168150 (  60.10MB)
Semantic (cleanup):                00:00:00.0010650 (  60.10MB)
Semantic (recursive struct check): 00:00:00.0008110 (  60.10MB)
Codegen (crystal):                 00:00:00.3381910 (  68.10MB)
[12/13] [67/215] Codegen (bc+obj)
        ~~~~~~~~
          (1)

(1) Amount of files processed, you can see these files on Crystal cache directory.

The kind of cache you asked, I think is not possible in crystal yet, as @RX14 said on gitter:

Every time you make an update youd reconstruct the AST for the whole program again

https://gitter.im/crystal-lang/crystal?at=5ab01c665f188ccc15d3df76

So, I think we need a bit more research on crystal type system or experiment some alternatives... 😉

@S-YOU
Copy link

S-YOU commented May 3, 2018

I am wondering about type cache can be improved for "Semantic (main)" part?
This is on compiling crystal itself, Semantic (main) took 12s.
(Codegen part can already be improved with .bc cache, in my understanding.)

Using /usr/bin/llvm-config-5.0 [version=5.0.1]
rm -rf .build
rm -rf ./doc
CRYSTAL_CONFIG_PATH=`pwd`/src ./bin/crystal build -Dgc_none --stats --progress --no-debug  -o .build/crystal src/compiler/crystal.cr -D without_openssl -D without_zlib
Parse:                             00:00:00.001987824 (   0.19MB)
Semantic (top level):              00:00:01.981872103 (  82.55MB)
Semantic (new):                    00:00:00.002911306 (  82.55MB)
Semantic (type declarations):      00:00:00.051871399 (  90.55MB)
Semantic (abstract def check):     00:00:00.002816633 (  90.55MB)
Semantic (ivars initializers):     00:00:00.047523762 (  98.55MB)
Semantic (cvars initializers):     00:00:00.007751930 (  98.55MB)
Semantic (main):                   00:00:12.534727764 ( 762.61MB)    <----- here
Semantic (cleanup):                00:00:00.001487390 ( 762.61MB)
Semantic (recursive struct check): 00:00:00.002925151 ( 762.61MB)
Codegen (crystal):                 00:00:10.846773391 ( 898.61MB)
Codegen (bc+obj):                  00:00:07.057399284 ( 898.61MB)
Codegen (linking):                 00:00:06.871162435 ( 898.61MB)

If there is type cached for most methods, I am expecting that amount of time taken (and memory usage?) can be reduced better?

I am also thinking that reconstructing the AST for the whole program is ok as long as there is type cached for each files that didn't changed? let me know If I am wrong.

@faustinoaq
Copy link
Contributor

If there is type cached for most methods, I am expecting that amount of time taken (and memory usage?) can be reduced better?

@S-YOU Oh, I understand you now 😅 , so, as far as I know currently crystal doesn't have a cache for the semantic phase, that's why "sometimes" the compiler is slow and consumes a lot of memory, even if you have compiled it previously 😅

I am also thinking that reconstructing the AST for the whole program is ok as long as there is type cached for each files that didn't changed? let me know If I am wrong.

As I said before,we need a bit more research on this, because crystal type system has an "special" type inference that apply some rules to get types, so if you change one file, the whole AST for all files get modified, so we can't apply "safe" incremental compilation or cache for semantic phase yet.

Well, I think at least we can have a semantic cache, just like .bc files that crystal is already using, and regenerate it "entirely" if some file changes, although, I don't have a good use-case for that yet. Why you want a semantic cache if most times will be regenerated entirely anyway? 😅

@S-YOU
Copy link

S-YOU commented May 3, 2018

"special" type inference that apply some rules to get types, so if you change one file, the whole AST for all files get modified.

I am wondering that modification can always be addition of new type?
(same as I mentioned on my other post above, basically when a method called with int, add "int -> int" annotation to type cache, when a method called with float, add "float -> float" to cache).

And if there is only addition, incremental type annotation cache is possible?

Please let me know If I am missing something.

@RX14
Copy link
Contributor

RX14 commented May 3, 2018

Caching is a good idea long-term but all of @asterite's proposals here seem to be aimed at reducing the number of instantiations instead of doing caching on them. That's probably because it's easier to do that than caching in the short term.

@asterite
Copy link
Member Author

asterite commented May 3, 2018

There's no problem with the idea of caching, many languages do that. The problem is who and how they are going to implement it. But the answer is simple: nobody and never. Nobody sends PRs to fix easier stuff in the compiler, so I doubt somebody will implement such a hard task as a cache.

I still don't understand what's the rush to get to 1.0. I'd rather Manas uses the money they get to rewrite the semantic and codegen phases from scratch with readability, performance and flexibility in mind.

Another way to say it: I don't think anything significant will happen before 1.0 (or, well, 1.0 will never be reached). In my mind, 1.0 means a compiler rewrite from scratch.

@Sija
Copy link
Contributor

Sija commented May 3, 2018

The problem is who and how they are going to implement it. But the answer is simple: nobody and never.

@asterite please, such statements are IMO highly discouraging, perhaps you're right but why close the door someone might want to go through?

@asterite
Copy link
Member Author

asterite commented May 3, 2018

Yeah, sorry. I'm sure someone will come up with an excellent understanding of the compiler's code and soon send a PR to fix this and other issues.

@RX14
Copy link
Contributor

RX14 commented May 3, 2018

The problem is who and how they are going to implement it. But the answer is simple: nobody and never.

I disagree that it's nobody and never, but I certainly would expect it to happen after 1.0. Compiler performance is something that doesn't need to be improved before 1.0.

I still don't understand what's the rush to get to 1.0

because people want to be confident in using crystal in production, and 1.0 means they can sell that to management or to their coworkers and to themselves that they can use crystal

In my mind, 1.0 means a compiler rewrite from scratch.

nobody really cares about the compiler enough that they'd rather see their funds and time go towards rewriting the compiler instead of getting important features like windows and parallelism working.

@straight-shoota
Copy link
Member

@RX14 and @asterite Both views have some truth to them 😉
Reaching 1.0 is not really that important from a content perspective, but it helps selling a status of stability. We certainly don't need a compiler rewrite for 1.0 per se, but a lot of bugfixes and adding/changing language features (such as the review sections on the Roadmap) is difficult with a cluttered compiler code.

@ysbaddaden
Copy link
Contributor

Sigh. To improve my confidence into pushing Crystal into production (for anything serious) I need one thing: a team of people actively working on the compiler; not 2 individuals fixing a few bolts now and then.

The compiler is "good enough" and "does the job", sure, but whatever the stdlib, windows support or even parallelism quality, can anyone say that the compiler is stable and solid enough to be considered a 1.0 grade? Please, be serious.

We need developers to dig into the compiler, understand its internals, and actively work on it, which is a tedious task.

@bararchy
Copy link
Contributor

bararchy commented May 3, 2018

Yeha, nothing about those notions is actionable, saying "the compiler sucks but no one will fix it" adds nothing and harms the community and moral.
Let's focus on what we can do and not throw useless "its broken forever" to the air

@RX14
Copy link
Contributor

RX14 commented May 3, 2018

@ysbaddaden we will have @bcardiff from manas working full-time on crystal in 2 weeks time - after he gets back from holiday. I think he'll have time to do compiler work, not too sure if he's planning to though (I think parallelism is more important becausr that's a big breaking change).

@S-YOU
Copy link

S-YOU commented May 3, 2018

Nobody sends PRs to fix easier stuff in the compiler

Very much understandable, but I believe that many people on earth thinks that compiler is not easy stuff. (Without looking at crystal source code of course)
I am still struggling to find out where to start looking at, and found that https://www.slideshare.net/crystallanguage/crystal-internals-part-1-70673255 slide is very good one.

@S-YOU
Copy link

S-YOU commented May 3, 2018

Yeah, parallelism and thread safety is important to my opinion too.

I don't have any problem with compiler performance, and my packet processing programs can compile under 5 seconds (in release mode)

I love the language and digging more and found this thread and commented some ideas I could come out, that's all. Cheers.

@eliasjpr
Copy link

eliasjpr commented May 3, 2018

@asterite if you want more people to contribute to the compiler more information should be shared/document of its internal workings. I bet you they're many who would like to jump in to improve the compiler it just feels like a very closed box.

The community is eager for v1.0 and we are conscious of the effort that is needed but there is no insight of direction/sequence from the core members (think roadmap, designs, plans, decisions made). We know that you would like a rewrite but there is no plan/strategy to get there. There is 10k people following this project and 1% of that will most likely contribute to get this done.

I propose the following:

Document your intentions for the compiler what would you like to see and how you think it should work and write it down, beak it down into workable small tasks, treat it as a project (milestone) and share it with the community and you will see PRs flowing in. You cannot be the only one working on this is gonna take a toll on you and your health, you have the support of an entire community that wants to see Crystal in v1.0.

Let me know if I can help in any of the above.

@faustinoaq
Copy link
Contributor

@eliasjpr Nice idea 👍 , I remember @makenowjust wrote a small guide about how to implement a new feature in the compiler #4757 (comment)

See: https://github.com/crystal-jp/techbookfest2/tree/master/makenowjust (Japanese) Isn't English but we can try to translate it to be a starting point 😉

@faustinoaq
Copy link
Contributor

Interesting post here https://github.com/nim-lang/Nim/issues/7874 😉

@sighoya
Copy link

sighoya commented Dec 12, 2019

One solution would be to store an id of an top level definition (block) together with an ast checksum, binary checksum and all the depending ids on for this special id.

If a block is in ast form, calculate checksum and compare with the ast checksum for the given id in cache, if they match take the binary from cache, if not:
generate code for the block and calculate the binary checksum of it, if it matches take binary from cache which has the advantage of its dependencies being untouched.
If it also not match generate code for the id and all the ids depending on it.

I'm really skeptical how much worth is intra-modular caching opposed to inter-modular caching.

@sighoya
Copy link

sighoya commented Dec 12, 2019

Why not add a compiler option for the enforcement of typing methods. Say we name this "IDE mode" where productivity and robustness is of value and otherwise we prefer the default mode for prototyping in the REPL or for one-peer projects.

@simonhf
Copy link

simonhf commented Jun 3, 2020

My projects get big, and fast, where the usual size is 40K lines and more.
Compiling takes time, and memory, especially when using --release mode.
Compiling takes: 1.2Gb to compile, which is quite absurd, think about trying a few different build at the same time while also trying to use the same server for tests and runs,

@bararchy any idea or guess how Crystal LOC relates to memory and/or release compilation time? How many GB to compile 20k LOC or 80k LOC ? Is there some kind of predictable-ish graph?

Some bigger software projects like databases have approaching 1M LOC. And some even bigger projects like WebKit have millions of LOC. Would a bigger project like that even be viable with Crystal with its current memory usage (and compilation time?)?

Also, I saw the comment above back on May 3, 2018 about how it's unlikely that the Crystal compiler will change for the better in terms of resource usage and speed, and maybe even a re-write is a better option. It's now over 2 years later. Has anything changed? Are faster compile times coming, or even on the horizon?

@waj
Copy link
Member

waj commented Jun 3, 2020

@simonhf we have some improvements in mind that might come after 1.0 is released, but we're aiming for correctness first. Once 1.0 is released be sure that we'll focus, at least some part of our time, to increase the performance of the compiler.

The time that currently takes to compile a project in release mode is mostly due to running LLVM optimizations. If you execute the compiler with -s flag you'll see how much time is spent on each compilation stage. That is because Crystal creates a single object module when compiling in release mode, to take advantage of inlining across different classes and modules. That might change in the future, or other options could be added where, because of the size of the project, building everything in a single module is unfeasible.

@straight-shoota
Copy link
Member

Regarding the question about a relation between LOC and memory and speed: I doubt there can be any meaningful generalization or even a rule of thumb. Compiler performance does not exactly relate to the amount of code being processed. You can build three different projects with 10k LOC and they'll likely show very different results, depending on specifics of the respective program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests