Optimize dwarf line numbers decoding #7413

asterite · 2019-02-11T17:32:46Z

The first time an Exception's full backtrace (filenames, line numbers, etc.) must be built, the runtime will first decode all DWARF information from the dwarf file (or somewhere else? not sure).

I noticed this process takes about 6 seconds for the compiler's std specs (because it's a huge program, the bigger the program the bigger the dwarf file and the longer it will take to decode it), specifically you can see this doing:

$ bin/crystal build spec/std_spec.cr
$ ./std_spec -e "inspects with cause"
.

Finished in 5.88 seconds

I thought 6 seconds to decode the backtrace was too much so I started to dig in the code to see if I can improve it. I managed to improve part of the decoding process by about 6 times.

The improvement is explained in the PR code comments. There's a decode_sequences method in DWARF::LineNumbers. Previously it took 4.84 seconds to do it for the std_spec. Now it takes 0.82 seconds.

Now running that spec above looks like this:

$ bin/crystal build spec/std_spec.cr
14:05 $ ./std_spec -e "inspects with cause"
.

Finished in 1.34 seconds

Note that this improvement is only see the first time ex.inspect_with_backtrace is called for any exception. Later on the dwarf info should have been decoded. Still, it's good to win a lot of time in this first decoding.

I believe there might be more things to improve, but so far this is the easiest change that gives the most boost.

yxhuvud · 2019-02-11T21:51:18Z

Replacing the implementation of #find at line 187 with a binary search looks like another low hanging fruit, assuming it is a nontrivial amount of rows.

ysbaddaden · 2019-02-11T22:35:55Z

Impressive boost, especially since the files/directories arrays are usually small if I remember correctly.

asterite · 2019-02-11T23:01:32Z

@yxhuvud Yes, find could use a binary search, I thought of it. That said, I don't understand why @matrix is defined like that, maybe pushing all Rows in a single array would be enough? Not sure.

But that's not the bottleneck, at least when I profiled it. The bottleneck are:

Debug::DWARF::Strings#decode: most of the time spent it gets and tell/pos (this can be optimized just a bit by using gets('\0', chomp: true) but I didn't want to make this PR too big
Debug::DWARF.read_unsigned_leb128: most of the time is spent in read_byte

I'm not sure there's something we can do about those two points. In both cases (I think?) the IO is repositioned and read, and because it's buffered an entire block is consumed. Maybe changing the buffer size could improve this because decode usually needs just a few bytes (it's just a name) and I guess the same applies to read_unsigned_leb128. But for that we should first make the buffer size of IO::Buffered configurable.

asterite · 2019-02-11T23:06:18Z

@ysbaddaden Yes, the arrays are small (about 600 entries for std_spec). The problem is that the register_filename and register_directory end up being called, at least in std_spec, 1 million times, and most of those calls are always with the same filename/directory that was passed the last time. In fact, another way I tried to optimize this is by only recomputing the file/path indexes when register.file was set, or when Register.new was done, or at the beginning of read_statement_program but the code ended up being a bit more complex and taking more time if I also didn't do the "is it the same as the last time" check.

src/debug/dwarf/line_numbers.cr

ysbaddaden · 2019-02-12T08:01:09Z

The matrix is defined as an Array(Array(Row)) to match the dwarf definition, but more importantly to skip a bunch of addresses to reach the actual block more quickly.

asterite · 2019-02-12T14:12:55Z

I added another commit that introduces two temporary hashes that map dir/file-names to indexes, to avoid doing linear scans over @files and @directories. That reduces the time of Debug::DWARF::LineNumbers#decode_sequences from 0.82s to 0.78s.

yxhuvud · 2019-02-12T17:15:11Z

src/debug/dwarf/line_numbers.cr

+      # another file comes next, etc.). So we remember the last
+      # mapped index to avoid unnecessary lookups.
+      @last_filename : String?
+      @last_filename_index = 0


BTW, if the files are read in some order (and not jumping from file to file, do @last_filename ever differ from @files[-1], (and similar for the index)?

Yes, they differ. They are not read in order. I don't know why, though, I'm not familiar with the format. Maybe understanding the format first, then optimizing it might be better... 😊

If I recall correctly, the format follows functions (symbols). An instruction group of the function refers to a file:line, then the following group likely refers to the same file at a further line... unless the compiler inlined code, which will cause jumps to another file, etc.

If I recall correctly, that is.

asterite · 2019-02-12T18:38:48Z

I just pushed another commit with more improvements. I tried to do small commits for this but I was experimenting and in the end it was a bit hard for me to separate all the changes.

Essentially, I changed some UInt32 | UInt64 unions to just UInt64 (the cost of dealing with unions is bigger than just tracking the exact type). Another change is avoiding calling io.tell multiple times (this is expensive; I also used io.pos, I think we should remove the tell alias).

But the biggest change is in DWARF::Strings: we now use a small (16Kb) buffer to read strings from instead of seeking and reading from the IO (which is done a lot of times).

So now:

$ bin/crystal build spec/std_spec.cr
$ ./std_spec -e "inspects with cause"
.

Finished in 1.31 seconds

(down from 1.83s)

I'm almost sure more things can be optimized, but with these changes I'm pretty satisfied now.

asterite · 2019-02-12T19:32:39Z

Meh, I'll redo the last commits in smaller commits because CI failed and I can't figure out why...

jkthorne · 2019-02-12T21:21:05Z

@asterite why use 2 hashes? Directories and filenames would not collide. It seems like you could reduce this to one register method and one hash.

asterite · 2019-02-12T21:24:56Z

@wontruefree files and directories are public and are used by other parts of the code that need the "index -> name" mapping.

jkthorne · 2019-02-12T21:35:40Z

@asterite I am not sure I understand.

But if you have considered it and know why it won't work that is cool. I cannot make a PR into your fork but here is what I was thinking.

master...wontruefree:opt/dwarf-decoding

asterite · 2019-02-12T21:43:34Z

@wontruefree the @files and @directories variables are exposed through the getters of LineNumbers and used here. In your code you just stop putting data there. Try it with this code:

begin
  raise "oh no"
rescue ex
  puts ex.inspect_with_backtrace
end

begin
  [1][2]
rescue ex
  puts ex.inspect_with_backtrace
end

Do you get something correct in the output?

Sija · 2019-02-12T23:25:14Z

@asterite CI just failed again with GC Warning: Failed to expand heap by 16777216 bytes, seems like crusade is not over yet...

jkthorne · 2019-02-12T23:30:54Z

@asterite I see what you are saying.
I updated the branch and reduced the external interface to use a method called paths.

After thinking about this why not just make the index the instance variable and have the callstack use a hash lookup. Seems like that would be faster.

asterite · 2019-02-12T23:47:01Z

I think some tests are allocating too much memory and it dies on Linux 32 bits. But I didn't start investigating that yet.

asterite · 2019-02-13T15:53:32Z

I just pushed yet another commit. Thanks to what @wontruefree said I noticed that it doesn't make sense to build those files/directories arrays. Instead, we can store them directly inside each Row.

@ysbaddaden Do you see a problem with this?

src/callstack.cr

ysbaddaden · 2019-02-13T21:02:10Z

@asterite It's probably fine. I don't recall exactly why I used arrays and indices. Maybe it helped limit the overall memory usage by avoiding to keep duplicated string allocations?

asterite · 2019-02-13T21:07:32Z

@ysbaddaden I don't think so because all of the strings are stored inside arrays inside Sequences so a same string shares the same memory for others. But maybe it wasn't like that at some point, I don't know. This way it seems to work :-)

Maybe if I have time I'll learn that format and write specs for it, so we can continue simplifying it (with more confidence). However, right now I think the times are more or less fine so I might focus on something else after this.

ysbaddaden · 2019-02-13T21:18:22Z

Once the sequences are decoded we don't retain any references to them, so their @directories and @files arrays will be collected by the GC. I think most folder names and enough file names are shared across sequences that de-duplicating can be significant on memory usage, especially with large programs, but that should be verified.

Sija · 2019-02-13T23:29:21Z

@asterite Would you mind updating PR description with the newest run-time numbers?

asterite · 2019-02-13T23:33:28Z

@Sija done!

asterite · 2019-02-14T18:25:44Z

Good to go?

asterite · 2019-02-14T23:00:02Z

I'll merge it. Should I squash or not? I never know. I think each commit is meaningful by itself.

ysbaddaden · 2019-02-15T09:43:12Z

Let's keep each commit. We could squash "simplify while loop" but that's nitpicking.

asterite added topic:stdlib performance labels Feb 11, 2019

yxhuvud mentioned this pull request Feb 11, 2019

Callstack: Fix confused block variable naming. #7416

Merged

yxhuvud reviewed Feb 12, 2019

View reviewed changes

src/debug/dwarf/line_numbers.cr Outdated Show resolved Hide resolved

src/debug/dwarf/line_numbers.cr Outdated Show resolved Hide resolved

Optimize dwarf line numbers decoding

5f99bb3

asterite force-pushed the opt/dwarf-decoding branch from aa9081a to 5f99bb3 Compare February 12, 2019 10:39

DWARF: use Hash for mapping names to indexes for an extra speedup

d19e070

yxhuvud reviewed Feb 12, 2019

View reviewed changes

Debug: optimized usage of IO#gets using chomp argument

c295c8a

asterite force-pushed the opt/dwarf-decoding branch from 8c5f4ca to c295c8a Compare February 12, 2019 19:19

Debug::Dwarf::Strings: optimize by reading a small buffer

ae9e861

Debug: avoid duplicate calls to io.tell

501ee45

Dwarf: store directory/file as String instead of Int32 in Row

708025f

asterite force-pushed the opt/dwarf-decoding branch from fd21c53 to 708025f Compare February 13, 2019 16:04

ysbaddaden reviewed Feb 13, 2019

View reviewed changes

src/callstack.cr Outdated Show resolved Hide resolved

Simplify while condition

5e1d740

straight-shoota approved these changes Feb 14, 2019

View reviewed changes

asterite added this to the 0.28.0 milestone Feb 14, 2019

ysbaddaden approved these changes Feb 15, 2019

View reviewed changes

asterite merged commit d83fe1c into crystal-lang:master Feb 15, 2019

asterite deleted the opt/dwarf-decoding branch March 30, 2019 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize dwarf line numbers decoding #7413

Optimize dwarf line numbers decoding #7413

asterite commented Feb 11, 2019 •

edited

Loading

yxhuvud commented Feb 11, 2019

ysbaddaden commented Feb 11, 2019

asterite commented Feb 11, 2019

asterite commented Feb 11, 2019 •

edited

Loading

ysbaddaden commented Feb 12, 2019

asterite commented Feb 12, 2019

yxhuvud Feb 12, 2019

asterite Feb 12, 2019

ysbaddaden Feb 12, 2019

asterite commented Feb 12, 2019 •

edited

Loading

asterite commented Feb 12, 2019

jkthorne commented Feb 12, 2019

asterite commented Feb 12, 2019

jkthorne commented Feb 12, 2019

asterite commented Feb 12, 2019

Sija commented Feb 12, 2019

jkthorne commented Feb 12, 2019

asterite commented Feb 12, 2019

asterite commented Feb 13, 2019

ysbaddaden commented Feb 13, 2019

asterite commented Feb 13, 2019

ysbaddaden commented Feb 13, 2019

Sija commented Feb 13, 2019

asterite commented Feb 13, 2019

asterite commented Feb 14, 2019

asterite commented Feb 14, 2019

ysbaddaden commented Feb 15, 2019

Optimize dwarf line numbers decoding #7413

Optimize dwarf line numbers decoding #7413

Conversation

asterite commented Feb 11, 2019 • edited Loading

yxhuvud commented Feb 11, 2019

ysbaddaden commented Feb 11, 2019

asterite commented Feb 11, 2019

asterite commented Feb 11, 2019 • edited Loading

ysbaddaden commented Feb 12, 2019

asterite commented Feb 12, 2019

yxhuvud Feb 12, 2019

Choose a reason for hiding this comment

asterite Feb 12, 2019

Choose a reason for hiding this comment

ysbaddaden Feb 12, 2019

Choose a reason for hiding this comment

asterite commented Feb 12, 2019 • edited Loading

asterite commented Feb 12, 2019

jkthorne commented Feb 12, 2019

asterite commented Feb 12, 2019

jkthorne commented Feb 12, 2019

asterite commented Feb 12, 2019

Sija commented Feb 12, 2019

jkthorne commented Feb 12, 2019

asterite commented Feb 12, 2019

asterite commented Feb 13, 2019

ysbaddaden commented Feb 13, 2019

asterite commented Feb 13, 2019

ysbaddaden commented Feb 13, 2019

Sija commented Feb 13, 2019

asterite commented Feb 13, 2019

asterite commented Feb 14, 2019

asterite commented Feb 14, 2019

ysbaddaden commented Feb 15, 2019

asterite commented Feb 11, 2019 •

edited

Loading

asterite commented Feb 11, 2019 •

edited

Loading

asterite commented Feb 12, 2019 •

edited

Loading