-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize dwarf line numbers decoding #7413
Conversation
Replacing the implementation of #find at line 187 with a binary search looks like another low hanging fruit, assuming it is a nontrivial amount of rows. |
Impressive boost, especially since the files/directories arrays are usually small if I remember correctly. |
@yxhuvud Yes, But that's not the bottleneck, at least when I profiled it. The bottleneck are:
I'm not sure there's something we can do about those two points. In both cases (I think?) the IO is repositioned and read, and because it's buffered an entire block is consumed. Maybe changing the buffer size could improve this because |
@ysbaddaden Yes, the arrays are small (about 600 entries for |
The matrix is defined as an Array(Array(Row)) to match the dwarf definition, but more importantly to skip a bunch of addresses to reach the actual block more quickly. |
aa9081a
to
5f99bb3
Compare
I added another commit that introduces two temporary hashes that map dir/file-names to indexes, to avoid doing linear scans over |
src/debug/dwarf/line_numbers.cr
Outdated
# another file comes next, etc.). So we remember the last | ||
# mapped index to avoid unnecessary lookups. | ||
@last_filename : String? | ||
@last_filename_index = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if the files are read in some order (and not jumping from file to file, do @last_filename
ever differ from @files[-1]
, (and similar for the index)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they differ. They are not read in order. I don't know why, though, I'm not familiar with the format. Maybe understanding the format first, then optimizing it might be better... 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, the format follows functions (symbols). An instruction group of the function refers to a file:line, then the following group likely refers to the same file at a further line... unless the compiler inlined code, which will cause jumps to another file, etc.
If I recall correctly, that is.
I just pushed another commit with more improvements. I tried to do small commits for this but I was experimenting and in the end it was a bit hard for me to separate all the changes. Essentially, I changed some But the biggest change is in So now:
(down from 1.83s) I'm almost sure more things can be optimized, but with these changes I'm pretty satisfied now. |
8c5f4ca
to
c295c8a
Compare
Meh, I'll redo the last commits in smaller commits because CI failed and I can't figure out why... |
@asterite why use 2 hashes? Directories and filenames would not collide. It seems like you could reduce this to one register method and one hash. |
@wontruefree |
@asterite I am not sure I understand. But if you have considered it and know why it won't work that is cool. I cannot make a PR into your fork but here is what I was thinking. |
@wontruefree the begin
raise "oh no"
rescue ex
puts ex.inspect_with_backtrace
end
begin
[1][2]
rescue ex
puts ex.inspect_with_backtrace
end Do you get something correct in the output? |
@asterite CI just failed again with |
@asterite I see what you are saying. After thinking about this why not just make the |
I think some tests are allocating too much memory and it dies on Linux 32 bits. But I didn't start investigating that yet. |
I just pushed yet another commit. Thanks to what @wontruefree said I noticed that it doesn't make sense to build those @ysbaddaden Do you see a problem with this? |
fd21c53
to
708025f
Compare
@asterite It's probably fine. I don't recall exactly why I used arrays and indices. Maybe it helped limit the overall memory usage by avoiding to keep duplicated string allocations? |
@ysbaddaden I don't think so because all of the strings are stored inside arrays inside Maybe if I have time I'll learn that format and write specs for it, so we can continue simplifying it (with more confidence). However, right now I think the times are more or less fine so I might focus on something else after this. |
Once the sequences are decoded we don't retain any references to them, so their |
@asterite Would you mind updating PR description with the newest run-time numbers? |
@Sija done! |
Good to go? |
I'll merge it. Should I squash or not? I never know. I think each commit is meaningful by itself. |
Let's keep each commit. We could squash "simplify while loop" but that's nitpicking. |
The first time an Exception's full backtrace (filenames, line numbers, etc.) must be built, the runtime will first decode all DWARF information from the dwarf file (or somewhere else? not sure).
I noticed this process takes about 6 seconds for the compiler's std specs (because it's a huge program, the bigger the program the bigger the dwarf file and the longer it will take to decode it), specifically you can see this doing:
I thought 6 seconds to decode the backtrace was too much so I started to dig in the code to see if I can improve it. I managed to improve part of the decoding process by about 6 times.
The improvement is explained in the PR code comments. There's a
decode_sequences
method inDWARF::LineNumbers
. Previously it took 4.84 seconds to do it for the std_spec. Now it takes 0.82 seconds.Now running that spec above looks like this:
Note that this improvement is only see the first time
ex.inspect_with_backtrace
is called for any exception. Later on the dwarf info should have been decoded. Still, it's good to win a lot of time in this first decoding.I believe there might be more things to improve, but so far this is the easiest change that gives the most boost.