Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster next by reducing number of frame_depth calls #743

Merged
merged 5 commits into from
Oct 26, 2022

Conversation

WillHalto
Copy link
Contributor

@WillHalto WillHalto commented Aug 31, 2022

Description

Background

  • next can be quite slow when stepping over code. It appears this is a known issue due to expensive calls to DEBUGGER__.frame_depth that are made for every :line event when stepping.
    • This check is made to determine if we have returned to the same stack depth as when next was called, since that's where we want to break.
  • This performance penalty is significant when stepping over code in a large codebase with a large call stack.
  • This PR improves the performance of stepping with next by restricting when frame_depth is called for :line events.

What this PR does

  • By not checking frame_depth for every single :line event when stepping, we can save significant time.
  • We can do this based on the following assumptions:
    1. If we execute a line of code (aka, encounter a :line event) without breaking or reaching the same/lesser stack depth than where 'next' was called, it means we are at a deeper stack depth.
    2. Removing frames from the stack corresponds to a :return or :b_return event.
    3. So, if we pass a :line event without reaching the starting stack depth we can ignore all following :line events until we see a return or b_return event, since we will not be back to the same stack depth until some return event(s) occur.

Examples/validation

This file uses deep recursion with a method call to create a very large call stack:

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

We can benchmark a step over the call on line 13 with this command:

time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb

Results

Baseline - not stepping at all:

 $ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.241s
user    0m0.188s
sys     0m0.054s

Stepping before this change:

 $ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.188s
user    0m2.083s
sys     0m0.054s

Stepping after this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m1.147s
user    0m1.107s
sys     0m0.038s

So this change gives a roughly ~50% improvement in the performance of stepping for the example.

Co-authored-by: @jhawthorn

@WillHalto WillHalto changed the title Skip most line events when stepping with next Skip some line events for faster stepping with next Aug 31, 2022
@WillHalto WillHalto changed the title Skip some line events for faster stepping with next Faster next by reducing number of frame_depth calls Aug 31, 2022
@WillHalto WillHalto marked this pull request as ready for review August 31, 2022 16:16
13| "foo"
13| puts "foo"
14| puts "bar"
15| "baz"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to the test to ensure that next <number> functionality is preserved when stepping over multiple lines at the same stack depth.

@st0012
Copy link
Member

st0012 commented Aug 31, 2022

This is an amazing optimization 👍

I feel we can also improve this part's readability a bit by moving relevant logic closer:

-              loc = caller_locations(2, 1).first
-              loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
               stack_depth = DEBUGGER__.frame_depth - 3

               # If we're at a deeper stack depth, we can skip line events until there's a return event.
               skip_line = event == :line && stack_depth > depth

               # same stack depth
-              (stack_depth <= depth) ||
+              next true if stack_depth <= depth
+
+              loc = caller_locations(2, 1).first
+              loc_path = loc.absolute_path || "!eval:#{loc.path}"

And then we can further refactor the later part too:

               # different frame
-              (next_line && loc_path == path &&
-               (loc_lineno = loc.lineno) > line &&
-               loc_lineno <= next_line)
+              next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)

So putting them all together:

-              loc = caller_locations(2, 1).first
-              loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
               stack_depth = DEBUGGER__.frame_depth - 3

               # If we're at a deeper stack depth, we can skip line events until there's a return event.
               skip_line = event == :line && stack_depth > depth

               # same stack depth
-              (stack_depth <= depth) ||
+              next true if stack_depth <= depth
+
+              loc = caller_locations(2, 1).first
+              loc_path = loc.absolute_path || "!eval:#{loc.path}"

               # different frame
-              (next_line && loc_path == path &&
-               (loc_lineno = loc.lineno) > line &&
-               loc_lineno <= next_line)
+              next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)

@WillHalto
Copy link
Contributor Author

@st0012 thanks for the feedback! That's a great refactor, I did that in 7a41491 👍

@jabamaus
Copy link

jabamaus commented Sep 5, 2022

Great work in these optimisations Will 😃 Are you thinking about optimising startup/connection times too? I'm thinking about that. Haven't looked too deeply but I'm interested in a minimal config that cuts out as many 'requires' as possible. It takes the best part of 10 seconds to get debugging on my underpowered pc with a pretty simple ruby program. Feels like it should be a couple of seconds really and practically instant on a good pc.

@ko1 ko1 added this to the v1.7.0 milestone Sep 16, 2022
@ko1
Copy link
Collaborator

ko1 commented Sep 16, 2022

Could you measure the time with #746 ?

@WillHalto
Copy link
Contributor Author

Could you measure the time with #746 ?

@ko1 yes, here is that result:

Using the same example as above

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

Baseline - not stepping at all:

$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.252s
user    0m0.207s
sys     0m0.046s

Stepping before changes:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.370s
user    0m2.292s
sys     0m0.077s

Stepping with changes from #743 and #746

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.276s
user    0m0.228s
sys     0m0.048s

With both of these changes in place, there is a ~99% improvement in the next performance compared to the baseline.

@ko1
Copy link
Collaborator

ko1 commented Oct 4, 2022

Ah, I'm sorry I need to ask you that "with #746 but without #743".
I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.

@WillHalto
Copy link
Contributor Author

Ah, I'm sorry I need to ask you that "with #746 but without #743". I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.

@ko1 oh, I understand. Test result for #746 without #743 is in the PR description here: #746 (comment) Under the Examples/validation header.

Let me know if you'd like me to run any other tests on either of these PRs! 👍

@ko1
Copy link
Collaborator

ko1 commented Oct 26, 2022

Ah, thank you!

This is summary:

# baseline
 $ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.241s
user    0m0.188s
sys     0m0.054s

# only with #743 
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m1.147s
user    0m1.107s
sys     0m0.038s

# only with #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.285s
user    0m0.218s
sys     0m0.069s

# with #743 and #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.276s
user    0m0.228s
sys     0m0.048s

It seems #746 is dominant, so I'll merge only #746.

#743 (this PR) I can't fully guarantee it is correct.


after reading the code, your code is more readable and ... maybe no problem, so I'll merge it.

@ko1 ko1 merged commit fa05ffa into ruby:master Oct 26, 2022
@ko1
Copy link
Collaborator

ko1 commented Oct 26, 2022

Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.

@WillHalto
Copy link
Contributor Author

Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.

Ah, no problem, I agree with

It seems #746 is dominant

anyway. Is it worth trying to fix multi thread support here or just go ahead with #746 only instead?

@ko1
Copy link
Collaborator

ko1 commented Oct 28, 2022

Is it worth trying to fix multi thread support here or just go ahead with #746 only instead?

I think multi-threading support is not valuable than the advantages.

@ko1
Copy link
Collaborator

ko1 commented Nov 2, 2022

My comment "it doesn't support multi-threads" was wrong (because TracePoint only enables on the created thread) so I'll try to merge it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants