Faster `next` by reducing number of frame_depth calls #743

WillHalto · 2022-08-31T15:17:13Z

Description

Background

next can be quite slow when stepping over code. It appears this is a known issue due to expensive calls to DEBUGGER__.frame_depth that are made for every :line event when stepping.
- This check is made to determine if we have returned to the same stack depth as when next was called, since that's where we want to break.
This performance penalty is significant when stepping over code in a large codebase with a large call stack.
This PR improves the performance of stepping with next by restricting when frame_depth is called for :line events.

What this PR does

By not checking frame_depth for every single :line event when stepping, we can save significant time.
We can do this based on the following assumptions:
1. If we execute a line of code (aka, encounter a :line event) without breaking or reaching the same/lesser stack depth than where 'next' was called, it means we are at a deeper stack depth.
2. Removing frames from the stack corresponds to a :return or :b_return event.
3. So, if we pass a :line event without reaching the starting stack depth we can ignore all following :line events until we see a return or b_return event, since we will not be back to the same stack depth until some return event(s) occur.

Examples/validation

This file uses deep recursion with a method call to create a very large call stack:

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

We can benchmark a step over the call on line 13 with this command:

time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb

Results

Baseline - not stepping at all:

 $ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.241s
user    0m0.188s
sys     0m0.054s

Stepping before this change:

 $ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.188s
user    0m2.083s
sys     0m0.054s

Stepping after this change:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m1.147s
user    0m1.107s
sys     0m0.038s

So this change gives a roughly ~50% improvement in the performance of stepping for the example.

Co-authored-by: @jhawthorn

WillHalto · 2022-08-31T16:18:04Z

test/console/control_flow_commands_test.rb

-        13| "foo"
+        13| puts "foo"
+        14| puts "bar"
+        15| "baz"


Added this to the test to ensure that next <number> functionality is preserved when stepping over multiple lines at the same stack depth.

st0012 · 2022-08-31T20:49:36Z

This is an amazing optimization 👍

I feel we can also improve this part's readability a bit by moving relevant logic closer:

-              loc = caller_locations(2, 1).first
-              loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
               stack_depth = DEBUGGER__.frame_depth - 3

               # If we're at a deeper stack depth, we can skip line events until there's a return event.
               skip_line = event == :line && stack_depth > depth

               # same stack depth
-              (stack_depth <= depth) ||
+              next true if stack_depth <= depth
+
+              loc = caller_locations(2, 1).first
+              loc_path = loc.absolute_path || "!eval:#{loc.path}"

And then we can further refactor the later part too:

               # different frame
-              (next_line && loc_path == path &&
-               (loc_lineno = loc.lineno) > line &&
-               loc_lineno <= next_line)
+              next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)

So putting them all together:

-              loc = caller_locations(2, 1).first
-              loc_path = loc.absolute_path || "!eval:#{loc.path}"
-
               stack_depth = DEBUGGER__.frame_depth - 3

               # If we're at a deeper stack depth, we can skip line events until there's a return event.
               skip_line = event == :line && stack_depth > depth

               # same stack depth
-              (stack_depth <= depth) ||
+              next true if stack_depth <= depth
+
+              loc = caller_locations(2, 1).first
+              loc_path = loc.absolute_path || "!eval:#{loc.path}"

               # different frame
-              (next_line && loc_path == path &&
-               (loc_lineno = loc.lineno) > line &&
-               loc_lineno <= next_line)
+              next_line && loc_path == path && loc.lineno.between?(line + 1, next_line)

WillHalto · 2022-09-01T20:06:07Z

@st0012 thanks for the feedback! That's a great refactor, I did that in 7a41491 👍

jabamaus · 2022-09-05T13:06:10Z

Great work in these optimisations Will 😃 Are you thinking about optimising startup/connection times too? I'm thinking about that. Haven't looked too deeply but I'm interested in a minimal config that cuts out as many 'requires' as possible. It takes the best part of 10 seconds to get debugging on my underpowered pc with a pretty simple ruby program. Feels like it should be a couple of seconds really and practically instant on a good pc.

ko1 · 2022-09-16T09:53:46Z

Could you measure the time with #746 ?

WillHalto · 2022-09-17T19:54:41Z

Could you measure the time with #746 ?

@ko1 yes, here is that result:

Using the same example as above

# target.rb
def foo
  "hello"
end

def recursive(n,stop)
  foo
  return if n >= stop

  recursive(n + 1, stop)
end
  
recursive(0,1000) # line 13

Baseline - not stepping at all:

$ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.252s
user    0m0.207s
sys     0m0.046s

Stepping before changes:

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m2.370s
user    0m2.292s
sys     0m0.077s

Stepping with changes from #743 and #746

$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.276s
user    0m0.228s
sys     0m0.048s

With both of these changes in place, there is a ~99% improvement in the next performance compared to the baseline.

ko1 · 2022-10-04T07:40:10Z

Ah, I'm sorry I need to ask you that "with #746 but without #743".
I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.

WillHalto · 2022-10-11T19:51:20Z

Ah, I'm sorry I need to ask you that "with #746 but without #743". I will merge #746 but I need to read details of #743 for review. But if #746 is enough, I can leave it.

@ko1 oh, I understand. Test result for #746 without #743 is in the PR description here: #746 (comment) Under the Examples/validation header.

Let me know if you'd like me to run any other tests on either of these PRs! 👍

ko1 · 2022-10-26T08:36:39Z

Ah, thank you!

This is summary:

# baseline
 $ time exe/rdbg -e 'b 13;; c ;; c ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.241s
user    0m0.188s
sys     0m0.054s

# only with #743 
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m1.147s
user    0m1.107s
sys     0m0.038s

# only with #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.285s
user    0m0.218s
sys     0m0.069s

# with #743 and #746
$ time exe/rdbg -e 'b 13;; c ;; n ;; q!' -e c target.rb
#< ... rdbg output removed for readability >
real    0m0.276s
user    0m0.228s
sys     0m0.048s

It seems #746 is dominant, so I'll merge only #746.

#743 (this PR) I can't fully guarantee it is correct.

after reading the code, your code is more readable and ... maybe no problem, so I'll merge it.

ko1 · 2022-10-26T08:44:34Z

Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.

WillHalto · 2022-10-26T14:12:29Z

Ah, I found an issue, it doesn't support multi-threads, so I'll revert it. Sorry.

Ah, no problem, I agree with

It seems #746 is dominant

anyway. Is it worth trying to fix multi thread support here or just go ahead with #746 only instead?

ko1 · 2022-10-28T02:25:59Z

Is it worth trying to fix multi thread support here or just go ahead with #746 only instead?

I think multi-threading support is not valuable than the advantages.

ko1 · 2022-11-02T17:40:09Z

My comment "it doesn't support multi-threads" was wrong (because TracePoint only enables on the created thread) so I'll try to merge it again.

WillHalto added 4 commits August 27, 2022 01:57

skip most line events when stepping over code

5a70f6d

clarify logic for skipping line, support n <num> use

3066f14

add test for next <n> stepping over lines with no returns

fcf648f

rename variable

9b4808c

WillHalto changed the title ~~Skip most line events when stepping with next~~ Skip some line events for faster stepping with next Aug 31, 2022

WillHalto changed the title ~~Skip some line events for faster stepping with next~~ Faster next by reducing number of frame_depth calls Aug 31, 2022

WillHalto marked this pull request as ready for review August 31, 2022 16:16

WillHalto commented Aug 31, 2022

View reviewed changes

refactor for readability

7a41491

ko1 added this to the v1.7.0 milestone Sep 16, 2022

WillHalto mentioned this pull request Sep 17, 2022

Faster next / finish by using rb_profile_frames #746

Merged

st0012 mentioned this pull request Sep 22, 2022

Performance issue when stepping through a deep callstack #760

Open

marianosimone mentioned this pull request Sep 25, 2022

next stops in the wrong place when blocks are called #763

Closed

ko1 merged commit fa05ffa into ruby:master Oct 26, 2022

ko1 mentioned this pull request Oct 26, 2022

Revert "Faster next by reducing number of frame_depth calls" #779

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster `next` by reducing number of frame_depth calls #743

Faster `next` by reducing number of frame_depth calls #743

WillHalto commented Aug 31, 2022 •

edited

Loading

WillHalto Aug 31, 2022

st0012 commented Aug 31, 2022

WillHalto commented Sep 1, 2022

jabamaus commented Sep 5, 2022

ko1 commented Sep 16, 2022

WillHalto commented Sep 17, 2022

ko1 commented Oct 4, 2022

WillHalto commented Oct 11, 2022

ko1 commented Oct 26, 2022

ko1 commented Oct 26, 2022

WillHalto commented Oct 26, 2022

ko1 commented Oct 28, 2022

ko1 commented Nov 2, 2022

Faster next by reducing number of frame_depth calls #743

Faster next by reducing number of frame_depth calls #743

Conversation

WillHalto commented Aug 31, 2022 • edited Loading

Description

Background

What this PR does

Examples/validation

Results

WillHalto Aug 31, 2022

Choose a reason for hiding this comment

st0012 commented Aug 31, 2022

WillHalto commented Sep 1, 2022

jabamaus commented Sep 5, 2022

ko1 commented Sep 16, 2022

WillHalto commented Sep 17, 2022

ko1 commented Oct 4, 2022

WillHalto commented Oct 11, 2022

ko1 commented Oct 26, 2022

ko1 commented Oct 26, 2022

WillHalto commented Oct 26, 2022

ko1 commented Oct 28, 2022

ko1 commented Nov 2, 2022

Faster `next` by reducing number of frame_depth calls #743

Faster `next` by reducing number of frame_depth calls #743

WillHalto commented Aug 31, 2022 •

edited

Loading