Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc: fix TestCondBreakpointWithFrame flakes on 1.22rc1 #3624

Merged
merged 1 commit into from
Jan 10, 2024

Conversation

aarzilli
Copy link
Member

@aarzilli aarzilli commented Jan 4, 2024

The flake manifests as an error where the variable i can not be found in
frame 1 and happens in go1.22rc1 between 0.1% and 0.5% of the time (it is highly dependent on CPU contention)
This problem is caused by the new code in evalop.PushLocal referencing the
stale value of SelectedGoroutine. This happens because:

  • evalop.PushLocal calls ConvertEvalScope
  • ConvertEvalScope calls FindGoroutine
  • FindGoroutine checks the value of selectedGoroutine

When breakpoint conditions are evaluated both the value of selectedGoroutine
and currentThread are stale because we can only set their definitive value
after all breakpoint conditions have been evaluated.

The fact that it only happens in 1.22rc1 is coincidental, it's probably
caused by the fact that 1.22rc1 migrates goroutines between threads more in
this particular circumstance.

This commit fixes the problem in two ways:

  1. selectedGoroutine and currentThread are given temprorary non-stale values
    before breakpoint conditions are evaluated
  2. evalop.PushLocal is changed so it takes a stack trace of the current
    thread rather than resolving it through the selected goroutine.

Either one would suffice however I think we should do both, (2) ensures that
the runtime.frame(n).var will work even if the current thread is not running
any goroutine and (1) ensures that we don't accidentally reference a stale
selectedGoroutine in the future.

The flake manifests as an error where the variable i can not be found in
frame 1 and happens in go1.22rc1 between 0.1% and 0.5% of the time (it is highly dependent on CPU contention)
This problem is caused by the new code in evalop.PushLocal referencing the
stale value of SelectedGoroutine. This happens because:

-  evalop.PushLocal calls ConvertEvalScope
- ConvertEvalScope calls FindGoroutine
- FindGoroutine checks the value of selectedGoroutine

When breakpoint conditions are evaluated both the value of selectedGoroutine
and currentThread are stale because we can only set their definitive value
*after* all breakpoint conditions have been evaluated.

The fact that it only happens in 1.22rc1 is coincidental, it's probably
caused by the fact that 1.22rc1 migrates goroutines between threads more in
this particular circumstance.

This commit fixes the problem in two ways:

1. selectedGoroutine and currentThread are given temprorary non-stale values
   before breakpoint conditions are evaluated
2. evalop.PushLocal is changed so it takes a stack trace of the current
   thread rather than resolving it through the selected goroutine.

Either one would suffice however I think we should do both, (2) ensures that
the runtime.frame(n).var will work even if the current thread is not running
any goroutine and (1) ensures that we don't accidentally reference a stale
selectedGoroutine in the future.
Copy link
Member

@derekparker derekparker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@derekparker derekparker merged commit bf627d0 into go-delve:master Jan 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants