Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using wall-clock profiling on macOS makes Ruby thread scheduling starve background threads #158

Open
ivoanjo opened this issue May 28, 2021 · 4 comments

Comments

@ivoanjo
Copy link
Contributor

ivoanjo commented May 28, 2021

Hello again! πŸ‘‹ πŸ˜„

First the disclaimer (I guess it may be relevant): I work at @DataDog on profiling for the ddtrace gem, although we don't use rb_profile_frames (oh my god it's full of C 🀣 ).

While doing a few experiments with stackprof, I came upon this really really really weird behavior. Consider this simple test application:

require 'bundler/inline'

puts RUBY_DESCRIPTION

gemfile do
  source 'http://rubygems.org'

  gem 'stackprof'
end

StackProf.start(mode: :wall) if ENV['USE_STACKPROF'] == '1'
puts "Stackprof: #{StackProf.running?}"

def fib(n)
  return n if n <= 1
  fib(n-1) + fib(n-2)
end

def loop_fib(marker)
  count = 0
  loop do
    count += 1
    print marker if count % 15 == 0
    fib(30)
  end
end

Thread.new { loop_fib("⭐️") }
loop_fib("😈")

Basically two threads are running as fast as they can this really simple but on-cpu workload. Ruby seems to distribute cpu time reasonably fairly:

$ ruby stackprof-bias-issue.rb
ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-darwin19]
Stackprof: false
😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈😈⭐️😈⭐️
😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈😈⭐️😈⭐️
😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈
⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈
⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈
(etc...)

But observe what happens if I turn on stackprof:

$ USE_STACKPROF=1 ruby stackprof-bias-issue.rb
ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-darwin19]
Stackprof: true
😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈
😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈
😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈😈
😈😈😈😈⭐️😈😈😈😈😈😈😈😈😈
(etc...)

Cpu profiling does not seem to be affected.

It's really weird, but it looks like the main thread gets to run most of the time, and any other background thread gets massively starved for time.

I was able to observe this behavior on Ruby 3.0.1, 2.7.3 and 2.6.7 on macOS (10.15.7/i7-1068NG7), but not on Linux. I did spot after the fact that the README.md states that Linux is required, but I'm guessing that's outdated and that macOS is expected to work (...maybe?).

@tenderlove
Copy link
Collaborator

Thanks for reporting this. It must be a problem with setitimer / sigaction on macOS, but I'm not sure where to start debugging this. FWIW I also couldn't reproduce this on Linux, but could reproduce on macOS.

My best guess is that this is due to the timer on macOS using a thread and pipe. We could probably improve the situation by changing the timer in Ruby to use Grand Central Dispatch (libdispatch) for its timer, but I haven't tested. Linux seems to be using timer_create from librt, and I suspect it plays better with signal handling.

@fekadeabdejene
Copy link

fekadeabdejene commented Mar 14, 2022

Machine: Linux spin 5.4.120+ #1 SMP Tue Jun 22 14:53:20 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-linux]

Modified the script to show differences between CPU and WALL on the above machine:

require 'bundler/inline'

puts RUBY_DESCRIPTION

gemfile do
  source 'http://rubygems.org'

  
  gem 'stackprof'
end

def loop_fib(marker)
  count = 0
  (0..10).each do |n|
    count += 1
    print marker
    sleep(0.25)
  end
end

mode = if ENV['USE_STACKPROF_WALL']
  :wall
else
  :cpu
end

StackProf.start(mode: mode, raw: true, interval: 150)
puts "Stackprof: #{StackProf.running?}"
Thread.new { loop_fib("⭐️") }
loop_fib("😈")
StackProf.stop
StackProf.results('2stackprof.dump')

:cpu showing both threads

:wall only showing main thread

@tenderlove
Copy link
Collaborator

tenderlove commented Mar 25, 2022

@fekadeabdejene I'm not seeing thread starvation in this case. When I run this on linux, both emojis are printed out mostly evenly:

Without WALL:

aaron@tenderhome ~/g/stackprof (master)> ruby -I ~/git/gel/lib ~/git/gel/bin/gel exec ruby -I lib test.rb
Stackprof: true
😈⭐️⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⏎

With WALL:

Stackprof: true
😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️😈⭐️⏎      

Does it do the same thing for you? I want to figure out if this is a problem with thread starvation or if it's a problem with the reporter.

aaron@tenderhome ~/g/stackprof (master)> uname -a
Linux tenderhome 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

@fekadeabdejene
Copy link

Hey, That is a good point! A problem with the reporting is possible. I also am able to see the emoji's. Whenever I get dumps using :wall it only shows the main thread samples. So I thought it maybe related to this issue. As you are saying, it might not be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@tenderlove @ivoanjo @fekadeabdejene and others