bug: state of _stackprof.last_sample_at can leak between invocations #220

dalehamel · 2023-11-10T15:05:22Z

We noticed a very peculiar profile:

The top frame is StackProf.start. It is supposedly occupying 20% of the overall aggregated profile time (~10 out of ~50 minutes). How can this be possible?

Checking the code, it looks like it is technically possible that we can have a signal be received within stackprof_start, as the signal handler is installed but there are a few lines left to execute before we return:

https://github.com/tmm1/stackprof/blob/master/ext/stackprof/stackprof.c#L204-L234

Many of the other variables on _stackprof struct are cleared between runs, but notably, last_sample_at is never reset within stackprof_results

Since we never explicitly acquire and relinquish the GVL within stackprof_start, and never reset the value, this is my theory as to what happened:

stackprof run A occurs, and on its final sample collection, sets last_sample_at, t_0
Some time passes
stackprof run B occurs, and during the initialization in stackprof_start, between the installation of the signal handler and before we reset last_sample_at to the current time, the ruby VM swaps to another thread
Enough wall clock time is spent on the other thread that our timer fires, and we collect a sample
When calculating the weight of the sample, we use t_0 as the base, instead of t_1, because it has not been reset yet
Execution returns and we finish executing stackprof_start, and last_sample_at is set to t_1, the current time.

I'm not sure if the above is plausible in ruby's threading model, but it's the best I could come up with^.

I was very surprised to see StackProf.start at all in profiles, let alone so prominently. But from what I can tell, the above scenario is a plausible explanation for how this could occur.

Looking at the same interval of time, we do not see it appear in :cpu mode profiles collected. This makes sense though, as those don't use timestamps for their weighting.

The solution to this I think would be some combination of:

Acquire the GVL during stackprof_start and release it just before we return
Reset the last_sample_at during the results function, as we do with other variables on the _stackprof struct.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: state of _stackprof.last_sample_at can leak between invocations #220

bug: state of _stackprof.last_sample_at can leak between invocations #220

dalehamel commented Nov 10, 2023 •

edited

Loading

bug: state of _stackprof.last_sample_at can leak between invocations #220

bug: state of _stackprof.last_sample_at can leak between invocations #220

Comments

dalehamel commented Nov 10, 2023 • edited Loading

dalehamel commented Nov 10, 2023 •

edited

Loading