Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack traces are disconnected when using resumable tasks #384

Closed
gartung opened this issue Apr 23, 2021 · 11 comments
Closed

Stack traces are disconnected when using resumable tasks #384

gartung opened this issue Apr 23, 2021 · 11 comments

Comments

@gartung
Copy link

gartung commented Apr 23, 2021

Since CMSSW moved to using resumable tasks the results of cpu usage with our profiler Igprof have been incorrect. The reported cpu usage is on the order of 10% of the actual cpu usage. Perf was also tried but the cumulative cpu usage for functions did not add up. Profiling was also done with vtune. While vtune correctly accounts for the total cpu usage, the top down view show a majority of the cpu usage as Skipped stack frame.

@gartung
Copy link
Author

gartung commented Apr 23, 2021

Igprof captures on these system calls to sample call stack and libunwind to find the corresponding function calls.

@gartung
Copy link
Author

gartung commented Apr 23, 2021

Before the switch to Igprof reported the cumulative cpu usage correctly in CMSSW_11_3_0_pre4 when counting from _libc_start_main.

After the switch the resumable tasks Igprof reported the cumulative cpu usage incorrectly in CMSSW_11_3_0_pre5

Notice the cumulative cpu usage for CMSSW_11_3_0_pre4 was ~8000s while the cumulative cpu usage for CMSSW_11_3_0_pre5 was ~130s.

@gartung
Copy link
Author

gartung commented Apr 23, 2021

Example of vtune report
image

@gartung
Copy link
Author

gartung commented Apr 26, 2021

From the comment in the igprof code it seems that igprof does not count the time in clone().

@gartung
Copy link
Author

gartung commented Apr 26, 2021

Vtune at least record the time under clone().
image

@alexey-katranov
Copy link
Contributor

alexey-katranov commented Apr 26, 2021

I suppose that resumable tasks involving stack manipulation with ucontext might cause unexpected issues with profiling tools. The issues can be caused when the same thread is switching between different tasks that access the same tls inside the tool, i.e. the tool can observe unexpected order of tls changes.

@gartung
Copy link
Author

gartung commented Apr 26, 2021

It appears that when the coroutine is created the parent context is not saved
https://github.com/oneapi-src/oneTBB/blob/9e15720bc7744f85dff611d34d65e9099e077da4/src/tbb/co_context.h#L176
I think this would explain the disconnected stack traces.

@gartung
Copy link
Author

gartung commented Apr 26, 2021

@alexey-katranov
Copy link
Contributor

In the example, uc_link is used to specify where the coroutine should return when its function is completed. In oneTBB runtime, we do not know where to return in advance because coroutines are executed independently from each other and successor depends on execution flow.

@gartung
Copy link
Author

gartung commented Apr 28, 2021

Igprof does not handle the context swapping with the coroutines correctly. Google perftools profiler library produces output that correctly accounts for the context swapping.

@gartung
Copy link
Author

gartung commented Apr 30, 2021

When the coroutine is create the parent link is set to 0
When the coroutine is swapped back the thread exits after running the task. (edited)
The IgProf threadWrapper sees the task return and disables the profiling timer.
https://github.com/igprof/igprof/blob/03d16c15bd970256affcfde9e510a8404856b233/src/profile.cc#L835
Gperftools does not ever disables the profiling timer on the main thread. Per thread timers can be enabled and are deleted when the thread exits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants