-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSE-334: Fallback for getting locks from stack frames #99
Conversation
08dcde6
to
3bf92ae
Compare
The OL8 UEK6 x86_64 builds used a broken CTF toolchain for a long time, resulting in broken CTF. However, the toolchain used to build aarch64 was newer and not affected by the bug. Mark it as compatible so we can use drgn with CTF on aarch64 for this release. Signed-off-by: Stephen Brennan <[email protected]>
For module debuginfo, we use a heuristic to detect whether debuginfo is loaded. This can fail and it's also extra overhead. When we load CTF, we know that we've got debuginfo for all modules. Cache this fact and use it to special case the module_has_debuginfo() check. This fixes test failures which happen due to a failed check. The test failures result in rather cryptic errors due to missing type fields. EG: AttributeError: 'struct files_struct' has no member 'fdt' These get caused because when CTF is loaded, the failed check will result in the debuginfo code loading a DWARF module debuginfo file, which can cause conflicts in the type lookups. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
We currently use the "frame.name" which may not be present when using CTF. Instead, we should fallback to the symbol name. The frame_name() function is not appropriate, since this may include additional information, such as the module containing the symbol. So introduce a new helper, func_name() for this case. Signed-off-by: Stephen Brennan <[email protected]>
aarch64 doesn't guarantee provide the value of SP when doing a frame pointer based unwind. Don't fail on this: instead fall back to FP. Fixes internal issue tracked as LSE-374. Signed-off-by: Stephen Brennan <[email protected]>
There's no reason to be confident that a variable value will be valid when we're formatting it in the stack trace. It is possible that it will contain stale values that may result in FaultErrors. Be sure to catch this and simply print a message saying the FaultError happened. This ensures bt does not crash in these exceptional cases. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
There's no reason to exempt UEK4 from this test: it's not that drgn can't do stack traces on UEK4, it's really that "crashed_thread()" doesn't work on UEK4 vmcores. Let's work around that so we can still smoke test the actual bt operation. Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
Signed-off-by: Stephen Brennan <[email protected]>
For mutexes, locks, and semaphores, we are able to reliably test whether any given pointer actually corresponds to the correct lock. We simply pretend that it is valid, and check to see whether the current task is present on the list of waiters. Imran initially implemented the is_task_blocked_on_lock() function with this great idea. We don't know where lock pointer will actually be on the stack: that's what DWARF would tell us. We could hard-code the stack offsets we've observed in the past, but this is not very maintainable. It takes a lot of code to implement and it takes a lot of time & resources to check every kernel. Plus, we would need to check each new kernel as they are released. The alternative, as suggested by Junxiao, is much simpler: just check every stack offset from the top of the stack to the mutex/sem lock function. We can eliminate many addresses by the fact that they may not be kernel memory addresses. For the remaining addresses, so long as we are careful (validating that we are really following a linked list, and not going into a loop), there should be no problem testing them to see if they are really locks. This commit implements the approach. Signed-off-by: Stephen Brennan <[email protected]> Co-authored-by: Imran Khan <[email protected]> Suggested-by: Junxiao Bi <[email protected]>
We've generated vmcores for every combination of OL, UEK, and architecture. Additionally, for combinations where there are multiple offsets, we've generated a vmcore on the oldest and newest version. Each vmcore is generated after first loading the "lockmod" kernel module, so that we can test loading the lock value out of the stack frame. The test is written to test the fallback on DWARF (in case the stack frame lookup fails) and CTF. Signed-off-by: Stephen Brennan <[email protected]>
3bf92ae
to
6d07880
Compare
# First, we need to get the start and end address to search. This is | ||
# arch-specific. | ||
if prog.platform.arch == Architecture.X86_64: | ||
candidates = range(task.thread.sp, frame.sp, 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frame.sp
is the bottom of the stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's the bottom of this stack frame. This reliably works well though! I think it's because x86 always spills the register beneath the stack frame anyway. With aarch64, it seems that the register gets stored within the stack frame.
I could try to get the next stack frame up, but this function doesn't actually accept a stack trace... just the single stack frame. I could try to use the workaround with the frame pointer that aarch64 does. In practice, we always find the lock before we get to the upper end up the range anyway, so it wouldn't be much of an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so task.thread.sp
is the bottom of the stack, then the scan happened to the parents stack frame of current one, could there be a case that lock address is stored in the children stack frame of current one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm not sure what you mean. The task.thread.sp
is the "bottom" of the stack, that is, it is where the stack pointer was stored just prior to context switching away. Then, frame.sp
is the stack pointer for the current function (e.g. mutex_lock()
). So, the stack looks something like this:
HIGHER ADDRESSES / OLDER FRAMES HERE
[parent caller frame]
...
[mutex_lock() frame]
0xDEADBEEF
0xF000B000
... vars ... <---- frame.sp
[schedule_preempt_disabled() frame]
...
[schedule() frame]
... vars ...
... bottom of the stack ... <--- task.thread.sp
LOWER ADDRESSES / NEWER FRAMES HERE
So we start at task.thread.sp
and increment the address until we reach frame.sp
. Since frame.sp
is technically pointing at the bottom of the stack frame for mutex_lock()
, it's possible that the lock could be somewhere in that frame and we'd miss it. However in practice that just doesn't happen, the lock has always been beneath the frame.sp value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fault, thought task.thread.sp
is the other end of the stack.
I'm OK with current approach, we can always fix it if lock is stored in current stack frame in future kernels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Super cool stuff @brenns10 : )
This pull request augments the existing lock module. Previously, it could only work with DWARF debuginfo, using the CFI to identify the mutex, sem, or rwsem that a task was blocked waiting on. Now, it can also function in cases where the DWARF debuginfo fails, as a fallback. What's more, it also works with CTF debuginfo.
The approach here is based on @imran-kn's idea that you can guess that a pointer corresponds to a lock, and then you can verify that by checking to see whether the current task is on the list of waiters for that lock. The catch here is that you need to be really careful about it: it's not enough to just do a "try/except" for FaultErrors. You also need to ensure that the linked list nodes are valid (so it's less likely that you're dereferencing random pointers). And even if the linked list nodes are valid, you could end up with an infinite loop, so we also need to track the set of already visited nodes in the list. If we do all this, we can be confident that either we will (a) dereference some junk and generate a FaultError, (b) encounter a pointer to something that's not a valid linked list node, so we'll get a ValidationError, or (c) we'll make a cycle in the list and raise a ValidationError because that's not valid data either.
With the careful checking described above, there's no need to have complex tables of offsets! We just need to find the start and stop location on the stack, and search every word on the stack for a valid lock. We can speed things along by checking what kind of address we're looking at, and only testing the pointer if it could be a valid lock pointer (skipping userspace addresses, text addresses, etc). This was @biger410's idea, if he hadn't mentioned it, I would still be generating tables of offsets...
To test this, I've created a large crop of vmcores with kernel modules loaded. These modules spawn several kthreads that wait for a lock, and then the test case can find these threads and assert that it is able to find the lock from the stack frame, both with and without the fallback.
Tests pass on all vmcores internally (including the newly uploaded ones). Tests pass on Github CI as well, though there's not much testing of this specific functionality.
I think this will probably replace #81.