-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDTs optimised into tail-calls panic #476
Comments
`uintptr_t` maybe be easy to stick in a probe, but it is also a footgun. This fixdes the display for e.g. port-process-return, but does not affect the actual panics mentioned in #476.
The actual panics all occur in sites where rustc has chosen to tail-call optimise the calls to the dtrace probes with a
Another failure had an interesting stack trace that led me to this:
The call made before This can be seen in the periodic flow expiry case above: the ExtractIf closure calls
I can confirm that Both examined functions end in a tail-call to the relevant DTrace SDT. |
@citrus-it took a look at a recent coredump, focussing on
This does not include a |
&InnerFlowId
behave unexpectedly and/or panic
Rather than working around this in opte, this should probably be fixed in illumos. Opened upstream bug and potential fix. |
The binary on-disk indeed shows (via
At runtime on stock illumos bits we see the code in-memory with NOPs replacing the last five bytes (
With the above patch, we now have a
For the non-tail call case, we continue to patch in only NOPs:
|
Closed by illumos#16480. |
Noted while working on #462/#475 -- this is a tracking issue to understand this as its own problem. Today we are converting
InnerFlowId
s toflow_id_sdt_arg
structs, which is moderately costly as it occurs many times per packet. This creates one or two (current, or before+after) stack-local variables which are referenced without issue.Removing this and passing in either a
*const InnerFlowId
or converting to auintptr_t
(as we do with our other args) leads to known panics in two locations so far. From some dumps I've captured:Periodic flow expiry
Update TCP state
Both occur some distance from the actual SDT: a format statement on the supposedly-valid
&InnerFlowId
, and a match on a&InnerFlowId
respectively before the probe occurs. Removing the probe causes these callsites to behave/compile correctly.Another SDT,layer-process-return
shows a different variation:Flow_before is obviously valid, while flow_after appears to point elsewhere. The only obvious difference I'm aware of is that flow_after is obtained direct viaPacket::flow
, while flow_before is obtained and explicltly copied out beforepkt
is modified.EDIT: The last case is caused by our
uintptr_t
untyped args making it easy to pass the wrong thing. The actual kernel panics still stand even with accurate types, however.The text was updated successfully, but these errors were encountered: