-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: FuncForPC/FileLine/Caller/Callers api interferes with implementation of better inlining #11432
Comments
On Fri, Jun 26, 2015 at 12:55 PM, dr2chase [email protected] wrote:
We did promise it would be a valid argument to FuncForPC/FileLine. There's no need to steal bits - we can assign arbitrary PCs to encode
|
I struggle with this as well in gccgo. gccgo does of course implement arbitrary inlining (but currently does not do tail calls). Gccgo does full correct unwinding of inlined functions, using the debug info to track inline sites. However, this only works with runtime.Caller. If you use runtime.Callers, and then pass the resulting PC values to FuncForPC, you don't get a proper stack. At an inlining point, you get the same file/line multiple times. I agree with Keith that the best approach is to encode inline positions in the PC value, somehow. |
I'm happy with encoding into "PC" values. The 3 options that quickly come to mind are:
I think I have an anti-preference against 2. That seems most-confusing and also most likely to lead to non-intuitive hair in things like dead code elimination and code motion. Where do we stand on tail-call elimination? Are we more nearly "this is an important optimization, users must deal with it" or "stack traces are really important, think about the time added to debugging/diagnosis when information is lost"? There have been proposals/hacks over the years for figuring out ways to deal with TCE, but none of them come close to the (cache, stack size, simplicity) efficiency of the real thing. Maybe we can think of new hacks??? |
With regard to tail call optimization, one idea that has been discussed in the past (e.g., https://groups.google.com/d/msg/golang-nuts/JZZrFjFKkuM/MV9sSJsFlFEJ) is to add an explicit statement to the language to implement tail calls ("become f()"). Only for Go 2, though. I think that in the absence of some sort of explicit indication that a tail call is acceptable, the compiler should not implement them. That is, I'm on the side of "stack traces are really important." That said, let me also mention http://www.dwarfstd.org/ShowIssue.php?issue=100909.2 which is an approach used by GCC to permit tail calls to be reconstructed from the debug info. So one option is to only make a tail call when the call sequence can be reconstructed. (As far as fake PC values goes I think I would tentatively vote for your option 3, in the sense of use real PC values for simple addresses and completely different values for inlined addresses, but I have no strong feelings about it.) |
A few thoughts:
|
Suppose for the return PCs we try the following: if A inlines B inlines C, the single PC in the "real" stack trace will appear within A. Annotate that PC appropriately to indicate the inlining patterns, and for the two synthesized PCs, choose the appropriate in site in non-inlined B where C is called, and the appropriate site in non-inlined C where the backtrace event occurred. Also write the api and implementation that we wish we had started with, and in the old code recommend the new api and note that sometimes the stack of "PCs" contains white lies. An improvement? |
Your PC choices sound good but they sound hard to implement. If A, B, and C are all in different packages, it's not going to be easy for the compilation of A to know the PC values to use to represent the inlining of B and C. If we can implement them with no extra cost, then, yes, sounds great. |
The difficulty in my example is "B" -- if A inlines B and C, then it's likely that B inlines C, and thus there is not necessarily (unless we extend the fake) any good place to say that "B calls C". It would also lead to a proliferation of symbols (most likely). Russ seemed to think (I hope I am not putting words in his mouth) that it would be perhaps okay to just drop PCs from the old interface, as long as we provide a new way to obtain all the funcs/files/lines that apply at a given call site and use that for usual-case user-visible stack traces. I.e., it's not entirely clear that the customers for slices of caller-PC have much use for fake PCs anyhow. |
As the author/maintainer of logging packages (github.com/go-kit/kit/log and gopkg.in/inconshreveable/log15.v2) and a standalone package that consumes For my use cases I don't care about the actual PC values. I care that I can use them with Dropping inlined call-sites would pose problems to the libraries I build, putting me on the side of "stack traces are really important". Furthermore, the opposite problem—extra entries for |
Hiding generated functions from stack traces is not too far off of #4899 . |
As far as runtime.Callers goes, we should probably think in terms of a new interface. Though the transition will be slow. Ideally runtime.Caller will consider inlining. Of course the most important thing is that stack dumps consider inlining. |
We're not going to eliminate tail calls. Let's just get that out of the way. |
Let's also take "fake PCs" off the table. That will introduce significant complexity and confusion. New API is needed, but probably not much. It's true that a single PC may in general correspond to a tiny call stack of func/file/line entries. Assume this:
Then:
People who report full stacks using repeated calls to Callers are already writing quadratic code, so it seems okay that it's imperfect in a different way now too. People who want to report the full call stack more carefully will use Callers to get some PCs and then call FuncForPC+Frames for each of those. |
@rsc Have you perhaps written "Callers" in a few places where you meant "Caller"? Specifically in (5) and the last paragraph? |
I'm looking at this in the context of also supporting stack tracebacks for C code called by cgo. I would like to support those tracebacks in Callers and get good information for them in runtime/pprof and friends. Currently it's hard to correctly turn the Callers information into a good stack traceback, as can be seen by the fact that the implementation in runtime/pprof is subtly incorrect (it checks for runtime.gopanic where it should check for runtime.sigpanic). Russ's I propose that rather than referring to sigpanic in Callers, we instead say
and then we implement
|
@ianlancetaylor I like your proposed new API for
|
|
Should the per-"call" information include some information about the function, such as "was generated", "was inlined"? There is certainly a point in the compilation where we know about that, so we could just choose not to forget it. |
Combining these ideas, the signature of func (ci* CallersIterator) Next() (f Frame, more bool)
type Frame struct {
Func string
File string
PC uintptr
Entry uintptr
Line int
Generated bool // maybe
Inlined bool // maybe
} Returning a struct type allows future additions and seems less cumbersome to use for API clients. |
CL https://golang.org/cl/19869 mentions this issue. |
This indirectly implements a small fix for runtime/pprof: it used to look for runtime.gopanic when it should have been looking for runtime.sigpanic. Update #11432. Change-Id: I5e3f5203b2ac5463efd85adf6636e64174aacb1d Reviewed-on: https://go-review.googlesource.com/19869 Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: David Chase <[email protected]>
I think this is fixed. Better inlining needs to keep runtime.Frames working, and code needs to use it, but it's all there. |
I'm fishing for comments. The current API gives away too much from the POV of a compiler-writer.
And I need to be clear on one point -- if we don't resolve this, there's well-understood optimizations that Go compilers won't do, and these are sufficiently important optimizations that they affect the way people write programs.
Current (1.<=5) Go only inlines when it can inline entire call trees all the way to the leaves; it cannot, for example, inline a simple function Small that wraps a much larger function Huge, because Huge is not eligible for inlining and thus all callers of Huge are not eligible for inlining. The current total-inlining policies mean that Go programs never observe the call sites at which the inlining occurs, and thus are never presented with the dilemma that there can be a stack of "callers" corresponding to a single return PC.
If we allow more flexible inlining, and if A calls B, B calls runtime.Callers(1, pcslice) and B is inlined into A, the "return pc" for Callers (pcslice[0]) will be an address within A (according to nm) and there will be no return PC identifying B or the file of B or the line number within B.
And unfortunately, this return PC = single caller identity is exposed in an interface -- a "return pc" is a uintptr. We could in theory hack on that integer to embed inline depth information in some unused bits, but in practice that is likely to break some code.
So, if we intend to improve inlining in the future, we need to do something about this. The options I see (and I may be myopic, other options are welcome) include:
(a) simply ignore intermediate callers. In the example, the caller of Callers will be A, and any mention of B is lost. It's a systems programming language, optimizers do this stuff all the time in other languages and you love it when they do, put on your grown-up pants and deal with it.
pros: easy.
And it will prepare people for tail-call elimination.
And it will discourage them from thinking that they can use this for implementing Java-style caller-sensitive security, which is slow, yet difficult to reason about and hard to maintain. Seriously, this has been a source of numerous security holes in Java, in JDK 8 they simplified it as much as possible to reduce their risk (see e.g. http://openjdk.java.net/jeps/176 and https://bugs.openjdk.java.net/browse/JDK-8046166 ).
cons: people might already critically rely on exact stack traces.
(b) hack on the encoding of the return PC to embed an index. I'd propose to grab the upper 2-3 bits on an Intel box or the lower 2 bits on a RISC architecture (this imposes an obvious restriction on inline depth, and I assume that I can grab these bits even on a 32-bit machine), and have the counter indicate the inline depth, thus the 00 case will match what is seen in the nm output (in the above example, "A") and the 01 case corresponds to B. The symbol table information for "A" will need to provide the inlining depth at this site. In the example above, if it is IA32 and the raw RPC is 0x01234567, then the caller of Callers (B) is return PC 0x41234567 and B's caller ("A") is return PC 0x01234567. On a RISC, raw PC 0x03217654 is the same as the RPC for "A" and 0x03217655 is the RPC for "B" (called by A).
pros: no change to interface, we never promised that "returnpc" was good for anything, did we?
cons: people thought we promised "returnpc" was a real live returnpc, and acted accordingly.
Fails to adequately discourage Java-style caller-sensitive security.
Also fails to prepare people for tail-call elimination.
(c) = (a) + extend the interface with a more generalized notion of program counter. Most old code will continue to sort-of work without change and no risk of gigantic surprise (e.g., illegal address dereference from believing that an option-b PC was a real address), new code will use the improved interface.
Note that there's nothing novel about abstracting on the "return PC" to get something that allows you to talk precisely about backtraces in the presence of inlining -- Java implementations have been doing this for more than 15 years.
The text was updated successfully, but these errors were encountered: