Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a builtin for comptime source information #2029

Closed
Sahnvour opened this issue Mar 3, 2019 · 24 comments · Fixed by #5628
Closed

add a builtin for comptime source information #2029

Sahnvour opened this issue Mar 3, 2019 · 24 comments · Fixed by #5628
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@Sahnvour
Copy link
Contributor

Sahnvour commented Mar 3, 2019

I want to propose a way to query the compiler in order to get information on the context of the compilation, at call site. In particular, this is mostly used to get source filename, line, function name.

Classical uses of this feature are logging, asserts, scoped variables (instrumentation, debugging...) and probably others I can't think of.

In C and C++ this is done through the __FILE__ and __LINE__ macros which are expanded by the preprocessor.
In Rust, there are file! and line! macros.
D also makes use of __FILE__ and such, though I don't know if it's actually a preprocessor.
And so on.

At the moment, something similar could be achieved by capturing a stack trace and looking at debug info, but this would not be very efficient. It would also be easily disturbed by the optimizations performed.

Zig could provide a @sourceInfo builtin function, returning an instance along the lines of

pub const SourceInfo = struct {
    file: []const u8,
    line: comptime_int,
    column: comptime_int,
    module: type,
    function: type,
    function_name: []const u8,
}

Example usage:

pub fn main() void {
    log(@sourceInfo(), "something something");
}

Example output:

[21:18:04] in 'main' (main.zig:2) | something something

Now, this leaves a usability concern: the other languages cited as examples all provide a way to the library writer to make get this information from client code. This means they can provide a log(msg: string) function or macro that transparently uses the source information.
In current Zig, the only way is to explicitly call @sourceInfo() and pass it to the function expecting it.

One solution is to provide a @callerSourceInfo to get the source information from the call site into the callee function. But it doesn't play nice with function pointers or extern functions.

@andrewrk
Copy link
Member

andrewrk commented Mar 3, 2019

This was previously rejected in #577, but you have made these new arguments:

capturing a stack trace and looking at debug info would not be very efficient.

It would also be easily disturbed by the optimizations performed.

I want to point out that a full stack trace does not need to be captured - in fact only a single usize returned by @returnAddress would need to be captured. A logging library could emit the addresses, which could be processed later with debug info to replace the addresses with source info, or it could resolve the source information immediately. This solves the problem where the caller would have to call @sourceInfo(). (The log function would want to use noinline #640 )

The problem of debug info not being efficient is a problem with the status quo implementation in userland, but that's not a fundamental problem. Applications which want to resolve debug information during runtime can pay an upfront cost to prepare the debug info structures, and then during the main program execution, lookups by address can be fast. However I still think that a more ideal solution would be simply printing addresses, and having the log processing (which can happen on a different machine, with debug info present that is not necessarily present on the main machine) perform the address to source translations.

As for optimizations, it's true that @sourceInfo() would be completely immune to optimizations, but I also think that debug info is sufficient even in optimized builds to map a machine code instruction back to the source information of where it came from, down to the line and column.

I'd like to push back on this. I think that a strategy with @returnAddress() and debug info is ultimately more optimal, for performance, code size, and convention (by convention I mean that a usize which corresponds to a machine code instruction address can be considered by applications and libraries to fully describe a particular source code expression, and becomes the standard way to store "here's how xyz corresponds to the source code") I'd rather improve the facilities for dealing with instruction addresses, rather than provide an additional solution to solve the same problem.

@andrewrk andrewrk added this to the 0.5.0 milestone Mar 3, 2019
@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Mar 3, 2019
@Sahnvour
Copy link
Contributor Author

Sahnvour commented Mar 3, 2019

Ha, I failed to find this previous issue.

I forgot about noinline but that only addresses half of the point: what if the function calling log is small (and log being noinline it does not contribute much to its size via a single call instr.) and gets inlined into its caller ? Can we guarantee that its instructions will carry the "right" debug info, referring to their original source ?
Maybe Zig does this better, but in my experience debug info are not reliable in optimized builds. As in: pairing inlined function code only with its calling source line, sometimes apparently multiple unrelated lines resolving to the same instructions, ...

It could be that one's writing a program where they would rather pay for a constant string of every file in their executable, rather than forcing noinline calls and post-processing of the addresses. This has been proved a faster approach for logging, but there are other usecases (such as real-time profiling/instrumenting that usually use __FILE__ etc.) that maybe cannot afford to have a lengthy post-processing. I've worked on (windows) projects where we record a big number of stack traces, and the impact of reading debug info was huge, with DIA.

If we can guarantee this though, I agree with you on the rest. I'm just not 100% positive that will be sufficiently accurate and fast.

@andrewrk
Copy link
Member

andrewrk commented Mar 3, 2019

what if the function calling log is small and gets inlined into its caller ? Can we guarantee that its instructions will carry the "right" debug info, referring to their original source ?

This would be guaranteed to be correct. The call to log would always be a call instruction, and the return address from within log implementation would be the next machine code instruction after call. So no matter how much the call to log gets inlined, still the return address would point to the correct thing.

sometimes apparently multiple unrelated lines resolving to the same instructions, ...

Indeed it is possible for the optimizer to merge multiple source lines into the same machine code, so that it becomes ambiguous which "parent" source line would correspond to a given instruction. I had forgotten, but here's an example of when that happened to me at OkCupid.

This is not just a problem with the technique described in this issue; it is a problem with the DWARF info emitted by gcc/clang. This problem exists in general for debugging using these debug symbols. Arguably the debug info should be able to represent when a single machine code address corresponds to multiple source locations. But even if that improvement were made, we're left facing the reality that @sourceInfo() would correctly disambiguate, while using debug info would not. However, in this case, the disambiguation done by @sourceInfo() would come at a cost. The very reason that there is an ambiguity in the other case is that it was more friendly to optimization, and the optimizer was able to produce efficient machine code that mapped to multiple source locations. So I could argue that the appropriate way to disambiguate would be to use @optimizeFor(.Debug) (#978) in a given scope. Otherwise, presumably, what was being optimized for was Release, and the optimization that causes multiple source lines to map to a single instruction is desired.

Here is how I would like to move forward with this issue:

  1. Try to make a proof of concept that the logging use case can be solved in userland, even with optimizations on, and that the performance is on-par with @sourceInfo(). This proof of concept will have to work with PDB as well as DWARF.
  2. Additionally make a proof of concept for the real-time profiling/instrumenting use case.
  3. If both of these proofs of concepts can be accomplished, then close this issue. Otherwise accept this issue.

@andrewrk
Copy link
Member

andrewrk commented Mar 4, 2019

@Sahnvour here's your gist:

const warn = @import("std").debug.warn;

fn log(msg: []const u8) void {
    warn("{} {}\n", @returnAddress(), msg);
    unreachable;
}

fn foo(n: u32) u32 {
    @noInlineCall(log, "log from foo");
    return n * 2;
}

fn bar(n: u32) u32 {
    return @inlineCall(foo, n);
}

pub fn main() void {
    warn("{}\n", bar(13));
}

output:

$ zig run returnaddress.zig
u8@7ff6c887f03f log from foo
reached unreachable code
D:\DEV\Zig\test\returnaddress.zig:5:5: 0x7ff6c887f250 in log (run.obj)
    unreachable;
    ^
D:\DEV\Zig\test\returnaddress.zig:15:12: 0x7ff6c887f03f in bar (run.obj)
    return @inlineCall(foo, n);
           ^
D:\DEV\Zig\test\returnaddress.zig:19:21: 0x7ff6c887f004 in main (run.obj)
    warn("{}\n", bar(13));
                    ^
D:\DEV\Install\zig\lib\zig\std\special\bootstrap.zig:51:40: 0x7ff6c887efe3 in ??? (run.obj)
    std.os.windows.ExitProcess(callMain());
                                       ^
???:?:?: 0x7ffb24fa3dc4 in ??? (???)


???:?:?: 0x7ffb27223691 in ??? (???)

You're pointing out that according to the stack trace, foo got inlined. But this does not show the source info for the @returnAddress() in log. Here's printing just the source info for the @returnAddress():

const std = @import("std");
const warn = std.debug.warn;

fn log(msg: []const u8) void {
    const outstream = &(std.io.getStdErr() catch unreachable).outStream().stream;
    std.debug.printSourceAtAddress(std.debug.getSelfDebugInfo() catch unreachable, outstream, @returnAddress(), true) catch unreachable;
}

fn foo(n: u32) u32 {
    @noInlineCall(log, "log from foo");
    return n * 2;
}

fn bar(n: u32) u32 {
    return @inlineCall(foo, n);
}

pub fn main() void {
    warn("{}\n", bar(13));
}
/home/andy/downloads/zig/build/test.zig:10:5: 0x22371d in ??? (test)
    @noInlineCall(log, "log from foo");
    ^
26

It says in ??? because I didn't implement the DWARF code for finding the symbol yet (it will show something on macos and windows), but it would probably be bar as you're pointing out, because it gets inlined. But my point is that it still points to the correct source expression. Even though the @noInlineCall(log, "log from foo"); got inlined, the return address still gets resolved to test.zig, line 10, column 5.

@Sahnvour
Copy link
Contributor Author

Sahnvour commented Mar 4, 2019

Well on my machine, here's the output with your modification:

$ zig run returnaddress.zig
D:\DEV\Zig\test\returnaddress.zig:15:12: 0x7ff6c691f03f in bar (run.obj)
    return @inlineCall(foo, n);

If I force inline the call to bar, the program outputs the line from main.
So the real source line is lost. I assume the compiler does not care about the platform it is emitting debug info for, so that must be the fault of LLVM or the PDB format, I guess ?

@andrewrk
Copy link
Member

andrewrk commented Mar 4, 2019

I see, that's good to know. Regardless of the outcome of this issue, I think it will be worth having a chat with some LLVM folks and see if this is a fundamental limitation of the PDB format, or if LLVM's debug info code needs to be improved.

If it is a fundamental limitation, or if improving LLVM's debug info is unlikely to happen, then that's pretty good reasoning for accepting this issue.

@Sahnvour
Copy link
Contributor Author

Sahnvour commented Mar 4, 2019

This is on par with my experience debugging on windows, WinDBG/VS' debugger are great but sometimes they just can't tell you where the inlined instructions are coming from.
I just stepped through the executable of our test case with WinDBG, and indeed all the instructions resulting from inlining are mapped to the calling source line. Tested with .inline 1 (see https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugging-optimized-code-and-inline-functions-external) for what it's worth.

@andrewrk andrewrk added the accepted This proposal is planned. label Mar 4, 2019
@ghost
Copy link

ghost commented Mar 18, 2019

A potential use for FILE/__dirname that I came across: If you have a comptime utility function that calls @embedFile, and you want to call that util function from another file, with a relative path.

@trishume
Copy link

Here's a potentially significant downside to offering this: Any use of it makes function-level incremental compilation impossible.

Consider this scenario: You have a 1000 line file with a bunch of log or panic calls that include line number information. You add a print call for debugging to a small function near the top of the file. But uh oh, now every single function in the rest of the file needs to be re-compiled instead of just the function you modified, because the source information they used changed.

If Zig wants to eventually have function-level incremental recompilation, which IMO is the best promise for <100ms common case debugging recompiles on arbitrarily sized projects, it should try to avoid this problem.

There's a separate but related problem where you do this within your compiler by entangling span information into your data structures. But that can be fixed with heavy refactoring, the actual behavioural changes can't be fixed as easily.

See Rust encountering this problem: rust-lang/rust#47389

@andrewrk
Copy link
Member

Thanks for bringing this up. That's a really good point and one that we hadn't considered before in this proposal. Ability to do function-level incremental compilation is definitely something that is planned and so this is a serious consideration.

@andrewrk andrewrk removed the accepted This proposal is planned. label Apr 18, 2019
@trishume
Copy link

I should note that the proposed use case for accessing the current file for path for relative compile time file lookup should be fine from this perspective. Another thing that should be fine is accessing the current function name. The file+function+message should be sufficient to locate the source of for example log messages fairly easily.

I like your idea of using the return address and debug info instead is also a great idea. I think a log library that output like [myfile.zig:0xRETADDR myFunction] log message if you passed sourceInfo would be quite serviceable. Then if you wanted lines you could pipe it through a formatter that replaced all return addresses with lines and columns inline and then it would follow the standard file:line:col format that terminals like iTerm2 allow you to command click to jump to.

@Sahnvour
Copy link
Contributor Author

That's serious concern, because I think source information is an important feature, but fast incremental compilation is even more so. Note that the proposal currently is, in my opinion, not final, regarding the usability issue described in my first comment. I thought before of doing something similar to what's described in the Rust issue, and maybe this can help achieve a better proposal.

Semi-serious workaround: only count line numbers inside functions, 0 being associated to the line of the first token of the declaration.

@rohlem
Copy link
Contributor

rohlem commented Sep 22, 2019

This might be obvious, but if we want to enable independent (-> incremental) compilation on a per-function basis, then we just need an intermediary "lookup table" step by function (/ other entry id).

So @sourceInfo().line_in_function can remain a field, but file-scope access like @sourceInfo().line_in_file becomes a method call @sourceInfo().getLineInFile() which does a [hashmap/ pair-list] lookup of "what offset does this function start at". Then on a change within a single function, we only need to update (the offsets in) that lookup table for all functions defined after it, and the previously compiled functions' code works without modification on the (perhaps even during-runtime-) updated data.

Note that this effectively demotes @sourceInfo().getLineInFile() to a runtime (comptime incompatible) value in incremental build modes. I don't think this considerably limits its use in any way.

@devzero
Copy link

devzero commented Jun 13, 2020

Would it be reasonable to just have a built-in that would provide comptime builtin.TypeInfo on the symbol referenced by @returnaddress?

@andrewrk
Copy link
Member

Thank you all for the considerations here. I refreshed my memory today by reading everyone's inputs, and put some thought into this, and I believe we can do it.

Name

Since it will be required to call this function explicitly (unlike using the C preprocessor for example), it is important that it has a short name that can be repeated at many callsites without issue. So for that I propose @src.

Return Type

Let's start with just this:

const SourceLocation = struct {
    file: []const u8,
    fn_name: []const u8,
    line: u32,
    column: u32,
}; 

Incremental Compilation

The key strategy here to make this work with incremental compilation is to make the return value of @src runtime-known rather than comptime known. This means that code will not be able to do conditional compilation based on a source line number or function name, and it means that when @src() invocations are moved around in a file, only a tiny amount of data needs to be updated - the line fields of each struct returned by @src() - but since the value is runtime known, it would not propagate into forcing the functions which have @src() callsites to be regenerated.

Calls Outside Function Scope

Calling outside function scope will be compile errors.

comptime {
    var x = @src(); // error: source information only available within function scope
}

Other Proposals

With this implemented, #5547 follows, with @src().fn_name, so that proposal can be closed in favor of this accepted one.

@andrewrk andrewrk added the accepted This proposal is planned. label Jun 18, 2020
@daurnimator
Copy link
Contributor

@andrewrk what if it took an address? e.g. @src(@returnAddress())?

@andrewrk
Copy link
Member

There are a lot of details and questions with that tiny code example. You're going to need to do a bunch more leg work of proposing & explaining

@hiroakitakada
Copy link

@andrewrk I support your decision because I believe @src is useful for small embedded devices, but I'd like to add one more proposal. In order to avoid repeating @src in every callsite, @callerSrc (or @callerSourceInfo is also OK) should be also introduced. The limitation that @callerSrc can be used only within "inline" function is acceptable.
I'm sorry, but I cannot figure out how @callerSrc impacts the compiler implementation.

@Sahnvour
Copy link
Contributor Author

@callerSrc() could be implemented in two ways:

  • it uses @returnAddress() and queries debug symbols to get the closest matching line information, or
  • the compiler notices its use, and passes a hidden argument pointing to the SourceLocation of the callsite to the function making use of @callerSrc()

The former needs limited compiler support but requires debugging symbols and may not be always precise, and the latter has hidden behaviour.

@SpexGuy
Copy link
Contributor

SpexGuy commented Jun 18, 2020

Maybe it should be a separate proposal, but it would be useful to be able to have a function name available at comptime, similar to how struct names are available at comptime.
I can do this:

if (comptime !structHasTraits(T))
    @compileError("Missing traits for type "++@typeName(T));

But I can't do this:

if (comptime !fnHasNeededParams(func))
    @compileError("Cannot generate for fn "++@fnName(func)++" ("++@typeName(@TypeOf(func))++")");

@trishume
Copy link

This seems reasonable, some things to consider when thinking about proposals:

  • Will this add a bunch of complexity in the data structures you use for incremental edges? Do you know how you're going to encode the fact that the SourceLocation structs need to be updated when the file they come from is changed (even if the resulting AST is the same)?
  • Does it remain possible to make an incremental compiler than can know how to call a function without parsing the body of that function? A cool parallel/incremental compiler design is to only parse all the signatures up front and then parse the bodies in parallel and only when they need to be updated. Things like @callerSrc can remove this property but you can maybe salvage it by adding some syntax to the signature, like making it a special automatic parameter type.

@Vexu Vexu mentioned this issue Jun 18, 2020
@alexnask
Copy link
Contributor

alexnask commented Jun 18, 2020

We should probably define what column actually means in this struct.
In @Vexu's PR it is implemented as a number of utf8-encoded bytes (if I am not mistaken), which is probably fine but we should clarify this since it could also be a number of unicode codepoints or some other measure of column.

@andrewrk
Copy link
Member

I'll be sure to play around with @src() in self-hosted, and do a proof-of-concept that it plays nicely with incremental compilation.

andrewrk added a commit that referenced this issue Jun 18, 2020
One of the main motivating use cases for this language feature is
tracing/profiling tools, which expect null-terminated strings for these
values. Since the data is statically allocated, making them
additionally null-terminated comes at no cost.

This prevents the requirement of compile-time code to convert to
null-termination, which could increase the compilation time of
code with tracing enabled.

See #2029
@ghost
Copy link

ghost commented Jun 23, 2020

I think #5675 could be relevant to this proposal. Source location struct could look like this:

const SourceLocation = struct {
    comptime file: []const u8,
    comptime fn_name: []const u8,
    line: u32,
    column: u32,
};

mlugg added a commit to mlugg/zig that referenced this issue Aug 19, 2024
* Indices of referenced captures
* Line and column of `@src()`

The second point aligns with a reversal of the "incremental compilation"
section of ziglang#2029 (comment).
This reversal was already done as ziglang#17688 (46a6d50), with the idea to
push incremental compilation down the line. My proposal is to keep it as
comptime-known, and simply re-analyze uses of `@src()` whenever their
line/column change.

I think this decision is reasonable for a few reasons:

* The Zig compiler is quite fast. Occasionally re-analyzing a few
  functions containing `@src()` calls is perfectly acceptable and won't
  noticably impact update times.
* The system described by Andrew in ziglang#2029 is currently vaporware.
* The system described by Andrew in ziglang#2029 is non-trivial to implement.
  In particular, it requires some way to have backends update a single
  global in certain cases, without re-doing semantic analysis. There is
  no other part of incremental compilation which requires this.
* Having `@src().line` be comptime-known is useful. For instance, ziglang#17688
  was justified by broken Tracy integration because the source line
  couldn't be comptime-known.
mlugg added a commit to mlugg/zig that referenced this issue Aug 20, 2024
* Indices of referenced captures
* Line and column of `@src()`

The second point aligns with a reversal of the "incremental compilation"
section of ziglang#2029 (comment).
This reversal was already done as ziglang#17688 (46a6d50), with the idea to
push incremental compilation down the line. My proposal is to keep it as
comptime-known, and simply re-analyze uses of `@src()` whenever their
line/column change.

I think this decision is reasonable for a few reasons:

* The Zig compiler is quite fast. Occasionally re-analyzing a few
  functions containing `@src()` calls is perfectly acceptable and won't
  noticably impact update times.
* The system described by Andrew in ziglang#2029 is currently vaporware.
* The system described by Andrew in ziglang#2029 is non-trivial to implement.
  In particular, it requires some way to have backends update a single
  global in certain cases, without re-doing semantic analysis. There is
  no other part of incremental compilation which requires this.
* Having `@src().line` be comptime-known is useful. For instance, ziglang#17688
  was justified by broken Tracy integration because the source line
  couldn't be comptime-known.
mlugg added a commit to mlugg/zig that referenced this issue Aug 21, 2024
* Indices of referenced captures
* Line and column of `@src()`

The second point aligns with a reversal of the "incremental compilation"
section of ziglang#2029 (comment).
This reversal was already done as ziglang#17688 (46a6d50), with the idea to
push incremental compilation down the line. My proposal is to keep it as
comptime-known, and simply re-analyze uses of `@src()` whenever their
line/column change.

I think this decision is reasonable for a few reasons:

* The Zig compiler is quite fast. Occasionally re-analyzing a few
  functions containing `@src()` calls is perfectly acceptable and won't
  noticably impact update times.
* The system described by Andrew in ziglang#2029 is currently vaporware.
* The system described by Andrew in ziglang#2029 is non-trivial to implement.
  In particular, it requires some way to have backends update a single
  global in certain cases, without re-doing semantic analysis. There is
  no other part of incremental compilation which requires this.
* Having `@src().line` be comptime-known is useful. For instance, ziglang#17688
  was justified by broken Tracy integration because the source line
  couldn't be comptime-known.
richerfu pushed a commit to richerfu/zig that referenced this issue Oct 28, 2024
* Indices of referenced captures
* Line and column of `@src()`

The second point aligns with a reversal of the "incremental compilation"
section of ziglang#2029 (comment).
This reversal was already done as ziglang#17688 (46a6d50), with the idea to
push incremental compilation down the line. My proposal is to keep it as
comptime-known, and simply re-analyze uses of `@src()` whenever their
line/column change.

I think this decision is reasonable for a few reasons:

* The Zig compiler is quite fast. Occasionally re-analyzing a few
  functions containing `@src()` calls is perfectly acceptable and won't
  noticably impact update times.
* The system described by Andrew in ziglang#2029 is currently vaporware.
* The system described by Andrew in ziglang#2029 is non-trivial to implement.
  In particular, it requires some way to have backends update a single
  global in certain cases, without re-doing semantic analysis. There is
  no other part of incremental compilation which requires this.
* Having `@src().line` be comptime-known is useful. For instance, ziglang#17688
  was justified by broken Tracy integration because the source line
  couldn't be comptime-known.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants