-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix how the RUSTC_CTFE_BACKTRACE
env var is gotten.
#68792
Conversation
This environment variable is currently obtained very frequently in CTFE-heavy code; using `lazy_static` avoids repeating the work. For the `ctfe-stress-4` benchmark this eliminates 67% of allocations done, and for `coercions` it eliminates 17% of allocations done.
@bors try @rust-timer queue |
⌛ Trying commit 15dcf1f with merge c5dab8b1f4945c397a8affdd72965ce2b921ec83... |
☀️ Try build successful - checks-azure |
Hm.. the issue is, Miri relies on changing this env var between MIR generation and running the interpreter. When working on Miri, I'd like backtraces only for Miri-related errors, not for CTFE-related errors. Cc @oli-obk |
What the heck is going on there that so many interpreter errors are being raised? My first guess would be const-prop (Cc @wesleywiser). I am beginning to wonder if having many thousand interpreter errors raised during successful compilation is good design... |
The alternative design is to basically duplicate all interpreter logic. I'm not sure how design this differently other than making the error type a generic part of the machine. |
Can the error constructor somehow figure out if it is run in a CTFE query or in "main" Miri? To my understanding, queries already use thread-local state for some amount of tracking, so this could maybe tie into that. |
well... that basically boils down to the above, but making the thread local global publically configurable |
Could we add a function to the |
The thing is, the conversion function doesn't have access to the @oli-obk I am not sure what you are saying my proposal boils down to. For now, I still think something involving TLS is the least ugly of the solutions proposed so far. |
I'm thinking that we could add a function taking a closure that sets a thread local turning off backtraces and once the closure finishes, clears that thread local. Then const prop could invoke miri somewhat like miri_backtrace_helper(|| ecx.some_code()) and all the code in |
@rust-timer build c5dab8b1f4945c397a8affdd72965ce2b921ec83 |
Queued c5dab8b1f4945c397a8affdd72965ce2b921ec83 with parent 01db581, future comparison URL. |
For I'm happy to close this PR because I can see that it's not the right approach. But this is a big perf hit for CTFE and should be fixed. @RalfJung, are you happy to take over? I can file an issue if that would be helpful. |
If there's a chosen approach that's better than this, I would be happy to take this over and implement it. |
I'm afraid I don't have much time I can invest into this. But I agree we should find some fix, and it looks like @wesleywiser is willing to do the implementation work once we found something resembling a solution.
Right, so we actually have three possible contexts for Miri: const eval, const prop, stand-alone. And ideally we'd want to treat them all separately. This does feel like a Is there a way to identify the "active query", so that we could test whether that is const-eval, const-prop, or something else, without having to introduce our own (and, strictly speaking, redundant) tracking? |
It's very much possible
Note that it is a very very bad idea to use this information to influence compilation. Using it for backtrace info is ok, since using backtrace info to influence compilation is already bad (and we don't do that). |
So, would that be a reasonable approach? Or would it be too fragile? |
No, it seems like a reasonable solution, but should have some big comments on it stating that this information may only be used for backtracing or diagnostics. |
I will work on that this week. |
I will close this PR because it's not the right approach. I look forward to seeing @wesleywiser's alternative :) |
Interestingly, the vast majority of errors do not seem to be coming from const-prop. So checking what the active query is doesn't make a noticeable difference in terms of performance. @oli-obk @RalfJung what do you think of this approach instead? This seems to recover all of the performance of this PR and miri can still setup its loggers here by setting |
@wesleywiser where else do they come from? The problem with |
I instrumented the function to dump the query stack at the time the function is called. Here's the results:
Backtrace for the top group:
|
Repeating from above, in case it's useful:
|
@wesleywiser ah... and then after the error is raised, here we catch it again. Yeah, we should really be more careful about catching errors... given the backtrace generation, there are only few places where we want to do that. The fix for this particular one, IMO, is not to catch an error. So, |
The more general question of course is, where else are we catching errors, and how can we avoid doing so accidentally? |
Before this change, `get_size_and_align()` calls `get_fn_alloc()` *a lot* in CTFE heavy code. This previously returned an `Error` which would check if `RUSTC_CTFE_BACKTRACE` was set on construction. Doing this turned out to be a performance hotspot as @nnethercote discovered in rust-lang#68792. This is an alternate take on that PR which resolves the performance issue by generating *many* fewer errors. Previously, `ctfe-stress-4` would generate over 5,000,000 errors each of which would check for the presence of the environment variable. With these changes, that number is reduced to 30.
Check `RUSTC_CTFE_BACKTRACE` much less by generating fewer errors Before this change, `get_size_and_align()` calls `get_fn_alloc()` *a lot* in CTFE heavy code. This previously returned an `Error` which would check if `RUSTC_CTFE_BACKTRACE` was set on construction. Doing this turned out to be a performance hotspot as @nnethercote discovered in #68792. This is an alternate take on that PR which resolves the performance issue by generating *many* fewer errors. Previously, `ctfe-stress-4` would generate over 5,000,000 errors each of which would check for the presence of the environment variable. With these changes, that number is reduced to 30. r? @RalfJung
This environment variable is currently obtained very frequently in
CTFE-heavy code; using
lazy_static
avoids repeating the work.For the
ctfe-stress-4
benchmark this eliminates 67% of allocationsdone, and for
coercions
it eliminates 17% of allocations done.r? @RalfJung