-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a LazyString type, for places (like errors) which we would prefer to defer the actual work #33711
Conversation
This is not my favorite way to deal with this, since the large number of types generated for the closures is also pretty sub-optimal for the compiler. A possible alternative is to have a macro that makes string interpolation lazy, i.e. returning an object with the arguments. That might be useful for other purposes as well. |
Alright, how do you like this version? Seems to still give the same benefits in inference (modulo a few extra LazyString constructors). |
@JeffBezanson do you like this approach better? If so, I'll fix up this PR and get it ready. |
I kind of like this as an implementation, but it suggests to me that having a macro for throwing errors would be beneficial, e.g. |
base/strings/lazy.jl
Outdated
:(LazyString($(parts...))) | ||
end | ||
|
||
function show(io::IO, s::LazyString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function show(io::IO, s::LazyString) | |
function print(io::IO, s::LazyString) |
Yes I like this. Unfortunately implementing the normal string interface for this type is not very natural, but oh well. |
13571c2
to
0d26536
Compare
@StefanKarpinski related to a possible (See also this crazy idea #7026 (comment) on how we might unify return codes with exceptions. It would make even more sense if combined with pattern matching of key value pairs.) |
d3b19f5
to
7f56d0e
Compare
I've been using the logging infrastructure for PkgServer and it's quite nice to just list key-value pairs for values that may be of interest when looking at logs. Doing something similar for errors would make sense to me as well. |
This has been open for a while, is it worth closing it now? |
@User-764Q please stop commenting on issues/PRs when the only information you bring up is that they are "old". It creates notifications for a lot of people. |
Just saw this issue it seems this is also useful for solving KristofferC/Crayons.jl#47 since when interpolating crayons we are lack of the IO context. Doing the interpolation lazily would defer this information when we actually print it which solves the problem. I'm wondering if people still want it in Base for other reasons or maybe we can have this in a package first. |
7f56d0e
to
1283a39
Compare
190c7d5
to
1da5d4e
Compare
I think this is good to go. Will merge in a bit so if there are any objections, say so now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good useful utility, and I like the @lazy_str
export. For minimalism I could live without the LazyString
export, but leaving it in also seems harmless enough.
@@ -914,19 +914,21 @@ end | |||
# copy from an some iterable object into an AbstractArray | |||
function copyto!(dest::AbstractArray, dstart::Integer, src, sstart::Integer) | |||
if (sstart < 1) | |||
throw(ArgumentError(string("source start offset (",sstart,") is < 1"))) | |||
throw(ArgumentError(LazyString("source start offset (",sstart,") is < 1"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these cases be using the string macro for consistency? It seems you've included strings/lazy.jl
before abstractarray.jl during bootstrap so I guess it should work?
[edit] ah, I see that some uses of string()
are just for keeping indentation under control. It sure would be nice if it was easier to reuse the parsing logic from #40753 so we had consistency of escaping rules for these kind of "almost but not quite a normal string" type of custom string literals. Rather than needing the raw string escaping rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did the straightforward replacement. We can change these for consistency in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Certainly fine by me.
This seems like it might potentially move in the opposite direction of the recent escape analysis-driven design changes, where we wanted to capture objects less frequently. |
as much work as possible to either the macro or later printing operations. | ||
""" | ||
struct LazyString <: AbstractString | ||
parts::Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to allocate an Array here, so we do not need to create a whole new Tuple type and can examine and iterate it more easily later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be be an immutable array once that's merged, but currently the analysis in #43852 is not strong enough to model the mutation of the mutable array at the moment. I think for the current use cases, a tuple is mostly fine, since the types will be known at compile time and are stable. Plus it's the error path anyway, which we heavily pessimize and in particular, don't infer, so it'll allocate tons of tuple types anyway.
base/strings/lazy.jl
Outdated
This type is designed to be cheap to construct at runtime, trying to offload | ||
as much work as possible to either the macro or later printing operations. | ||
""" | ||
struct LazyString <: AbstractString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit cheeky to declare that this is a kind of String, then implement none of the interface for one.
We could potentially make this object mutable, then swap the field for the string'd value when you first access it, perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's reasonable. Let me do that.
Well, as I said in those discussions, I don't like the idea to string summarize on error construction and if there are escaping issues, we should insert maybecopies. |
849858c
to
286f803
Compare
286f803
to
6522370
Compare
end | ||
|
||
function String(l::LazyString) | ||
if !isdefined(l, :str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any worry about race conditions for code like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We can use an atomic acquire and cmpswap release here to fix that. (Or perhaps create an "assign_once" type, that manages this pattern for us?)
I'm still not convinced that maybecopy is the right solution. One problem of the maybecopy idea is that it will require a global transformation, or many similar builtin function definitions. julia> const Lx = Ref{Any}();
julia> @noinline storelen!(xs) = Lx[] = length(xs);
julia> code_escapes() do
a = [1,2,3]
storelen!(a)
a[2]
end
#9() in Main at REPL[9]:2
2 ◌ 1 ─ %1 = Core.tuple(1, 2, 3)::Core.Const((1, 2, 3)) │╻ vect
↑ │ %2 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Int64}, svec(Any, Int64), 0, :(:ccall), Vector{Int64}, 3, 3))::Vector{Int64}
◌ └── goto #7 if not true ││
✓ 2 ┄ %4 = φ (#1 => 1, #6 => %16)::Int64 ││
✓ │ %5 = φ (#1 => 1, #6 => %17)::Int64 ││
✓ │ %6 = φ (#1 => 1, #6 => %9)::Int64 ││
◌ │ %7 = Base.getfield(%1, %4, true)::Int64 ││╻ getindex
◌ │ Base.arrayset(false, %2, %7, %6)::Vector{Int64} ││╻ setindex!
◌ │ %9 = Base.add_int(%6, 1)::Int64 ││
◌ │ %10 = (%5 === 3)::Bool │││╻ ==
◌ └── goto #4 if not %10 │││
◌ 3 ─ Base.nothing::Nothing │││
◌ └── goto #5 │││
◌ 4 ─ %14 = Base.add_int(%5, 1)::Int64 │││╻ +
◌ └── goto #5 ││╻ iterate
✓ 5 ┄ %16 = φ (#4 => %14)::Int64 ││
✓ │ %17 = φ (#4 => %14)::Int64 ││
◌ │ %18 = φ (#3 => true, #4 => false)::Bool ││
◌ │ %19 = Base.not_int(%18)::Bool ││
◌ └── goto #7 if not %19 ││
◌ 6 ─ goto #2 ││
◌ 7 ┄ goto #8 ││
3 ◌ 8 ─ invoke Main.storelen!(%2::Vector{Int64})::Any │
4 ◌ │ %24 = Base.arrayref(true, %2, 2)::Int64 │╻ getindex
◌ └── return %24 │ This is because the constructor of To me that sounds more complicated and I'd say it's much simpler if we change the semantics of certain error constructors so that they don't capture and let analyzers and optimizers assume that invariant. |
Yes, the initial implementation of |
One common theme when looking at inference logs is that formatting error messages
tends to pull in the whole I/O and printing system which a) can take some time to
infer and b) adds backedges if that stuff gets invalidated. Both seem wasteful.
This is an attempt to try to improve the situation. For various error structs
that used to contain a string, they now take either a String or a closure that
will format the required string on demand when showing the error.
I have some tooling to count the total number of methods that inference looks
at and went through and switched over all the error messages that are used
by *(::Matrix{Float64}, ::Matrix{Float64} to the new setup. Doing this
reduces the total number of methods inference looks at by about 20%,
which seems like a worthwhile improvement.
Obviously there are a lot of error sites that could potentially throw, but this approach is backwards compatible, so if we like the direction, we can merge it and then we can gradually go through and clean up the callsite (or somebody could write a femtocleaner pass to do it).