Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a LazyString type, for places (like errors) which we would prefer to defer the actual work #33711

Merged
merged 1 commit into from
Jan 20, 2022

Conversation

Keno
Copy link
Member

@Keno Keno commented Oct 29, 2019

One common theme when looking at inference logs is that formatting error messages
tends to pull in the whole I/O and printing system which a) can take some time to
infer and b) adds backedges if that stuff gets invalidated. Both seem wasteful.
This is an attempt to try to improve the situation. For various error structs
that used to contain a string, they now take either a String or a closure that
will format the required string on demand when showing the error.

I have some tooling to count the total number of methods that inference looks
at and went through and switched over all the error messages that are used
by *(::Matrix{Float64}, ::Matrix{Float64} to the new setup. Doing this
reduces the total number of methods inference looks at by about 20%,
which seems like a worthwhile improvement.

Obviously there are a lot of error sites that could potentially throw, but this approach is backwards compatible, so if we like the direction, we can merge it and then we can gradually go through and clean up the callsite (or somebody could write a femtocleaner pass to do it).

@JeffBezanson
Copy link
Member

This is not my favorite way to deal with this, since the large number of types generated for the closures is also pretty sub-optimal for the compiler. A possible alternative is to have a macro that makes string interpolation lazy, i.e. returning an object with the arguments. That might be useful for other purposes as well.

@Keno
Copy link
Member Author

Keno commented Oct 29, 2019

Alright, how do you like this version? Seems to still give the same benefits in inference (modulo a few extra LazyString constructors).

@Keno
Copy link
Member Author

Keno commented Oct 29, 2019

@JeffBezanson do you like this approach better? If so, I'll fix up this PR and get it ready.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Oct 30, 2019

I kind of like this as an implementation, but it suggests to me that having a macro for throwing errors would be beneficial, e.g. @throw DomainError "blah $x blah".

:(LazyString($(parts...)))
end

function show(io::IO, s::LazyString)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function show(io::IO, s::LazyString)
function print(io::IO, s::LazyString)

@JeffBezanson
Copy link
Member

Yes I like this. Unfortunately implementing the normal string interface for this type is not very natural, but oh well.

base/strings/lazy.jl Outdated Show resolved Hide resolved
@fredrikekre fredrikekre added needs compat annotation Add !!! compat "Julia x.y" to the docstring needs news A NEWS entry is required for this change labels Oct 31, 2019
@c42f
Copy link
Member

c42f commented Oct 31, 2019

@StefanKarpinski related to a possible @throw, I've sometimes thought it would be useful to be able to attach arbitrary key-value pairs to exceptions in much the same way we attach them to log records. I think people sometimes have a temptation to log-then-throw for adding context which is really not ideal.

(See also this crazy idea #7026 (comment) on how we might unify return codes with exceptions. It would make even more sense if combined with pattern matching of key value pairs.)

@Keno Keno force-pushed the kf/errorclosures branch 2 times, most recently from d3b19f5 to 7f56d0e Compare October 31, 2019 11:09
@StefanKarpinski
Copy link
Member

I've been using the logging infrastructure for PkgServer and it's quite nice to just list key-value pairs for values that may be of interest when looking at logs. Doing something similar for errors would make sense to me as well.

@User-764Q
Copy link

This has been open for a while, is it worth closing it now?

@KristofferC
Copy link
Member

@User-764Q please stop commenting on issues/PRs when the only information you bring up is that they are "old". It creates notifications for a lot of people.

@Roger-luo
Copy link
Contributor

Just saw this issue it seems this is also useful for solving KristofferC/Crayons.jl#47 since when interpolating crayons we are lack of the IO context. Doing the interpolation lazily would defer this information when we actually print it which solves the problem.

I'm wondering if people still want it in Base for other reasons or maybe we can have this in a package first.

base/strings/lazy.jl Outdated Show resolved Hide resolved
@Keno Keno changed the title RFC: Allow a closure in various kinds of errors Allow a closure in various kinds of errors Jan 19, 2022
@Keno
Copy link
Member Author

Keno commented Jan 19, 2022

I think this is good to go. Will merge in a bit so if there are any objections, say so now.

Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good useful utility, and I like the @lazy_str export. For minimalism I could live without the LazyString export, but leaving it in also seems harmless enough.

@@ -914,19 +914,21 @@ end
# copy from an some iterable object into an AbstractArray
function copyto!(dest::AbstractArray, dstart::Integer, src, sstart::Integer)
if (sstart < 1)
throw(ArgumentError(string("source start offset (",sstart,") is < 1")))
throw(ArgumentError(LazyString("source start offset (",sstart,") is < 1")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these cases be using the string macro for consistency? It seems you've included strings/lazy.jl before abstractarray.jl during bootstrap so I guess it should work?

[edit] ah, I see that some uses of string() are just for keeping indentation under control. It sure would be nice if it was easier to reuse the parsing logic from #40753 so we had consistency of escaping rules for these kind of "almost but not quite a normal string" type of custom string literals. Rather than needing the raw string escaping rules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did the straightforward replacement. We can change these for consistency in a followup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly fine by me.

@vtjnash
Copy link
Member

vtjnash commented Jan 20, 2022

This seems like it might potentially move in the opposite direction of the recent escape analysis-driven design changes, where we wanted to capture objects less frequently.

as much work as possible to either the macro or later printing operations.
"""
struct LazyString <: AbstractString
parts::Tuple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to allocate an Array here, so we do not need to create a whole new Tuple type and can examine and iterate it more easily later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be be an immutable array once that's merged, but currently the analysis in #43852 is not strong enough to model the mutation of the mutable array at the moment. I think for the current use cases, a tuple is mostly fine, since the types will be known at compile time and are stable. Plus it's the error path anyway, which we heavily pessimize and in particular, don't infer, so it'll allocate tons of tuple types anyway.

This type is designed to be cheap to construct at runtime, trying to offload
as much work as possible to either the macro or later printing operations.
"""
struct LazyString <: AbstractString
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit cheeky to declare that this is a kind of String, then implement none of the interface for one.

We could potentially make this object mutable, then swap the field for the string'd value when you first access it, perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's reasonable. Let me do that.

@Keno
Copy link
Member Author

Keno commented Jan 20, 2022

This seems like it might potentially move in the opposite direction of the recent escape analysis-driven design changes, where we wanted to capture objects less frequently.

Well, as I said in those discussions, I don't like the idea to string summarize on error construction and if there are escaping issues, we should insert maybecopies.

@Keno Keno removed needs tests Unit tests are required for this change needs news A NEWS entry is required for this change needs compat annotation Add !!! compat "Julia x.y" to the docstring labels Jan 20, 2022
@vtjnash vtjnash changed the title Allow a closure in various kinds of errors Create a LazyString type, for places (like errors) which we would prefer to defer the actual work Jan 20, 2022
@Keno Keno merged commit f290339 into master Jan 20, 2022
@Keno Keno deleted the kf/errorclosures branch January 20, 2022 22:35
end

function String(l::LazyString)
if !isdefined(l, :str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any worry about race conditions for code like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We can use an atomic acquire and cmpswap release here to fix that. (Or perhaps create an "assign_once" type, that manages this pattern for us?)

@aviatesk
Copy link
Member

aviatesk commented Jan 21, 2022

This seems like it might potentially move in the opposite direction of the recent escape analysis-driven design changes, where we wanted to capture objects less frequently.

Well, as I said in those discussions, I don't like the idea to string summarize on error construction and if there are escaping issues, we should insert maybecopies.

I'm still not convinced that maybecopy is the right solution. One problem of the maybecopy idea is that it will require a global transformation, or many similar builtin function definitions.
E.g. if we want to allocate a on the stack in the following example, where a can only escape via possible bounds error in arrayref call, we need to change %24 = Base.arrayref(true, %2, 2)::Int64 to something like %24 = Base.arrayref_copy_on_throw(true, %2, 2)::Int, where arrayref_copy_on_throw uses another version of BoundsError constructor that makes a copy of a:

julia> const Lx = Ref{Any}();

julia> @noinline storelen!(xs) = Lx[] = length(xs);

julia> code_escapes() do
          a = [1,2,3]
          storelen!(a)
          a[2]
      end
#9() in Main at REPL[9]:2
21%1  = Core.tuple(1, 2, 3)::Core.Const((1, 2, 3))                                │╻      vect
  %2  = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Int64}, svec(Any, Int64), 0, :(:ccall), Vector{Int64}, 3, 3))::Vector{Int64}
  ◌  └──       goto #7 if not true                                                       ││     2%4  = φ (#1 => 1, #6 => %16)::Int64                                             ││     
  ✓  │   %5  = φ (#1 => 1, #6 => %17)::Int64                                             ││     
  ✓  │   %6  = φ (#1 => 1, #6 => %9)::Int64                                              ││     
  ◌  │   %7  = Base.getfield(%1, %4, true)::Int64                                        ││╻      getindex
  ◌  │         Base.arrayset(false, %2, %7, %6)::Vector{Int64}                           ││╻      setindex!
  ◌  │   %9  = Base.add_int(%6, 1)::Int64                                                ││     
  ◌  │   %10 = (%5 === 3)::Bool                                                          │││╻      ==
  ◌  └──       goto #4 if not %10                                                        │││    3 ─       Base.nothing::Nothing                                                     │││    
  ◌  └──       goto #5                                                                   │││    4%14 = Base.add_int(%5, 1)::Int64                                                │││╻      +
  ◌  └──       goto #5                                                                   ││╻      iterate5%16 = φ (#4 => %14)::Int64                                                      ││     
  ✓  │   %17 = φ (#4 => %14)::Int64                                                      ││     
  ◌  │   %18 = φ (#3 => true, #4 => false)::Bool                                         ││     
  ◌  │   %19 = Base.not_int(%18)::Bool                                                   ││     
  ◌  └──       goto #7 if not %19                                                        ││     6 ─       goto #2                                                                   ││     7 ┄       goto #8                                                                   ││     
38 ─       invoke Main.storelen!(%2::Vector{Int64})::Any4 ◌  │   %24 = Base.arrayref(true, %2, 2)::Int64                                         │╻      getindex
  ◌  └──       return %24

This is because the constructor of BoundsError is opaque from this frame, and it's tricky to change the semantics of it based on the analysis on this frame. So probably the most sensible way to implement the maybecopy idea is to define "no-capturing on exception" alternatives of certain builtin functions whose no-throw checks inherently need to be conservative.

To me that sounds more complicated and I'd say it's much simpler if we change the semantics of certain error constructors so that they don't capture and let analyzers and optimizers assume that invariant.

@Keno
Copy link
Member Author

Keno commented Jan 22, 2022

Yes, the initial implementation of maybecopy is just to always copy the array, so you can assume no-escape. The issue people had with that is that the array may be big, so we want to have the flexibility to not do that in contexts where memory allocations are supposed to be tightly controlled. That's why the maybecopy is useful, because the optimizer can choose. I agree with you of course that it requires a global transformation, but I also think the extra memory issue is somewhat theoretical, and by the time it becomes a problem we should be able to do the compiler transform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.