Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add read/write specialisation for IOContext{AnnIO} #53715

Merged
merged 1 commit into from
May 2, 2024

Conversation

tecosaur
Copy link
Contributor

@tecosaur tecosaur commented Mar 12, 2024

Now that we have AnnotatedIOBuffer and StyledStrings, I decided to revisit my "tell me about this struct" utility, and make it less of a kludge.

Along the way I ran into what I think is essentially a rather rough edge that can be smoothed over with a few extra method specialisations.

See the commit message for details on this what's been done, it's changed a few times now.

@tecosaur tecosaur added io Involving the I/O subsystem: libuv, read, write, etc. strings "Strings!" labels Mar 12, 2024
@vtjnash
Copy link
Member

vtjnash commented Mar 12, 2024

Two of those seem like what I would have expected the AbstractPipe would already do for forwarding these to the underlying implementation

@tecosaur
Copy link
Contributor Author

Running into this behaviour took me by surprise, but I think it's because AnnotatedIOBuffer implements pipe_reader and pipe_writer to point to the IOBuffer inside it. The generic write method for IOContext then writes to the pipe_writer, hence the need for this specialisation.

@vtjnash
Copy link
Member

vtjnash commented Mar 13, 2024

Okay, I see this calls instead StyledStrings._ansi_writer because of overloading StyledStrings, rather than first unwrapping the IOContext. I think that suggests that it should unwrap any IOContext before dispatching to StyledStrings._ansi_writer for these types? Otherwise this looks like just too much of a narrow special case, that can be tricked simply by cases such as IOContext{Any}

@tecosaur
Copy link
Contributor Author

Thanks for taking a further look into this. Do I read you correctly in that you're suggesting we resolve this issue by tackling the general case where reads/writes with an IOContext behave differently from reads/writes to the wrapped IO?

@vtjnash
Copy link
Member

vtjnash commented Mar 13, 2024

yes

@tecosaur
Copy link
Contributor Author

Cool, I'll have a look at doing that and see about updating this PR to take that approach instead.

@tecosaur
Copy link
Contributor Author

Hmmm, my first attempt at this hasn't gone well

# We want reads/writes involving an `IOContext` to dispatch to to underlying `IO`,
# rather than on `IOContext`.
write(io::IOContext, x::Any) = write(io.io, x)
read(io::IOContext, x::Any)  = read(io.io, x)
# To resolve method ambiguities
write(io::IO, x::AbstactChar) = write(io.io, x)
write(io::IO, x::AbstactString) = write(io.io, x)
write(io::IO, x::Union{String,SubString{String}}) = write(io.io, x)
write(io::IO, x::UInt8) = write(io.io, x)

(when I take julia +nightly and eval the above in a begin ... end block) produces


SYSTEM (REPL): showing an error caused an error

SYSTEM (REPL): caught exception of type Unreachable reached at 0x7f0b8cb176c6

[28873] signal 4 (2): Illegal instruction
in expression starting at none:0
print at ./strings/io.jl:250
unknown function (ip: 0x7f0b8cb176f6)
print at ./strings/io.jl:46
...

zsh: illegal hardware instruction (core dumped)  julia +nightly
18:09:05 ❯ 

I'm also not sure if there's a good general method do deal with all the potential method ambiguities introduced.

I'm confident about the methods for the IOContext{AnnotatedIOBuffer} case since we can know ahead of time which methods need to be implemented and that no ambiguities are introduced, but I'm struggling to see how to make a general solution work here.

If anybody has thoughts on this, I'd love to hear them.

@Seelengrab
Copy link
Contributor

Seelengrab commented Mar 15, 2024

I'm also not sure if there's a good general method do deal with all the potential method ambiguities introduced.

Possibly related: #47385, #47325

@tecosaur
Copy link
Contributor Author

After a rather productive chat with Nathan Zimmerberg on slack, I'm thinking that:

  • It probably wouldn't be a good idea to always pass through to the underlying IO of an IOContext
  • We can fix this headache just as effectively by making a "pass through to the pipe_writer/pipe_reader" method in annotated.jl

Let me know what you think!

@tecosaur tecosaur force-pushed the annotated-context-buffer branch from 0daade5 to 3f41272 Compare March 21, 2024 17:18
@tecosaur
Copy link
Contributor Author

Update: this breaks the behaviour of IOContext{IOBuffer} with :color => true. Maybe we should just go with the IOContext{AnnotatedIOBuffer} specialisation after all?

@tecosaur
Copy link
Contributor Author

tecosaur commented Mar 22, 2024

I suppose the alterative would be that we try to shoehorn the type of pipe_writer(io) into the dispatch with a method like so:

# So that read/writes with `IOContext` (and any similar `AbstractPipe` wrappers)
# work as expected.
function write(io::AbstractPipe, s::Union{AnnotatedString, SubString{<:AnnotatedString}})
    if pipe_writer(io) isa AnnotatedIOBuffer
        write(pipe_writer(io), s)
    else
        invoke(write, Tuple{IO, typeof(s)}, io, s)
    end
end
# Can't be part of the `Union` above because it introduces method ambiguities
function write(io::AbstractPipe, c::AnnotatedChar)
    if pipe_writer(io) isa AnnotatedIOBuffer
        write(pipe_writer(io), c)
    else
        invoke(write, Tuple{IO, typeof(c)}, io, c)
    end
end

I don't think this can work for read though, but that is just a convenience method (in other words: it's nice for completeness, but not alleviating a headache the way write is).

@tecosaur
Copy link
Contributor Author

If anyone has thoughts on this, I'd love to hear them. Currently, I find the alternative mentioned in my last comment a bit ugly, but if there's no other input I'll just go with that.

@tecosaur tecosaur force-pushed the annotated-context-buffer branch from 3f41272 to 10ea217 Compare March 31, 2024 09:17
@tecosaur
Copy link
Contributor Author

I've updated this PR. @vtjnash please let me know what you think of the current state.

@tecosaur
Copy link
Contributor Author

tecosaur commented Apr 2, 2024

Let me know if I'm being too ambitious, but I'm going to speculatively tag this with backport-1.11 since it (IMO) improves the state of this feature in the 1.11 release.

@tecosaur tecosaur force-pushed the annotated-context-buffer branch 2 times, most recently from fc2f42b to 50833bd Compare May 1, 2024 13:24
@tecosaur
Copy link
Contributor Author

tecosaur commented May 1, 2024

I've just resolved a recently-introduced merge conflict.

More broadly, I still think this code looks a bit ugly, but it seems to work, and it's functionality that I think is rather worth having.

Should we go ahead and merge this, and let any improvements/nicer implementations come along in the future?

Comment on lines +461 to +519
function write(io::AbstractPipe, s::Union{AnnotatedString, SubString{<:AnnotatedString}})
if pipe_writer(io) isa AnnotatedIOBuffer
write(pipe_writer(io), s)
else
invoke(write, Tuple{IO, typeof(s)}, io, s)
end::Int
end
# Can't be part of the `Union` above because it introduces method ambiguities
function write(io::AbstractPipe, c::AnnotatedChar)
if pipe_writer(io) isa AnnotatedIOBuffer
write(pipe_writer(io), c)
else
invoke(write, Tuple{IO, typeof(c)}, io, c)
end::Int
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is a 100% optional suggestion that is ultimately a matter of taste

As you are most likely aware, you could avoid the code duplication by using @eval, i.e.

Suggested change
function write(io::AbstractPipe, s::Union{AnnotatedString, SubString{<:AnnotatedString}})
if pipe_writer(io) isa AnnotatedIOBuffer
write(pipe_writer(io), s)
else
invoke(write, Tuple{IO, typeof(s)}, io, s)
end::Int
end
# Can't be part of the `Union` above because it introduces method ambiguities
function write(io::AbstractPipe, c::AnnotatedChar)
if pipe_writer(io) isa AnnotatedIOBuffer
write(pipe_writer(io), c)
else
invoke(write, Tuple{IO, typeof(c)}, io, c)
end::Int
end
# (can't do this via a single method with a 'Union' argument due to method ambiguities)
for T in (AnnotatedChar, AnnotatedString, SubString{<:AnnotatedString}) do
@eval function write(io::AbstractPipe, arg::$T)
if pipe_writer(io) isa AnnotatedIOBuffer
write(pipe_writer(io), arg)
else
invoke(write, Tuple{IO, typeof(arg)}, io, arg)
end::Int
end
end

@@ -456,6 +456,24 @@ function write(dest::AnnotatedIOBuffer, src::AnnotatedIOBuffer)
nb
end

# So that read/writes with `IOContext` (and any similar `AbstractPipe` wrappers)
# work as expected.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still trying to wrap my brain around this:

  • by default, unsafe_write(io::AbstractPipe, ...) for simply redispatches to unsafe_write(pipe_writer(io)::IO, ...)
  • by default, all write(io, ...) methods in the end go through unsafe_write(io, ...)
  • so naively write(pipe_writer(io), s) seems pointless
  • but then there are custom write methods ... such as your write(io::AnnotatedIOBuffer, astr::Union{AnnotatedString, SubString{<:AnnotatedString}}) method...

And now we have a problem if AbstractPipe type, such as IOContext, wraps an AnnotatedIOBuffer: instead of dispatching from write(io, ...) to unsafe_write(io, ...) you need it to go through pipe_writer(io) so that the annotations can be handled by that.

What if there is a sandwich of three AbstractPipe types, say an IOContext outermost, then some other AbstractPipe type, then the AnnotatedIOBuffer -- it wouldn't work out then either, right?

This then makes me wonder: why not "simply" something like

write(io::AbstractPipe, s::AnnotatedString) = write(pipe_writer(io), s)

I bet you considered this and ruled it out for some reason, but I didn't see it in the PR discussion.

None of this is meant as fundamental objection to this PR, but I feel at least this comment should be a bit more elaborate (which, if you happen to agree with this POV, could certainly wait for a future PR -- I don't mind to hold this one up). I'd love to be helpful and suggest something, but for that I'd first have to really understand it, hence my questions :-).

Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a chat with @tecosaur about this over on Slack, and he pointed out the commit message here which is excellent and provides further details.

All in all, I share the mild unease about the code in here, and I think there may be need to change some things in the IO system, but this PR is not the place for it. And if this code causes any problems down the road, well, the only way to find out is to try it in the wild?

With that I'd be happy to have this merged at any time.

Ensure that when an AnnotatedIOBuffer is wrapped in an IOContext (or
similar AnnotatedPipe-based construct), that writes of annotated
strings/chars and reading out an AnnotatedString is unimpeded by the
IOContext wrapping.

Without these specialisations, the generic pipe_reader/pipe_writer
fallbacks will directly access the underlying IOBuffer and annotations
will be lost.

There are a number of scenarios in which one might want to combine an
AnnotatedIOBuffer and IOContext (for example setting the compact
property). Losing annotations in such scenarios is highly undesirable.
It is of particular note that this can arise in situations where you
can't unwrap the IOContext as needed, for example when passing IO to a
function you do not control (which is currently extremely hard to work
around).

Getting this right is a little difficult, and a few approaches have been
tried. Initially, I added IOContext{AnnotatedIOBuffer} specialisations
to show.jl, but arguably it's a bit of a code smell to specialise in
this way (and Jameson wasn't happy with it, with concerns that it could
be tricked by IOContext{Any}).

    # So that read/writes with `IOContext` (and any similar `AbstractPipe` wrappers)
    # work as expected.
    write(io::IOContext{AnnotatedIOBuffer}, s::Union{AnnotatedString, SubString{<:AnnotatedString}}) =
        write(io.io, s)
    write(io::AnnotatedIOBuffer, c::AnnotatedChar) = write(io.io, c)

Then I tried making it so that IOContext writes dispatched on the
wrapped IO type, but of course that broke cases like IOContext{IOBuffer}
with :color=>true.

    # So that read/writes with `IOContext` (and any similar `AbstractPipe` wrappers)
    # work as expected.
    write(io::AbstractPipe, s::Union{AnnotatedString, SubString{<:AnnotatedString}}) =
        write(pipe_writer(io), s)
    write(io::AbstractPipe, c::AnnotatedChar) = write(pipe_writer(io), c)

Finally, we have the current AbstractPipe + Annotated type
specialisation, which IOContext is just an instance of. To avoid
behaving too broadly, we need to confirm that the underlying IO is
actually an AnnotatedIOBuffer. I'm still not happy with this, only idea
I've had other than implementing IOContext{AnnotatedIOBuffer} methods
that actually seems viable, and I've had trouble soliciting help from
other people brainstorming here.

If somebody can implement something cleaner here in the future, I'd be
thrilled.
@tecosaur tecosaur force-pushed the annotated-context-buffer branch from 50833bd to 9fdb4db Compare May 2, 2024 09:14
@tecosaur
Copy link
Contributor Author

tecosaur commented May 2, 2024

(just pushed a typo fix, no changes to the code)

@fingolfin fingolfin merged commit 5dcd509 into JuliaLang:master May 2, 2024
3 of 7 checks passed
@tecosaur tecosaur deleted the annotated-context-buffer branch May 2, 2024 09:41
tecosaur added a commit that referenced this pull request May 6, 2024
Ensure that when an AnnotatedIOBuffer is wrapped in an IOContext (or
similar AnnotatedPipe-based construct), that writes of annotated
strings/chars and reading out an AnnotatedString is unimpeded by the
IOContext wrapping.

Without these specialisations, the generic pipe_reader/pipe_writer
fallbacks will directly access the underlying IOBuffer and annotations
will be lost.

There are a number of scenarios in which one might want to combine an
AnnotatedIOBuffer and IOContext (for example setting the compact
property). Losing annotations in such scenarios is highly undesirable.
It is of particular note that this can arise in situations where you
can't unwrap the IOContext as needed, for example when passing IO to a
function you do not control (which is currently extremely hard to work
around).

Getting this right is a little difficult, and a few approaches have been
tried. Initially, I added IOContext{AnnotatedIOBuffer} specialisations
to show.jl, but arguably it's a bit of a code smell to specialise in
this way (and Jameson wasn't happy with it, with concerns that it could
be tricked by IOContext{Any}).

    # So that read/writes with `IOContext` (and any similar `AbstractPipe` wrappers)
    # work as expected.
    write(io::IOContext{AnnotatedIOBuffer}, s::Union{AnnotatedString, SubString{<:AnnotatedString}}) =
        write(io.io, s)
    write(io::AnnotatedIOBuffer, c::AnnotatedChar) = write(io.io, c)

Then I tried making it so that IOContext writes dispatched on the
wrapped IO type, but of course that broke cases like IOContext{IOBuffer}
with :color=>true.

    # So that read/writes with `IOContext` (and any similar `AbstractPipe` wrappers)
    # work as expected.
    write(io::AbstractPipe, s::Union{AnnotatedString, SubString{<:AnnotatedString}}) =
        write(pipe_writer(io), s)
    write(io::AbstractPipe, c::AnnotatedChar) = write(pipe_writer(io), c)

Finally, we have the current AbstractPipe + Annotated type
specialisation, which IOContext is just an instance of. To avoid
behaving too broadly, we need to confirm that the underlying IO is
actually an AnnotatedIOBuffer. I'm still not happy with this, only idea
I've had other than implementing IOContext{AnnotatedIOBuffer} methods
that actually seems viable, and I've had trouble soliciting help from
other people brainstorming here.

If somebody can implement something cleaner here in the future, I'd be
thrilled.
KristofferC added a commit that referenced this pull request May 28, 2024
Backported PRs:
- [x] #53665 <!-- use afoldl instead of tail recursion for tuples -->
- [x] #53976 <!-- LinearAlgebra: LazyString in interpolated error
messages -->
- [x] #54005 <!-- make `view(::Memory, ::Colon)` produce a Vector -->
- [x] #54010 <!-- Overload `Base.literal_pow` for `AbstractQ` -->
- [x] #54069 <!-- Allow PrecompileTools to see MI's inferred by foreign
abstract interpreters -->
- [x] #53750 <!-- inference correctness: fields and globals can revert
to undef -->
- [x] #53984 <!-- Profile: fix heap snapshot is valid char check -->
- [x] #54102 <!-- Explicitly compute stride in unaliascopy for SubArray
-->
- [x] #54070 <!-- Fix integer overflow in `skip(s::IOBuffer,
typemax(Int64))` -->
- [x] #54013 <!-- Support case-changes to Annotated{String,Char}s -->
- [x] #53941 <!-- Fix writing of AnnotatedChars to AnnotatedIOBuffer -->
- [x] #54137 <!-- Fix typo in docs for `partialsortperm` -->
- [x] #54129 <!-- use correct size when creating output data from an
IOBuffer -->
- [x] #54153 <!-- Fixup IdSet docstring -->
- [x] #54143 <!-- Fix `make install` from tarballs -->
- [x] #54151 <!-- LinearAlgebra: Correct zero element in
`_generic_matvecmul!` for block adj/trans -->
- [x] #54213 <!-- Add `public` statement to `Base.GC` -->
- [x] #54222 <!-- Utilize correct tbaa when emitting stores of unions.
-->
- [x] #54233 <!-- set MAX_OS_WRITE on unix -->
- [x] #54255 <!-- fix `_checked_mul_dims` in the presence of 0s and
overflow. -->
- [x] #54259 <!-- Fix typo in `readuntil` -->
- [x] #54251 <!-- fix typo in gc_mark_memory8 when chunking a large
array -->
- [x] #54276 <!-- Fix solve for complex `Hermitian` with non-vanishing
imaginary part on diagonal -->
- [x] #54248 <!-- ensure package callbacks are invoked when no valid
precompile file exists for an "auto loaded" stdlib -->
- [x] #54308 <!-- Implement eval-able AnnotatedString 2-arg show -->
- [x] #54302 <!-- Specialised substring equality for annotated strs -->
- [x] #54243 <!-- prevent `package_callbacks` to run multiple time for a
single package -->
- [x] #54350 <!-- add a precompile signature to Artifacts code that is
used by JLLs -->
- [x] #54331 <!-- correctly track freed bytes in
jl_genericmemory_to_string -->
- [x] #53509 <!-- revert moving "creating packages" from Pkg.jl -->
- [x] #54335 <!-- When accessing the data pointer for an array, first
decay it to a Derived Pointer -->
- [x] #54239 <!-- Make sure `fieldcount` constant-folds for `Tuple{...}`
-->
- [x] #54288
- [x] #54067
- [x] #53715 <!-- Add read/write specialisation for IOContext{AnnIO} -->
- [x] #54289 <!-- Rework annotation ordering/optimisations -->
- [x] #53815 <!-- create phantom task for GC threads -->
- [x] #54130 <!-- inference: handle `LimitedAccuracy` in
`handle_global_assignment!` -->
- [x] #54428 <!-- Move ConsoleLogging.jl into Base -->
- [x] #54332 <!-- Revert "add unsetindex support to more copyto methods
(#51760)" -->
- [x] #53826 <!-- Make all command-line options documented in all
related files -->
- [x] #54465 <!-- typeintersect: conservative typevar subtitution during
`finish_unionall` -->
- [x] #54514 <!-- typeintersect: followup cleanup for the nothrow path
of type instantiation -->
- [x] #54499 <!-- make `@doc x` work without REPL loaded -->
- [x] #54210 <!-- attach finalizer in `mmap` to the correct object -->
- [x] #54359 <!-- Pkg REPL: cache `pkg_mode` lookup -->

Non-merged PRs with backport label:
- [ ] #54471 <!-- Actually setup jit targets when compiling
packageimages instead of targeting only one -->
- [ ] #54457 <!-- Make `String(::Memory)` copy -->
- [ ] #54323 <!-- inference: fix too conservative effects for recursive
cycles -->
- [ ] #54322 <!-- effects: add new `@consistent_overlay` macro -->
- [ ] #54191 <!-- make `AbstractPipe` public -->
- [ ] #53957 <!-- tweak how filtering is done for what packages should
be precompiled -->
- [ ] #53882 <!-- Warn about cycles in extension precompilation -->
- [ ] #53707 <!-- Make ScopedValue public -->
- [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} -->
- [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` -->
- [ ] #53286 <!-- Raise an error when using `include_dependency` with
non-existent file or directory -->
- [ ] #52694 <!-- Reinstate similar for AbstractQ for backward
compatibility -->
- [ ] #51479 <!-- prevent code loading from lookin in the versioned
environment when building Julia -->
@KristofferC KristofferC removed the backport 1.11 Change should be backported to release-1.11 label May 28, 2024
lazarusA pushed a commit to lazarusA/julia that referenced this pull request Jul 12, 2024
Ensure that when an AnnotatedIOBuffer is wrapped in an IOContext (or
similar AnnotatedPipe-based construct), that writes of annotated
strings/chars and reading out an AnnotatedString is unimpeded by the
IOContext wrapping.

Without these specialisations, the generic pipe_reader/pipe_writer
fallbacks will directly access the underlying IOBuffer and annotations
will be lost.

There are a number of scenarios in which one might want to combine an
AnnotatedIOBuffer and IOContext (for example setting the compact
property). Losing annotations in such scenarios is highly undesirable.
It is of particular note that this can arise in situations where you
can't unwrap the IOContext as needed, for example when passing IO to a
function you do not control (which is currently extremely hard to work
around).

Getting this right is a little difficult, and a few approaches have been
tried. Initially, I added IOContext{AnnotatedIOBuffer} specialisations
to show.jl, but arguably it's a bit of a code smell to specialise in
this way (and Jameson wasn't happy with it, with concerns that it could
be tricked by IOContext{Any}).

    # So that read/writes with `IOContext` (and any similar `AbstractPipe` wrappers)
    # work as expected.
    write(io::IOContext{AnnotatedIOBuffer}, s::Union{AnnotatedString, SubString{<:AnnotatedString}}) =
        write(io.io, s)
    write(io::AnnotatedIOBuffer, c::AnnotatedChar) = write(io.io, c)

Then I tried making it so that IOContext writes dispatched on the
wrapped IO type, but of course that broke cases like IOContext{IOBuffer}
with :color=>true.

    # So that read/writes with `IOContext` (and any similar `AbstractPipe` wrappers)
    # work as expected.
    write(io::AbstractPipe, s::Union{AnnotatedString, SubString{<:AnnotatedString}}) =
        write(pipe_writer(io), s)
    write(io::AbstractPipe, c::AnnotatedChar) = write(pipe_writer(io), c)

Finally, we have the current AbstractPipe + Annotated type
specialisation, which IOContext is just an instance of. To avoid
behaving too broadly, we need to confirm that the underlying IO is
actually an AnnotatedIOBuffer. I'm still not happy with this, only idea
I've had other than implementing IOContext{AnnotatedIOBuffer} methods
that actually seems viable, and I've had trouble soliciting help from
other people brainstorming here.

If somebody can implement something cleaner here in the future, I'd be
thrilled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc. strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants