-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework annotation ordering/optimisations #54289
Rework annotation ordering/optimisations #54289
Conversation
Ah, tests are failing because I missed a use of |
a1539b8
to
b4a4dec
Compare
1d26197
to
b4a4dec
Compare
What part of annotations are semantically meaningful/visible? IIUC the annotations
This should be documented. Specifically, a simple specification that describes what is and is not semantically visible from which one can derive the above (or a different result). Ideally everything returned by (I can't review the correctness of an optimization without knowing what "correct" means) |
Ah yep, we did also discuss a need for more info in the docstrings. I'll see about adding some of the elaboration needed. |
b4a4dec
to
f869c2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (region, annot) in str.annotations
if region == fullregion
push!(annotations, (firstindex(unannot):lastindex(unannot), annot))
end
end
for offset in 0:len:(r-1)*len
for (region, annot) in str.annotations
if region != fullregion
push!(annotations, (region .+ offset, annot))
end
end
end
This is buggy in the new semantic because it re-orders annotations. Not necessarily blocking this PR, but an issue.
end | ||
# Insert any extra entries in the appropriate position | ||
for (offset, (i, entry)) in enumerate(extras) | ||
insert!(annotations, i + offset, entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is O(n^2), but we can revise this function for perf later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, and I don't think this should be in the "hot path" of any common use-cases 🤞.
first(first(annot)) == 1 || continue | ||
if last(annot) == last(last(io.annotations)) | ||
valid_run = true | ||
for runlen in 1:i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also O(n^2) when it could be O(n), but perf can wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it will be O(n) in non-pathological cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, this is written with the non-pathological case in mind, and if you consider the case where few-annotation insertions are repeatedly made, individual insertions should be pretty much O(1)
with a constant factor based on the average "checked annotation depth" / average number of annotations each insertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be merged and followup prs should address the performance of this PR's implementations and address the now-invalid-ness of the repeat function.
I've just adjusted fullregion = firstindex(str):lastindex(str)
if allequal(first, str.annotations) && first(first(str.annotations)) == fullregion
newfullregion = firstindex(unannot):lastindex(unannot)
for (_, annot) in str.annotations
push!(annotations, (newfullregion, annot))
end
else
for offset in 0:len:(r-1)*len
for (region, annot) in str.annotations
push!(annotations, (region .+ offset, annot))
end
end
end this change has been folded into |
f869c2f
to
1bed84f
Compare
Now that #54308 has been merged, we need to resolve the merge conflict and adjust the test introduced there. |
This is in preparation for changes to the way annotation ordering is handled in Base.
After a long chat with Lilith Halfner, we've come to the conclusion that the range-based ordering applied to annotations, while nice for making some otherwise O(n) code O(log n) and O(n^2) code O(n), is actually assuming too much about how annotations are used and interact with each other. Removing all assumptions about ordering, and giving annotation order primacy seems like the most sensible thing to do, even if it makes a few bits of the code less algorithmically "nice". As a consequence, we also get rid of annotatedstring_optimize!. Specific producers/consumers of annotated text will know what assumptions can be made to compress/optimise the annotations used, and are thus best suited to do so themselves. The one exception to this is probably when writing to an AnnotatedIOBuffer, here adding a specific optimisation to _insert_annotations! probably makes sense, and will be explored soon.
This is a rather important optimisation, since it prevents the annotation blow-up that can result from say writing to an AnnotatedIOBuffer char-by-char. Originally I was just going to pass the AnnotatedString produced when reading the AnnotatedIOBuffer through annotatedstring_optimize!, but now that's been removed, this seems like the best past forwards (it's also actually a better approach than applying annotatedstring_optimize!, just hard to justify when that code already existed).
It's important to specify the way that annotations relate to the characters of the underlying string and each other. Along the way, it's also worth explaining the behaviour of the internal functions _clear_annotations_in_region! and _insert_annotations!.
1bed84f
to
4c18472
Compare
After CI, I think this should be good to merge if that also sounds good by @LilithHafner 🙂. |
1bed84f LGTM. I have not reviewed https://github.com/JuliaLang/julia/compare/1bed84f4ae6fe6fc9bb39612e274fd1630a51349..4c18472b3e56998f2784ecd5d9deea4cf5371850, but am open to suggestions to how I could do that without re-reviewing this PR from scratch. |
The only change made here in the diff you link is @test chopprefix(sprint(show, str), "Base.") ==
- "AnnotatedString{String}(\"some string\", [(1:4, :thing => 0x01), (1:11, :all => 0x03), (6:11, :other => 0x02)])"
+ "AnnotatedString{String}(\"some string\", [(1:4, :thing => 0x01), (6:11, :other => 0x02), (1:11, :all => 0x03)])" i.e. correctly resolving the change introduced by #54308. HTH. |
Okay, the diff in your comment also LGTM. |
@tecosaur I know you have a preference to not squash but for things that will get backported, it introduces unnecessary stumbling blocks in that automated process that would be better to avoid. |
Thanks for mentioning that Ian, I do like to preserve more granular information when it seems sensible, but I wasn't aware that complicated the backporting process. I'll keep that in mind with the few remaining PRs I'm hoping to get backported 🙂. (As an aside, is the automated backporting doing something other than cherry picking? Or is the hassle having to supply |
Yeah the automatic script doesn't handle adding |
Improvements to the consistency and semantics of AnnotatedStrings, mainly around the ordering of annotations.
Backported PRs: - [x] #53665 <!-- use afoldl instead of tail recursion for tuples --> - [x] #53976 <!-- LinearAlgebra: LazyString in interpolated error messages --> - [x] #54005 <!-- make `view(::Memory, ::Colon)` produce a Vector --> - [x] #54010 <!-- Overload `Base.literal_pow` for `AbstractQ` --> - [x] #54069 <!-- Allow PrecompileTools to see MI's inferred by foreign abstract interpreters --> - [x] #53750 <!-- inference correctness: fields and globals can revert to undef --> - [x] #53984 <!-- Profile: fix heap snapshot is valid char check --> - [x] #54102 <!-- Explicitly compute stride in unaliascopy for SubArray --> - [x] #54070 <!-- Fix integer overflow in `skip(s::IOBuffer, typemax(Int64))` --> - [x] #54013 <!-- Support case-changes to Annotated{String,Char}s --> - [x] #53941 <!-- Fix writing of AnnotatedChars to AnnotatedIOBuffer --> - [x] #54137 <!-- Fix typo in docs for `partialsortperm` --> - [x] #54129 <!-- use correct size when creating output data from an IOBuffer --> - [x] #54153 <!-- Fixup IdSet docstring --> - [x] #54143 <!-- Fix `make install` from tarballs --> - [x] #54151 <!-- LinearAlgebra: Correct zero element in `_generic_matvecmul!` for block adj/trans --> - [x] #54213 <!-- Add `public` statement to `Base.GC` --> - [x] #54222 <!-- Utilize correct tbaa when emitting stores of unions. --> - [x] #54233 <!-- set MAX_OS_WRITE on unix --> - [x] #54255 <!-- fix `_checked_mul_dims` in the presence of 0s and overflow. --> - [x] #54259 <!-- Fix typo in `readuntil` --> - [x] #54251 <!-- fix typo in gc_mark_memory8 when chunking a large array --> - [x] #54276 <!-- Fix solve for complex `Hermitian` with non-vanishing imaginary part on diagonal --> - [x] #54248 <!-- ensure package callbacks are invoked when no valid precompile file exists for an "auto loaded" stdlib --> - [x] #54308 <!-- Implement eval-able AnnotatedString 2-arg show --> - [x] #54302 <!-- Specialised substring equality for annotated strs --> - [x] #54243 <!-- prevent `package_callbacks` to run multiple time for a single package --> - [x] #54350 <!-- add a precompile signature to Artifacts code that is used by JLLs --> - [x] #54331 <!-- correctly track freed bytes in jl_genericmemory_to_string --> - [x] #53509 <!-- revert moving "creating packages" from Pkg.jl --> - [x] #54335 <!-- When accessing the data pointer for an array, first decay it to a Derived Pointer --> - [x] #54239 <!-- Make sure `fieldcount` constant-folds for `Tuple{...}` --> - [x] #54288 - [x] #54067 - [x] #53715 <!-- Add read/write specialisation for IOContext{AnnIO} --> - [x] #54289 <!-- Rework annotation ordering/optimisations --> - [x] #53815 <!-- create phantom task for GC threads --> - [x] #54130 <!-- inference: handle `LimitedAccuracy` in `handle_global_assignment!` --> - [x] #54428 <!-- Move ConsoleLogging.jl into Base --> - [x] #54332 <!-- Revert "add unsetindex support to more copyto methods (#51760)" --> - [x] #53826 <!-- Make all command-line options documented in all related files --> - [x] #54465 <!-- typeintersect: conservative typevar subtitution during `finish_unionall` --> - [x] #54514 <!-- typeintersect: followup cleanup for the nothrow path of type instantiation --> - [x] #54499 <!-- make `@doc x` work without REPL loaded --> - [x] #54210 <!-- attach finalizer in `mmap` to the correct object --> - [x] #54359 <!-- Pkg REPL: cache `pkg_mode` lookup --> Non-merged PRs with backport label: - [ ] #54471 <!-- Actually setup jit targets when compiling packageimages instead of targeting only one --> - [ ] #54457 <!-- Make `String(::Memory)` copy --> - [ ] #54323 <!-- inference: fix too conservative effects for recursive cycles --> - [ ] #54322 <!-- effects: add new `@consistent_overlay` macro --> - [ ] #54191 <!-- make `AbstractPipe` public --> - [ ] #53957 <!-- tweak how filtering is done for what packages should be precompiled --> - [ ] #53882 <!-- Warn about cycles in extension precompilation --> - [ ] #53707 <!-- Make ScopedValue public --> - [ ] #53452 <!-- RFC: allow Tuple{Union{}}, returning Union{} --> - [ ] #53402 <!-- Add `jl_getaffinity` and `jl_setaffinity` --> - [ ] #53286 <!-- Raise an error when using `include_dependency` with non-existent file or directory --> - [ ] #52694 <!-- Reinstate similar for AbstractQ for backward compatibility --> - [ ] #51479 <!-- prevent code loading from lookin in the versioned environment when building Julia -->
There are some annoyances/discrepancies with annotation handling that I originally tried to fix in #53794, #53800 and #53801.
@LilithHafner was kind enough to have a look at these, and after a long chat we came to the conclusion that this isn't the right approach to take, and that the order with which annotations are applied should be given primacy over annotation ranges.
This PR supersedes the aforementioned PRs, and should make the way annotations are ordered more sensible.
See the commit messages for more details.