Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix YAML auto indent #6768

Merged
merged 2 commits into from
Aug 10, 2023
Merged

Conversation

dead10ck
Copy link
Member

This makes use of the new extend capture to correct YAML indentation.

Fixes #6661

@dead10ck dead10ck added E-easy Call for participation: Experience needed to fix: Easy / not much S-waiting-on-review Status: Awaiting review from a maintainer. labels Apr 15, 2023
pascalkuthe
pascalkuthe previously approved these changes Apr 15, 2023
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert on ident queries but this seems good to me Really awesome that you added a bunch of tests. This doesn't just test the YAML queries but also adds some more testing for the indent system in general which could use more 👍

@dead10ck
Copy link
Member Author

Yeah this actually took me a few days of just digging around in the indent code to figure out how it even works and where the captures are supposed to go, and what they are supposed to cover. I was convinced there was some kind of bug and added a whole bunch of debug prints, and then figured out the hard way how it works.

It seems needlessly complex, but I don't say that with a high level of confidence. Maybe I should go over the docs now that I'm not completely clueless about how it works.

@dead10ck
Copy link
Member Author

@pascalkuthe sorry, I added some more tests for inserting above, and also for flow style strings.

@dead10ck dead10ck requested a review from pascalkuthe April 15, 2023 23:35
pascalkuthe
pascalkuthe previously approved these changes Apr 15, 2023
@dead10ck
Copy link
Member Author

Actually I just discovered another broken case: lists of maps

@dead10ck
Copy link
Member Author

dead10ck commented Apr 17, 2023

Ok, I'm thinking actually there will need to be a code change to fix this. Consider this really simple case, where the | represents a cursor, and we want to insert a line below with o:

- top:|
    baz: foo
    bax: foox

I think I have an ok enough understanding now of the indent system to say with reasonable confidence that this set of indent queries should cover what we want.

(block_scalar) @indent @extend

(block_mapping_pair) @indent

(block_sequence_item) @indent

The last two cover the two indent cases we want: one level for the list, and one for the mapping pair. The problem is the current code will intentionally combine all @indent captures that happen on the same line into one.

if node_line != parent_line {

So we end up with

- top:
  |
    baz: foo
    bax: foox

I assume this was meant to cover giving the ability to have multiple overlapping patterns without them stepping on each other's toes.

But in this case, the sequence and the mapping pair simultaneously both happen on the same line, and both introduce a new level of indentation.

I think maybe what we need is a capture group that will always add a new indent level, even if multiple of them happen on the same line. Maybe it could be called @indent.always or something.

@pascalkuthe
Copy link
Member

pascalkuthe commented Apr 17, 2023

The indent code is not really my forte but sounds reasonable to me cc @Triton171 since he did a lot of work on indent queries

@Triton171
Copy link
Contributor

I agree that this needs a new capture type since currently indent can only ever increase by 1 from one line to the next. And you're right, this behavior is needed since in most languages e.g. multiple nested blocks starting on the same line only increase indent by 1. @indent.always sounds good and it should be relatively simple to implement.

The indent code can probably also be simplified a bit: For example, we could remove the get_first_in_line function and replace it by just checking if a node is preceeded only by whitespace. The behavior is different in theory but I've yet to see a situation where it makes a difference in practice. Also, I'm thinking about making the documentation a bit more extensive so hopefully people writing indent queries don't have to look at the code. Is there anything that is particularly unclear in the current docs?

@dead10ck
Copy link
Member Author

Thanks @Triton171 , glad to see I'm on the right track. I have some feedback for the docs, I just have to write it up and/or might just update it myself, since I will need to add the new capture type to the docs anyway.

@dead10ck dead10ck force-pushed the fix-yaml-indents branch from 15714ac to 8b9ccaf Compare May 1, 2023 03:12
@dead10ck dead10ck added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from a maintainer. labels May 22, 2023
@dead10ck dead10ck force-pushed the fix-yaml-indents branch 2 times, most recently from cd54baa to 827c581 Compare June 15, 2023 02:25
@dead10ck
Copy link
Member Author

Ok, I think I got the code changes done. I added the new indent.always capture, and I have tests passing. I did end up adding a couple of other additional changes along the way:

  • Added a new #one-line? predicate which tests whether the capture only spans a single line. This was necessary for YAML sequences, where we don't want to add an indent for an individual list item unless it spans more than one line.

  • Slightly modified the position that the indent query starts from. Currently, it starts the query from the very end of the line, including the line feed. This was problematic in YAML because e.g. if we have

    top:
      baz: foo
      bazi:
        more: yes
        why: b#[e|]#cause
      quux:
        - 1
        - 2
    bax: foox

    and hit o, it will currently start the query from the line feed past the e, and the smallest descendant for this is

    top:
      #[baz: foo
      bazi:
        more: yes
        why: because
      quux:
        - 1
        - 2|]#
    bax: foox

    so it ends up skipping a whole lot of nodes and captures. To fix this, I changed it to start the query from one character behind the line feed.

At this point, I think I just need to update the docs.

@archseer
Copy link
Member

archseer commented Jun 15, 2023

Let's get @Triton171's review here too

(Edit: Ah sorry, just read the rest of the thread!)

Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I jkticed that one-line? checks if a capture is contained on a single line after the new line break is inerted. I think that makes sense but just to make sure: Is that intentuonal? If yes then that should probably be documented

helix-core/src/indent.rs Outdated Show resolved Hide resolved
helix-term/src/commands.rs Outdated Show resolved Hide resolved
helix-term/src/commands.rs Outdated Show resolved Hide resolved
helix-term/src/commands.rs Outdated Show resolved Hide resolved
helix-core/src/test.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
@dead10ck
Copy link
Member Author

Great comments Pascal, thanks for the review!

@dead10ck
Copy link
Member Author

dead10ck commented Jun 16, 2023

Also, yes, it's intentional that one-line? is only true if it is unaffected by a new line insertion. Otherwise, if you have e.g.

foo: #[|bar]#

and you hit return, it would only query the current state of the text, determine that it's one line, and not indent as it should. Actually, this would be a good regression test.

I will document this for sure.

@Triton171
Copy link
Contributor

Most of this looks good to me, it's also very nice to get some integration testing for the indent code. There's 2 things which I'm unsure about:

(1): Changing the indent position to the character before the line feed is probably not always what we want. For example it breaks indentation in this C code:

int main() {
    if (cond)
        do_smth(); // Pressing `o` here would add a line with 2 indent levels (1 too many)
}

This is a fundamental difference between languages where trailing whitespace should be considered to be part of a node and languages where this is not the case. Instead of changing the position of indent computation, we can instead use @extend captures for YAML. The only difference to your approach would be that this also extends nodes to trailing blank lines (if they have sufficient indentation). If that causes issues, we could add something like an @extend.line capture which only extends a node up to (and including) the next newline.

(2): I would expect the @indent.always and @outdent.always captures to simply add/subtract 1 to the total indentation (no matter where they appear & what other captures are encountered on the same line). The implementation seems to do something slightly different: @indent.always is equivalent to @indent but we pretend that there is an additional newline at its position. This probably behaves similarly to what I described above, but it's not the same.

For example, if we encounter (from innermost to outermost node) @outdent, @indent @indent.always captures all on the same line and no other captures:

  • With the behavior that I described, @indent and @outdent cancel out, so @indent.always gives a total indentation of 1
  • In the implementation, @indent.always would be equivalent to @indent here (because there are no other captures after it), so everything cancels out and we get a total indentation of 0.

Is there some reason why we need this more complex behavior? It seems like it makes the indentation logic significantly harder to reason about, because now the order in which captures are encountered matters, even if they all apply to the same line.

@dead10ck
Copy link
Member Author

@Triton171 Thanks for the review!

(1): Changing the indent position to the character before the line feed is probably not always what we want. For example it breaks indentation in this C code:

int main() {
    if (cond)
        do_smth(); // Pressing `o` here would add a line with 2 indent levels (1 too many)
}

Shoot you're right. It's actually worse than this: it doesn't indent at all. Nor does it indent when inserting a new line from the function declaration. I'll look into the @extend capture. Thanks for finding this.

(2): I would expect the @indent.always and @outdent.always captures to simply add/subtract 1 to the total indentation (no matter where they appear & what other captures are encountered on the same line). The implementation seems to do something slightly different: @indent.always is equivalent to @indent but we pretend that there is an additional newline at its position. This probably behaves similarly to what I described above, but it's not the same.

For example, if we encounter (from innermost to outermost node) @outdent, @indent @indent.always captures all on the same line and no other captures:

  • With the behavior that I described, @indent and @outdent cancel out, so @indent.always gives a total indentation of 1
  • In the implementation, @indent.always would be equivalent to @indent here (because there are no other captures after it), so everything cancels out and we get a total indentation of 0.

Is there some reason why we need this more complex behavior? It seems like it makes the indentation logic significantly harder to reason about, because now the order in which captures are encountered matters, even if they all apply to the same line.

Hm, I actually tried this at first, just changing the add* functions to sum the indent levels to greater than one for indent.always capture types, but it did not work because it only considers captures whose nodes have parents on the preceding line, so they get skipped altogether.

Do you have any suggestions? And also, out of genuine ignorance, is having indent and outdent captures on the same line a thing that happens?

@dead10ck
Copy link
Member Author

@Triton171 you were right, simply adding @extend to the captures solved that problem.

@Triton171
Copy link
Contributor

I can't really think of another way to do this besides another two fields for the "always" variants. I thought of maybe just making a single field of Vec<IndentCaptureType> and making logic to combine them, but this doesn't seem any better.

I think a single additional (signed) field should also be enough, since @indent.always and @outdent.always cancel each other, so we only need to store their difference. I don't think it matters much either way though; both versions would be fine imo.

@dead10ck dead10ck force-pushed the fix-yaml-indents branch 2 times, most recently from 628b689 to f28077d Compare June 19, 2023 01:07
@dead10ck
Copy link
Member Author

Ok this has to be the weirdest thing I've ever seen in Rust. I changed Indentation::add_capture to return its own &mut self argument, so that you could chain calls to it.

f28077d#diff-44db3a32fdd620ed51f6953534c6ed5df3d2966010cc86aaf13448e9bbb65f07R264

But this inexplicably causes indentation to break in some cases.

The only places this function is used is within calculating the indent level here:

indent_for_line.add_capture(definition.capture_type);

We can see the reference returned is not used. But that ... should be fine, shouldn't it? How on Earth is this changing the result?

Anyway, I added the two new fields for tracking always captures, and added unit tests to verify the associativity. The tests are uglier than I would like because of the above, but they work now.

I opted for two separate fields because in my mind it's a bit more confusing to have one field be a count and one be a signed delta.

@dead10ck
Copy link
Member Author

@Triton171 if you get some time, I would appreciate a review. ❤️ I think these implementation details will affect the docs, so I want to wait on code review before I spend time updating docs.

helix-core/src/indent.rs Show resolved Hide resolved
helix-core/src/indent.rs Outdated Show resolved Hide resolved
@Triton171
Copy link
Contributor

Triton171 commented Jun 25, 2023

Sorry for taking some time with the review, I had a busy few days. Apart from the 2 small things I pointed out above, all of this looks very good. Once those are resolved and the docs are updated, this can be merged in my opinion.

I opted for two separate fields because in my mind it's a bit more confusing to have one field be a count and one be a signed delta.

I agree, and it also has the advantage that we can always use usize for counting indents instead of using signed / unsigned types for different fields.

But this inexplicably causes indentation to break in some cases.

I looked at the diff in which you reverted that change and I don't understand either how this can change the behavior of indent queries. Since it only makes the testing code slightly cleaner, I guess it's not really worth the time to debug why this happens.

@dead10ck
Copy link
Member Author

Thanks so much for the review @Triton171 ! I'll go ahead and work on the docs then.

@pascalkuthe pascalkuthe removed the E-easy Call for participation: Experience needed to fix: Easy / not much label Jul 11, 2023
@dead10ck
Copy link
Member Author

Just checking in, I am still going to finish this, it's just been a busy couple weeks

@dead10ck
Copy link
Member Author

Ok, I've finished going over the docs. I added a bunch of examples that should hopefully help newcomers understand how everything works. These are details I found myself sorely missing when I was trying to put together how to make indent queries.

I think this pretty much wraps up the PR. It is fully ready for review.

@the-mikedavis the-mikedavis added S-waiting-on-review Status: Awaiting review from a maintainer. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 21, 2023
@archseer archseer added this to the next milestone Aug 1, 2023
@pascalkuthe pascalkuthe added the A-indent Area: Indentation label Aug 4, 2023
pascalkuthe
pascalkuthe previously approved these changes Aug 4, 2023
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reall great. The docs are really nice!

helix-core/src/indent.rs Outdated Show resolved Hide resolved
YAML indents queries are tweaked to fix auto indent behavior.

A new capture type `indent.always` is introduced to address use cases
where combining indent captures on a single line is desired.

Fixes helix-editor#6661
@dead10ck
Copy link
Member Author

dead10ck commented Aug 4, 2023

Ah, thanks, I fixed it.

@archseer archseer merged commit 929eb0c into helix-editor:master Aug 10, 2023
@archseer archseer deleted the fix-yaml-indents branch August 10, 2023 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-indent Area: Indentation S-waiting-on-review Status: Awaiting review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

YAML indentation
5 participants