-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indentation rework #1562
Indentation rework #1562
Conversation
I would recommend splitting all the TODOs to separate small PRs since it will be easier to review and keep track of. |
…oncept for assignment statements in rust)
…sue helix-editor#1523) Remove commented-out code
Add more options to ComplexNode and use them to improve C/C++ indentation.
7260514
to
eac73eb
Compare
Thanks @sudormrfbin, splitting it up is a good idea. |
helix-core/src/indent.rs
Outdated
struct IndentResult { | ||
indent: i32, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a comment here saying that indentation is in terms of levels and not spaces? Also, I wonder how well this will work for python indention for this:
def hello(a,
b,
c):
given that the indentation depends on the previous line, if there are no items there. IIRC even vim have issues on this but there is a plugin that solves this I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding python, I have this on my radar, I'm planning to add something like an align
specification. I still have to think about how to make it as intuitive and general as possible. It's kinda difficult to define what to make the "anchor" for the alignment. For simple cases (like aligning the parameters of a function) it should not be to complicated though, since treesitter does all the heavy lifting.
I added some more detailed documentation which should hopefully explain the general idea. @archseer let me know if anything is unclear. |
06664bf
to
c4a256e
Compare
Also, please fix the lints. |
My review ends here. I think it's better for @archseer to check since I don't haven't done indent much. |
runtime/queries/rust/indents.toml
Outdated
all = [ | ||
{ parent_kind_in = ["assignment_expression", "compound_assignment_expr"], field_name_in = ["right"]}, | ||
{ parent_kind_in = ["let_declaration"], field_name_in = ["value"] } | ||
] | ||
tail = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through both the examples the docs and the original issue, but I don't understand the need for all
vs tail
.
The point is to make:
let (
a_long_variable_name_in_this_tuple,
b_long_variable_name_in_this_tuple,
c_long_variable_name_in_this_tuple,
d_long_variable_name_in_this_tuple,
e_long_variable_name_in_this_tuple,
): (usize, usize, usize, usize, usize) = //<- cursor here
<Enter>
produce indentation level == 1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, basically. indent.tail
produces the behavior of the old indent
option (indenting everything except for the first line of a node). indent.all
indents the whole node including the first line, similar to how the old outdent
behaves. Here we need it because the whole RHS of a let declaration should be indented, including the first line. Additionally, indent
and outdent
are symmetric in the new version, so we don't need any special cases for them and for everything else that the indent code might handle in the future (comments, alignment, etc.).
Your example doesn't work because of a limitation of tree-sitter: In that situation there isn't a value for the let_declaration
yet, so we don't know that we should indent it. The best solution here is probably an optional auto-indent feature that updates the indent as soon as tree-sitter parses the code correctly.
Pressing <Enter>
in this situation does produce the correct indent (1):
let (
a_long_variable_name_in_this_tuple,
b_long_variable_name_in_this_tuple,
c_long_variable_name_in_this_tuple,
d_long_variable_name_in_this_tuple,
e_long_variable_name_in_this_tuple,
): (usize, usize, usize, usize, usize) = |(0,0,0,0,0);
// Cursor at | in the line above
I just had a closer look at the Treesitter query documentation and code and realized I misunderstood the API. Setting the byte range for a query doesn't just find all matches inside that range, it finds all matches that intersect the range. This means that we can query a very small range (usually a couple of bytes should be enough) and still get all the relevant matches for indentation. Unless there's some disadvantage that I'm not seeing, I'll change this PR to use treesitter queries instead of the custom TOML format, as was suggested in #114. I still think we shouldn't just try to copy the indent queries from nvim-treesitter because they don't support stuff like custom prefixes (e.g. for comments or markdown lists). Maybe I'm wrong though, so suggestions/discussions are welcome. |
I explained this in #114 (comment), the problem is that we aren't querying the innermost node, but instead traversing the tree. So passing the range of a single line will match the |
I've improved the rust query a bit and implemented your suggestions. I also tried running the indentation test on EDIT: During testing, I also found & fixed 2 places where |
Looks good, I'm taking a deeper look at indent.rs and I'm test driving this branch for a few days. |
Regarding cargo fmt: it was probably for code inside a macro? It seems to avoid any formatting inside them |
if added.indent > 0 && added.outdent == 0 { | ||
self.indent += 1; | ||
} else if added.outdent > 0 && added.indent == 0 { | ||
self.outdent += 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd to me, can't indent
and outdent
both be > 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're also not using the number from added
and always use 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because of how the indentation that's added from a single line works: We can always have at most one indent or at most one outdent (because the indentation never increases by multiple levels on a single line). Here, added.indent
and added.outdent
will always be 1 or 0 because added
is constructed using the function add_capture
. I just check for >0
instead for ==1
to prevent weird errors in case this invariant no longer holds in the future.
If there are both indent
and outdent
nodes on the same line, we ignore all of them, that's why there is no third case for when added.indent > 0 && added.outdent > 0
. This is required to fix #1523.
}) { | ||
continue; | ||
} | ||
for capture in m.captures { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reasonable guess for capacity on indent_captures
we could use? Either ::with_capacity()
or by calling reserve()
inside this loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It corresponds to the number of times this node is captured with an IndentCapture (currently @indent
and @outdent
. This should be 1 in most cases (because we don't create the Vec
if it's 0), I added a with_capacity
to make that a bit more explicit.
Reuse tree-sitter cursors for indentation queries.
738ea7a
to
91d0a75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very exciting! 😄
"}" | ||
"]" | ||
")" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"}" | |
"]" | |
")" | |
"}" | |
"]" | |
")" |
Sorry this is a bit of a nit. I see the single-space indent in some other indent files as well:
- dart
- go
- llvm
- nix
- scala
- zig
At some point I'd really like to write a formatter for tree-sitter queries to make this all moot 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A formatter would be pretty nice. I must've missed this while converting all the indent queries, thanks for pointing out :)
let mut indent_for_line = AddedIndent::new(); | ||
let mut indent_for_line_below = AddedIndent::new(); | ||
loop { | ||
let is_first = *first_in_line.last().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment for that?
helix-core/src/syntax.rs
Outdated
@@ -85,17 +85,17 @@ pub struct LanguageConfiguration { | |||
// first_line_regex | |||
// | |||
#[serde(skip)] | |||
pub(crate) highlight_config: OnceCell<Option<Arc<HighlightConfiguration>>>, | |||
pub highlight_config: OnceCell<Option<Arc<HighlightConfiguration>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were these made pub
because of usage in tests? Why not use the associated methods highlight_config()
and indent_query()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's because LanguageConfiguration
is constructed directly in the test. Let's instead use the loader with a custom languages.toml
? Can reuse this one: https://github.com/helix-editor/helix/blob/ce8311727f32975dc1c1030c676ffc141f63b851/.github/workflows/languages.toml from #1659
node = parent | ||
/// Computes for node and all ancestors whether they are the first node on their line. | ||
/// The first entry in the return value represents the root node, the last one the node itself | ||
fn get_first_in_line(mut node: Node, byte_pos: usize, new_line: bool) -> Vec<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wouldn't get_first_in_line
also match this first character?
Tested over the week and it works great! Once the last set of comments is resolved this is ready for merge 👍🏻 |
Nice to hear that! I probably won't have time to work on it in the next week or so but I definitely want to finish it (and work on same of the follow-up features) when I can. |
I wrote a basic test suite (currently specific to Nix) to test this PR. The following test cases currently fail: TESTING: let-in-attr
*** work/let-in-attr.nix 2022-03-05 21:29:07.460718670 +0100
--- output/let-in-attr.nix 2022-03-05 21:28:24.453336376 +0100
***************
*** 1,7 ****
let
a = 3;
in {
b = [
! /*DONE*/
];
}
--- 1,7 ----
let
a = 3;
in {
b = [
! /*DONE*/
];
}
TESTING: let
*** work/let.nix 2022-03-05 21:29:08.416727170 +0100
--- output/let.nix 2022-03-05 19:31:23.883676561 +0100
***************
*** 1,2 ****
let
! /*DONE*/
--- 1,2 ----
let
! /*DONE*/ The In let
value = 1;
in {
inherit value;
} vs the variant Helix would want: let
value = 1;
in {
inherit value;
} Nix formatters such as alejandra would transform the latter into the former. This also happens with other languages, such as JSON. Here, the indentation before "a" was incorrectly removed. Helix places the cursor and closing square bracket on the correct level, while carrying over the incorrect indentation would look more consistent. {
"a": [
| <- cursor ends up here
]
} {
"a": [
| <- cursor should end up here
]
} The case of As for why a different behavior would be desirable: Writing |
Thanks for bringing up these issues, even though some of them will probably be a bit complicated to fix. The incorrect indentation for The fact that indentation is calculated globally can in my opinion be seen both as an advantage and a disadvantage: It manages to correctly compute indentation even when the current indentation is wrong in some places. It just becomes annoying when this "wrong" indentation is actually intentional. We could have an option where we don't apply indent from a tree-sitter node when the indent doesn't exist in the file. For me this isn't super high-priority though. For the indentation of incomplete code, i can think of 2 possible solutions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me now, stellar work @Triton171! 🎉
I reworked the indentation system to remove some of it's limitations and make it easier to extend. This should close #114, #1523 and #1440.
Things I want to implement in followup PRs: