Rewrite indentation engine #80

Ambrevar · 2014-11-07T16:27:08Z

The main reason for this change is to change indentation from

foobar('arg1', function ()
         print('foobar')
               end)

to

foobar('arg1', function ()
    print('foobar')
end)

This problem is due to alignment. I decided to remove this feature because of the following rationale:

There is nothing close to a convention on alignment. PIL is not using it.
Rules on alignment can quickly become ambiguous.
It can result in ugly indentation. (See above example).
It is more complex and slower.
When indenting with tabs, one needs to use Emacs' smart-tabs plugin (to follow the "indent with tabs, align with spaces" rule). Not many editors can do that.

This rewrite partly addresses issue #31 and some issues in indentation-test.el.

The main reason for this change is to change indentation from foobar('arg1', function () print('foobar') end) to foobar('arg1', function () print('foobar') end) The indentation engine was very convoluted, hence the major changes. Algorithm: indentation is computed from current line and previous line. Current line: -1 on every unmatched closer if first token is a closer -1 if first token is 'middle' Previous line: +1 on every unmatched opener +1 if first token is a 'middle' -1 on every unmatched closer if first token is not a closer (if first token is a 'middle', then first unmatched closer is actually closing the 'middle'). To this we add + previous line indentation +1 if previous line is not a continuation and current-line is -1 if previous line is a continuation and current-line is not The new indentation algorithm is as follows: Current line indentation is: + indentation changes induced by unmatched closers on current line + indentation changes induced by previous line + previous line indentation - indent-level if previous line was a continuation - indent-level Implementation changes: * Keyword are only distinguished by one of the classes 'open', 'close', and 'middle'. * Openers yield a simple indent, no alignment. * Closers at the beginning of the line do not yield any indent for following line. * Continuing lines are not indented together, only previous line is used for indentation. * There is no more exceptions for 'else', 'do', and so on. * The engine is split into 3 functions only. * The indentation stack is stored into a single integer, there is no more list construct. The engine should never scans back more than the previous line. This should result in a sensible performance boost. For now this is not the case since the function 'lua-find-matching-token-in-line' goes beyond the boundaries of the line (quick and dirty implementation). Fixing this should make the engine much faster. Tests: The file test/indentation-test.el has been updated accordingly. I have added a test for the new capabilities of the engine (among them the above example). Other changes include: * Remove unused left-shifter detection code * lua-block-token-alist: Sanitize regexp * lua-block-token-alist: Clarify doc * Completed some missing indentation tests.

immerrr · 2014-11-07T17:39:33Z

Thank you for the PR, it is truly a worthy effort. But I must say, at the risk of sounding arrogant and ungrateful, I would love this to be discussed beforehand because there's little chance this will get accepted as is.

There is a convention on alignment, at least in C/C++, Java, Python and Ruby. Basically, you need to align 2nd and subsequent lines of a syntactic block to the column of first syntactically significant character of the block's body if it is on the same line as the body's opener, e.g

    // yes
    int i[3] = {bar,
                baz,
                qux};

    // no
    int i[3] = {bar,
       baz,
       qux
    };

PIL does use alignment, although does that quite rarely, see for example section 3.6:

    days = {"Sunday", "Monday", "Tuesday", "Wednesday",
            "Thursday", "Friday", "Saturday"}
....
    opnames = {["+"] = "add", ["-"] = "sub",
               ["*"] = "mul", ["/"] = "div"}

Alignment rule is by no means ambiguous, but I do agree that it may produce bad/ugly results. If you don't like what you see, most of the style guides on languages I've mentioned allow properly indented variants where body starts on the next line, too:

    // yes
    int i[3] = {
       bar,
       baz,
       qux
    };

It is true that existing code is unnecessarily complex and reducing that complexity is good, but not at expense of dropping features.

That being said, current state of indentation affairs is rather poor indeed. Also, indentation code is the last significant chunk that has no FSF copyright assignment and blocks inclusion of lua-mode into Emacs trunk. If I didn't turn you off with the above and you're OK with doing some paperwork, you're welcome to participate in (or take over) a rewrite of indentation engine using Emacs smie library.

Ambrevar · 2014-11-10T08:12:47Z

No worries, I am very open to critics :) (Especially when well structured.)

Sorry for not asking before, but this is the usual story of the personal hacked that turned out to be good enough to commit.
What I meant by "convention" is something more or less precisely specified under a name, as the K&R style for C. I do not know such a thing for C. I cannot speak for the other languages though. Anyway, it does not matter so much.
I forgot one important rule for indentation: when a line gets broken in 2 (or vice-versa), only the indentation of the 2 lines should change, not the surrounding code. This is relevant for versioning and collaboration.
The various implementations of alignment are not ambiguous as per se, but the choice made for alignment rules can vary. For instance, shall we align corresponding tokens only (current implementation):

t = { a,
      b,
    }
foobar('arg1', function ()
         print('foobar')
               end)

or innermost tokens only (your example):

t = { a,
      b,
}
foobar('arg1', function ()
   print('foobar')
end)

or shall we increase the indentation of the nested content as well?

t = { a,
      b,
    }
foobar('arg1', function ()
                 print('foobar')
               end)

The first case (current implementation) looks ugly, but the last does not respect the aforementioned rule on line splitting. The innermost alignment sounds like a good compromise, unfortunately it is a tad more difficult to implement.

My point is that alignment has too many drawbacks not worth the fanciness it provides. It does not necessarily improve clarity. It is always possible to have clearly indented code without it. I know this is a personal opinion and that Emacs strives at being a universal editor. But when universality impedes development, both for developers and users, I believe it is worth a discussion.
I agree that SMIE might be a much better base for rewriting the indentation system. I had a quick look at it and it sounds reasonable. It would increase performance, maintainability (this is the only standard indentation engine / lexer in Emacs -- it is used by a few modes like Octave and shell) and features (beginning-of-defun, etc.).
What do you mean with paperwork? I'll be glad to see this mode integrated into the main trunk, that is for sure!

If you are OK with it, I'll start working on SMIE :)

immerrr · 2014-11-10T09:25:40Z

My experience shows that people who do not like of alignment, like you, are in minority, so dropping alignment is not an option right now, but fixing its corner-cases that don't work is welcome. And, of course, feel free to point me to a discussion on lua mailing list showing strong support of such decision and we'll return to this question.

My point is that alignment has too many drawbacks not worth the fanciness it provides.

I think that alignment does optimize code for reading at the expense of increasing modification costs. But I also agree with what I read in software engineering books about reading the code happening much more often than actually modifying it, so it kind of makes sense to me.

What I meant by "convention" is something more or less precisely specified under a name, as the K&R style for C. I do not know such a thing for C

If you value K&R opinion, there are code snippets in their TCPL with function arguments aligned (e.g. in section 6.2):

   middle = makepoint((screen.pt1.x + screen.pt2.x)/2,
                      (screen.pt1.y + screen.pt2.y)/2);

As for your example, table ctor closer IMHO may indeed be placed in both ways. I personally prefer the second one, because less rules is better than more rules and the second way can be justified with the generic "put the closer to the same column as the beginning of the line containing corresponding opener". But yes, this can be made configurable.

Anonymous function example though doesn't really fit the alignment rule I've posted earlier: print('foobar') is the first statement in anonymous function and its opener is on the different line, so there should be no alignment at all. If the argument list continued after the anonymous function, yes there could be some ugliness:

foobar('arg1', function ()
  print('foobar')
end,
       'arg3'
)

But you're always free to rewrite it to avoid alignment (which is what I do myself and saw other people do in other languages):

foobar(
  'arg1',
  function ()
    print('foobar')
  end,
  'arg3'
)

All in all, it seems that my preference/experience matches what you describe as "innermost-only" alignment and it's good that we have an agreement that it looks better.

If you are OK with it, I'll start working on SMIE :)

Cool, I've been doing some research about it and Stefan (the original author) was a tremendous help already, so I'll push what he sent me and what I had myself in nearest future to a publicly available branch.

What do you mean with paperwork?

In short, all code maintained by Free Software Foundation must come from authors who agree to assign the copyright to FSF, basically, so that it can enforce GPL requirements on behalf of the author in court (should it be necessary). Previously, the assignment process involved some dead-tree mail exchange with FSF, but I've seen somewhere that now the process is fully electronic. See here for some information about it.

Ambrevar · 2014-11-10T19:05:06Z

But you're always free to rewrite it to avoid alignment (which is what I do myself and saw other people do in other languages):

In my opinion, it is the role of the indentation engine to adapt to the structure of your code, not the other way around. In the latter case, your editor is constraining you. Then indentation, alignment and reading clarity become vain since you cannot dictate how it should display to make it appear as clear as possible.
Beside this is surely not a good thing for users collaborating with different editors.

My rationale: alignment must be configurable.

I have been inspecting various possibilities with SMIE. However, I have stumbled on one major setback: it seems like SMIE is aligning code in a few different places without any customization possibilities.
So if I am not mistaken, the only way to get non-aligned function arguments is to hack something like this:

(defun smie-indent-exps ()
  ;;; ...
       ((cdr positions)
        ;; We skipped some args plus the function and bumped into something.
        (if (not smie-align)
            ;; Indent once.
            (+ (smie-indent--offset 'args) (current-column))
          ;; Align with the first arg.
          (goto-char (cadr positions))
          (current-column)))
                    ;;; ...

I do not know if anybody can confirm or correct this.
If I happen to be right, I would like to push this customization possibility onto the main trunk before proceeding.

immerrr · 2014-11-11T13:43:54Z

That doesn't seem right.

I always thought that indentation is specified via a rule function with very specific signature that doesn't mandate for or against alignment and one can do whatever they want in that function. But I'm not sure as I have never really touched it. I think the best source of information on smie is the emacs-devel list.

immerrr · 2014-11-11T13:54:42Z

In my opinion, it is the role of the indentation engine to adapt to the structure of your code, not the other way around. In the latter case, your editor is constraining you.

Sometimes a gentle nudge is good :) I appreciate the consistency of Python world "enforced" by PEP-8 style guide. It reduces to minimum the overhead you face when trying to contribute to someone else's code, because basically anything you open looks like your own and you don't need time to adjust to the new style.

Ambrevar · 2014-11-12T08:07:28Z

As far as I got it, with SMIE you define the grammar and the rules. For sh-mode,

cmd -opt1 \
        -opt2

cmd will be detected by the grammar and send the pair (:list-intro )
token to the rule function. (See sh-smie-sh-rules.) The problem is, a
:list-token type can only return a boolean, which is in turn analyzed by
smie-indent-exps. This function is part of SMIE.

A work around would be the change the grammar, but then using SMIE is pointless.

I'll ask on the mailing list.

Ambrevar · 2014-11-12T08:13:13Z

Yes, I think enforcing a style helps a lot reducing all the hassle.
I think Python did not go far enough by simply publishing a rationale.
The thing is that you still have to make sure your editor is set up properly in accordance to the rationale. This requires some extra efforts some developers are not willing to do.

Go went even further with their gofmt.
It is a ready to go. Plus the indentation/style choices are hardly
debatable, which makes it even more appealing. The left the debatable part to
the user.

immerrr · 2014-11-12T13:23:54Z

indentation/style choices are hardly debatable

That seems opinionated :) Tabs/spaces holywar alone is as old as the hills. But any (sane) consistency is good than the lack thereof.

The problem is, a :list-token type can only return a boolean, which is in turn analyzed by smie-indent-exps. This function is part of SMIE.

I see, getting this fixed upstream may take a while. I wonder if one could just return nil for :list-token queries and do the indentation math otherwise.

immerrr · 2014-11-12T14:16:44Z

Here's the smie branch: https://github.com/immerrr/lua-mode/tree/smie
It misses a better tokenizer I remember writing at some point, hopefully I can find it later tonight.

Ambrevar · 2014-11-12T15:05:14Z

indentation/style choices are hardly debatable

That seems opinionated :) Tabs/spaces holywar alone is as old as the hills. But any (sane) consistency is good than the lack thereof.

My bad, wrong phrasing, I meant "reasonable". Then it makes more sense. Don't know why I was so intense here! :)

I will have a look at the SMIE branch later.

ibawt · 2014-12-24T19:42:05Z

@Ambrevar this almost works.

foo( 'some thing', function()
  bar('foodso', function() 
      print('blah')
  end)
end)

is what I would expect.

immerrr · 2020-10-03T11:18:58Z

This is a long-standing PR, and given that this is already supported with lua-indent-nested-block-content-align and lua-indent-close-paren-align variables, I am inclined to respectfully close it.

Please, let me know if you have a reason to believe I need to rethink this decision.

Ambrevar · 2020-10-03T11:23:47Z

No problem, I haven't use Lua in years... :p

Ambrevar added 3 commits November 5, 2014 17:28

lua-goto-matching-block-token: Fix typo in doc

80b56fd

lua-calculate-indentation: Typo in comment

abd6347

Ambrevar mentioned this pull request Nov 7, 2014

Request for support on indentation #31

Open

immerrr force-pushed the master branch 3 times, most recently from 95103e4 to a57e25e Compare March 2, 2015 21:25

immerrr force-pushed the master branch from 0c6daa2 to fbad628 Compare October 22, 2015 06:13

immerrr closed this Oct 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite indentation engine #80

Rewrite indentation engine #80

Ambrevar commented Nov 7, 2014

immerrr commented Nov 7, 2014

Ambrevar commented Nov 10, 2014

immerrr commented Nov 10, 2014

Ambrevar commented Nov 10, 2014

immerrr commented Nov 11, 2014

immerrr commented Nov 11, 2014

Ambrevar commented Nov 12, 2014

Ambrevar commented Nov 12, 2014

immerrr commented Nov 12, 2014

immerrr commented Nov 12, 2014

Ambrevar commented Nov 12, 2014

ibawt commented Dec 24, 2014

immerrr commented Oct 3, 2020

Ambrevar commented Oct 3, 2020 via email

Rewrite indentation engine #80

Rewrite indentation engine #80

Conversation

Ambrevar commented Nov 7, 2014

immerrr commented Nov 7, 2014

Ambrevar commented Nov 10, 2014

immerrr commented Nov 10, 2014

Ambrevar commented Nov 10, 2014

immerrr commented Nov 11, 2014

immerrr commented Nov 11, 2014

Ambrevar commented Nov 12, 2014

Ambrevar commented Nov 12, 2014

immerrr commented Nov 12, 2014

immerrr commented Nov 12, 2014

Ambrevar commented Nov 12, 2014

ibawt commented Dec 24, 2014

immerrr commented Oct 3, 2020

Ambrevar commented Oct 3, 2020 via email