GitHub - mbucc/erlfmt: A command-line utility for formatting Erlang code.

mbucc / erlfmt Public

Notifications You must be signed in to change notification settings
Fork 0
Star 22

A command-line utility for formatting Erlang code.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.gitignore		.gitignore
ChangeLog		ChangeLog
Makefile		Makefile
README		README
erlfmt		erlfmt
erlfmt.escript		erlfmt.escript

Repository files navigation

Read Erlang source on stdin, reformat with erl_tidy, and write to stdout.
No options, you get what erl_prettypr:format/2 gives you.


                                  Back story


    The problem seemed simple: read Erlang source code on stdin and send a
    nicely formatted result to stdout.  In vim, I got used to this when I was
    learning Go:

            :%!gofmt

    Just type the code without any concern for formatting, and then format the
    entire file in one command.  I now have xmlfmt, jsfmt, javafmt and I
    wanted a similar utility for Erlang.

    I started with erl_tidy, which

            Tidies and pretty-prints Erlang source code, removing
            unused functions, updating obsolete constructs and
            function calls, etc.

    Seems perfect ... except that erl_tidy does not read from stdin.  I posted
    a question to the Erlang questions mailing list asking if I missed
    something and the answer was a clear no.  People pointed me to alternative
    approaches: an Emacs elisp module, a rebar3 module that wraps erl_tidy,
    and a state-machine/parser written in Erlang that the vim Erlang module
    uses.  The first two didn't solve my problem, and I didn't like the
    complexity of the third approach.

First attempt: use basic Erlang modules
    
    While Googling around, I found a short post showing how you can format
    Erlang code using erl_scan (string -> tokens), erl_parse (tokens -> form)
    and erl_pp (form -> string).  So I started coding that up.  It worked
    great. Initially.

    The first problem I hit was that erl_parse:parse_form/1 does not handle
    white space or comment tokens.  OK, no problem.  To keep moving forward I
    implemented the quick hack of only dropping comments that came inside a
    form, and keeping the ones that came before.  Not great, but I wanted to
    get something working.

    The next problem I hit was pre-processor constructs.  It turns out that
    erl_parse does not understand pre-processor bits either (macros, imports,
    etc).  So, another hack: when we hit a dot-terminated token sequence that
    includes a pre-processor construct, just print out the raw text that
    generated those tokens and leave it at that.

    When that was done, I decided dropping comments was not acceptable.  While
    searching around for how to re-insert comments, I came across a reference
    to the epp_dodger module and a function that re-inserts comments.  Both of
    which is used by erl_tidy.  It seemed stupid to re-write erl_tidy so I
    went back to square one.

Second attempt: teach erl_tidy about stdin

    Erlang can read stdin just fine.  In fact, most of the input functions in
    the io module read from stdin by default.  Erlang provides the standard_io
    atom that you can use for an IODevice argument.  I started hacking on
    erl_tidy, intending to use the "special" file name of a single dash (-) to
    tell erl_tidy to read from stdin.

    It was simple to add a new read_module("-", Opts) and pass standard_io to
    epp_dodger:parse/3.  But, the next chunk of erl_tidy logic reads comments
    from the same file again, which is not possible with stdin, since the
    stream was already consumed by parsing.

    I briefly looking to see if I could somehow turn a string into an IODevice
    (like you can in Java), but nothing turned up.  So, it seems like I have
    to write stdin to a file in order to use erl_tidy.

Third attempt: call erl_tidy:file/1 from a shell script.

    It was trivial to write a shell script that pipes stdin to a file.
    Calling erl_tidy from the shell script was not.  My first attempt looked
    like this:

            $ erl -run erl_tidy file $TMPFN

    which produced:

            =ERROR REPORT==== 9-May-2016::20:13:30 ===
            erl_comment_scan: bad filename: `['hmmmm_sup.erl']'

    and hung there, waiting in the Erlang interpreter.

    It turns out that when use the -run flag and pass it arguments, Erlang
    assumes the receiving function has one argument---a list.  That's why the
    filename in the error message has brackets around it ... it's an list not
    a string..

Final attempt: call escript from a shell script

    The final result is what you see in the repository now.   It's so easy to
    pipe stdin to a file in a shell script that I saw no reason to re-write
    that in Erlang.  So I added a short escript that unpacks the file name
    from the list and passes that to erl_tidy.

    The result: I was able to erlfmt all 1,795 source files under
    /usr/local/Cellar/erlang/18.2.1, with only one error:

            /usr/local/Cellar/erlang/18.2.1/lib/erlang/lib/wx-1.6/src/gen/gl.erl
            ./erlfmt: line 6: 25399 User defined signal 2: 31 ./erlfmt.escript "$TMPF"

    That file is a generated one, and is 971KB in size.  I got the above error
    when I had erl_tidy write the reformatted file to the temporary file.
    When instead I told erl_tidy to write the result to stdout, I got this
    error:

            escript: exception exit: badarg
              in function  erl_tidy:file/2 (erl_tidy.erl, line 295)
              in call from erl_eval:local_func/6 (erl_eval.erl, line 557)
              in call from escript:interpret/4 (escript.erl, line 787)
              in call from escript:start/1 (escript.erl, line 277)
              in call from init:start_it/1 
              in call from init:start_em/1 
            
    It successfully parsed the other 1,794 files, which gives success rate of
    0.99945.  Not too hot in the Erlang world, but I don't expect to be
    formatting files that big.  Good enough.

The back story back story.

    Why such a long README?  Erlang questions mailing list recently had an
    interesting thread on code documentation ("rhetorical structure of code").
    It was a long thread and a lot of things were said, but one in particular
    stuck with me: code documentation rarely shows the starts and stops, the
    litany of failed experiments that you encounter on the way to the final
    product.  And that those failed experiements are sometimes useful in the
    future.