Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: A different approach to avoiding trailing whitespace #139

Merged
merged 6 commits into from
Jun 30, 2020
Merged

Conversation

sjakobi
Copy link
Collaborator

@sjakobi sjakobi commented Jan 27, 2020

removeTrailingWhitespace was introduced in response to the issue of trailing whitespace on otherwise empty lines in the output of dhall (#46).

removeTrailingWhitespace does quite a bit more than just remove whitespace on otherwise empty lines though:

  • It also strips SText and SChar of any trailing whitespace.
  • It keeps track of any surrounding annotations and avoids stripping whitespace inside annotations.

These extra features are also useful, but they aren't needed to address the original issue.

This patch addresses the original issue more directly: When we encounter a Line during layout, we check how the document continues before adding any indentation.

This is sufficient to address the original issue in dhall.

What I'm currently unsure about is the importance of annotations: Would this break any documents that try to highlight whitespace?!


TODO:

  • Benchmarking
  • Tests
  • Documentation for rTW
  • Documentation for indent

Travis status, since Github doesn't show it: https://travis-ci.org/github/quchen/prettyprinter/pull_requests

@sjakobi
Copy link
Collaborator Author

sjakobi commented Feb 9, 2020

Idea: In principle this behaviour could be controlled via the LayoutOptions.

@quchen
Copy link
Owner

quchen commented Mar 2, 2020

I totally agree that stripping should be done by the layouter, and not as a pass afterwards. The reason I chose the second pass is to not complicate the layouter even more (it’s quite big already), and to be able to configure options better, such as whether to strip inside annotations. Being afraid of unknown edge cases, basically. Now, years later, we have a much better idea of what those edge cases are, and I think putting it in the layouter is a good idea; if it can also be good code, I’m all in.

@sjakobi
Copy link
Collaborator Author

sjakobi commented May 22, 2020

Actually, I suspect that dhall might not profit from this too much, since we might want to keep using
removeTrailingWhitespace anyway to clean up comments etc.. It might be useful for other applications though.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Jun 14, 2020

I benchmarked this using dhall with dhall --plain decode --file cpkg.dhallb > /dev/null.

It seems to incur a penalty of about 3% on the timings, which seems acceptable.

@sjakobi sjakobi added the PVP: major Requires a major version bump label Jun 14, 2020
@sjakobi
Copy link
Collaborator Author

sjakobi commented Jun 15, 2020

What I'm currently unsure about is the importance of annotations: Would this break any documents that try to highlight whitespace?!

Currently, this patch would indeed remove trailing whitespace within annotations that removeTrailingWhitespace would preserve.

We could add annotation accounting to layoutWadlerLeijen, but I'm not too keen on the overhead. Probably worth testing (benchmarking) anyway.

Admittedly I'm not very convinced of the practical usefulness of the annotation accounting in rTW. Is there any evidence that users actually want to highlight trailing whitespace?


EDIT: Also, the whitespace we're talking about here is kind of special – it's incurred by indenting things, not by trailing spaces in literals or something.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Jun 15, 2020

A few preliminary tests in the REPL (using variants of indent 2 (align (vsep ["x", ""]))) indicate that the annotation accounting in best seems to work. It's actually simpler than I expected.

One limitation is with the first line of an indented expression:

> indent 2 (align (vsep ["", "y"]))
  
  y

This still produces trailing whitespace on the otherwise empty first line.

The problem is this whitespace is not a Line constructor, but a Text:

indent
:: Int -- ^ Number of spaces to increase indentation by
-> Doc ann
-> Doc ann
indent i d = hang i (spaces i <> d)

I'm not convinced that we should start checking Text literals in layoutWadlerLeijen though. That seems rather expensive to me. We could rather consider adding a dedicated Spaces constructor.

In any case, I don't think that this is a problem that we'll encounter a lot in practice. I'd rather wait for a bug report!

@sjakobi
Copy link
Collaborator Author

sjakobi commented Jun 16, 2020

In the same dhall benchmark, the annotation accounting seems to slow things by another 1–1.5%.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Jun 16, 2020

I've benchmarked the same changes using the large-output benchmark, with roughly the same results:

  • Making the indentation conditional on the following document adds an overhead of ~3%.
  • Additionally keeping track of annotations seems to cost another 1–1.5%.

If this patch allows some users to get rid of removeTrailingWhitespace, I think that's worth it.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Jun 16, 2020

@quchen This still needs some tests and polish, but the core approach seems to good to me. Could you review?

@quchen
Copy link
Owner

quchen commented Jun 17, 2020

This is nicer than I would have expected! The only question that remains is whether some users want to keep ordinary trailing whitespace, but I think that’s a corner-use-case, and the vast majority of users might actually want to have trailing whitespace stripped by default.

If someone really wants to keep trailing whitespace without annotations, they can hack around this by adding a global dummy annotation around the whole document.

So: LGTM, up to the lack of tests :-)

@sjakobi
Copy link
Collaborator Author

sjakobi commented Jun 17, 2020

Thanks for the review, @quchen! :)

I've changed my mind about the annotation handling here though. The reason is that the whitespace that we avoid adding here is always the result of the user nesting a document in order to position it on the page. The user never intended to produce whitespace – the whitespace is just a means of positioning the document.

So even when this indentation happens within an annotation, it seems implausible that the user wants to protect or even highlight this kind of whitespace.

@sjakobi sjakobi mentioned this pull request Jun 17, 2020
@quchen
Copy link
Owner

quchen commented Jun 18, 2020

Also true. In general I think actually wanting trailing space of any kind is a rare use case. If this is really desirable, this may not really be the right library for it. And even when a user really really wants it, they might introduce a custom annotation with an int that stands for the amount of trailing whitespace, and then write their own renderer that respects just that.

tl;dr trim all the trailing space :-)

@quchen
Copy link
Owner

quchen commented Jun 18, 2020

This PR should then also deprecate the current removeTrailingWhitespace function, replacing its implementation with id, no?

@sjakobi
Copy link
Collaborator Author

sjakobi commented Jun 18, 2020

Thanks again for the feedback!

This PR should then also deprecate the current removeTrailingWhitespace function, replacing its implementation with id, no?

rTW actually still fills a role for dhall where it strips trailing whitespace from user comments while preserving whitespace in (annotated) string literals.

Clearly, it's more expensive than just properly stripping the comments though, so we should eventually get rid of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PVP: major Requires a major version bump
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants