Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace-aware pretty-printer #39

Merged
merged 3 commits into from
Sep 17, 2018
Merged

Whitespace-aware pretty-printer #39

merged 3 commits into from
Sep 17, 2018

Conversation

aantron
Copy link
Owner

@aantron aantron commented Sep 17, 2018

This changes Markup.trim and Markup.pretty_print to respect whitespace in HTML phrasing ("inline") content. So, applying Markup.trim to

<div>
 <p>
  <em>foo</em> bar
 </p>
</div>

produces

<div><p><em>foo</em> bar</p></div>

and, notionally the inverse, applying Markup.pretty_print to

<div><p><em>foo</em>bar</p></div>

produces

<div>
 <p>
  <em>foo</em>bar
 </p>
</div>

The important thing is that Markup.trim did not remove whitespace around </em>, and Markup.pretty_print did not introduce whitespace around </em>. This preserves the word breaks in the HTML.

I went for the simplest possible pretty-printer that is correct with respect to word breaks. It doesn't try to limit the line length or do anything else that could be considered fancy. It just indents the flow ("block") content, and passes phrasing ("inline", word) content straight through. This can result in long lines, but we can adjust that later with more elaborate implementations. The point is that from now, no matter how simple or fancy, or how nice or ugy the output looks, Markup.pretty_print should be correct and safe to use on HTML, modulo bugs :)

As an aside, pretty-printing HTML is not correct in the general case, because the significance of whitespace in each element is up to CSS, and that is outside HTML. Each pretty-printer basically has to make a guess that the input is valid, and that the CSS uses the default interpretations of whitespace for each kind of element.

cc @Ostera, @Drup

Resolves #35.

@Drup
Copy link

Drup commented Sep 17, 2018

@aantron In practice, to ensure that tyxml's users can use this, the best is to provide a function of type 'a Html.t -> content_signal gen ? Could we use seq nowadays ? :)

@aantron
Copy link
Owner Author

aantron commented Sep 17, 2018

Ok, I applied it to some of the expect output of odoc. Here are outtakes:

  1. Empty tags like <meta> stay empty, i.e. don't get a separate, indented closing tag, which they shouldn't:

    <!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml">
     <head>
      <title>
       Markup (test_package.Markup)
      </title>
      <link rel="stylesheet" href="../../odoc.css">
      <meta charset="utf-8">
      <meta name="viewport" content="width=device-width,initial-scale=1.0">
      <script src="../../highlight.pack.js"></script>
      <script>
       hljs.initHighlightingOnLoad();
      </script>
     </head>
  2. Flow elements are indented, but their phrasing content is passed through:

        <h1>
         Module <code>Markup</code>
        </h1>
        <p>
         Code can appear <b>inside <code>other</code> markup</b>. Its display shouldn't be affected.
        </p>
  3. Content of <pre> tags is left completely alone:

         <pre><code class="ml">let foo = ()
    (** There are some nested comments in here, but an unpaired comment
        terminator would terminate the whole doc surrounding comment. It's
        best to keep code blocks no wider than 72 characters. *)
    
    let bar =
      ignore foo</code></pre>

The output also looks visually correct.

@aantron
Copy link
Owner Author

aantron commented Sep 17, 2018

@Drup that sounds like a whole separate big issue. I was thinking that TyXML would wrap this completely, and users wouldn't know that Markup.ml is being called at all.

@Drup
Copy link

Drup commented Sep 17, 2018

@aantron That's something that I precisely would like to avoid, as it adds a significant dependency to tyxml. But I agree it's a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants