Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

Formatter: Performance [Stretch] #2571

Closed
Tracked by #2565
MichaReiser opened this issue May 10, 2022 · 1 comment · Fixed by #3160
Closed
Tracked by #2565

Formatter: Performance [Stretch] #2571

MichaReiser opened this issue May 10, 2022 · 1 comment · Fixed by #3160
Assignees

Comments

@MichaReiser
Copy link
Contributor

MichaReiser commented May 10, 2022

This issue is mainly a place where I collect ideas on how we can improve the Formatter's performance (probably in significant ways).

Flat FormatElement Structure

I expect this to improve the performance because it can greatly reduce the number of allocations the formatter does.

What's this about. The idea is to change the nested FormatElement structure to a flat structure:

  • Empty -> []
  • List(Vec(Elements)) -> [...Elements]
  • Group(content) -> [StartGroup, ...Content, EndGroup]
  • Fill(elements) -> [StartFill, ...elements, EndFill]
  • Indent(content) -> [StartIndent, ...content, EndFill]

Why should this improve performance:

  • It removes the need to perform heap allocations for Groups, and Indent.
  • The Printer get's a single continuous array with elements. Removes the deref Content / Vec pointers. May even be possible to write the Printer in a way that it no longer needs it's inner queue and instead can just use a pointer to the position in the input document.

Challenges

The main challenges will be

  • Rewriting the Printer to work with a flat list
  • Rewriting split_trivia to work efficiently. (Maybe no longer needed after re-writting the comment formatting?)

Shared FormatElement Buffer #2634

The gist of the idea is to change Format to no longer return the formatted document but instead make the format method write to a mutable `formatter.

// Before 
fn format(&self, formatter: &Formatter) -> FormatResult<Document> {
  return formatted![formatter, self.if_token().format(), space_token(), self.l_paren().format()]
}

// After
 fn format(&self, formatter: &mut Formatter) -> FormatResult<Document> {
  // Writes the format elements to `formatter` that maintains a shared buffer 
  write!(formatter, self.if_token().format(), space_token())?;
  write!(formatter, self.l_paren().format())?;
}

This may look familiar to you. And there's good reason why, this design is heavily inspired by Rust's format, format_args, and write.

How does this improve performance

  • Reduces the heap allocation from one allocation per format call to 1 for formatting the whole document (excluding the heap allocations happening inside of a specific Format implementation)
  • ... I think that's more than reason enough

Why I'm excited about this

I haven't ran any benchmarks myself but there exists a benchmark for rust template libraries and the interesting thing is that you can group the libs in two groups:

  • Libraries that do heap allocations
  • Libraries that don't do heap allocations

And the libraries not doing heap allocations play in a complete different league: link. Just take a look how fast write is compared to something like Tera or Handlebars.

Challenges

  • Again, SplitTrivia, is_empty etc.
  • This changes the formatter ideom from a functional style to an imperative API where order matters. That means it will be necessary to re-write/review every formatter.

Migration

I think it should be doable to implement a cheap v1/v2 adapter if Flat FormatElement is in place. This would allow us to rewrite the formatting on a per-node basis

Inspiration

Take a look at this branch where I played around with this a little while ago.

Depends on Flat FormatElement Structure

MichaReiser added a commit that referenced this issue Jun 6, 2022
…acros (#2634)

> tldr: Rust's allocation free `format`, `write` and `format_args` for Rome formatter!

First of all, I'm sorry for this massive PR. 

The motivation behind this PR is to change our formatting to work from left to right. Something that may become useful when formatting comments. The current formatter formats the most-grouped elements first rather than left to right, mainly because all IR elements like `group_elements`, `fill` etc. accept a `FormatElement` as a parameter and not a `Format` implementation. The other motivation behind this PR is to make all formatting macros allocation free compared to the current `formatted!` and `format_elements` macro that requires at least one allocation (except the compiler optimises it away). 

This PR enforces left to right formatting by changing Format's signature from:

```rust
fn format(&self, f: &Formatter<JsFormatOptions>) -> FormatResult<FormatElement>
```

to 

```rust
fn format(&self, f: &mut Formatter<JsFormatOptions>) -> FormatResult()
```

The main change is that `format` no longer returns the formatted result but instead writes it into the `Formatter`, similar to Rust's `Debug` and `Display` trait with the `write` and `format_args` macros. The fact that `format` now writes to a shared `FormatElement` buffer enforces format rules to write the element in order or they will appear out of order in the output. 

Second, the PR changes all builders (`group_elements`, `fill`, `space`) to return objects that implement `Format` rather than `FormatElement`s directly. This decouples the formatting DSL from the IR used by the printer. The other added benefit is that builders that accept some inner content no longer accept `FormatElement` but instead accept any object implementing `Format`. This should remove the need for many `formatted!` calls that were necessary just because some helper needs a `FormatElement` because it's the least common denominator. 

OK, but how do I write my formatting logic now: 


```rust
impl FormatNodeFields<JsFunctionBody> for FormatNodeRule<JsFunctionBody> {
    fn format_fields(
        node: &JsFunctionBody,
        f: &mut Formatter<JsFormatOptions>,
    ) -> FormatResult<()> {
        let JsFunctionBodyFields {
            l_curly_token,
            directives,
            statements,
            r_curly_token,
        } = node.as_fields();

        let format_statements = format_with(|f| {
            let mut join = f.join_nodes_with_hardline();

            for stmt in &statements {
                join.entry(stmt.syntax(), &stmt.format_or_verbatim());
            }

            join.finish()
        });

        write!(
            f,
            [f.delimited(
                &l_curly_token?,
                &format_args![directives.format(), format_statements],
                &r_curly_token?,
            )
            .block_indent()]
        )
    }
}
```

The main differences are 

* You call `write!(f, [])` instead of `formatted` if you want to write something to the document. 
* You use `format_args` if you want to pass multiple `Format` objects to a helper like `group_elements`. 
* The formatter now exposes helpers like `f.join()`, `f.join_with()`, `f.fill()` and `f.join_nodes_with_softline_break` etc to format a sequence of objects

Part of #2571
@MichaReiser MichaReiser self-assigned this Sep 5, 2022
@MichaReiser MichaReiser linked a pull request Sep 21, 2022 that will close this issue
@github-actions
Copy link

This issue is stale because it has been open 14 days with no activity.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant