Formatter: Performance [Stretch] #2571

MichaReiser · 2022-05-10T07:22:01Z

This issue is mainly a place where I collect ideas on how we can improve the Formatter's performance (probably in significant ways).

Flat `FormatElement` Structure

I expect this to improve the performance because it can greatly reduce the number of allocations the formatter does.

What's this about. The idea is to change the nested FormatElement structure to a flat structure:

Empty -> []
List(Vec(Elements)) -> [...Elements]
Group(content) -> [StartGroup, ...Content, EndGroup]
Fill(elements) -> [StartFill, ...elements, EndFill]
Indent(content) -> [StartIndent, ...content, EndFill]

Why should this improve performance:

It removes the need to perform heap allocations for Groups, and Indent.
The Printer get's a single continuous array with elements. Removes the deref Content / Vec pointers. May even be possible to write the Printer in a way that it no longer needs it's inner queue and instead can just use a pointer to the position in the input document.

Challenges

The main challenges will be

Rewriting the Printer to work with a flat list
Rewriting split_trivia to work efficiently. (Maybe no longer needed after re-writting the comment formatting?)

Shared `FormatElement` Buffer #2634

The gist of the idea is to change Format to no longer return the formatted document but instead make the format method write to a mutable `formatter.

// Before 
fn format(&self, formatter: &Formatter) -> FormatResult<Document> {
  return formatted![formatter, self.if_token().format(), space_token(), self.l_paren().format()]
}

// After
 fn format(&self, formatter: &mut Formatter) -> FormatResult<Document> {
  // Writes the format elements to `formatter` that maintains a shared buffer 
  write!(formatter, self.if_token().format(), space_token())?;
  write!(formatter, self.l_paren().format())?;
}

This may look familiar to you. And there's good reason why, this design is heavily inspired by Rust's format, format_args, and write.

How does this improve performance

Reduces the heap allocation from one allocation per format call to 1 for formatting the whole document (excluding the heap allocations happening inside of a specific Format implementation)
... I think that's more than reason enough

Why I'm excited about this

I haven't ran any benchmarks myself but there exists a benchmark for rust template libraries and the interesting thing is that you can group the libs in two groups:

Libraries that do heap allocations
Libraries that don't do heap allocations

And the libraries not doing heap allocations play in a complete different league: link. Just take a look how fast write is compared to something like Tera or Handlebars.

Challenges

Again, SplitTrivia, is_empty etc.
This changes the formatter ideom from a functional style to an imperative API where order matters. That means it will be necessary to re-write/review every formatter.

Migration

I think it should be doable to implement a cheap v1/v2 adapter if Flat FormatElement is in place. This would allow us to rewrite the formatting on a per-node basis

Inspiration

Take a look at this branch where I played around with this a little while ago.

Depends on Flat FormatElement Structure

The text was updated successfully, but these errors were encountered:

…acros (#2634) > tldr: Rust's allocation free `format`, `write` and `format_args` for Rome formatter! First of all, I'm sorry for this massive PR. The motivation behind this PR is to change our formatting to work from left to right. Something that may become useful when formatting comments. The current formatter formats the most-grouped elements first rather than left to right, mainly because all IR elements like `group_elements`, `fill` etc. accept a `FormatElement` as a parameter and not a `Format` implementation. The other motivation behind this PR is to make all formatting macros allocation free compared to the current `formatted!` and `format_elements` macro that requires at least one allocation (except the compiler optimises it away). This PR enforces left to right formatting by changing Format's signature from: ```rust fn format(&self, f: &Formatter<JsFormatOptions>) -> FormatResult<FormatElement> ``` to ```rust fn format(&self, f: &mut Formatter<JsFormatOptions>) -> FormatResult() ``` The main change is that `format` no longer returns the formatted result but instead writes it into the `Formatter`, similar to Rust's `Debug` and `Display` trait with the `write` and `format_args` macros. The fact that `format` now writes to a shared `FormatElement` buffer enforces format rules to write the element in order or they will appear out of order in the output. Second, the PR changes all builders (`group_elements`, `fill`, `space`) to return objects that implement `Format` rather than `FormatElement`s directly. This decouples the formatting DSL from the IR used by the printer. The other added benefit is that builders that accept some inner content no longer accept `FormatElement` but instead accept any object implementing `Format`. This should remove the need for many `formatted!` calls that were necessary just because some helper needs a `FormatElement` because it's the least common denominator. OK, but how do I write my formatting logic now: ```rust impl FormatNodeFields<JsFunctionBody> for FormatNodeRule<JsFunctionBody> { fn format_fields( node: &JsFunctionBody, f: &mut Formatter<JsFormatOptions>, ) -> FormatResult<()> { let JsFunctionBodyFields { l_curly_token, directives, statements, r_curly_token, } = node.as_fields(); let format_statements = format_with(|f| { let mut join = f.join_nodes_with_hardline(); for stmt in &statements { join.entry(stmt.syntax(), &stmt.format_or_verbatim()); } join.finish() }); write!( f, [f.delimited( &l_curly_token?, &format_args![directives.format(), format_statements], &r_curly_token?, ) .block_indent()] ) } } ``` The main differences are * You call `write!(f, [])` instead of `formatted` if you want to write something to the document. * You use `format_args` if you want to pass multiple `Format` objects to a helper like `group_elements`. * The formatter now exposes helpers like `f.join()`, `f.join_with()`, `f.fill()` and `f.join_nodes_with_softline_break` etc to format a sequence of objects Part of #2571

github-actions · 2022-09-21T12:04:42Z

This issue is stale because it has been open 14 days with no activity.

MichaReiser mentioned this issue May 10, 2022

☂️ Project: Multi-language Formatter #2565

Closed

8 tasks

MichaReiser added this to Rome 2022 May 10, 2022

MichaReiser mentioned this issue Jun 1, 2022

refactor(formatter): Introduce write, format, and format_args macros #2634

Merged

5 tasks

MichaReiser self-assigned this Sep 5, 2022

This was referenced Sep 5, 2022

perf(rome_formatter): Add Buffer.elements #3156

Merged

refactor(rome_formatter): Flat IR #3160

Merged

MichaReiser linked a pull request Sep 21, 2022 that will close this issue

refactor(rome_formatter): Flat IR #3160

Merged

github-actions bot added the S-Stale label Sep 21, 2022

MichaReiser removed the S-Stale label Sep 21, 2022

MichaReiser closed this as completed in #3160 Sep 23, 2022

MichaReiser moved this to Done in Rome 2022 Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formatter: Performance [Stretch] #2571

Formatter: Performance [Stretch] #2571

MichaReiser commented May 10, 2022 •

edited

Loading

github-actions bot commented Sep 21, 2022

Formatter: Performance [Stretch] #2571

Formatter: Performance [Stretch] #2571

Comments

MichaReiser commented May 10, 2022 • edited Loading

Flat FormatElement Structure

Shared FormatElement Buffer #2634

github-actions bot commented Sep 21, 2022

MichaReiser commented May 10, 2022 •

edited

Loading

Flat `FormatElement` Structure

Shared `FormatElement` Buffer #2634