Skip to content

Understanding mathjax performance

Peter Krautzberger edited this page Jul 24, 2013 · 13 revisions

This is working draft.

This posting gives an overview over the different aspects that affect MathJax performance.

"Real" size

A full download of the MathJax code is ~22MB, but most of it is due to the legacy picture fonts (~9.5MB), , the unpacked folder (containing the code before it was compressed -- ~4.1MB), and the configuration folder (~2.8MB -- most pages need only one configuration file but more later).

In other words, what's "really" MathJax, is MathJax.js as well as the extension, localization and jax folders, and the webfonts -- summing up to ~5MB.

However, MathJax will never actually need all of these 5MB. E.g., we offer webfonts in 4 format, which exist for specific (older) browsers who can't use the current webfonts standard -- woff).

So as a first approximation: "all of MathJax", i.e., all input and output options and their extensions that a user would ever have download ~3.5MB.

But in real life 1 input + 1 output is used, which is ~1.5MB (and sending compressed files should bring it down to ~650KB).

As a comparison: the average web page is ~1.5MB in June 2013 according to the http-archive.

Effective size and MathJax's modularity

The effective load a visitor experiences is lower still since most pages don't use all MathJax features at once.

MathJax is highly modular even within a single input or output option. MathJax will only load those components which are actually needed for the mathematical content found on a page.

For example, if MathJax is configured to render TeX input to HTML output, it won't load the components needed for certain LaTeX packages unless there's content in the page using them. Similarly, it will only load those webfonts files containing the characters actually needed.

The same principle applies to multiple input options: if e.g. the configuration allows both MathML and TeX input, but the page only contains MathML, then no TeX components will be loaded.

We do not have specific data, but we estimate that the effective size is 500kb -- 1MB (uncompressed).

We need to balance the benefit of modularizing with the number of network connections.

We offer some ways for authors to optimize the effective size via the combined MathJax configuration files (see below).

But this balance must be revisited regularly and more options could help.

Caching

In addition to the size of the MathJax components, caching improves performance after the first load.

Once any MathJax components have downloaded, they will remain in the browser cache for a specific time (usually 1 week) so a visitor will usually only download them on the visit to the very first page using MathJax and skip this particular performance drain in later visits.

While browser caching is separated per domain (for security reasons), the MathJax CDN let's page authors benefit from each other: if a user visits one site using the CDN, than any other site using the CDN will benefit from the MathJax components already cached while during the visit to the first site.

While not adding to performance on an initial visit, caching improves speed on any future visit.

Alternative and additional caching methods could expand and optimize this performance benefit.

Optimizing download of components via configuration files

The download of MathJax components can be optimized by the page author.

On the one end of the spectrum, we provide combined configuration files which compile specific input and output components into a single file. These are useful for page authors who know exactly which MathJax components their content will require.

As the name suggests, combined configuration files combine various components into one large file. This allows page authors to specify the components they want to load up front as one big file rather than many parallel files later, speeding up processing. For example, the TeX-AMS_HTML configuration file loads the TeX-input with its AMS-math extensions as well as configuring the HTML-output.

On the other end of the spectrum, a page author who wants everything to load asynchronously can use extremely light configurations which leave it to MathJax to queue the download of its components. This is often good for community sites that have pages with math.

Many sites do not configure MathJax efficiently. We could provide tools to analyze configurations and create more options.

MathJax Processing

MathJax processing of a page has three stages, one pre-processing and two processing stages as described at http://docs.mathjax.org/en/latest/model.html.

Pre-processing.

Pre-processing identifies mathematical content on a page (MathML, TeX, different TeX-delimiters etc) and converts it into a standard input format (script-tags). While this pre-processing can be done server-side, it's not a bottleneck and very little performance is gained by optimizing here.

Input-processing

An Input-Jax will process the input into MathJax's internal format (which is essentially MathML).

This process is already very fast. While it could theoretically benefit from parallelization (e.g. via webworker), the benefits will only be noticeable in pages with a very large amount of mathematical content or extremely large equations (e.g. we've seen a 80,000 line MathML equation a while ago). Other bottlenecks are much more critical.

Since the input processors are modular, network latency can create delays as components are loaded as they are needed. This is the core problem of balancing modularity vs network activity and needs to be revisited as network speed and processing power develop.

Output-processing.

The third part of MathJax processing is the generation of its output which currently comes in two ways: HTML-CSS or SVG.

The output generation is the second performance bottleneck of MathJax.

The key problem with the MathJax output lies in the problem that math layout is a bottom-up process while HTML-CSS is a top-down process. CSS layout algorithm determines the width of a parent element and then descends to its children to determine their widths and later on determines the heights. This limits the quality of output one can gain with current HTML methods.

MathJax essentially implements the Knuth-Plass algorithm, which goes bottom-up, determining the widths and heights of the children before determining the width and height of a parent.

This is the core problem: top-down vs bottom up.

However, SVG is often ~25% faster than HTML which is due to an additional problem with HTML layout. While the SVG output can reliably calculate relative sizes within an equation internally, HTML/CSS runs into browser deficiencies that force it to layout the content -- a performance drain as browsers are not designed to layout content repeatedly.

First, browsers do not reliably allow the calculation of width -- simply put, the sum of the width of characters is not the width of the string as it's laid out by the browser. To get around this, MathJax has to measure the substrings/subequations by laying them out and asking the browser to measure them. This problem naturally occurs recursively and shows dramatically in complex equations.

Next, browsers do not provide javascript access to all font metrics (let alone modern features like OpenMath tables). That's why MathJax need to provide the metrics separately, which is the reason why MathJax only supports a handful of fonts.

While width can be measured correctly as mentioned above, height cannot be measured correctly since browsers provide only the font height/depth (the maximal height/depth of any character in the font). MathJax has to compensate for these incorrect measurements.

Preliminary tests have shown that deactivating these measurements will speed up the HTML output to the level of the SVG output. However, this will currently come at a loss of rendering quality (although the preliminary tests have shown that modern browsers do a much better job). We can work with browser vendors to improve things on their end, e.g. the Chrome team seems interested in this; the necessary browser improvements could increase typesetting quality in browsers in general.

Ways forward

While MathJax is a large javascript library, the effective size is much smaller in practice thanks to its modular structure. But this modular structure adds overhead. The rendering process itself is complex and can be slow on older and mobile devices.

We face a difficult situation: on slow machines (like mobile), the download of MathJax is overshadowed by slow rendering whereas on fast machines rendering is overshadowed by network calls for missing components.

Options for moving forward

The following are not exclusive to each other.

Optimizing MathJax loading

On current desktop/laptop CPUs, the dominant performance issues are the delays due to asynchronous download of components. This also affects mobile even if actual rendering performance is a problem there.

We should investigate how to optimize this. Some ideas are

  • improved combined files
    • Creating better combined configuration files as well as tools for page authors to build optimized packages would reduce latency issues.
  • lazy pre-loading
    • Creating an option for MathJax components to download in the background after a page has finished. This would improve performance on subsequent pages or dynamically injected content.

Optimizing the current output algorithm

One way forward is seek new ways to optimize the SVG and HTML output.

As mentioned, the SVG output is often 25% faster than the HTML output. Improvements could only be made if browsers themselves become more reliable.

We can investigate current javascript optimization techniques.

We can also develop speed profiling tools for content providers to narrow down performance problems related to MathJax.

Optimizing perceived performance

Both the latency and performance issues are especially a perception problem. Even though the page is readable quickly, users perceive the processing as slow.

By tweaking the way content appears on the page, we could reduce the impression.

  • multi-pass layout
    • We can add a first "quick&dirty" rendering and then re-render until full TeX-quality is achieved.
    • We can
  • rendering small equations before large ones
    • Due to the recursive nature of our output, complex equations take much longer. In combination with equation-chunking (the number of equations MathJax will reveal on a page at once), this can lead to negative perceived performance. For example, a page rarely starts with a highly complex equation but usually has a number of small inline equations before a complicated one shows up. However, the chunking prevents those small ones to show up until the large ones are typeset. An size-oriented chunking could reduce this problem.
  • local storage
    • Local storage could save rendered output and MathJax wouldn't have to re-typeset while a user browses back and forth.

Improving browser infrastructure

We can try to work with browser vendors to improve the browser behavior.

  • Enabling better webfont APIs (to reduce our hacks to detect webfonts arrival)
  • remove the width-measuring problems
  • allow javascript to access font metrics, openmath tables to become font agnostic
  • improve a new layout algorithm that is HTML-focused

The advantage would be that MathJax could help move browser vendors to enable better typesetting tools in general. This would be a big step forward in general.

Creating a new HTML output algorithm

A very basic problem is that the Knuth-Plass bottom-up algorithm we use has to work against the top-down HTML/CSS layout algorithms. This problem cannot be resolved and affects performance.

We could investigate a fundamentally new approach, letting the browser do the layout for us. The latest CSS modules such as flexbox could enable native rendering speed while offering much improved rendering.

Dropping support for legacy browsers

An big questions is how much support for legacy browsers, in particular IE<9, is holding speed back. Browser JavaScript engines changed the way of optimizing javascript execution.

Caveat emptor: this would probably lead to a re-write of much of MathJax.

Remarks on speed

A simple test indicates that the output rendering speed varies greatly across platforms.

For example, on a 2011 macbook pro, rendering (no downloads, everything cached) of https://en.wikipedia.org/wiki/Matrix_multiplication

* Chrome: html: ~2500ms, svg:~1850ms
* Safari: html: 1450ms, svg:~1000ms, mathml: ~300ms
* Firefox: html:  ~3300ms, svg:~2400ms, mathml:~880ms

Notes

Safari SVG output is already close to the performance of Firefox Native MathML. But it's hard to judge the 300ms for Safari's MathML since that implementation is incomplete -- it's easy to be fast if you're not doing the job. However, in Safari's defense, the page is within the range of its abilities.

Another comparison: a copy of the page, with SVG output and equation chunking to 100 (so that it's one go).

* Nexus 7, Galaxy Nexus/ Chrome:  ~18sec  
* Nexus 7 / Dolphin Browser: ~8.5sec
* iPad (2013) / Safari: ~3sec
* iPhone 4 / Safari: ~6sec

on ubuntu 13.04, i7:

* Firefox 22: ~2sec
* Chromium 28: ~1.6sec
* Chrome 29: ~1.7sec
* Windows 8 / IE 10: ~2.5sec (virtualbox + Microsoft's free testing VM)
* Windows 7 / IE9: ~3sec (virtualbox + Microsoft's free testing VM)
* Windows 7 / Firefox 22: ~3.8sec (virtualbox + Microsoft's free testing VM)
* Windows 8.1 / IE11: ~3sec (virtualbox + release candidate)

A copy of the page with TeX converted to MathML was ~0.5-1 sec slower.

Clone this wiki locally