Skip to content

Latest commit

 

History

History
207 lines (175 loc) · 9.71 KB

1722-math.md

File metadata and controls

207 lines (175 loc) · 9.71 KB

Support for displaying math(s) in messages

Some users need to communicate using mathematical notation. Matrix should provide a common format for sending mathematical notation so that users using different clients can communicate with each other.

This proposal defines a format for sending messages with mathematical notation. Note that it does not define how to input mathematical notation; clients are free to use different input methods, as long as they can generate the required message format.

See also:

Proposal

The HTML subset supported by Matrix in the formatted_body property of messages with "format": "org.matrix.custom.html" will be extended to support Presentation MathML. Presentation MathML is used rather than Content MathML because Presentation MathML seems to be better supported. Other markup formats can be transmitted along with the MathML using the Annotation framework.

In other words, let HM be the HTML subset currently supported by Matrix in the formatted_body property of messages with "format": "org.matrix.custom.html", and let MP be Presentation MathML. We propose to extend the HTML subset supported by Matrix by allowing clients to support HM=HMMP. (Note that AMP, where A is the Annotation framework.)

Clients should replace the mathematical notation with something more human-readable in the body property of the message. However, this proposal does not specify what form this should take.

Example (with line breaks and indentation added to formatted_body for clarity):

{
  "content": {
    "body": "This is an equation: sin(x)=a/b",
    "format": "org.matrix.custom.html",
    "formatted_body": "This is an equation:
      <math>
        <semantics>
          <mi>sin</mi><mo>&#x2061;</mo><mfenced><mi>x</mi></mfenced><mo>=</mo><mfrac><mi>a</mi><mi>b</mi></mfrac>
          <annotation encoding=\"application/x-latex\">\\sin(x)=\\frac{a}{b}</annotation>
          <annotation encoding=\"text/html\">
            sin(<i>x</i>)=<sup><i>a</i></sup><sub><i>b</i></sub>
          </annotation>
        </semantics>
      </math>",
    "msgtype": "m.text"
  },
  "event_id": "$eventid:example.com",
  "origin_server_ts": 1234567890
  "sender": "@alice:example.com",
  "type": "m.room.message",
  "room_id": "!soomeroom:example.com"
}

Other solutions

  • LaTeX (or LATEX): LaTeX is a popular method for writing mathematical texts, and is fairly readable. However, "LaTeX" is not a single format; there are several popular extensions such as AMS-LaTeX that different implementations may or may not support. There are also certain (La)TeX commands that should probably not be supported, such as \newcommand, as it could be used create an infinite loop, which may crash an implementation that is not sufficiently careful. (La)TeX is Turing complete, which is, from a security standpoint, not a good property for transmitting documents. Therefore using LaTeX as the format for sending mathematical notation in Matrix events would require specifying which (sub|super)set of LaTeX should be supported.

    An alternative to specifying the set of supported commands may be to allow clients to send arbitrary LaTeX, and if it contains a command that the receiving client does not support, then the receiving client should fall back to displaying the raw LaTeX, relying on the readability of LaTeX and/or the fact that people who are communicating about more complicated mathematics are likely to be able to understand the requisite LaTeX. This may give an inconsistent user experience, but would also provide clients that are unable to support proper display of mathematics with an easy fallback. This also does not address security concerns, and it would be up to client authors to ensure that their code for displaying mathematics, or the library that they use, is not vulnerable to any potential attacks.

    If LaTeX is used, then it must be delimited in some way, most likely by wrapping it in some element. One option would be to use a custom Matrix-specific element such as <mx-math> (this is similar to how replies use the <mx-reply> element). Other options include using a <span> with a custom class (such as <span class="math">), or a <script> element (e.g. <script type="math/tex">, as MathJax uses). The containing element may also provide a facility for providing fallbacks for clients that do not support mathematical notation. There is much bikeshedding opportunity here.

    For comparison, the same example above, sent using a LaTeX method, might look like (again, with line breaks and indentation added to the formatted_body for clarity):

    {
      "content": {
        "body": "This is an equation: sin(x)=a/b",
        "format": "org.matrix.custom.html",
        "formatted_body": "This is an equation:
          <mx-math latex=\"\\sin(x)=\\frac{a}{b}\">
            sin(<i>x</i>)=<sup><i>a</i></sup><sub><i>b</i></sub>
          </mx-math>",
        "msgtype": "m.text"
      },
      "event_id": "$eventid:example.com",
      "origin_server_ts": 1234567890
      "sender": "@alice:example.com",
      "type": "m.room.message",
      "room_id": "!soomeroom:example.com"
    }

    In this example, the <mx-math> element uses a latex attribute to convey the LaTeX markup, and the contents of the element (in this case, a rendering of the equation in HTML) can be used as a fallback.

  • Images: Mathematics can be sent as an image, rendered by the sender. This was a common method for displaying mathematical notation in web pages prior to the development of more modern methods. This has the advantages of ensuring that the recipient sees the math exactly as intended, and not requiring the recipient to have any special support for mathematical notation. However, it has several disadvantages, such as poor accessibility, the mathematical notation may not be properly aligned with the text, and retrieving images would require extra HTTP requests.

  • Unicode: Some simple mathematics can be written purely with unicode characters and formatting, such as ∑n∊ℕx-2=2. This method has the advantage of not requiring any changes to the protocol. However, this only works for certain notation when using only the subset of HTML allowed by Matrix, and requires that users have a font installed that supports the necessary characters. Most importantly, one cannot write matrices using this method, and failing to support matrices in a protocol called "Matrix" would be a disaster.

Potential issues

Lack of libraries for displaying mathematics

In general, there are not many libraries for displaying mathematics:

  • On the web-based platforms, the most commonly-used methods are MathJax (which can support LaTeX, asciimath, and MathML inputs) and KaTeX (which can support LaTeX inputs).
  • Firefox and WebKit support MathML natively (though not perfectly, especially with Content MathML), but Chrome and IE/Edge do not.
  • There does not seem to be a good mobile library for displaying mathematical notation that does not involve a web view; the most common suggestion for displaying mathematics on Android is to use MathJax in a web view, and on iOS most suggestions are to use MathJax or MathML in a web view.
  • Two other libraries that could be used for MathML are pMML2SVG and lasem. However, both of these seem to be largely unmaintained.

Fallbacks

MathML does not, by itself, lend itself well to providing an easy fallback. The usual approach in HTML of ignoring unknown elements may cause the contents to be interpreted incorrectly. For example, a client that does not support the <msup> element would render <msup><mi>x</mi><mn>2</mn></msup> as "x2" rather than as "x2", which will be read as "x times 2" rather than "x squared". This is one major disadvantage that MathML has compared with LaTeX, as falling back to displaying the raw LaTeX when faced input that cannot be handled usually leads to a rendering that can still be understood correctly. (This is not always true, however. For example x^22 is "x22", rather than "x22" as might normally be expected.)

One solution would be to use the annotation framework to provide fallbacks. For example, clients could:

  • display the MathML if it understands all elements and attributes; otherwise
  • display the application/x-latex annotation as LaTeX if it exists and the client understands all the LaTeX commands; otherwise
  • display the text/html annotation as HTML if it exists and the client understands all elements and attributes; otherwise
  • display an image/* annotation if it exists and refers to an mxc: URL, and the client understands the format; otherwise
  • display the application/x-latex annotation as plain text if it exists; otherwise
  • display an error.

This method of providing fallbacks may increase the chance that the receiving client will be able to display something that looks nice to the user, but does so by bloating the message.

Security considerations

Displaying mathematical notation is hard; client authors will need to ensure that the mathematical display code does not introduce vulnerabilities when presented with malicious input.

Conclusion

Matrix should support sending messages with mathematical notation. We propose to do this by extending the existing message format using Presentation MathML.