Rendering accessible LaTeX math on the web

Lion Ralfs — Posted on

I needed to embed math in my blog posts and since I'm already familiar with LaTeX[1] it seemed like a reasonable choice to find a way to somehow render LaTeX as HTML.

A very popular approach seems to be using client side JavaScript to detect LaTeX code on the page, transforming it and replacing it with HTML representing the output of whatever LaTeX renderer was used. Unless there is no way to interact with the server side code or manipulate its output, this seems awfully inefficient.

I'm aware that working on your own website as a greenfield project allows you to practically use any technology you want and spend as much time on it as you have available to you. I just wanted to write down how I solved it, in case others stumble upon this, are stuck in a similar predicament and can draw inspiration from it.

With that being said, I wasn't okay with sending a large chunk of JavaScript to website visitors just to render some mathematical symbols on every page load. Especially when I can do it on the server (once, since in my case, I render everything statically as HTML)

The library I landed on is MathJax, a "JavaScript display engine for mathematics that works in all browsers".[2] I'm using Node.js to build my pages, so the mathjax-node[3] wrapper made sense in my case.

Note: I am using version 2.1.1 of mathjax-node here, although version 3 is already available. There are a bunch of breaking/API changes and I haven't gotten around to fix it.

It has built-in server side rendering, meaning I just needed to find a way to feed LaTeX into the library and stitch its output back into the rest of my page. My pages are a bunch of handlebars templates and partials, so I set up a new handlebars helper as such:

handlebars.registerHelper('latex', async function (value) {
  // the raw latex string
  const latexStr = value.fn();

  // ... use MathJax here to render
  let mathjaxHTMLOutput = '...';

  return new handlebars.SafeString(mathjaxHTMLOutput);
});

It allows me to have custom handling of certain segments in my raw input. Let's go through how I use MathJax to turn my LaTeX into HTML:

const mathJax = require('mathjax-node');

let { html } = await mathJax.typeset({
  math: latexStr,
  format: 'inline-TeX',
  html: true,
  speakText: true,
});

To explain the options:

I would end up writing LaTeX math like this:

{{#latex}}
\begin{bmatrix}
1 & 2 & 3\\
a & b & c
\end{bmatrix}
{{/latex}}

The latex viewhelper transforms it into HTML which looks like this:

<span class="mjx-chtml">
  <span
    class="mjx-math"
    aria-label="\begin{bmatrix}
1 & 2 & 3\\
a & b & c
\end{bmatrix}
"
  >
    <span class="mjx-mrow" aria-hidden="true">
      <span class="mjx-mrow">
        <span class="mjx-mo"
          ><span
            class="mjx-char MJXc-TeX-size3-R"
            style="padding-top: 1.256em; padding-bottom: 1.256em;"
            >[</span
          ></span
        >
        <span
          class="mjx-mtable"
          style="vertical-align: -0.95em; padding: 0px 0.167em;"
        >
          <span class="mjx-table">
            <span class="mjx-mtr" style="height: 1.2em;">
              <span
                class="mjx-mtd"
                style="padding: 0px 0.5em 0px 0px; width: 0.529em;"
              >
                <span class="mjx-mrow" style="margin-top: -0.2em;">
                  <span class="mjx-mn"
                    ><span
                      class="mjx-char MJXc-TeX-main-R"
                      style="padding-top: 0.372em; padding-bottom: 0.372em;"
                      >1</span
                    ></span
                  ><span class="mjx-strut"></span>
                </span>
              </span>
              <span
                class="mjx-mtd"
                style="padding: 0px 0.5em 0px 0.5em; width: 0.5em;"
              >
                <span class="mjx-mrow" style="margin-top: -0.2em;">
                  <span class="mjx-mn"
                    ><span
                      class="mjx-char MJXc-TeX-main-R"
                      style="padding-top: 0.372em; padding-bottom: 0.372em;"
                      >2</span
                    ></span
                  ><span class="mjx-strut"></span>
                </span>
              </span>
              <span
                class="mjx-mtd"
                style="padding: 0px 0px 0px 0.5em; width: 0.5em;"
              >
                <span class="mjx-mrow" style="margin-top: -0.2em;">
                  <span class="mjx-mn"
                    ><span
                      class="mjx-char MJXc-TeX-main-R"
                      style="padding-top: 0.372em; padding-bottom: 0.372em;"
                      >3</span
                    ></span
                  ><span class="mjx-strut"></span>
                </span>
              </span>
            </span>
            <span class="mjx-mtr" style="height: 1.2em;">
              <span class="mjx-mtd" style="padding: 0.2em 0.5em 0px 0px;">
                <span class="mjx-mrow" style="margin-top: -0.2em;">
                  <span class="mjx-mi"
                    ><span
                      class="mjx-char MJXc-TeX-math-I"
                      style="padding-top: 0.225em; padding-bottom: 0.298em;"
                      >a</span
                    ></span
                  ><span class="mjx-strut"></span>
                </span>
              </span>
              <span class="mjx-mtd" style="padding: 0.2em 0.5em 0px 0.5em;">
                <span class="mjx-mrow" style="margin-top: -0.2em;">
                  <span class="mjx-mi"
                    ><span
                      class="mjx-char MJXc-TeX-math-I"
                      style="padding-top: 0.446em; padding-bottom: 0.298em;"
                      >b</span
                    ></span
                  ><span class="mjx-strut"></span>
                </span>
              </span>
              <span class="mjx-mtd" style="padding: 0.2em 0px 0px 0.5em;">
                <span class="mjx-mrow" style="margin-top: -0.2em;">
                  <span class="mjx-mi"
                    ><span
                      class="mjx-char MJXc-TeX-math-I"
                      style="padding-top: 0.225em; padding-bottom: 0.298em;"
                      >c</span
                    ></span
                  ><span class="mjx-strut"></span>
                </span>
              </span>
            </span>
          </span>
        </span>
        <span class="mjx-mo"
          ><span
            class="mjx-char MJXc-TeX-size3-R"
            style="padding-top: 1.256em; padding-bottom: 1.256em;"
            >]</span
          ></span
        >
      </span>
    </span>
  </span>
</span>

Admittedly, that's a lot of HTML, but that's just what MathJax outputs, which I'm fine with for now. Notice the aria-label attribute, it contains the raw LaTeX code for the matrix. Let's try to fix that.

Accessibility

At the moment, the way our HTML output would be treated by screen readers is to read the LaTeX code from the aria-label attribute and ignore everything that is in the inner HTML element. That is obviously not great so I went looking for a solution.

First of all, a span doesn't have semantic meaning in the context of accessibility, so the aria-label might be ignored on assistive technologies.[4][5] An attribute that can help in such situations is the role attribute. Take the MDN description for the role="img" for instance, which appears to be just what we need:

"Any set of content that should be consumed as a single image (which could include images, video, audio, code snippets, emojis, or other content) can be identified using role="img"."[6]

Modifying the HTML output of MathJax is possible yet feels somewhat "hacky":

let { htmlNode } = await mathJax.typeset({
  math: latexStr,
  format: 'inline-TeX',
  speakText: true,
  // don't need the html string
  html: false,
  // need the html node though, since I want to modify it
  htmlNode: true,
});

// add the role="img" attribute
htmlNode.firstChild.setAttribute('role', 'img');

// back to string
let html = htmlNode.outerHTML;

The trick was to tell MathJax to output a htmlNode instead of an HTML string, so we can use DOM operations to modify the output. The same thing would've been possible using something like an HTML parser for the string output and modifying it, but it would require way more effort. Luckily, they seem to be using jsdom so we don't have to worry about any of that.

The last step is to turn the aria-label nonsense into something that a screen reader can actually pronounce. Because this makes absolutely no sense:

<span class="mjx-chtml">
  <span
    class="mjx-math"
    role="img"
    aria-label="\begin{bmatrix}
1 & 2 & 3\\
a & b & c
\end{bmatrix}
"
  >
    ...
  </span>
</span>

Fortunately there is a drop-in replacement for mathjax-node called mathjax-node-sre[7] which uses a speech rule engine[8] to generate speech strings. I'm using it along with the mathspeak ruleset, like this:

// import the drop-in replacement
const mathJax = require('mathjax-node-sre');

let { htmlNode } = await mathJax.typeset({
  math: latexStr,
  format: 'inline-TeX',
  // don't need the html string
  html: false,
  // need the html node though, since I want to modify it
  htmlNode: true,
  // this adds an aria-label describing the math
  speakText: true,
  speakRuleset: 'mathspeak',
});

Take a look at what the aria-label for our matrix looks like now, which is certainly better than raw LaTeX code or using an image or SVG instead:

<span class="mjx-chtml">
  <span
    class="mjx-math"
    role="img"
    aria-label="Start 2 By 3 Matrix 
    1st Row 1st Column 1 2nd Column 2 3rd Column 3 
    2nd Row 1st Column a 2nd Column b 3rd Column c 
    EndMatrix"
  >
    ...
  </span>
</span>

The rendered version looks like this:

[123abc]\begin{bmatrix} 1 & 2 & 3\ a & b & c \end{bmatrix}

I must admit it took some digging through libraries and their documentation to get to this point but I'm happy where I've arrived. Here is the full handlebars helper I'm using:

handlebars.registerHelper('latex', async function (value) {
  const latexStr = value.fn();

  let { htmlNode } = await mathJax.typeset({
    math: latexStr,
    format: 'inline-TeX',
    // don't need the html string
    html: false,
    // need the html node though, since I want to modify it
    htmlNode: true,
    // this adds an aria-label describing the math
    speakText: true,
    speakRuleset: 'mathspeak',
  });

  // for a11y purposes, also add the role="img" attribute
  htmlNode.firstChild.setAttribute('role', 'img');
  return new handlebars.SafeString(htmlNode.outerHTML);
});

Style and Fonts

One thing still left to do is to embed the CSS and fonts required by TeX. You can either use a CDN or self-host. I decided to self-host the fonts and inline the CSS on all pages that have LaTeX on them, for performance reasons.

Update Mar. 18, 2022: Since publishing this blog post, I've updated my blog to use the KaTeX package instead. Upgrading mathjax to version 3 was too much work for me and it turns out KaTeX has everything I need (HTML output and accessibility via MathML) built-in already.

References