Rendering accessible LaTeX math on the web
Lion Ralfs — Posted onI needed to embed math in my blog posts and since I'm already familiar with LaTeX[1] it seemed like a reasonable choice to find a way to somehow render LaTeX as HTML.
A very popular approach seems to be using client side JavaScript to detect LaTeX code on the page, transforming it and replacing it with HTML representing the output of whatever LaTeX renderer was used. Unless there is no way to interact with the server side code or manipulate its output, this seems awfully inefficient.
I'm aware that working on your own website as a greenfield project allows you to practically use any technology you want and spend as much time on it as you have available to you. I just wanted to write down how I solved it, in case others stumble upon this, are stuck in a similar predicament and can draw inspiration from it.
With that being said, I wasn't okay with sending a large chunk of JavaScript to website visitors just to render some mathematical symbols on every page load. Especially when I can do it on the server (once, since in my case, I render everything statically as HTML)
The library I landed on is MathJax, a "JavaScript display engine for mathematics that works in all browsers".[2] I'm using Node.js to build my pages, so the mathjax-node
[3] wrapper made sense in my case.
Note: I am using version 2.1.1 of mathjax-node
here, although version 3 is already available. There are a bunch of breaking/API changes and I haven't gotten around to fix it.
It has built-in server side rendering, meaning I just needed to find a way to feed LaTeX into the library and stitch its output back into the rest of my page. My pages are a bunch of handlebars templates and partials, so I set up a new handlebars helper as such:
handlebars.registerHelper('latex', async function (value) {
// the raw latex string
const latexStr = value.fn();
// ... use MathJax here to render
let mathjaxHTMLOutput = '...';
return new handlebars.SafeString(mathjaxHTMLOutput);
});
It allows me to have custom handling of certain segments in my raw input. Let's go through how I use MathJax to turn my LaTeX into HTML:
const mathJax = require('mathjax-node');
let { html } = await mathJax.typeset({
math: latexStr,
format: 'inline-TeX',
html: true,
speakText: true,
});
To explain the options:
math
: Here, I just pass my raw LaTeX string.format
: What to treat my LaTeX string as, here I could also use the non-inline versionTeX
orMathML
. Since I want to write LaTeX, MathML is not an option for me.html
: Instruct the library to output HTML.speakText
: Add an aria-label to the wrapper element, while hiding the rest of the equation-HTML witharia-hidden="true"
. We'll get back to this.
I would end up writing LaTeX math like this:
{{#latex}}
\begin{bmatrix}
1 & 2 & 3\\
a & b & c
\end{bmatrix}
{{/latex}}
The latex
viewhelper transforms it into HTML which looks like this:
<span class="mjx-chtml">
<span
class="mjx-math"
aria-label="\begin{bmatrix}
1 & 2 & 3\\
a & b & c
\end{bmatrix}
"
>
<span class="mjx-mrow" aria-hidden="true">
<span class="mjx-mrow">
<span class="mjx-mo"
><span
class="mjx-char MJXc-TeX-size3-R"
style="padding-top: 1.256em; padding-bottom: 1.256em;"
>[</span
></span
>
<span
class="mjx-mtable"
style="vertical-align: -0.95em; padding: 0px 0.167em;"
>
<span class="mjx-table">
<span class="mjx-mtr" style="height: 1.2em;">
<span
class="mjx-mtd"
style="padding: 0px 0.5em 0px 0px; width: 0.529em;"
>
<span class="mjx-mrow" style="margin-top: -0.2em;">
<span class="mjx-mn"
><span
class="mjx-char MJXc-TeX-main-R"
style="padding-top: 0.372em; padding-bottom: 0.372em;"
>1</span
></span
><span class="mjx-strut"></span>
</span>
</span>
<span
class="mjx-mtd"
style="padding: 0px 0.5em 0px 0.5em; width: 0.5em;"
>
<span class="mjx-mrow" style="margin-top: -0.2em;">
<span class="mjx-mn"
><span
class="mjx-char MJXc-TeX-main-R"
style="padding-top: 0.372em; padding-bottom: 0.372em;"
>2</span
></span
><span class="mjx-strut"></span>
</span>
</span>
<span
class="mjx-mtd"
style="padding: 0px 0px 0px 0.5em; width: 0.5em;"
>
<span class="mjx-mrow" style="margin-top: -0.2em;">
<span class="mjx-mn"
><span
class="mjx-char MJXc-TeX-main-R"
style="padding-top: 0.372em; padding-bottom: 0.372em;"
>3</span
></span
><span class="mjx-strut"></span>
</span>
</span>
</span>
<span class="mjx-mtr" style="height: 1.2em;">
<span class="mjx-mtd" style="padding: 0.2em 0.5em 0px 0px;">
<span class="mjx-mrow" style="margin-top: -0.2em;">
<span class="mjx-mi"
><span
class="mjx-char MJXc-TeX-math-I"
style="padding-top: 0.225em; padding-bottom: 0.298em;"
>a</span
></span
><span class="mjx-strut"></span>
</span>
</span>
<span class="mjx-mtd" style="padding: 0.2em 0.5em 0px 0.5em;">
<span class="mjx-mrow" style="margin-top: -0.2em;">
<span class="mjx-mi"
><span
class="mjx-char MJXc-TeX-math-I"
style="padding-top: 0.446em; padding-bottom: 0.298em;"
>b</span
></span
><span class="mjx-strut"></span>
</span>
</span>
<span class="mjx-mtd" style="padding: 0.2em 0px 0px 0.5em;">
<span class="mjx-mrow" style="margin-top: -0.2em;">
<span class="mjx-mi"
><span
class="mjx-char MJXc-TeX-math-I"
style="padding-top: 0.225em; padding-bottom: 0.298em;"
>c</span
></span
><span class="mjx-strut"></span>
</span>
</span>
</span>
</span>
</span>
<span class="mjx-mo"
><span
class="mjx-char MJXc-TeX-size3-R"
style="padding-top: 1.256em; padding-bottom: 1.256em;"
>]</span
></span
>
</span>
</span>
</span>
</span>
Admittedly, that's a lot of HTML, but that's just what MathJax outputs, which I'm fine with for now. Notice the aria-label
attribute, it contains the raw LaTeX code for the matrix. Let's try to fix that.
Accessibility
At the moment, the way our HTML output would be treated by screen readers is to read the LaTeX code from the aria-label
attribute and ignore everything that is in the inner HTML element. That is obviously not great so I went looking for a solution.
First of all, a span
doesn't have semantic meaning in the context of accessibility, so the aria-label
might be ignored on assistive technologies.[4][5] An attribute that can help in such situations is the role attribute. Take the MDN description for the role="img"
for instance, which appears to be just what we need:
"Any set of content that should be consumed as a single image (which could include images, video, audio, code snippets, emojis, or other content) can be identified using
role="img"
."[6]
Modifying the HTML output of MathJax is possible yet feels somewhat "hacky":
let { htmlNode } = await mathJax.typeset({
math: latexStr,
format: 'inline-TeX',
speakText: true,
// don't need the html string
html: false,
// need the html node though, since I want to modify it
htmlNode: true,
});
// add the role="img" attribute
htmlNode.firstChild.setAttribute('role', 'img');
// back to string
let html = htmlNode.outerHTML;
The trick was to tell MathJax to output a htmlNode
instead of an HTML string, so we can use DOM operations to modify the output. The same thing would've been possible using something like an HTML parser for the string output and modifying it, but it would require way more effort. Luckily, they seem to be using jsdom so we don't have to worry about any of that.
The last step is to turn the aria-label
nonsense into something that a screen reader can actually pronounce. Because this makes absolutely no sense:
<span class="mjx-chtml">
<span
class="mjx-math"
role="img"
aria-label="\begin{bmatrix}
1 & 2 & 3\\
a & b & c
\end{bmatrix}
"
>
...
</span>
</span>
Fortunately there is a drop-in replacement for mathjax-node
called mathjax-node-sre
[7] which uses a speech rule engine[8] to generate speech strings. I'm using it along with the mathspeak
ruleset, like this:
// import the drop-in replacement
const mathJax = require('mathjax-node-sre');
let { htmlNode } = await mathJax.typeset({
math: latexStr,
format: 'inline-TeX',
// don't need the html string
html: false,
// need the html node though, since I want to modify it
htmlNode: true,
// this adds an aria-label describing the math
speakText: true,
speakRuleset: 'mathspeak',
});
Take a look at what the aria-label
for our matrix looks like now, which is certainly better than raw LaTeX code or using an image or SVG instead:
<span class="mjx-chtml">
<span
class="mjx-math"
role="img"
aria-label="Start 2 By 3 Matrix
1st Row 1st Column 1 2nd Column 2 3rd Column 3
2nd Row 1st Column a 2nd Column b 3rd Column c
EndMatrix"
>
...
</span>
</span>
The rendered version looks like this:
I must admit it took some digging through libraries and their documentation to get to this point but I'm happy where I've arrived. Here is the full handlebars helper I'm using:
handlebars.registerHelper('latex', async function (value) {
const latexStr = value.fn();
let { htmlNode } = await mathJax.typeset({
math: latexStr,
format: 'inline-TeX',
// don't need the html string
html: false,
// need the html node though, since I want to modify it
htmlNode: true,
// this adds an aria-label describing the math
speakText: true,
speakRuleset: 'mathspeak',
});
// for a11y purposes, also add the role="img" attribute
htmlNode.firstChild.setAttribute('role', 'img');
return new handlebars.SafeString(htmlNode.outerHTML);
});
Style and Fonts
One thing still left to do is to embed the CSS and fonts required by TeX. You can either use a CDN or self-host. I decided to self-host the fonts and inline the CSS on all pages that have LaTeX on them, for performance reasons.
Update Mar. 18, 2022: Since publishing this blog post, I've updated my blog to use the KaTeX package instead. Upgrading mathjax to version 3 was too much work for me and it turns out KaTeX has everything I need (HTML output and accessibility via MathML) built-in already.
References
- [1] The LaTeX Project
- [2] MathJax
- [3] GitHub: MathJax-node
- [4] MDN Web Docs - Using the aria-label attribute - Elements supporting aria-label
- [5] Léonie Watson, Short note on aria-label, aria-labelledby, and aria-describedby, July 12, 2017
- [6] MDN Web Docs - ARIA: img role - Description
- [7] GitHub: mathjax-node-sre
- [8] GitHub: speech-rule-engine