Ideas for a Blog LaTeX Parser

March 16, 2021 4 minute read

After writing a few blog posts using $\LaTeX$, I have gotten used to writing math in Kramdown and MathJax.¹ However, since I plan to post more regular reviews/summaries of the books that I am reading related to mathematics, I wanted to have a more efficient way of writing LaTeX.

Perhaps the most vexing of all is the clashing between markdown syntax and LaTeX, the latter of which is notorious for its frequent use of the backslash. This does not cause any problems when a letter without any reserved uses in Markdown, such as any letter, is preceded by a backlash. For instance, \lambda prints $\lambda$. However, a new line, which is usually two consecutive backslashes, gets ugly in markdown. Each backslash must each by escaped by another backslash, resulting in \\\\ for a newline within LaTeX math environments. This is also the case for my preferred math mode delimiters:  \[ \]. I prefer these delimiters because I first learned LaTeX this way. During a conversation with my math teacher Mr. Odden, I learned that these delimiters have yet another advantage over their native TeX counterparts $ and $$: having distinct symbols for starting and ending the delimiters helps catch any syntax errors. Whereas if one were to accidentally miss a $ sign at some point in a paragraph, it will send all the wrong strings into math mode and make a mess. The issue is that the brackets () and [] all serve syntactic purposes in Markdown, which would usually be fine except it would register any backslashes preceding it as escapes. In this case, I would have to input \$ \$ and \\[ \\] instead, which is quite a hassle. There are other inconveniences of LaTeX in Markdown too, such as the underscore _ (which is used for subscripts in LaTeX, and for italic/bold text in Markdown) sometimes causing problem within MathJax math environments and needing to be escaped.²

Of course I have tried Pandoc already (to convert between LaTeX files and Markdown).³ However, even Pandoc cannot address the escaping problems and compatibility with MathJax. That is why I am thinking about creating a simple LaTeX to Kramdown with MathJax parser in Python. The concept is simple, I do not need it to do many things. It also does not need to be expandable/customizable; I only need it to convert my own LaTeX documents. Here is an initial list of features that I would like my parser to have.

Note: While writing this post and brainstorming the LaTeX parser project, I read up on MathJax’s documentation and was surprised to learn that it is in fact very customizable. I also made the decision to stop supporting the $ and $$ mathmode delimiters so I can finally type the dollar sign in peace.

Make a two-way parser between LaTeX and Kramdown & MathJax. Try to optimize formatting for LaTeX ⟶ Kramdown & MathJax. The Kramdown & MathJax ⟶ LaTeX conversion is mostly for practicality, most notably interchanging \ and \\.
Change \chapter, \section, \subsection, \subsubsection into relevant Markdown titles and the same way back. Make the heading tag numbers that correspond with the latex environments customizable.
Identify and change mathmode delimiters  \[ \] to \$ \$ \\[ \\]. MathJax does not really like newlines in the code, so we will have to remove all newlines for equation, gather, and align environments. We replace align* with aligned, gather* with gathered*, and simply remove equation* because * can cause trouble with Markdown and is redundant since all math environments are already enclosed in mathmode delimiters.
Potentially also deal with Tables and/or Lists (itemize and enumerate).

Here is my current MathJax configuration as of writing this post:

MathJax.Hub.Config({
    extensions: ["tex2jax.js"],
    jax: ["input/TeX", "output/HTML-CSS"],
    tex2jax: {
      inlineMath: [ ["\\(","\\)"] ],
      displayMath: [ ["\\[","\\]"] ],
      processEscapes: true
    },
    TeX: {
        Macros: {
          mbb: ["\\mathbb{#1}",1],
          mbf: ["\\mathbf{#1}",1],
          mcal: ["\\mathcal{#1}",1],
          mfk: ["\\mathfrak{#1}",1],
          eps: "\\varepsilon", // The better Epsilon
          N: "\\mathbb{N}", // Natural Numbers
          Z: "\\mathbb{Z}", // Integers
          Q: "\\mathbb{Q}", // Rational Numbers
          R: "\\mathbb{R}", // Real Numbers
          C: "\\mathbb{C}", // Complex Numbers
          F: "\\mathbb{F}", // Arbitrary Field
          set: ["\\{ #1 \\}",1], // Normal Brackets Set
          Set: ["\\left\\{ #1 \\right\\}",1], // Dynamically Scaled Brackets Set
          setbar: ["\\middle\\mid"], // Bar for Dynamically Scaled Brackets Set
          func: ["#1 \\colon #2 \\to #3",3], // Function/Mapping
          floor: ["\\left\\lfloor #1 \\right\\rfloor",1], // Floor/Greatest Integer Function
          ceil: ["\\left\\lceil #1 \\right\\rceil",1] // Ceiling Function
        },
        equationNumbers: { autoNumber: "AMS" },
        extensions: ["AMSmath.js", "AMSsymbols.js"] // there is also "AMScd.js"
    },
    "HTML-CSS": { availableFonts: ["TeX"] }
  });

And here is my latest MathJax configuration (currently in use).

I will also try to play with Pandoc some more to see if it could be incorporated into the solution. If it could, then the parser will be much easier to make.

See Kramdown, a flavor of Markdown used in Jekyll. Also see MathJax, a javascript based online LaTeX rendering utility tool. ↩
I am also afraid that the habit of typing an extra backslash to escape special characters will be ingrained into my muscle memory, which would be troublesome when I try to write in LaTeX again. ↩
See Pandoc, example 5. ↩

Share on

Twitter Facebook LinkedIn

Darren Zhu

Ideas for a Blog LaTeX Parser

Share on

You may also enjoy

Blindsolving 101 Part 1: Recognition, Memory, and Algorithms

Learning Traditional Chinese (Part 2)

Preliminary Ideas for a Rubik’s Cube Club

Learning Traditional Chinese (Part 1)