Text files and me V - Markdown

If it seems like this series has been going on forever, it’s because it has.

My Christmas present to myself is finishing this series off with a discussion of Markdown.

Canonical (Gruber’s) Markdown

Because this is a historical journey, the right way to start this post would be to talk about my introduction to Markdown. If you’d asked me six months ago about how I learned about Markdown, I would’ve said I read about it in 2004 or 2005 in something Merlin Mann wrote. That may be true, but when I started searching through blog posts and mailing list archives to write this introduction, I became less sure of it. A few things I am sure of:

  1. Around that time, Cory Doctorow wrote several posts on Boing Boing about how programmers and other computer types used plain text files to organize and simplify their lives. GTD was part of it, but not the whole story; there was also discussion of how their text files were formatted.
  2. The Cory posts introduced me to both Merlin and Danny O’Brien, who were at the time doing a lot of writing and talking about productivity hacks. I’m pretty sure Markdown was in the mix, but I haven’t tracked down any specific post or article I can point to.
  3. A lot of lightweight markup formats were getting attention back then. As a Perl programmer, I knew of POD, which had been designed to allow scripts to carry along their own documentation but had been extended to allow more complicated writing (including all the text of some of O’Reilly’s Perl books). Other formats floating around were reStructured Text, Setext, wiki markup, and Markdown. There was also David Mertz’s Text Processing in Python book that nearly led me to create my own format.
  4. I wanted to start a blog but didn’t want to write entries in HTML and didn’t want to use some goofy in-browser editor. Blosxom kept its posts in plain text files stored in a set directory structure.1 Markdown was one of the formats supported by Blosxom,2 and this blog was initially delivered via Blosxom and Markdown. I also used Blosxom/Markdown locally to keep class notes for continuing education classes. It was an almost perfect system for classes that didn’t require sketches in the notes.

However the introduction came about—and it may have come through more than one of those threads—I started using Markdown in 2004 when I started blogging.3 I chose Markdown over the other lightweight formats because

When I started using it, Markdown still had one serious flaw, a flaw I didn’t notice until I’d been using it for a while. The problem was in code blocks. If the code you wanted to show in your document included one of a dozen or so backslash sequences— \\, \(, \), and \*, to name a few—you had to escape the leading backslashes with backslashes. So, for example, if you had a script with a regular expression that looked for asterisks

if (/\*/) { print "Found one! "; }

you’d have to put it in Markdown as

if (/\\*/) { print "Found one! "; }

This wasn’t very convenient, and was distinctly un-Markdown-like. I complained about it on the Markdown mailing list and John Gruber agreed that it was a problem. He delayed the release of Version 1.0.1 so he could include the fix in that release. As this was the last release of Gruber’s Markdown, I take credit for instigating the last significant change in the canonical version of Markdown.

(People who’ve been on the Markdown mailing list for only a few years or less may find this story unbelievable. First, there’s the matter of Gruber participating in the list. This in itself sounds doubtful. To combine that with the notion that he’d actually make a change at the request of a user pushes my account into the realm of pure fantasy. But it did happen. You could look it up.)

MultiMarkdown

So there I was, writing blog posts and class notes in Markdown that was then getting converted to HTML. The writing experience was so pleasant that I began to resent writing my reports for work in LaTeX. And I started to scheme.

Anyone who’s written in both LaTeX and HTML can’t help but notice the similarities between the two. In a broad sense, they both focus on structure, with the layout being handled elsewhere—style files for LaTeX and CSS files for HTML. And in a narrow sense, there’s very little difference between

\section{Background information}

and

<h1>Background information</h1>

You can see where this is going. I began reading Markdown.pl, looking for a simple way to change its output from HTML to LaTeX. There didn’t seem to be one, in part because of that secret weapon. Because Markdown allows embedded HTML, the ability to convert Markdown to LaTeX must include the ability to convert HTML to LaTeX. This was a big challenge, but I had done something similar with a script that turned reports written in SGML into troff documents.

Which brings us to Fletcher Penney and MultiMarkdown. Unlike my introduction to Markdown, my introduction to MultiMarkdown is razor-sharp in my mind. I was in a hotel near Pittsburgh on a business trip, and I had spent the day in a lab examining broken pieces of steel. I opened my laptop, connected to the hotel internet, and began reading my mail, which included the Markdown mailing list. There in the list was Fletcher’s announcement of MultiMarkdown, and it was like a bright flash of light. I’d been doing much thinking about the problem of converting Markdown to LaTeX, but I’d been approaching it wrong. Fletcher had the right approach and he’d solved the problem.

Unlike Markdown, MultiMarkdown wasn’t a single script. It was a system. First, there were his extensions to the Markdown format, a syntax for tables and the addition of header lines that provided meta information—title, author, date, etc.—on the file itself being the most prominent. Handling these was built in to his Markdown.pl file, a revision of Gruber’s, but with most of Gruber’s code intact. Like the original Markdown.pl this file converted Markdown to HTML. Many people have adopted MultiMarkdown as their Markdown processor just for these extensions.

But it was the “multi” part of MultiMarkdown, the part that allowed conversion to LaTeX, that most impressed me. This ability came from the second part of the MMD system, a set of XSL transformation files. Fletcher’s insight was to recognize that converting directly from Markdown to LaTeX was unnecessary and that the tools for converting from HTML to LaTeX had already been built. These tools were Saxon, Sablotron, xsltproc, and the other XSLT processors. Markdown already generated XHTML by default. Because XHTML is XML, it can be turned into any other format by one of these XSLT processors and a set of transformation files. So he wrote the transformation files.

I don’t want to give the impression that writing XSL transformation files is easy, because it isn’t. The syntax is verbose and clumsy, and items get nested so deeply it seems impossible to figure out the long chain of steps used to get to them. Fletcher’s XSL files were very complex, took a lot of effort, and worked amazingly well. I could not, unfortunately, use them as-is.

The problem was they generated LaTeX files meant to be processed through the memoir class. I used the standard article class with a set of customized style files that not only set my reports’ margins and font choices the way I liked, but also put my company’s logo on the title page and handled other boilerplate items. I’d spent a lot of time getting those style files just right and wasn’t interested in changing.

So I opened up Fletcher’s XSL files and began editing. The reason I can speak with authority on the complexity of these files is time I spent inside them. Fortunately, most of the necessary changes were in the generation of the LaTeX preamble, the list of \documentclass and \usepackage commands that start every LaTeX document.

Once I got the kind of LaTeX output I wanted, I edited Fletcher’s Markdown.pl file to add a feature that even MultiMarkdown was missing: equations. I’d discovered Davide Cervone’s jsMath package for adding equations to HTML, and it seemed like the perfect solution because it used LaTeX notation.4 My changes to Markdown.pl consisted of adding functions that would transform lines like

\[ \Phi(x) = \int_{-\infty}^x e^{-\xi^2/2} d\xi \]

into

<div class="math">\Phi(x) = \int_{-\infty}^x e^{-\xi^2/2} d\xi]</div>

which would generate

Φ(x)=xeξ2/2dξ\Phi(x) = \int_{-\infty}^x e^{-\xi^2/2} d\xi

I chose the \[...\] notation for display equations because it was the same as LaTeX’s. Similarly, \(...\) was used to enclose inline equations.

Since HTML wasn’t the end format for my reports, I had to make additions to the XSL files to turn the XHTML back into LaTeX. This was actually pretty easy because everything between the delimiters remained intact.

Now, you may be thinking Doesn’t MultiMarkdown have equation support already? It does, but it didn’t back when I first started using it. Also, when Fletcher added equations, he chose ASCIIMathML as the equation format, a decision I disagreed with for various reasons. So my MultiMarkdown system is quite a bit different from Fletcher’s or anyone else’s, but it’s been working nicely for five years or so. And because it’s written using standard utilities that are very unlikely to change, I could probably keep using it until I retire.

With MultiMarkdown 3, Fletcher’s brought all the conversions together into a more tightly integrated package. He tells me that MMD 3 has hooks for doing the sort of customizations I did by editing the XSL files. I keep telling myself to give it a try, but it’s just so easy to keep doing what I’ve been doing.

One way I may start experimenting with MMD 3 is through Fletcher’s Mac app, MultiMarkdown Composer. I bought it as soon as it hit the Mac App Store; even if I never use it, it can’t begin to pay Fletcher back for the time MultiMarkdown has saved me.

A sample report

To give you a sense of my report-writing workflow, here’s what one looks like:

Title: Sample Report
Format: complete
Author: Dr. Drang
Date: December 25, 2011
Client: Jane P. Client
        That Company
        1234 Wacker Drive
        Chicago IL 60606

# Introduction #

Kind or another is to me so great that nothing but the earth
rotates in the long run. If the light be sent through its
sides one half as long as he thinks prudent, the sum say it
may be prolonged in any direction or in other words, such a
scheme is quite impossible, for the same reason. that its
density is inappropriately applied to the nature of what is
important to note is, that the most obvious and mechanical
actions, also into ether forms. For instance, the mechanical
energy. The meteorite that falls.


# Description #

And again by the ether. In the ether. An exchange between
different conditions of the same rate as when they are
separated. The ether in this particular. There is no
resistance; that a mass of matter is, then, proportional to
the amount of transfer motion instead of the whole series
without assuming imponderables, or fluids or forces.
Mechanical motion only, by pressure, from any known in
physical science. would have been discovered by weighing
([this photo](#p20111225-001)).

Of them. It implies a body may be ever so gently. If they do
not thus with other velocities observed in masses of matter.
is its significance has not been shortened. Some years ago.
The inertia of the bodies possess inertia. That it should
swing ten times ([this photo](#p20111225-003)). From the
visible ones at the most remote part as at the rate of
186,000 miles long, if 186,000 times faster than such comet,
and 900,000 times a second, and the pitch of the vibrating
body will be heard in the ether. Molecular cohesion exists
between very wide ranges. When strong, so if one is
concerned only.

Make all the observed movements of double stars testify to
its activity among the thousands of them, but the records
of. Newton'quoted at the head of this intervening space must
therefore be impact and freedom alternating with each other
at near distances, and cohere if contiguous, and electric
bodies operate at greater distances, as well ([this
photo](#p20111225-010)). In which case the assumption was
that the ether is selective and it is possible to produce
rays of circularly polarized light.


# Conclusions #

Into heat and radiated away. One may say. Electrical action
is not heat-transmission. There is only one fifth that of a
mass of matter that can be asserted concerning such events
is, that the magnetic field is to be reckoned as millions of
miles from the earth, and moon, or sun, the quantity of
matter involved is not changed, so much as the whole solar
system is the mechanism each one of the same element as to
retard the rotation and the atom or say the
hundred-thousandth part.


![p20111225-001][]
![p20111225-003][]
![p20111225-010][]


[p20111225-001]: 20111225-001.jpg "A caption of a photo." plate=yes portrait=no
[p20111225-003]: 20111225-001.jpg "Another caption of a photo." plate=yes portrait=yes
[p20111225-010]: 20111225-001.jpg "Yet another caption of a photo." plate=yes portrait=no

Most of the header stuff is standard MultiMarkdown. The client section is my addition—the name and address of the client appear on the title page of the report.

The rest of the report is pretty much straightforward Markdown. The photographs included in the report (which will be processed as plates in LaTeX) are all bunched down at the bottom of the file. They’re referred to by their file names preceded by a “p.” The “plate” and “portrait” attributes given to the photos are obviously not standard XHTML, but my reports aren’t meant to be rendered as XHTML—they’re just an XML way station on the route to LaTeX. As long as my XSL files know how to handle them, I can add whatever attributes I like. The “plate” and “portrait” attributes act as flags that control the LaTeX that places the photos on the page.

References to plates in the body of the text use the standard hash prefix to link to other places within the same HTML document. As you might expect, I have a few TextMate commands for quickly inserting the all the necessary information for a plate.

The Markdown file is converted to a LaTeX report by this shell script, md2report:

 1:  #!/bin/bash
 2:  
 3:  base=${1%.md}
 4:  mdname=$base.md
 5:  texname=$base.tex
 6:  # echo base:    $base
 7:  # echo mdname:  $mdname
 8:  # echo texname: $texname
 9:  if [ -f $mdname ]; then
10:    mmmd $mdname | SmartyPants \
11:    | xsltproc --novalid ~/xslt/xhtml2article-report.xsl - \
12:    | addsignature | separateplates > $texname
13:  else
14:  echo No such file: $mdname
15:  fi

The real work of the script is done in Lines 12-14, which is a pipeline that processes the file through

  1. My fork of Markdown. We now have an XHTML file.
  2. SmartyPants. This turns straight apostrophes and quotation marks into proper typographical marks.
  3. The XSLT processor using my XSL file. We now have a LaTeX file.
  4. A simple script that adds my signature to the end of the report.
  5. Another simple script that adds a LaTeX \clearpage command after every 10th plate. I’ve found that this is necessary to keep pdflatex (which we’ll get to in a minute) from getting bound up.

After md2report, I run the resulting LaTeX file through pdflatex a couple of times and—voila!—I have a PDF. With this line

md2report report; pdflatex report; pdflatex report

in the Terminal, I can simply tap ↑ and ↩ every once in a while to get a PDF of the report with the latest edits.

Update 12/26/11
The md2report script and its description have been edited to reflect reality. I initially posted an “experimental” version of the script that I had tried out years ago but which never worked right and which I never deleted from my laptop.

PHPMarkdown Extra

When I moved the blog from Movable Type to WordPress several years ago, I needed a Markdown processor written in PHP. The only real choices were PHPMarkdown and PHPMarkdown Extra by Michel Fortin. Michel’s done such a good job with them that no one else even tries.

I went with PHPMarkdown Extra because it had table and footnote support, but of course I couldn’t leave well enough alone and had to make my own additions. As with MultiMarkdown, I added the ability to format equations, first with jsMath and then with MathJax. I call the result PHPMarkdown Extra Math (PHPMEM) and have made it available in a GitHub repository.5

PHPMEM is installed as a WordPress plugin on my server, taking the place of Michel’s script. It’s also the processor behind my blog previewing system, a set of scripts and CSS files that render my posts locally in the style of the blog before I publish them on the server.

(And if you’re wondering how I publish the posts, the answer is TextMate’s venerable Blogging Bundle, introduced way back in 2006 and still working like a charm.)

Marked

Brett Terpstra’s Marked is the latest addition to my Markdown menagerie. It watches the Markdown source file as you edit it, converts it to HTML using the processor of your choice (MultiMarkdown by default), and renders the result in its own web view window. It comes with a few premade CSS style files and can also use CSS that you provide. It’s a very flexible system, and I can think of a few scripts I’ve written over the years that I wouldn’t have written if Marked had been available, my blog preview system being the most prominent example.

Today I use Marked to render short documents that aren’t going to be posted to the web and don’t need the elaborate formatting of my engineering reports. My use of Marked is still in its infancy—I’m sure there’ll be more posts about how I use it in the months to come.

The wrapup

I started this series almost a year ago, and as it’s traced my use of text files over the past 15 years we’ve gone through many formats, many scripts, and many processors. If you were to read the whole series in one sitting, it might seem overwhelming and unnecessarily complicated. Certainly some of the things I’ve done over the years were done just because I like to tinker.

But surprising as it may seem, the overarching theme of these 15 years has been to simplify my writing process—both for my reports for clients and for the blog. I’ve gone from word processors to SGML to LaTeX to Markdown, and with each step I’ve been able to spend more time writing and less time formatting. Yes, there’s a lot of machinery working behind the scenes, but the construction of that machinery was done years ago and its maintenance takes up almost no time today. I spend all of my time on the content.


  1. Whenever I hear how clever Jekyll is, I think of Blosxom. 

  2. It might be more accurate to say that Blosxom was a blogging engine supported by Markdown. If you look in Markdown’s source code, you’ll see special cases carved out for both Blosxom and Movable Type. 

  3. The earliest posts you’ll find here are dated August 18, 2005. This is due to a screwup that happened during one blog conversion or another, probably the conversion from Blosxom to Movable Type. 

  4. More recently, I’ve changed to Cervone’s latest equation-handling library, MathJax. The details of the two systems differ, but the fundamentals are the same. 

  5. My fork of MultiMarkdown isn’t available on GitHub because it has company-specific parts that I’ve never stripped out. And now it’s so old compared to MMD 3, I doubt that it’d be useful to anyone but me.