Syntax highlighting

After some persistent comments in the embedded gist article, I’ve decided to try my hand at adding syntax highlighting to the code I post here. I think I have it working, but don’t be surprised if some things look a little weird for a while.

Update 12/17/10
Basically, everything in this article has been superseded by this one. I’m now using a different syntax highlighting system that doesn’t require any changes to PHPMEM.

There are several syntax highlighting libraries to choose from, both server-side and client-side. I decided to go with the Python-based Pygments because it seemed to be mature, used by many, and relatively easy to customize for my needs. Because it’s on the server side, I needed to modify PHP Markdown Extra Math (PHPMEM) to call Pygments’ pygmentize command-line tool and incorporate its return value. And, of course, I needed to make changes to the site’s CSS to fit the output to the blog’s overall style.

The JavaScript function I used in the gist article now renders like this:

javascript:
1:  // Change straight quotes to curly and double hyphens to em-dashes.
2:  function smarten(a) {
3:    a = a.replace(/(^|[-\u2014\/(\[{"\s])'/g, "$1\u2018");      // opening singles
4:    a = a.replace(/'/g, "\u2019");                              // closing singles & apostrophes
5:    a = a.replace(/(^|[-\u2014\/(\[{\u2018\s])"/g, "$1\u201c"); // opening doubles
6:    a = a.replace(/"/g, "\u201d");                              // closing doubles
7:    a = a.replace(/--/g, "\u2014");                             // em-dashes
8:    return a
9:  };

You can select the code for copying by dragging your mouse along it; the line numbers won’t be selected unless your mouse wanders out into that column.

As I’m writing, the section above looks like this in my text editor:

The JavaScript function I used in the gist article now renders like this:

    ::: javascript linenos
    // Change straight quotes to curly and double hyphens to em-dashes.
    function smarten(a) {
      a = a.replace(/(^|[-\u2014\/(\[{"\s])'/g, "$1\u2018");      // opening singles
      a = a.replace(/'/g, "\u2019");                              // closing singles & apostrophes
      a = a.replace(/(^|[-\u2014\/(\[{\u2018\s])"/g, "$1\u201c"); // opening doubles
      a = a.replace(/"/g, "\u201d");                              // closing doubles
      a = a.replace(/--/g, "\u2014");                             // em-dashes
      return a
    };

You can select the code for copying by dragging your mouse along it; the line numbers won't be selected.

The line with the three colons at the top of the code block is the syntax directive.1 It tells PHPMEM to send the code block to Pygments and have it format the code as JavaScript with included line numbers. The syntax directive itself is not included in the output.

If I leave out the linenos option, the code is rendered as you would expect:

javascript:
// Change straight quotes to curly and double hyphens to em-dashes.
function smarten(a) {
  a = a.replace(/(^|[-\u2014\/(\[{"\s])'/g, "$1\u2018");      // opening singles
  a = a.replace(/'/g, "\u2019");                              // closing singles & apostrophes
  a = a.replace(/(^|[-\u2014\/(\[{\u2018\s])"/g, "$1\u201c"); // opening doubles
  a = a.replace(/"/g, "\u201d");                              // closing doubles
  a = a.replace(/--/g, "\u2014");                             // em-dashes
  return a
};

And if I leave out the entire syntax directive line, it’s rendered the way code segments have always been rendered here, as plain text with no highlighting:

// Change straight quotes to curly and double hyphens to em-dashes.
function smarten(a) {
  a = a.replace(/(^|[-\u2014\/(\[{"\s])'/g, "$1\u2018");      // opening singles
  a = a.replace(/'/g, "\u2019");                              // closing singles & apostrophes
  a = a.replace(/(^|[-\u2014\/(\[{\u2018\s])"/g, "$1\u201c"); // opening doubles
  a = a.replace(/"/g, "\u201d");                              // closing doubles
  a = a.replace(/--/g, "\u2014");                             // em-dashes
  return a
};

This insures (I think) that this new formatting won’t interfere with the code in my old posts. They’ll continue to be unhighlighted and will have line numbers only if I included line numbers in the code blocks when I wrote the post.

Some things I don’t know how to do yet:

I suspect I can handle both of these by writing my own Pygments processing script instead of using the stock pygmentize script that comes with the library.

Update 12/6/10
No need to write my own script; pygmentize can handle both cases. To get line numbers to start at something other than one, include a linenostart=n option along with linenos. For example,

 ::: python linenos,linenostart=45

To highlight PHP without enclosing it in <?php and />, include a startinline option. For example,

 ::: php startinline

As suggested by the first example, options must be separated by commas with no whitespace.

The changes in PHPMEM are in the “pygments” branch of its repository. I’ll merge them into the master branch when I feel more comfortable with the reliability of the new code.

Update 12/6/10
If you’re reading this via RSS, you may be wondering where the syntax highlighting is. It’s in the CSS, which get stripped out of the feed, there’s no highlighting in Google Reader or NetNews Wire. (h/t @PhilGeek)

The same is true with Instapaper. In fact, Instapaper is worse because it separates the line numbers from the code:

Yuck.


  1. Why three colons? Because that’s what the CodeHilite extension to Python Markdown uses. I figured there was no point in coming up with something new.