Improved Markdown table commands for TextMate

A few years ago, I made a small TextMate bundle with commands for manipulating MultiMarkdown-style tables1. One of the commands, Normalize Markdown Table, didn’t work too well if the table included non-ASCII characters. As I said in the post:

I’m sure there’s some clever Python module I can use to get around this problem, but I don’t know what it is yet. Suggestions are welcome.

It’s taken three years and five months, but someone has finally stepped up. I got an email today from Christoph Kepper with a new Normalize Markdown Table script that fixes the problem.

To review briefly, the Normalize Markdown Table command takes a table that looks like this,

|Left align|Right align|Center align|

and turns it into one that looks like this,

| Left align | Right align | Center align |
| This       |        This |     This     |
| column     |      column |    column    |
| will       |        will |     will     |
| be         |          be |      be      |
| left       |       right |    center    |
| aligned    |     aligned |   aligned    |

The idea is that rather than trying to get the columns to align as you type, you just create the table initially with misaligned separators. Then you select the table, choose the Normalize Markdown Table command, and it turns into the nicely formatted one.

Two things to note:

  1. Both forms are equally valid tables, and your Markdown processor will produce the same output whichever one you use. The Normalize Markdown Table command is purely for the writer’s benefit; it makes the table easier to read before it’s processed.
  2. The column alignment relies on the use of monospaced fonts, which is what all right-thinking people use in TextMate.

A problem with the Normalize Markdown Table command arose when the table included non-ASCII characters, like this:

|Theodore von Kármán|Mathematical Methods in Engineering|
|Stephen Timoshenko|Theory of Elasticity|
|Jacob Pieter Den Hartog|Mechanical Vibrations|

It would normalize to this:

| Author                  | Book                                |
| Theodore von Kármán   | Mathematical Methods in Engineering |
| Stephen Timoshenko      | Theory of Elasticity                |
| Jacob Pieter Den Hartog | Mechanical Vibrations               |

The column separators didn’t align because the á is two bytes long but takes up only one character space. Christoph’s improved code results in

| Author                  | Book                                |
| Theodore von Kármán     | Mathematical Methods in Engineering |
| Stephen Timoshenko      | Theory of Elasticity                |
| Jacob Pieter Den Hartog | Mechanical Vibrations               |

which is just what we want.

Here’s the new code:

 1:  #!/usr/bin/python
 3:  import sys
 5:  def just(string, type, n):
 6:      "Justify a string to length n according to type."
 8:      string = unicode(string, 'utf-8')
 9:      if type == '::':
10:          return
11:      elif type == '-:':
12:          return string.rjust(n)
13:      elif type == ':-':
14:          return string.ljust(n)
15:      else:
16:          return string
19:  def normtable(text):
20:      "Aligns the vertical bars in a text table."
22:      # Start by turning the text into a list of lines.
23:      lines = text.splitlines()
24:      rows = len(lines)
26:      # Figure out the cell formatting.
27:      # First, find the formatting line.
28:      for i in range(rows):
29:          if set(lines[i]).issubset('|:.-'):
30:              formatline = lines[i]
31:              formatrow = i
32:              break
34:      # Delete the formatting line from the content.
35:      del lines[formatrow]
37:      # Determine how each column is to be justified. 
38:      formatline = formatline.strip('| ')
39:      fstrings = formatline.split('|')
40:      justify = []
41:      for cell in fstrings:
42:          ends = cell[0] + cell[-1]
43:          if ends == '::':
44:              justify.append('::')
45:          elif ends == '-:':
46:              justify.append('-:')
47:          else:
48:              justify.append(':-')
50:      # Assume the number of columns in the format line is the number
51:      # for the entire table.
52:      columns = len(justify)
54:      # Extract the content into a .
55:      content = []
56:      for line in lines:
57:          line = line.strip('| ')
58:          cells = line.split('|')
59:          # Put exactly one space at each end as "bumpers."
60:          linecontent = [ ' ' + x.strip() + ' ' for x in cells ]
61:          content.append(linecontent)
63:      # Append cells to rows that don't have enough.
64:      rows = len(content)
65:      for i in range(rows):
66:          while len(content[i]) < columns:
67:              content[i].append('')
69:      # Get the width of the content in each column. The minimum width will
70:      # be 2, because that's the shortest length of a formatting string and
71:      # because that matches an empty column with "bumper" spaces.
72:      widths = [2] * columns
73:      for row in content:
74:          for i in range(columns):
75:              widths[i] = max(len(unicode(row[i], 'utf-8')), widths[i])
77:      # Add whitespace to make all the columns the same width and 
78:      formatted = []
79:      for row in content:
80:          formatted.append('|' + '|'.join([ just(s, t, n) for (s, t, n) in zip(row, justify, widths) ]) + '|')
82:      # Recreate the format line with the appropriate column widths.
83:      formatline = '|' + '|'.join([ s[0] + '-'*(n-2) + s[-1] for (s, n) in zip(justify, widths) ]) + '|'
85:      # Insert the formatline back into the table.
86:      formatted.insert(formatrow, formatline)
88:      # Return the formatted table.
89:      return '\n'.join(formatted)
92:  # Read the input, process, and print.
93:  unformatted =   
94:  print normtable(unformatted).encode('utf-8')

Christoph’s improvements boil down to just three changes:

  1. He added Line 8, which treats the string as UTF-8, both in the argument and the return value.
  2. He did the same thing in Line 75, which makes the len command return the number of characters rather than bytes.
  3. He added the encode method to Line 94, so the output would be handled properly.

I’ve put the new Normalize Markdown Table command into my Text Tables bundle, which you can download as a zip file. After you unzip the file, double-click on the resulting TextMate bundle, and it will install itself into your TextMate system. The bundle includes two other commands: one for turning tab-separated tabular data (like what you’d get if you copied a set of cells from a spreadsheet and pasted them into TextMate) into a nearly complete Markdown table, and one for turning a tab-separated table into a neatly aligned space-separated table (with no pipe characters as column separators). These two commands are described here and here.

One last thing. To show how awesome he is, Christoph didn’t send me the full source code of his improved command; he just sent me the diff between it and my original. To show how awesome I am, I decided to use patch to apply his update straight from the email. Unfortunately, my awesomeness wasn’t up to the challenge; for reasons I can’t explain, patch refused to apply the second of his three changes, and I had to do that one by hand. Time to turn in my Unix merit badge.

  1. It’s also the style used in PHP Markdown Extra and some other Markdown processors that implement tables.