A line-numbering text factory for BBEdit

You know how you get into a certain mode of thinking and you miss the obvious? And then, when you finally do see the obvious, you feel stupid and embarrassed? Whenever I want to automate a process, I assume I should write a script. That’s my first thought and, too often, my last. Usually it works, but tonight I realized that a script I’d written was wasting my time.

The process I wanted to automate was adding line numbers to source code in blog posts. When I include a script in a post, the Markdown looks like this:

When I include a script in a post, the Markdown looks like this:

    python:
    1:  #!/usr/bin/python
    2:  
    3:  print "Hello, world!"

The code is preceded by the name of the language, which is used to set the syntax highlighting rules, and the lines of the source code itself are numbered, with a colon and two spaces at the start of each line. I get the effect I want,

python:
1:  #!/usr/bin/python
2:  
3:  print "Hello, world!"

through some JavaScript.

Here’s how I’ve been adding line-numbered source code for ages:

  1. Copy the code from the original and paste it into the post.
  2. Select the code just pasted (BBEdit will select the just-pasted code for you if you paste via ⌥⌘V—that saves one step).
  3. If there are tabs in the source code, turn them into spaces (this is usually needed only for AppleScript).
  4. Run a filter on the selection that numbers the lines. In TextMate, that filter was written in Python; in BBEdit, it was in AppleScript.
  5. Indent the lines so Markdown knows they’re code.
  6. Add the language line above the code.

Now, it’s not like this was an especially laborious process; each step was performed by keyboard shortcuts. But for some reason—some mode of thinking I was in for years—I never thought to combine most of these steps until tonight. And to do it without creating a big, complicated script.

The answer was BBEdit’s Text Factories, a system that allows you to combine several text transformations into a single command. The transformations can be from BBEdit’s built-in set or scripts you write yourself. I ended up using a combination of both.

Here’s the factory I came up with:

Source code line numbering

Four of the five steps are from BBEdit’s built-in commands in the Text menu. The other, which is the fourth step in the series, is this little Perl script,

perl:
1:  #!/usr/bin/perl -p 
2:  
3:  if ($.==1 && /^\s*1:  #!.+\b(.+)$/) {
4:    print "$1:\n";
5:  }

which I call “Prepend Language.pl” and keep in the Text Filters directory. If the script starts with a shebang line, it grabs the name of the executable that runs the script and puts it above the source code itself. It is by no means perfect (it doesn’t work on itself because of the -p at the end of the shebang line), but it’ll work for a lot of the scripts I post here. And it won’t be any worse than what I was doing.

Now the process for adding code here is

  1. Copy the code from the original and paste it into the post, preserving the selection.
  2. Apply the text factory through ⌃⌥⌘N, the shortcut I assigned to it.
  3. There is no Step 3. (Unless the script has no shebang line, in which case I’ll have to add the language line manually.)

I confess it feels a little unmanly to use a text factory instead of script—anyone can make a text factory—but I can’t argue with the efficiency.


4 Responses to “A line-numbering text factory for BBEdit”

  1. Chris Poole says:

    awk could help too:

    awk -v OFS='' '{print NR,":\t",$0}'
    

    It wouldn’t take much more to sniff the file extension and/or hashbang line, and decide what the language is, as part of a slightly larger shell script.

  2. Dr. Drang says:

    It’s true that awk can make quick work of line numbering, Chris, but your one-liner is a little too quick to get the formatting I like. First, you put a tab after the colon, and I’m dead set against having tabs in my Markdown files because the size of the tab is inconsistent. (It’s 8 characters in the Terminal, but I’ve never had an editor set to use an 8-character tab.) Second, I like the line numbers to be right justified so the colons line up. The text factory does that with a simple checkbox; my old Python script does it by counting the lines first and then going back to put in the line numbers; your awk script leaves them left justified.

    If you’re going to use traditional Unix tools to number the lines, nl is your best bet. But you’ll still have to tell it the width of the number field to avoid excessive white space at the beginning of each line.

    There are two problems with trying to get the language from the extension:

    1. I often don’t use extensions. I never use them for scripts I expect to run from the command line.
    2. BBEdit’s text filters work by assuming the input is stdin, not a file. There really isn’t a filename to parse.

    My Perl script does sniff the shebang (hashbang) line. It’s just that that won’t work for AppleScripts, HTML snippets, etc.

  3. Chris Poole says:

    I agree about the tab issue; my one-liner was just an attempt at showing the power of unix tools, to get an 80% solution very quickly. (Something that I know isn’t lost on you.) nl is new to me though, so thanks for the tip.

    As for the file extension sniffing, I mention it only in that my editor of choice (Emacs) uses this method (see variable auto-mode-alist), as well as the hashbang line sniffing (interpreter-mode-alist). It works effectively for me, since I tend to either use a hashbang with no extension (for things living in ~/bin), or a file extension (for other, perhaps larger, coding projects). For me, these methods together would probably correctly match 95% of my scripts/programs.

    Emacs uses the following regex to sniff the interpreter given in the hashbang line:

    #![ \t]?\\([^ \t\n]*/bin/env[ \t]\\)?\\([^ \t\n]+\\)
    

    It’s stored in auto-mode-interpreter-regexp, and not thrown by the addition of the -p. Seems a little excessive though?

  4. Dr. Drang says:

    I fixed (I think) the formatting of the regex, Chris. The \t and \n characters were getting substituted with actual tabs and newlines. This is an interesting pattern—gives me ideas on how I can improve mine to be more flexible. Thanks for pointing it out.