AppleScript syntax highlighting (finally)

I added syntax highlighting to the source code posted here back in late 2010. After a brief time using Pygments on the server, I switched to Highlight.js. I liked it the best of the JavaScript-based highlighters because it was cleanly written and easy to understand. The only problem with Highlight.js was that it didn’t support AppleScript. Initially, that didn’t bother me because I didn’t do much AppleScripting. That’s changed, and I expect to do more coding in AppleScript as I continue to explore BBEdit.

I was lamenting Highlight.js’s lack of AppleScript support on Twitter this afternoon with Brett Terpstra and Gabe Weatherhead. I had decided it was time to bite the bullet and write the AppleScript definitions myself.

I really need to teach highlight.js how to handle AppleScript.
  — Dr. Drang (@drdrang) Sun Sep 2 2012 5:14 PM CDT

Imagine my delight to find this tweet waiting for my when I got back from supper:

@drdrang @ttscoff @macdrifter I made some not too long ago. I just put them on github. github.com/nathan11g/high…
  — Nathan Grigg (@nathangrigg) Sun Sep 2 2012 6:33 PM CDT

Nathan Grigg is one of those people whose blog doesn’t get nearly the attention it should. As other bloggers rewrite the same rumors about how thin the next iPhone will be, Nathan puts out truly original material that’s clear, concise, and actually helpful. For example, I don’t know if I’d’ve been able to write my recent series of scripts on Markdown reference links in BBEdit without this post of Nathan’s.

Nathan would be, I think, the first to say that his AppleScript definitions for Highlight.js, a branch of his fork from Ivan Sagalaev’s original repository, are just a first step, but that’s the step I was most dreading. AppleScript has a huge set of keywords, and Nathan put in the effort to add them all. He has comments and strings working, too. My fork has a first cut at function definitions.

I’ve updated my styleCode function, first described here, to allow AppleScript code to be entered. Although Highlight.js has a language detection feature that can figure out (reasonably accurately) which syntax definitions to apply, I’ve never used it. I always start the code with a line that declares the language.

A script like this

    applescript:
     1:  tell application "BBEdit"
     2:    set myText to contents of front document
     3:    set myRef to do shell script "~/bin/bbstdin " & quoted form of myText & " | ~/bin/getreflink"
     4:    
     5:    if myRef is not "" then
     6:      if length of selection is 0 then
     7:        -- Add link with empty text and set the cursor between the brackets.
     8:        set curPt to characterOffset of selection
     9:        select insertion point before character curPt of front document
    10:        set selection to "[][" & myRef & "]"
    11:        select insertion point after character curPt of front document
    12:        
    13:      else
    14:        -- Turn selected text into link and put cursor after the reference.
    15:        add prefix and suffix of selection prefix "[" suffix "]" & "[" & myRef & "]"
    16:        select insertion point after last character of selection
    17:      end if
    18:    end if
    19:    
    20:  end tell

will now display like this,

applescript:
 1:  tell application "BBEdit"
 2:    set myText to contents of front document
 3:    set myRef to do shell script "~/bin/bbstdin " & quoted form of myText & " | ~/bin/getreflink"
 4:    
 5:    if myRef is not "" then
 6:      if length of selection is 0 then
 7:        -- Add link with empty text and set the cursor between the brackets.
 8:        set curPt to characterOffset of selection
 9:        select insertion point before character curPt of front document
10:        set selection to "[][" & myRef & "]"
11:        select insertion point after character curPt of front document
12:        
13:      else
14:        -- Turn selected text into link and put cursor after the reference.
15:        add prefix and suffix of selection prefix "[" suffix "]" & "[" & myRef & "]"
16:        select insertion point after last character of selection
17:      end if
18:    end if
19:    
20:  end tell

whereas code without the language line will look like this:

 1:  tell application "BBEdit"
 2:    set myText to contents of front document
 3:    set myRef to do shell script "~/bin/bbstdin " & quoted form of myText & " | ~/bin/getreflink"
 4:    
 5:    if myRef is not "" then
 6:      if length of selection is 0 then
 7:        -- Add link with empty text and set the cursor between the brackets.
 8:        set curPt to characterOffset of selection
 9:        select insertion point before character curPt of front document
10:        set selection to "[][" & myRef & "]"
11:        select insertion point after character curPt of front document
12:        
13:      else
14:        -- Turn selected text into link and put cursor after the reference.
15:        add prefix and suffix of selection prefix "[" suffix "]" & "[" & myRef & "]"
16:        select insertion point after last character of selection
17:      end if
18:    end if
19:    
20:  end tell

I’ve gone back and added the applescript: line to code in recent posts, but I probably won’t bother doing the same to older posts unless I happen to link to them and see that their AppleScript could use a little spiffing up. One of the nice things about dynamic syntax highlighting is that the highlighting will improve as I add new features to the definitions file.


4 Responses to “AppleScript syntax highlighting (finally)”

  1. has says:

    DIY syntax highlighters never work well for AppleScript since they don’t know how to tokenise it correctly due to white space being allowed in keywords, e.g. ‘do shell script’ is a single (osax-defined) keyword, but your JS highlights the ‘script’ portion to indicate a language keyword. Ditto the ‘and’ in ‘prefix and suffix’ further down. The ability of apps and osaxen to redefine existing terms to their own ends doesn’t help either.

    If you want accurate highlighting, you have to get the AppleScript interpreter to pretty-print it for you, and you should do this on the machine you compiled the script on to ensure you have all the required apps and osaxen available to decompile the script.

    e.g. Try:

    http://www.script-factory.net/software/ScriptEditor/AppleScriptHTML/en/

    Or, if you really want to roll your own, you can use the -[NSAppleScript richTextSource] category method provided by AppKit to retrieve an attributed string and munge that yourself.

  2. Dr. Drang says:

    has,
    In the tradeoff between getting accurate syntax highlighting and maintaining readable Markdown input in the blog database, I favor the latter. If I ever shift to a static blog, AppleScriptHTML—if it can be called from the command line—would be a good tool to call during the “baking” process.

    Strictly speaking, what you’ve said about AppleScript could be said about most languages. The language’s own tokenizer is the true arbiter of its syntax; everything else is an approximation. AppleScript may be more difficult than other languages for third-party highlighters to handle, but it isn’t unique.

    Perl, in particular, gives syntax highlighters the fits because it’s so flexible and context-sensitive. When Perl was my main language, I noticed that the more adept I got with Perl and the more of its features I used, the less likely it was for my editor to get the highlighting right.

  3. has says:

    There’s a difference between struggling with a baroque grammar and struggling with one that is intentionally incomplete. If a standalone Perl highlighter can’t pick out language keywords correctly, that could be addressed by writing a better parser with a more complete understanding of Perl’s grammar rules. At worst, you’d need to replicate Perl’s own parser in full, which wouldn’t be at all fun but would be possible in principle.(The sane option, of course, would just be to reuse Perl’s existing parser (although I don’t know if Perl has the ability to output an AST, which is what you need).

    OTOH, it’s effectively impossible to write a remotely competent standalone AppleScript parser - or even run AppleScript’s own parser completely standalone - because AppleScript itself does not, and cannot, describe its own code semantics. Instead, much of its grammar rules (what’s a property name, what’s an element name, what’s a command name and what its parameter names are, etc.) are derived from external sources - app and osax dictionaries - so can be different for every script. To work reliably, you really have to format at the point of the script’s authoring, not at point of the blog’s rendering, because even if you did implement a full AS parser (much simpler than writing a Perl parser, btw) there’s no guarantee the machine doing the rendering will have access to all - or even any - of the dictionary dependencies that were available to the machine that compiled the script. I mean, you could construct an external database of extracted SDEFs for it to look up, but I think the logistics of managing that would make parsing Perl look sensible. Far better just to go with AppleScript’s flow than fight its way of doing things every step of the way.

    If you don’t want the pre-tagged source code mucking up the rest of your article (which is not unreasonable), you could always stick each pre-formatted code chunk in a separate file/table and have your renderer merge it into the article prior to serving it (subject to your ability to rejig your blogging app’s rendering pipeline to perform the extra step). That’s how I’d do it.

  4. has says:

    BTW, handy tip for reading AS code in general: add underlining to the various ‘keyword’ styles in your editor’s preferences. Not so pretty, but it lets you see exactly where each token begins and ends, since the spaces within a multi-word keyword will be underlined as well.

    (It’s just a shame AS can’t do background colours too, as that’d be much nicer on the eye.)