# BBEdit's Extract and new feature blindness

When I use a piece of software for a long time, I tend to get comfortable with a certain set of features and often don’t bother to learn the lastest stuff when a new release comes out. So it was with BBEdit’s Extract button. It came with Version 11, which was released all the way back in October of 2014, and yet I didn’t use it until this weekend. That was dumb. It helped me solve an RSS mystery yesterday, and now that I’ve seen what it does, I realize I could’ve been using it for a year and a half.

This has been the first weekend in a few months in which I didn’t work (or at least feel guilty about not working). So I had some time to catch up on personal projects. In addition to this improvement to my homemade RSS aggregator, I also started looking into the feed from Lance Mannion’s blog. It hasn’t appeared in the aggregator for a long time, even though I know he’s still writing. I decided to find out why by taking a detailed look at his feed.

Grabbing the feed was easy:

curl http://lancemannion.typepad.com/lance_mannion/rss.xml > mannion-rss.xml


Based on problems I’ve had with other feeds, I thought the publication dates of the entries were the likely culprit. I opened BBEdit’s Find window and typed in the search criterion you see in the screenshot above. Normally, I’d click the Find All button and work my way through the Search Results window that appears:

This is a perfectly usable result. It is, after all, what I’ve been doing for years without any complaints. But this time I decided to give the Extract button a try, which opened this window:

There’s nothing in this window that wasn’t in the Find All window, but I found it less cluttered and easier to focus on the results.1 More important, though, I thought about how many times in the past 18 months that this would have been exactly what I wanted. Times when I really did want to pull out a subset of lines from a large file and create a new document from them. And I thought about how many other new features I’ve been blind to.

I need to go back and read those release notes.

1. The main result being that the publication dates are completely fucked. I have no idea what Typepad theme Lance is using, but it’s producing entries with dates that make no sense at all. Luckily, he also generates an Atom feed that includes both a publication date and an updated date. The Atom feed’s publication dates are just as wrong as the RSS feed’s, but the updated dates are correct. I’ve switched to checking the Atom feed and now have a special case in my aggregator to ignore the publication date of Lance’s entries and look only at the updated dates. Once again, I feel sympathy for Brent Simmons and understand why he got out of the aggregator business. ↩︎

# Making feedparser more tolerant

My last post ended with a Monty Python video from YouTube. The video displayed and played perfectly here on ANIAT, and it probably did the same in professionally programmed feed readers, but it didn’t show up at all in my homemade RSS aggregator.

This isn’t unique to videos posted here. Regardless of the site, embedded videos almost never appear in the aggregator. In the six months or so that I’ve been using it, I’ve gotten around this problem by simply clicking a link and going to the original article. But I had a little time today and decided to dig in and fix the problem. It required only one new line of code.

My aggregator is written in Python, and it uses the feedparser library to intelligently parse all the many types of feeds and put them into a common, clean, and simple format that’s easy to work with. As it happens, though, feedparser sanitizes the HTML content of a feed to reduce the possibility of security risks getting through. Some of the HTML elements it filters out are elements commonly used for embedding videos.

The solution came from reading this older post at Rumproarious and the answer to this Stack Overflow question. You can add to feedparser’s set of acceptable elements early in the script and they’ll make it through the parsing step without being sanitized away.

fp._HTMLSanitizer.acceptable_elements |= {'object', 'embed', 'iframe'}


This adds three HTML elements to feedparser’s whitelist. Because acceptable_elements is a set, the addition is really a union, hence the |=.

Now feeds look the way they should in the aggregator.

There is some risk to this, of course, but I’m trusting that those of you whose feeds I subscribe to won’t start adding malicious code to your feeds.

One of the things I dislike about a lot of Mac workflows written by others is that they rely on the user putting some critical information on the clipboard before the workflow is invoked. Something I dislike more is when I do the same thing, even though I know better.

I sometimes include YouTube videos here, and I like to have the embedded video centered. YouTube makes the embedding code easy to get at: click the Share button (the one with the swoopy arrow next to it), then the Embed button, and the embedding code will appear, already selected for you to copy and paste into the source code of your web site.

For the example video, the embedding code is

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/npjOSLCR2hE?showinfo=0" frameborder="0" allowfullscreen></iframe>


(The width and height may differ from one person to another, depending on preferences you set, as will other settings like “privacy-enhanced mode.” These settings can be changed by clicking the “Show More” button below the video and will persist if you’re signed in.)

Since I like to center the videos, I created a center CSS class in my style sheet and add that to the embedding code. So the code I’d embed here is

<iframe class="center" width="560" height="315" src="https://www.youtube-nocookie.com/embed/npjOSLCR2hE?showinfo=0" frameborder="0" allowfullscreen></iframe>


After doing this by hand several times, I decided to automate the process. But my first pass at the automation wasn’t very smart. It assumed I had copied the embedding code from YouTube onto the clipboard, and it simply added the class="center" part through a search-and-replace operation.

This was foolish, exactly the sort of thing that bothers me when I see it in other people’s workflows. There’s no reason to do the clicking and copying on the YouTube page. Everything in the embedding code is boilerplate except the ID string of the video itself (npjOSLCR2hE in the example), and that can be extracted from the URL of the page through AppleScript—no need for me to do any clicking or copying.

Here’s the Keyboard Maestro macro that does the work:

The first step is an AppleScript one-liner that gets the URL of the frontmost Safari document. If you use Chrome, you’ll have to change the AppleScript to something like

tell application "Google Chrome" to get the URL of active tab of front window


The URL is put into the variable YTID, and the second step does a regular expression search-and-replace to leave YTID with just the video’s ID string. In a YouTube URL, the ID string is the value of the parameter v. The macro’s second step uses a find regex of

.+[?&]v=([^&]+).*


to capture the v parameter. The URL is then replaced with just the captured ID string.

After we have the ID string, it’s a simple matter to spit out the boilerplate code with the ID string in its proper place. Because this is relatively long string, I use “Insert text by pasting” instead of “Insert text by typing.”1 This leaves the embedding code on the clipboard, which is messy, so the final step deletes it, putting the clipboard back to what it was before the macro was invoked.

Using this macro does require some preparation—the video you want to embed must be the frontmost tab of the frontmost Safari window. But this is usually the case. Even when it isn’t, a single ⌘-click on the video’s tab will put Safari in the proper state without leaving your text editor. Either way, it’s less work than clicking around in Safari and using copy/paste.

Does the macro work? Let’s see.

Update 5/28/2016 6:28 PM
Ed Cormany had a good suggestion for a more native version of the first step of the macro:

@tjluoma @drdrang @keyboardmaestro ah, i found it! not a Safari action, but it’s a global token.

Ed Cormany (@ecormany) May 28 2016 5:14 PM

Like Ed, I had looked in Keyboard Maestro’s set of Safari actions to find one that grabbed the URL of the current Safari page but couldn’t find it. Unlike Ed, I didn’t think to look through the built-in variables. Now that I see it, though, it makes perfect sense—the frontmost URL is just a string, more appropriate for a variable than an action.

So if you’re a Safari user, as I am, you can use this for the first step instead of the AppleScript:

That’s what I’ve done on my computers, and I’ve changed the downloadable file, too. If you’re a Chrome user, there’s a similar built-in variable to get the URL of its current page.

1. I generally prefer Keyboard Maestro to insert text via simulated typing even though it’s slower, because some text fields don’t allow pasting. In this case, pasting won’t be a problem because I’ll be using the macro from within a text editor, where pasting is always allowed. ↩︎

# Path expansion in LaTeX

On Linux, your home directory is typically /home/username; on OS X, it’s /Users/username. If you work on different operating systems, or if you have different usernames on different computers, and you want certain things to work the same way across all your machines, these differences can lead to small annoyances. Recently, I learned a way to eliminate one of those annoyances when working in LaTeX.

On both of my computers, I have a PDF of my signature stored in a file called, cleverly enough, signature.pdf. It’s in the same subdirectory of my home directory on both machines, but because I use different usernames on these machines,1 it’s /Users/name/graphics/signature.pdf on my office computer and /Users/drdrang/graphics/signature.pdf on my notebook.

In shell scripts that access the signature file, this difference poses no problems. The file can be addressed as ~/graphics/signature.pdf on both computers, because the tilde expands to the home directory. Similarly, in Python scripts, it’s os.env['HOME']/graphics/signature.pdf on both computers.

The problem is in LaTeX files. I have a signature macro defined this way on my MacBook Air:

\signature{\vspace{-.625in}\hspace{-.125in}\includegraphics{/Users/drdrang/graphics/signature.pdf}\\
\vspace{-.125in}My Real Name}


Because it uses an absolute path, it has to be defined slightly differently on my iMac at the office. It would be nice if I could define it the same way on both machines, but

\signature{\vspace{-.625in}\hspace{-.125in}\includegraphics{~/graphics/signature.pdf}\\
\vspace{-.125in}My Real Name}


doesn’t work because the tilde has special meaning (nonbreaking space) in LaTeX. You might think escaping the tilde with a backslash (\~) would fix the problem, but it doesn’t.

The solution, which I found in an answer to this Stack Exchange question, is to use the primitive \string command to get the underlying TeX engine to treat the tilde literally rather than as a special character. My signature command is now

\signature{\vspace{-.625in}\hspace{-.125in}\includegraphics{\string~/graphics/signature.pdf}\\
\vspace{-.125in}My Real Name}


Kpathsea, the system TeX and LaTeX use for searching paths, understands the tilde, so now I have just one signature definition on both computers.

Similarly, the LaTeX code for other graphic elements that I commonly use in reports—the company logo, for example—has been unified across machines, and I no longer have to keep two slightly different versions in sync with one another.

1. Why have different usernames? Mainly because I do all my blogging from my MacBook Air, and it’s generally easier to maintain my pseudonymity if my username is drdrang on that machine. On my iMac at the office, it’s easier—and less weird looking—to have my real name as my username. Of course, I often do real work from my MacBook Air, and that’s where the annoyance described in this post arises.