2020 and the return of the HPDA

If you read yesterday’s post, you may have noticed this tin can filled with index cards on my desk.

Tea can with index cards

The tea was a gift from my daughter and is long gone, but the can it came in seemed too nice to throw away. So I kept it around, figuring I’d eventually find a use for it. I didn’t think that use would take me back 15 years.

I’ve been one of the lucky people who’ve been able to work throughout the pandemic, and I’ve been able to work safely from home most of the time. My trips to the office have been mainly for dealing with bills and the mail and for shuttling books and file material back and forth from home. My time there has generally been limited to a couple of visits a week for no more than a few hours.

So when I’m at the office, I typically want to whip through a short set of tasks as quickly as I can and leave. At some point in late spring, I found myself writing a checklist on an index card and stuffing it in my back pocket before I left for work. While at the office, I’d check off the tasks; when there was nothing left to check, it was time to go home.

It struck me that when I needed to be efficient and complete, I went from digital to analog. Instead of pulling out my phone, which has the entire world in it, it was better to pull out that one card with just what I needed and nothing more. I had rediscovered the focus that comes with a stripped-down version of Merlin Mann’s Hipster PDA.1

The completely stripped-down HPDA didn’t last long. Sometimes I learned things at work that needed to be acted upon when I got home, and it wasn’t always easy to write them down on a single flimsy index card. So I retrieved my Levenger Pocket Briefcase from a back shelf in my closet and began using it as a writing surface and to hold a few extra cards.

Levenger pocket briefcase with task list card

As you can see in the photo, the handwritten task list gave way to a printed one. I started using the HPDA at home as well as at work, so the tasks it contained needed to be expanded and organized by project. But even with the expansion, it’s still just one card (sometimes the flip side, too, but I try to avoid that) where I can immediately see the things I need to do and mark them off as I do them. A new card is printed every morning, and the items that were checked off during the day are transferred to my task manager every evening.

That task manager is Things, which I’ve been using to track my work across Mac, iPad, and iPhone for a couple of years. It syncs well, it looks good, and it fits the way I think. The formatting and printing is done through a few scripts and configuration files repurposed from this setup that I put together about 10 years ago.

I’ll detail those scripts and configuration files in a later post. This post was for thinking about how 2020 taught me that what I was doing in 2005 and 2010 is still worthwhile and probably shouldn’t have been abandoned. And to explain why I have a tin can full of index cards on my desk.

  1. Remember when he was Merlin Mann? 

Myself and a shelf

David Sparks often talks about getting his external SSDs out of the way by velcroing them to the underside of his desk. I like the idea but have always been afraid to try it because, unlike David, I’m still using spinning hard disks for external storage. I worry that the weight and vibration would work them loose eventually.

But this week I got both of my external disks—one for Time Machine and the other for a nightly backup—off the top of my desk and down underneath it. I went with a more prosaic solution: a shelf.

Here’s my home desk:

Home desk

I bought it back in the late 90s when CRTs were still a thing. Amazingly, the center opening of the hutch is just wide enough for a 27″ iMac. Until this week, the hard disks were tucked back behind the Mac. I could sometimes hear the chik-chik-chik when they were reading or writing, and at night I could see the area under the Mac flash when their lights blinked.

I bought this set of shelves from Amazon (yes, that’s an affiliate link) to mount under the desk. There are three coated steel shelves in the set: one 17″ long, one 13″ long, and one 7″ long. They’re all 5″ wide and have a lip along the outstanding edge.

After looking over the possibilities, I decided to mount the 17″ shelf to the far side of the modesty panel that runs from side to side. I crept under the desk on my back and got to work.

First, I stuck a magnetic level to the underside of the shelf and held it up against the modesty panel to mark the holes with a pencil:

Positioning shelf

This view is looking up from the floor. My head is basically up against the wall behind the desk. I should also mention that what you see in this photo is the 13″ shelf, even though I wanted to install the 17″ one. I realized I had grabbed the wrong shelf after marking the holes. I hauled myself out from under the desk to get the right shelf and repeated the process but didn’t take new photos.

After using an awl to punch starter divots into the panel at the pencil marks, I drilled pilot holes for the mounting screws. I was worried the drill wouldn’t fit between the panel and the wall, but it did. Like the iMac in the hutch, it had a fraction of an inch to spare.

Drilling pilot holes

I like to think I held the drill more square to the panel when I wasn’t using one hand to take a photo. I had just enough foresight to shift my head off to the side before drilling so the sawdust didn’t fall into my eyes and mouth.

After installing the shelf with three screws, I rechecked the level and was pleasantly surprised.

Final check of level

I unmounted the hard disks, rerouted the cables, and set them up on the shelf. If the shelf had been too narrow for the larger disk (it wasn’t supposed to be, but who knows how accurate an unknown vendor’s reported dimensions are?), I could have flipped it on its side.

Disks on shelf

The disks have never been loud, but now that they’re under the desk and on the other side of the modesty panel, I don’t hear them at all. And their blinking lights are well out of my sight.

I haven’t decided what to do with the other two shelves. There’s more room on the modesty panel if I need more hidden space. I could also put them on the side panels for things that need to be handy. The one thing I don’t like about this desk is its lack of drawers.

The $18 I spent on the shelves is undoubtedly more than David spent on his velcro, but peace of mind is worth something.

Fake survey entries, with and without typos

Here’s a quick one on how I made the synthetic data files used in yesterday’s post.

First, I got the canonical iMac color names from this post by Stephen Hackett. I figured Stephen was about as authoritative a source as any on the subject. I copied all the text of the post, pasted it into a BBEdit document, and used some of the techniques discussed yesterday (how meta!) to boil it down to just the list of thirteen colors.

The list became part of this script:

 1:  #!/usr/bin/env python
 3:  import random
 5:  # Initialize
 6:  numEntries = 100
 7:  numTypos = 15
 8:  colors = ['Bondi Blue', 'Blueberry', 'Lime', 'Tangerine',
 9:            'Strawberry', 'Grape', 'Graphite', 'Sage', 'Ruby',
10:            'Indigo', 'Snow', 'Blue Dalmation', 'Flower Power']
11:  wgts = [3, 1, 1, 2, 1, 1, 2, 1, 2, 2, 2, .5, .5]
13:  # Make list of random iMac colors
14:  canEntries = random.choices(colors, k=numEntries, weights=wgts)
16:  # Write list to file, one per line
17:  ctext = '\n'.join(canEntries)
18:  with open('cancolors.txt', 'w') as f:
19:    f.write(ctext)
21:  # Misspell a few of the entries
22:  badEntries = canEntries[:]
23:  alphabet = [ chr(97+i) for i in range(26) ]
24:  typos = [ random.randrange(numEntries) for i in range(numTypos) ]
25:  for e in typos:
26:    color = badEntries[e]
27:    letter = random.randrange(1, len(color))
28:    color = color[:letter] + random.choice(alphabet) + color[letter+1:]
29:    badEntries[e] = color
31:  # Write misspelled list to file, one per line
32:  ctext = '\n'.join(badEntries)
33:  with open('colors.txt', 'w') as f:
34:    f.write(ctext)

The first section of the script initializes the variables used later. In addition to the list of colors, there’s the number of entries we’ll generate, the number of errors we’ll introduce, and a set of weights we’ll use for the random selection of colors.

The wgts variable created in Line 11 is a list of relative likelihoods for each color. It’s parallel to the colors list, so you can see that Bondi Blue is three times as likely to be chosen as Blueberry, Tangerine is twice as likely, Flower Power is half as likely, and so on. The wgts variable is like a probability mass function for the colors, except that it isn’t normalized to a sum of one. There wasn’t really a need to weight the colors, but I wanted the output to look realistic, and certainly some iMac colors were more popular than others. The weights weren’t based on any data, just my arbitrary choices with a little boost given to Bondi Blue because it was the original.

The next section generates a list of colors through a random selection process. Line 14 uses the choices function from Python’s random module to generate a list of numEntries colors. The list is then joined together with linefeed characters and written out to the cancolors.txt file.

The next section makes a new list of entries with typos. Line 22 copies the list of properly spelled entries into a new list that we’ll add typos to. Line 23 creates a list of lower case letters from which we’ll choose at random to make the typos, and Line 24 generates a random list of numTypos integers that represent the entries we’ll be messing up.

The loop in Lines 25–29 replaces random letters at random locations. The index of the letter to be replaced is chosen in Line 27 using the randrange function. Note that the random index chosen starts at 1 rather than 0. I did this because I thought it was more realistic for typos to come after the first character. Line 28 then inserts a randomly chosen letter at that random index. Finally, the misspelled word is put back in the list of entries with typos. When the loop is done, the list is joined together with linefeeds and written out to the colors.txt file.

I should note here that my Macs have a recent version of Python 3 (installed via Anaconda) and that my PATH environment variable is such that the Anaconda Python is the default. If you want to play with this script, you may need to change the shebang line to


or do something else to make sure you’re running it through Python 3 instead of Python 2. The random module in Python 2 doesn’t have a choices function (nor do versions of Python 3 before 3.6).

As is usually the case, this little script took longer to explain than it did to write.

A few text utilities

If anything can stir me from a blogging lull, it’s a Jason Snell post about one of my favorite programs, BBEdit. And when responding to his post allows me to talk about Unix text processing utilities along with BBEdit, well, strap in.

Jason uses a particular example—collecting and counting listener survey responses to use on The Incomparable’s Family Feud-style Game Show episodes—to explain some of BBEdit’s powerful text filtering commands. He extracts, counts, sorts, and reformats, all within the friendly confines of BBEdit, where every step you take is both visible (to check if you’ve done it right) and reversible (in case you haven’t).

One of the cleverest things Jason does, which I think he undersells, is nibble away at the dataset as he processes it. In this screenshot

BBEdit processing from Six Colors

we see that “Delete matched lines” is checked. By deleting each set of entries as he finds them, he makes it easier to develop the criteria for finding the next set. And with “Copy to clipboard” checked, he hasn’t lost the entries he’s just found—they’re ready to be pasted into a new document for checking and counting. This nibbling technique is one I’ve never used but will keep in mind the next time I’m faced with this type of problem.

Most of the text analysis I do is, thankfully, easier than Jason’s. When I’m counting up lines of text based on certain criteria, I’m usually working with machine-made text: log files, EXIF data from a folder of photos, output from a data acquisition system. These files don’t have misspellings, typos, or inconsistent capitalization and can usually be processed in one fell swoop. My model for processing this kind of data is Doug McIlroy’s famous six-line shell script.

To make an example that parallels the data in Jason’s post, I generated a 100-line text file with random selections from the thirteen colors of the iMac G3. The first ten lines look like this:

Blue Dalmation
Bondi Blue

I named the file cancolors.txt because all the colors are in their canonical form, with no deviations in spelling or capitalization.

Following McIlroy’s lead, I can count the entries and sort by popularity with this simple shell pipeline,

sort cancolors.txt | uniq -c | sort -nr

which gives me this output:

  16 Bondi Blue
  13 Tangerine
  12 Graphite
  11 Snow
  11 Ruby
   8 Indigo
   7 Sage
   5 Strawberry
   5 Blueberry
   4 Grape
   3 Lime
   3 Flower Power
   2 Blue Dalmation

(When I said the colors were chosen randomly, I didn’t mean they were chosen randomly with the same probability. I’ll write a short post later about how I generated the list and weighted the entries.)

The first sort command reads the lines of cancolors.txt and sorts them in alphabetical order (without changing cancolors.txt itself. The sorted lines are piped to uniq, which reduces identical adjacent lines to single lines and prints them. The -c option tells uniq to precede each line with the number of times it was repeated.1 Finally, the second sort sorts the output of uniq, where the -n option tells it to sort numerically instead of alphabetically, and the -r option tells it to reverse the order.

Depending on what I need to do, I could use this output as is or process it further. If I wanted to pop it into two columns of a spreadsheet, as Jason does, I could either put the output into BBEdit and process it there through Find/Replace or just continue the shell pipeline:

sort cancolors.txt | uniq -c | sort -nr | perl -ple 's/^ +(\d+) (.+)/$2\t\$1' | pbcopy

The perl command gets rid of leading spaces, reverses the order of the number and the color, and puts a tab character between them. The pbcopy puts the result onto the clipboard, ready for pasting into a spreadsheet.

This is undoubtedly much faster than Jason’s method, but that’s only because the color names were in canonical form before I started. What if the dataset comes from humans instead of machines and contains inconsistencies?

I do often get that kind of data at work. Usually, these are inspection notes, where different inspectors—including me—use inconsistent terms and capitalization while reviewing or measuring different parts of a building or machine. In these cases, I usually correct the data before processing, creating a new input file with consistent spelling and capitalization that I can process as above. The tools I use to correct the data depend on the situation but usually include some combination of command-line utilities, BBEdit, and Numbers.

As an example, I’ve introduced random typos into the file of color names. This noncanonical file is called colors.txt and has lines like this:

Bondi Blue
Bindi Blue
Bondi Blue

If there are just a few typos, I can open this file in BBEdit, sort the lines, and scroll through it. When the file is sorted, typos are usually easy to find and fix. A way to see all the mistakes at once is to use this pipeline at the Terminal,

sort colors.txt | uniq

which will give an output like this:

Bindi Blue
Blue Dalmation
Bondi Blue
Flower Power

This reduces the correct spellings down to a single line and shows all the typos, usually adjacent to the correct spellings.

When there are more than a handful of typos, I usually copy all the sorted lines from BBEdit and paste them into an empty spreadsheet in Numbers for editing.2 The advantage of making the corrections in Numbers is that it’s easy to select several rows at once—including both correct and incorrect spellings—and enter the correct spelling with a single Paste. With the corrections made, I paste the text back into BBEdit.

I should mention that recent versions of BBEdit have made correcting several lines at once easier by allowing rectangular selection even when Soft Wrap is turned on (my default). The Move Line Up and Move Line Down commands are also very helpful in bringing lines together for correction in a single step. Still, I tend to use Numbers when there are lots of corrections to make; it’s a longstanding habit.3

I save the corrected data to a new file and run the McIroy-style pipeline on it. This is the kind of approach to a problem you learn in math class: reduce the new problem to one that’s already been solved. I don’t know if it ends up being faster than what Jason does, but it’s what I’m comfortable with.

  1. The -c means “count.” I can’t tell you how many times I’ve mistakenly written the command with an -n (for “number”) option. Fortunately, there is no -n option, and I get an error instead of an incorrect result. 

  2. Excel would be fine for this, too, but since Excel takes a day and a half to open, I usually use Numbers. 

  3. Bouncing back and forth between a text editor and a spreadsheet is hardly original with me. Although I didn’t learn it from them, John Gruber and Jason talked about doing this in an episode of The Talk Show a few years ago. I’d give you a link, but I couldn’t identify the show from the descriptions in the episode list