Two simple things

There’s a time for powerful, complex programs and there’s a time for simple little utilities. I had two reminders this week that speedy little programs have a power of their own.

I was on the phone with a client on Tuesday, discussing the failure of a piece of equipment back in 2011. We had a set of records about the failure, some of which included unambiguous dates, like April 27, and others that referred to events by the day of week. My first instinct to reconcile the two was to bring up Fantastical and flip back to get a calendar for April 2011. That certainly would have worked, but there would have been a lot of clicking or arrow key pushing to get back that far. And, of course, Fantastical fades away when you click on any other window and will force you to scroll back in time again if you need to have another look.1

The obvious choice—which was obvious only because my mouse passed over an open Terminal window as it made its up to the Fantastical icon in the menubar—was cal, the venerable Unix utility2 that does nothing more than display calendars. I tapped on the Terminal window to bring it forward and typed

cal 4 2011

The immediate response was

     April 2011
Su Mo Tu We Th Fr Sa
                1  2
 3  4  5  6  7  8  9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

and I was able to resolve the two types of dates without any interruption of the phone call. My client probably thought I was prepared.

This morning I was going through the data files of a set of tests run by someone else at my company. The files had a few columns of data that I had no use for and was missing a column of time stamps. I happened to know that the data were collected at 20-second intervals, so I wrote a one-liner that put a column of times in the clipboard:

perl -e 'for($i=0;$i<=900;$i+=20){print "$i\n"}' | pbcopy

Putting that into the first column of the data was the perfect job for paste, and getting rid of the unnecessary columns was right up cut’s alley. The pipeline went this way:

pbpaste | paste - data-raw.txt | cut -f 1,5- > data-cleaned.txt

The - as the first argument of paste told it to put the column of times before the other columns in data-raw.txt. The -f 1,5- option to cut told it to pull out columns 1, 5, 6, 7, and so on (i.e., everything other than columns 2–4) and send them to data-cleaned.txt.

This pipeline was easy to write because the data file was in tab-separated-values format, and tabs are the default column separators for cut and paste. One of the few advantages of having data sent to you in an Excel spreadsheet, is that copying a bunch of cells and pasting them into a text editor gives you a nice TSV file that lots of programs understand.

After cleaning the data, I did some elementary analysis in IPython using the pandas library, which I’m finally getting around to learning. Pandas understands TSV, too, but it’s definitely not a simple utility. A topic for another day.

Update 11/24/13
The great thing about posting tips and tricks on the internet is that there’s always someone (usually several someones) who know even better tips and tricks. This morning I woke up to a handful of improvements in my Twitter stream.

Building Twenty and Josh Asch pointed out that I could have prevented Fantastical from fading away by clicking the little anchor icon in its lower left corner. I always wondered what that was for. And Alexandre Chabot suggested typing “April 2011” in Fantastical’s entry field; that starts the creation of a new entry, which I don’t want, but because of Fantastical’s instant feedback, it scrolls to the that month, which I do want.

David Cross suggested jot as a substitute for my Perl one-liner. No question, jot is exactly the right tool for the job, but for some reason I never remember to use it when I need to generate a sequence. Maybe writing this will help me remember next time. David suggested

jot 100 0 900 20

where the 100 is the number of steps to generate, 0 is the starting point, 900 is the ending point, and 20 is the step size. You may notice that specifying all four terms overdetermines the sequence, creating a conflict between the first and fourth arguments. jot solves this problem by generating the smallest number of steps that satisfies one or the other. By choosing an excessively high number for the first argument. David forced jot to generate its sequence by using the other three. In other words,

jot 100000000 0 900 20

would’ve worked just as well. If your mathematical background has left you with a distaste for overdetermined systems, you can tell jot to ignore the first argument by using a hyphen:

jot - 0 900 20

Hyphens as arguments typically mean “use standard input or output instead of a file.” I can’t think of another case in which it means “ignore me,” but I’ll bet there’s someone out there who can.

  1. There’s always the Mac’s own Calendar, which won’t fade away but which I didn’t even consider because I seldom have it open. The damage Apple did to my opinion of Calendar/iCal in Lion will take a long time to repair. 

  2. The man page says it came with Version 5, which was released in 1974. Almost forty years old and still fundamentally the same program. Stagnant, I guess.