# Irons in the fire

I’m have a few productivity-related projects going on at the moment. My progress on all of them are being slowed by some unavoidable and some avoidable roadblocks:

1. Several projects at work (i.e., things I get paid for) need my attention.
2. I can’t decide which productivity project should get the most attention, so I keep bouncing between them.
3. There’s much to read on each topic, and I’m having a hard time separating the wheat from the chaff.
4. It’s especially difficult to read or write code when the NBA playoffs are on.

## NumPy, SciPy, and Matplotlib

My recent post on Gnuplot has prompted one reader, Barron Bichon, to ask, via email, why I don’t use Matplotlib1 instead. It’s a good question. Since virtually all of my scripting is done in Python, Matplotlib, a system for generating plots in various output formats from within Python, seems like it should be a perfect fit.

So, for that matter, do NumPy and SciPy, which Matplotlib builds off of. In fact, the availability and maturity of the NumPy/SciPy combo was one of the reasons I chose Python over Ruby when I was deciding which language to move to when I left Perl. But I had trouble installing these libraries several years ago and gave up trying.

Barron’s email, though, included a short, exceptionally clear Python/Matplotlib script for generating the normalized amplitude response plot in my accelerometer post, and it resparked my interest in all those libraries. I found what purported to be a very simple installer for all of them: Chris Fonnesbeck’s SciPy Superpack.

The Superpack worked beautifully on both my computers, and now I have a NumPy/SciPy/Matplotlib system wherever I work. It does require Xcode 4.3.2, which is the one that installs itself in /Applications rather than /Developer, and the command-line tools that have to be installed from within Xcode. But once you have those, it’s clear sailing—or at least it was for me.

Of course, I still have no idea how to use these tools. I think I’ll start by rewriting some of my older Gnuplot code. Expect a post or two comparing the two plotting systems.

## XSLT and lxml

I have a few scripts that scrape web pages for information and rework it into more usable forms. These scripts use a combination of the Beautiful Soup library and some embarrassing hackwork heuristically defined coding. I’d like to get these scripts cleaned up and maybe adapt them to scrape other pages.

I’ve been thinking about using XSLT to process the pages. It worked well in my 5by5 After Dark RSS filter. But after fiddling around with it for a couple of weeks, I’ve decided that pure XSLT just isn’t for me. I like its functional aspects and the clean way XPath works to define the elements within a document, but I can’t see myself using XSLT’s verbose and clumsy syntax on a regular basis.

Enter lxml, a Python library that—at first glance, at least—combines XSLT’s power with Python’s syntax. Again, I’ll probably start out by reworking some old code

## PDF project notes

I’ve had this serverless wiki system for some time, and while it’s worked OK for keeping project notes at work, it’s been too limited. Some common types of notes don’t fit in an HTML environment:

• Most of my notes, especially those taken in the field when I’m out inspecting equipment, are handwritten.
• Much of the background information I use—product manuals, standards, and regulations—come as PDF files.
• Drawings and other documents sent to me by my clients is almost always in PDF form.

I’ve decided to shift to a PDF-based notes system. My handwritten notes can be scanned in2 with a few keywords added as annotation to make the pages searchable. The Markdown-formatted notes I do create directly on the computer can be turned into PDFs via Marked. And all of these notes can be combined into a single PDF by PDFpenPro, with annotations and links between pages as appropriate.

Not only is PDF a better format to combine the disparate notes I collect, it will also help on those occasions when my clients need me to send them a copy of “my file.” Depending on the size of the file, I can pop it into an email, share it on Dropbox, or burn it to a disk and FedEx it.

Although some of the work of creating a useful PDF of notes will have to be done by hand, a lot of it can be automated through Folder Actions and AppleScript. This might also be a good time to finally give Hazel a try.

(I should mention here that Gabe’s recent post on using PDFs on an iPad fits in very well with my prospective PDF notes system—or it will in a year or so when I bow to the inevitable and get an iPad. I have it bookmarked and InstaPaper’d for later study.)

The great thing about writing out your thoughts is that the process of writing forces you to clarify. It’s obvious now that I need to work first on getting a PDF notes system up and running. It’s also obvious that this can start with a lot of “by hand” operations that don’t get automated until I’ve had more experience putting such notes together—the automation is a convenience, not an essential element.

Deciding about Matplotlib is the second priority. My reports for work often include plots, and a better way to produce them would certainly make my life easier. Here, there is no “by hand”—Matplotlib is all programming, so I just have to dive in and start coding with the documentation at my side.

Finally, the lxml coding is clearly something to do when my time is more free. If I can’t muster more than three paragraphs about it, it can’t be all that important.

1. The Matplotlib folks don’t seem to have settled on a consistent form of capitalization. The home page sometimes has it all lowercase, and sometimes capitalized like a normal proper noun. I’m going with conventional capitalization here.

2. I don’t have much experience with this and expect to lean heavily on David Sparks’s wonderfully timed Paperless ebook.

## 4 Responses to “Irons in the fire”

1. sapporo says:

Very much looking forward to your comparison between Gnuplot and Matplotlib!

2. I’m looking forward to your Numpy/Scipy/Matplotlib posts. I’ve predominately used R for static visualizations, but the maturation of iPython1 and Pandas2 projects had moved me entirely into Python now. iPython has MathJax support and a special browser-based interactive notebook, which are really helpful. Pandas integrates with iPython and provides some key data structures and high performance data manipulations (sort, pivot, melt, etc.).

3. The SciPy Superpack is a great find. I wish this had been around when I struggled through the manual install. Like the other commenters, I’m looking forward to reading about your impressions of Matplotlib.

4. Josh says:

I’ll chime in as another voice looking forward to Matplotlib posts. I’m just trying it out as well (coming from R).