Python and fileinput

In last week’s post about Awk, I mentioned that my Perl has gotten rusty through lack of use. Considering how much Perl code I wrote back in the late ’90s and early ’00s, its surprising how clumsy I’ve become with it. I can still read and tweak older programs, but writing a new Perl script from scratch is pretty much impossible. It just doesn’t flow anymore.

Which is fine. I’m happy with Python and prefer it overall. But there are certain aspects of Perl that are very nice, and when I see them in my old code I miss them.

Mostly, I miss the while(<>) construct. Like many things in Perl, this does different things under different circumstances, but it almost always does exactly what you want it to. It reads sequential lines of text, which is about as prosaic as you can get, but there’s magic in where it gets the text.

Let’s use this little program, which we’ll call initial.pl, as an example:

perl:
#!/usr/bin/perl
while(<>) {
  print substr($_,0,1) . "\n";
}

It prints out the first letter of every line. If we make it executable call it from the command line this way

initial.pl /usr/share/dict/words

it’ll get its lines from the spelling dictionary. If we call it this way

initial.pl /usr/share/dict/words /etc/passwd

it’ll get its lines first from the spelling dictionary and then from the password file, as if the two files were concatenated into one. If we call it in a pipeline, like this

ls | initial.pl

it knows to get its lines from standard input rather than open a file.

In short, the while(<>) construct is ideally suited for making Unix command line tools.

The Python language itself doesn’t have a file reading feature that can figure out from context where its input is coming from. The usual file reading examples you see in Python tutorials look like this:

python:
#!/usr/bin/python
for line in open('/usr/share/dict/words'):
  print line[0]

This, obviously, isn’t as flexible as the Perl version,1 because we have the file name hard-coded into the source. Here’s a better way:

python:
#!/usr/bin/python
import sys
for line in open(sys.argv[1]):
  print line[0]

Now it reads the input file name from the command line, so we can call

initial.py /usr/share/dict/words

or

initial.py /etc/password  

and it will work. Of course, it won’t work if you try to pass it both files, nor can it handle standard input. To get those features you have to add a much more Byzantine scaffolding of logic.

Or simply use the fileinput module. This is a curiously underpromoted part of the Python Standard Library that gives you all the magic of Perl’s while(<>). Rewriting initial.py like this

python:
#!/usr/bin/python
import fileinput
for line in fileinput.input():
  print line[0]

we get a program that works exactly like initial.pl.

The fileinput module has several useful functions in addition to input(). There’s lineno(), which acts like the Perl special variable $., giving you the cumulative line number of the last line read. It also has filename(), filelineno(), isfirstline(), and isstdin(), which all do exactly what you would guess from their names.

The fileinput module has been part of the Standard Library (which means it’s a library that comes with Python) for ages, but it doesn’t get the publicity it deserves. It is featured in one of Doug Hellmann’s Python Module of the Week articles, but it’s not in Guido’s Python Tutorial despite how useful it is. In most Python books I’ve seen, it’s hidden away in that long list of libraries that come with Python. It’s not until you see it in action that you appreciate all the work it’s doing for you.

I confess I haven’t used fileinput nearly as often as I should have, often hard-coding filenames into my scripts and then changing the code when I need to run it against a different input. I hope this post shames me into using it as much as I used to use while(<>).


  1. Nor does it have the kind of error handling that a well-written program would have. I didn’t want to clutter it up with code that isn’t germane to our topic.