Formating deposition transcripts
August 18th, 2005
One of the odd things I do on a regular basis is read deposition transcripts (not for fun, it’s part of my job). A deposition is testimony given under oath prior to trial; like trial testimony, it’s recorded by a court reporter on a stenotype machine (court reporters are also known as stenographers). I prefer having the transcripts emailed to me in plain text format so I can quickly search for words or phrases, but there’s nothing like having a printed copy when you need to read a transcript from start to finish.
Most stenographers will provide a printed version of the transcript in a condensed form, usually called a miniscript or some clever (and probably trademarked) variation on that term. By “condensed” I don’t mean zipped or gzipped or bzipped, I mean printed 4-up, like this:
+--------+--------+
| | |
| | |
| Page 1 | Page 3 |
| | |
| | |
+--------+--------+
| | |
| | |
| Page 2 | Page 4 |
| | |
| | |
+--------+--------+
The pages are printed on both sides of the paper, so each sheet contains 8 pages of transcript, a nice savings of paper.
For some time, I’ve been making my own miniscripts from the text transcripts by
inserting troff commands into the files;
running the files through groff, the GNU version of troff, to create a PostScript file; and
running that PostScript file through psnup to create a 4-up version.
The second and third steps are purely mechanical, requiring no thought, but the first can be tricky. Deposition transcripts are done in many styles: sometimes they’re double-spaced, sometimes not; sometimes the page numbers are at the bottoms of the pages at the right margin, sometimes they’re at the top and flush left; sometimes there are several blank spaces at the beginning of each line, sometimes not. I wrote some notes to myself to help with the process, but it always seemed more time consuming than it should be.
(What is consistent from stenographer to stenographer is that the pages and lines of a transcript are always numbered so testimony can be cited with precision. Whatever stylistic transformations may be done, these numbers must be preserved.)
Late last week, I decided the time had come to automate the process in a Perl program. Some design decisions were easy:
Blank lines (or lines with only whitespace) would be eliminated, making every transcript single-spaced.
A fixed number of leading spaces would be chopped from each line, the number coming from the user as a command-line option.
These were simple, each requiring only one line of code (plus a few
lines to handle the command line). The user (me) would know how many
spaces to cut by a quick examination of the file. But handling the
tops and bottoms of the page was more difficult. Troff needs a command
(.bp) to break a page, and it seemed like every
deposition I ran into needed a different rule for where to put that
command.
Ultimately, my model for handling the page boundaries was Larry Wall’s rename script. It was in the pink version of Programming Perl, but has since been moved to the Perl Cookbook. The genius of this program was that it took advantage of the user’s knowledge of Perl to adapt itself to changing conditions. My program has a somewhat dumbed-down version of this flexibility.
Here it is.
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Std;
my $usage = <<'USAGE';
dep24up -- Create a 4-up PostScript version of a deposition transcript
so it can be printed out nicely.
usage: dep24up [options] file
options:
-h : print this message
-b n : number of blank spaces to remove from beginning
of each line (default: 0)
-p sss : Perl regexp that defines a page boundary
(default: '^(\s{10,}(Page\s+)*(\d+))\s*$')
-e : page boundary regexp is at end of page, rather than
at beginning
-d : create a troff version of the transcript for
debugging, but don't process it into PostScript
notes:
Page boundaries are processed after blank lines and initial
blank spaces are removed. Only $1 from the -p regexp is
preserved. The output file has the same base name as the
input file, with a '-4up.ps' extension (or '.rf' if the -d
option is used).
USAGE
# Handle command line
my %opt;
getopts('b:dehp:', \%opt);
my $file = shift;
die $usage if ($opt{h} || ! $file);
my $blanks = $opt{b} || 0;
my $page = $opt{p} || q(^(\s{10,}(Page\s+)*(\d+))\s*$);
my $atend = $opt{e};
my $debug = $opt{d};
# Slurp in whole file
open(TS, $file) or die "No $file: $!\n";
undef $/;
my $ts = <TS>; # ts = transcript
# Simple cleanup
$ts =~ s/\n(\s*\n)+/\n/g; # weed out blank lines
$ts =~ s/^ {$blanks}//mg; # strip beginning blank spaces
# Add codes at page boundaries
if ($atend) {
$ts =~ s/$page/$1\n.bp\n\.sp |.5i/mg; # $1 goes before .bp
} else {
$ts =~ s/$page/.bp\n\.sp |.5i\n$1/mg; # $1 goes after .bp
$ts =~ s/\.bp\n\.sp \|\.5i\n//; # delete inadvertent .bp before 1st page
}
# Add overall formatting commands at beginning
my $prolog = <<'PROLOG';
.ft BMR
.ps 18
.vs 28
.po .5i
.ll 7.5i
.sp |.5i
.nf
.na
PROLOG
$ts = $prolog . $ts;
# Output
my ($base, undef) = split(/\./, $file, 2);
if ($debug) {
open OUT, "> $base.rf" or
die "Can't open $base.rf for writing: $!\n";
} else {
open OUT, "| groff | psnup -4 -c -q -m18 > $base-4up.ps" or
die "Can't run pipeline: $!\n";
}
print OUT $ts;
I think the code is pretty straightforward Perl. There’s a header comment for each section of the program; only a few lines seemed to merit their own comments. Here are a few additional notes.
I like having a usage string at the beginning of my programs; it documents the program like an introductory comment, and pulls double duty as the help message.
Most of my utility programs written in Perl—and all of the recent ones—have a command-line handling section very much like this one. I prefer single-dash options and always use -h for help. My choice for the default page boundary regexp was based on what I thought would handle most cases I’ll run into. My guess is that this will be the part of the code that will need the most tweaking as I get experience using the program.
Redefining the input record separator,
$/, to change the way file input operator,<>, is a standard Perl idiom, and I didn’t think it was worthy of comment. I did a reminder that “ts” and “TS” were short for “transcript” would be helpful if/when I need to revise the code.Once I had decided to turn the page boundary definition into a pair of a user-defined options, the section that adds the troff page-break commands pretty much wrote itself. The
.sp |.5ipart puts a half-inch margin at the top of each page.The prologue sets the font to Bookman (BMR = Bookman Roman), a font that I associate with children’s books. It’s standard on PostScript printers and is legible at small sizes. It starts at 18 points (that’s the
.ps 18), but will end up less than 9 points after the 4-up reduction. The 28-point leading (.vs 28) gives me something between single- and double-spacing which I have found to be easy to read. The.nfand.nacommands turn off “filling” and “adjusting,” which is necessary to preserve the numbered-line structure of the transcript.The output file is determined by the presence or absence of the -d option. I suppose I could have used the File::Basename module, but it seemed like overkill for this program. The options to psnup tell it to do a 4-up transformation (-4) with column-major page numbering (-c) and an extra 18 points of margin (-m 18). Normally, psnup spits out the page numbers of the new file as they are created, but -q suppresses that.
With a transcript file named depo.txt, a command of
dep24up -b 8 -p '^(\d{4})\s*$' depo.txt
will strip the first 8 spaces from each line and recognize the top of
a page as being a sequence of four digits, flush left. The output file
will be named depo-4up.ps.



