Base64

The other day I asked a new coworker to send me his vCard. I had an entry in my Address Book for him, but I wanted to make sure I had all his contact info. As I imported it into Address Book, I noticed that the various parts of his name were screwed up, so I opened the vCard in TextMate to see what was wrong.

VCards are just text files in a standard (or sometimes not-so-standard) format, so they’re pretty easy to read in a text editor. In this case, my coworker’s name (N) fields were arranged like this

N:John A. Doe;P.;E.;;


instead of the way it should be

N:Doe;John;A.;;P.E.


With my usual prejudice, I blamed the mixup on Microsoft (he’s an Outlook user) and was about to move on when I noticed this entry,

X-MS-CARDPICTURE;TYPE=JPEG;ENCODING=BASE64:


which was followed by several lines that looked like this:

CAApACkDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAA
AgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkK
FhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWG
h4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl
5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREA
AgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYk
NOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOE


I wonder what that is, I thought. If it’s a photo of him, why isn’t it given the standard PHOTO label? And if it isn’t his photo, what is it? It didn’t show up when I imported the vCard into Address Book.

I knew that most scripting languages have a library for encoding and decoding Base64, but I didn’t want to write a script, I just wanted a single command to decode the data.

The answer was the base64 command1, which I found by typing

apropos base64


in the Terminal. The apropos command is a great way to learn about other commands.

Base64 is an encoding designed to turn binary files into ASCII so they can be stored in text files and transmitted via protocols that expect text only. The encoded text file doesn’t incorporate every ASCII character; it’s restricted to just 64 of them.

There are different ways to choose the 64 characters of a Base64 representation, but the most common one uses the 10 digits, the 26 lowercase letters, the 26 uppercase letters, +, and /. Each character represents one 6-bit value, so it takes four Base64 characters to represent three bytes. The encoder goes through the binary file, reading in bytes in groups of three and writing out Base64 characters in groups of four. If the number of bytes in the binary file isn’t a multiple of three, one or two padding characters—usually = signs—are put at the end of the Base64 encoding.

This JPEG image, for example,

can be encoded in Base64 through the command

base64 -b 60 -i snowman.jpg -o snowman.txt


which generates a text file with this content:

/9j/4AAQSkZJRgABAQAAAQABAAD/4QBARXhpZgAATU0AKgAAAAgAAYdpAAQA
AAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB
AQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEB
AQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/
wAARCABjAGMDAREAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQF
.
.  [hundreds of lines of this stuff]
.
6F+j9Th/YlCfJHnnieaUuVc0neW7tforrZ2uzxvSNJ037KtubOB4Zkj81JE8
0MWYAtmQsyscn5lIbJyDnmvwvEUKM4tzpQk3BXbim3/X567n9HQxFelj0qda
pBKKdozla9ova9t0tNjy3VoIrfUb+GBPLiiucRoGbagbbkAEnAOTmvwjOacI
ZvVUIqK56istrWk/0Wu+h/TWSValTJsvc5uTlTouTdrtuMbu5njv9P6ivIqf
xY/4v/kjvo/x6n+Kr+olaHUf/9k=


The -b 60 part of the command tells it to put a line break after every 60 characters. This makes the encoded text easier to deal with in a text editor.

Converting back to binary (decoding) is done through the same base64 command, but with the -D option:

base64 -D -i snowman.txt -o snowman2.jpg


The command is smart enough to ignore the linefeeds.

The 4:3 ratio of the encoding can be seen in the file sizes:

\$ ls -l snowman*
-rw-r--r--  1 drdrang  staff  14915 Mar  4 20:43 snowman.jpg
-rw-r--r--  1 drdrang  staff  20426 Mar  4 20:42 snowman.txt


The file size ratio is a bit bigger than 4:3 because of the newline characters introduced by the -b 60 option, but it’s close. You can also see why the encoded version is padded with a single = sign: 14,915 is one byte shy of being a multiple of three.

So back to my vCard questions. I copied the encoded data from my coworker’s vCard, made a new text file from it, and ran base64 -D on it to reconstruct the original image file. Turned out to be our company logo. Kind of a letdown.

1. Normally, when I reference a command line utility here, I link to its man page entry on Apple’s site. Weirdly, there is no man page for base64; the entry under that name is for the Tcl package, not the command. You’ll have to type man base64 in the Terminal to see how it works.