March 4th, 2012 at 10:27 pm by Dr. Drang
The other day I asked a new coworker to send me his vCard. I had an entry in my Address Book for him, but I wanted to make sure I had all his contact info. As I imported it into Address Book, I noticed that the various parts of his name were screwed up, so I opened the vCard in TextMate to see what was wrong.
VCards are just text files in a standard (or sometimes not-so-standard) format, so they’re pretty easy to read in a text editor. In this case, my coworker’s name (N) fields were arranged like this
N:John A. Doe;P.;E.;;
instead of the way it should be
With my usual prejudice, I blamed the mixup on Microsoft (he’s an Outlook user) and was about to move on when I noticed this entry,
which was followed by several lines that looked like this:
CAApACkDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAA AgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkK FhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWG h4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl 5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREA AgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYk NOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOE
I wonder what that is, I thought. If it’s a photo of him, why isn’t it given the standard PHOTO label? And if it isn’t his photo, what is it? It didn’t show up when I imported the vCard into Address Book.
I knew that most scripting languages have a library for encoding and decoding Base64, but I didn’t want to write a script, I just wanted a single command to decode the data.
The answer was the
base64 command1, which I found by typing
in the Terminal. The
apropos command is a great way to learn about other commands.
Base64 is an encoding designed to turn binary files into ASCII so they can be stored in text files and transmitted via protocols that expect text only. The encoded text file doesn’t incorporate every ASCII character; it’s restricted to just 64 of them.
There are different ways to choose the 64 characters of a Base64 representation, but the most common one uses the 10 digits, the 26 lowercase letters, the 26 uppercase letters,
/. Each character represents one 6-bit value, so it takes four Base64 characters to represent three bytes. The encoder goes through the binary file, reading in bytes in groups of three and writing out Base64 characters in groups of four. If the number of bytes in the binary file isn’t a multiple of three, one or two padding characters—usually
= signs—are put at the end of the Base64 encoding.
This JPEG image, for example,
can be encoded in Base64 through the command
base64 -b 60 -i snowman.jpg -o snowman.txt
which generates a text file with this content:
/9j/4AAQSkZJRgABAQAAAQABAAD/4QBARXhpZgAATU0AKgAAAAgAAYdpAAQA AAABAAAAGgAAAAAAAqACAAQAAAABAAAAY6ADAAQAAAABAAAAYwAAAAD/2wBD AAEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEB AQEBAQEBAQEBAQEBAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEB AQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/ wAARCABjAGMDAREAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQF . . [hundreds of lines of this stuff] . 6F+j9Th/YlCfJHnnieaUuVc0neW7tforrZ2uzxvSNJ037KtubOB4Zkj81JE8 0MWYAtmQsyscn5lIbJyDnmvwvEUKM4tzpQk3BXbim3/X567n9HQxFelj0qda pBKKdozla9ova9t0tNjy3VoIrfUb+GBPLiiucRoGbagbbkAEnAOTmvwjOacI ZvVUIqK56istrWk/0Wu+h/TWSValTJsvc5uTlTouTdrtuMbu5njv9P6ivIqf xY/4v/kjvo/x6n+Kr+olaHUf/9k=
-b 60 part of the command tells it to put a line break after every 60 characters. This makes the encoded text easier to deal with in a text editor.
Converting back to binary (decoding) is done through the same
base64 command, but with the
base64 -D -i snowman.txt -o snowman2.jpg
The command is smart enough to ignore the linefeeds.
The 4:3 ratio of the encoding can be seen in the file sizes:
$ ls -l snowman* -rw-r--r-- 1 drdrang staff 14915 Mar 4 20:43 snowman.jpg -rw-r--r-- 1 drdrang staff 20426 Mar 4 20:42 snowman.txt
The file size ratio is a bit bigger than 4:3 because of the newline characters introduced by the
-b 60 option, but it’s close. You can also see why the encoded version is padded with a single
= sign: 14,915 is one byte shy of being a multiple of three.
So back to my vCard questions. I copied the encoded data from my coworker’s vCard, made a new text file from it, and ran
base64 -D on it to reconstruct the original image file. Turned out to be our company logo. Kind of a letdown.