Timing a Pythonista endeavor

I’ve known about John D. Cook’s blog, The Endeavor, for quite a while, but for some reason I didn’t subscribe to its RSS feed until recently. Now that I’ve corrected that omission, it takes every bit of strength I have to keep from responding to each of his posts. They’re all interesting and well-written.

In this post yesterday morning, he asked this question,

Does the base 10 expansion of 2^n always contain the digit 7 if n is large enough?

and wrote a little Python program that suggested the answer may be “yes” for [n > 72]. The program isn’t a proof, of course, it just tests up to [n = 10,000].

The program was, I thought, a little odd in that he created a set of all the digits in [2^n] rather than just doing a string comparison. I thought it worth looking into the relative speeds of doing it each way.1 Because my MacBook Air was all the way on the other side of the room, I decided this was a good chance to give Pythonista a try.

Here’s my first crack at the comparison. I copied the code from John’s post, pasted it into a new Pythonista script, and added some new functionality.

python:
 1:  #!/usr/bin/python
 2:  
 3:  def digits(n):
 4:    s = set()
 5:    while n > 0:
 6:      s.add(n%10)
 7:      n /= 10
 8:    return s
 9:  
10:  def numerals(n):
11:    return '%d' % n
12:  
13:  for i in range(71, 2000):
14:    p = 2**i
15:    if 7 not in digits(p):
16:      print i, p
17:  
18:  for i in range(71, 2000):
19:    p = 2**i
20:    if '7' not in numerals(p):
21:      print i, p

John’s original code had only the digits function and the loop in Lines 13-16. Also, his range in Line 13 had an upper limit of 10,000, not 2000; I lowered it to get reasonable run times on my iPhone 5.

As I said, it seemed more natural to me to turn the number [2^n] into a string and look for instances of the character 7. My function numerals and the loop in Lines 18-21 solved the problem that way.

I should mention at this point that although Pythonista can run code that uses either spaces or tabs for indentation, code that’s written in Pythonista uses tabs. There is, at present, no way to tell the Pythonista editor to insert spaces when the Tab key is tapped. If you’re writing code from scratch, this is no problem, but because my script was a mix of pasted code (that used spaces) and original code (that used tabs), it threw indentation errors until I went in and replaced all my tabs with spaces.

I had a brief Twitter discussion about this with Pythonista’s developer, Ole Zorn, and he acknowledged the problem. I hope he adds the ability to auto-expand tabs, not just because I prefer spaces, but also because pasting and editing is probably a pretty common use case.

Once I got my script running, it was immediately clear that the string method was much faster than the set method. But how much faster? I decided to give the standard timeit library a whirl. After fighting a bit with timeit’s so-called convenience functions (which I’ll discuss in a bit), I ended up with this script:

python:
 1:  #!/usr/bin/python
 2:  
 3:  from timeit import timeit
 4:  
 5:  def digits(n):
 6:    s = set()
 7:    while n > 0:
 8:      s.add(n%10)
 9:      n /= 10
10:    return s
11:  
12:  def numerals(n):
13:    return '%d' % n
14:  
15:  def time_digits(n):
16:    for i in range(72, n):
17:      p = 2**i
18:      if 7 not in digits(p):
19:        print i, p
20:  
21:  def time_numerals(n):
22:    for i in range(72, n):
23:      p = 2**i
24:      if '7' not in numerals(p):
25:        print i, p
26:  
27:  for i in range(1, 5):
28:    print i*1000
29:    t = timeit('time_digits(i*1000)',
30:               'from __main__ import time_digits, i',
31:               number=1)
32:    print 'Digits: %f' % t
33:    t = timeit('time_numerals(i*1000)',
34:               'from __main__ import time_numerals, i',
35:               number=1)
36:    print 'Numerals: %f' % t
37:    print

The results were

1000
Digits: 1.845313
Numerals: 0.054590

2000
Digits: 8.365773
Numerals: 0.238395

3000
Digits: 21.571197
Numerals: 0.653563

4000
Digits: 43.310401
Numerals: 1.419421

which shows that the string method is more than an order of magnitude faster than the set method. This didn’t surprise me, there’s a lot of work being done in the digits function.

Just for fun, I later ran the same script on my 2010 MacBook Air. The results were

1000
Digits: 0.233390
Numerals: 0.008770

2000
Digits: 1.266341
Numerals: 0.031867

3000
Digits: 3.823505
Numerals: 0.078629

4000
Digits: 8.306719
Numerals: 0.160169

which is less than an order of magnitude faster than the iPhone 5. I choose to see this as evidence that I have a fast phone, not that I have a slow laptop.

What I disliked about the timeit function was that I needed to include the from __main__ import code chunks as the second argument. When I first tried this script, I didn’t include those arguments and got errors like

NameError: global name 'time_digits' is not defined

I found the answer in this Stack Overflow discussion, but I wasn’t happy with it. The documentation refers to timeit as a “convenience function,” but there’s nothing convenient about having to import the names of all your global variables and functions whenever you want to time them. It’s completely non-intuitive to have to import from __main__ when I’m already in __main__.

It’s not that I don’t understand the reasons behind it—I just think the library should take care of the importing for me when I’m using a convenience function in what must be the most common case: timing a function at the top level. This, like the clumsy way the subprocess module works, is a case of Python being too Pythonic for its own good.

In summary:


  1. Looking at the post’s comments today, I see that I wasn’t the only one to consider this.