November 24th, 2012 at 3:58 pm by Dr. Drang
I’ve known about John D. Cook’s blog, The Endeavor, for quite a while, but for some reason I didn’t subscribe to its RSS feed until recently. Now that I’ve corrected that omission, it takes every bit of strength I have to keep from responding to each of his posts. They’re all interesting and well-written.
In this post yesterday morning, he asked this question,
Does the base 10 expansion of 2^n always contain the digit 7 if n is large enough?
and wrote a little Python program that suggested the answer may be “yes” for [n > 72]. The program isn’t a proof, of course, it just tests up to [n = 10,000].
The program was, I thought, a little odd in that he created a set of all the digits in [2^n] rather than just doing a string comparison. I thought it worth looking into the relative speeds of doing it each way.1 Because my MacBook Air was all the way on the other side of the room, I decided this was a good chance to give Pythonista a try.
Here’s my first crack at the comparison. I copied the code from John’s post, pasted it into a new Pythonista script, and added some new functionality.
python: 1: #!/usr/bin/python 2: 3: def digits(n): 4: s = set() 5: while n > 0: 6: s.add(n%10) 7: n /= 10 8: return s 9: 10: def numerals(n): 11: return '%d' % n 12: 13: for i in range(71, 2000): 14: p = 2**i 15: if 7 not in digits(p): 16: print i, p 17: 18: for i in range(71, 2000): 19: p = 2**i 20: if '7' not in numerals(p): 21: print i, p
John’s original code had only the
digits function and the loop in Lines 13-16. Also, his
range in Line 13 had an upper limit of 10,000, not 2000; I lowered it to get reasonable run times on my iPhone 5.
As I said, it seemed more natural to me to turn the number [2^n] into a string and look for instances of the character
7. My function
numerals and the loop in Lines 18-21 solved the problem that way.
I should mention at this point that although Pythonista can run code that uses either spaces or tabs for indentation, code that’s written in Pythonista uses tabs. There is, at present, no way to tell the Pythonista editor to insert spaces when the Tab key is tapped. If you’re writing code from scratch, this is no problem, but because my script was a mix of pasted code (that used spaces) and original code (that used tabs), it threw indentation errors until I went in and replaced all my tabs with spaces.
I had a brief Twitter discussion about this with Pythonista’s developer, Ole Zorn, and he acknowledged the problem. I hope he adds the ability to auto-expand tabs, not just because I prefer spaces, but also because pasting and editing is probably a pretty common use case.
Once I got my script running, it was immediately clear that the string method was much faster than the set method. But how much faster? I decided to give the standard
timeit library a whirl. After fighting a bit with
timeit’s so-called convenience functions (which I’ll discuss in a bit), I ended up with this script:
python: 1: #!/usr/bin/python 2: 3: from timeit import timeit 4: 5: def digits(n): 6: s = set() 7: while n > 0: 8: s.add(n%10) 9: n /= 10 10: return s 11: 12: def numerals(n): 13: return '%d' % n 14: 15: def time_digits(n): 16: for i in range(72, n): 17: p = 2**i 18: if 7 not in digits(p): 19: print i, p 20: 21: def time_numerals(n): 22: for i in range(72, n): 23: p = 2**i 24: if '7' not in numerals(p): 25: print i, p 26: 27: for i in range(1, 5): 28: print i*1000 29: t = timeit('time_digits(i*1000)', 30: 'from __main__ import time_digits, i', 31: number=1) 32: print 'Digits: %f' % t 33: t = timeit('time_numerals(i*1000)', 34: 'from __main__ import time_numerals, i', 35: number=1) 36: print 'Numerals: %f' % t 37: print
The results were
1000 Digits: 1.845313 Numerals: 0.054590 2000 Digits: 8.365773 Numerals: 0.238395 3000 Digits: 21.571197 Numerals: 0.653563 4000 Digits: 43.310401 Numerals: 1.419421
which shows that the string method is more than an order of magnitude faster than the set method. This didn’t surprise me, there’s a lot of work being done in the
Just for fun, I later ran the same script on my 2010 MacBook Air. The results were
1000 Digits: 0.233390 Numerals: 0.008770 2000 Digits: 1.266341 Numerals: 0.031867 3000 Digits: 3.823505 Numerals: 0.078629 4000 Digits: 8.306719 Numerals: 0.160169
which is less than an order of magnitude faster than the iPhone 5. I choose to see this as evidence that I have a fast phone, not that I have a slow laptop.
What I disliked about the
timeit function was that I needed to include the
from __main__ import code chunks as the second argument. When I first tried this script, I didn’t include those arguments and got errors like
NameError: global name 'time_digits' is not defined
I found the answer in this Stack Overflow discussion, but I wasn’t happy with it. The documentation refers to
timeit as a “convenience function,” but there’s nothing convenient about having to import the names of all your global variables and functions whenever you want to time them. It’s completely non-intuitive to have to import from
__main__ when I’m already in
It’s not that I don’t understand the reasons behind it—I just think the library should take care of the importing for me when I’m using a convenience function in what must be the most common case: timing a function at the top level. This, like the clumsy way the
subprocess module works, is a case of Python being too Pythonic for its own good.
- The Endeavor: good.
- Pythonista: good, but needs to auto-expand tabs.
timeitlibrary: frustrating. Needs a Kenneth Reitz makeover.
Looking at the post’s comments today, I see that I wasn’t the only one to consider this. ↩