Range rage

I understand why Python’s range function works the way it does, and I usually use it correctly, but I still tend to mess up the three-parameter version. Even worse, I mess up its NumPy cousin, arange, which I find more useful than range itself, almost every time I use it. Today I decided to take action.

The root of the problem is list indices. Python inherited C’s zero-based indexing scheme. The first five items of an array named a are

a[0], a[1], a[2], a[3], a[4]

not

a[1], a[2], a[3], a[4], a[5]

as they would be in a language built for scientists and engineers, like, say Fortran.1

C does this because it’s close to the metal,2 and the index really represents an offset from the address of the start of the list. Thus the memory address of a[0] is the same as the address of a itself, a[1] is one away from the address of a, and so on.

I don’t know why Guido decided Python, which is decidedly not close to the metal, should use the same indexing scheme as C, but I suspect it has something to do with C being the mother tongue of most computer science types of his generation.

The list of numbers generated by range fits in with this zero-based mindset. The single-parameter version, range(5), returns

[0, 1, 2, 3, 4]

which are the indices of a five-element list. The default starting value of range is zero.

The two-parameter version allows you to set the starting value, so range(1, 5) returns

[1, 2, 3, 4]

which maintains the same end value. This is a little tricky, because the second parameter represents neither the end value nor the number of elements, but there is a consistency of sorts with the one-parameter version.

The three-parameter version allows you to set the step value, so

range(0, 10, 2)

returns

[0, 2, 4, 6, 8]

As with the one- and two-parameter versions, the second parameter, which the documentation calls the “stop” value, never appears in the list. To get 10 in the list, we have to use range(0, 11, 2) or range(0, 12, 2).

As I said, I usually get this wrong, but since I seldom use range, my cognitive deficiency doesn’t hurt me too often. I do, on the other hand, use the NumPy version, arange, quite often. When I want to plot a function over a uniformly spaced set of x values, arange is just the ticket.

Or it would be, if I didn’t keep mistaking the stop value for where the generated array actually stops. I can’t tell you how often I’ve written arange(0, 1, .1) and been disappointed when it creates

array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])

and doesn’t include the 1.

If you’re familiar with NumPy, you might think linspace would be my salvation. But while linspace does stop on the stop value, I still have an off-by-one issue with its third parameter, which I keep thinking should be the number of intervals, not the number of generated values. So I do linspace(0, 1, 10) and am disappointed when the result is

array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])

instead of

array([ 0. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1. ])

which requires linspace(0, 1, 11).

Today I decided to combat this problem by writing an array-generating function that works the way my brain does. It’s called fromtoby, and it always takes three parameters:

Here’s fromtoby.py:

 1:  #!/usr/bin/python
 2:  
 3:  from __future__ import division
 4:  from numpy import arange
 5:  
 6:  def fromtoby(f, t, b):
 7:    return arange(f, t + b/2, b)
 8:  
 9:  if __name__ == "__main__":
10:    print fromtoby(0, 1, .1)

By saving it in my $PYTHONPATH, I can

from fromtoby import fromtoby

and say things like

x = fromtoby(0, 1, .01)

to get x equal to

array([ 0.  ,  0.01,  0.02,  0.03,  0.04,  0.05,  0.06,  0.07,  0.08,
        0.09,  0.1 ,  0.11,  0.12,  0.13,  0.14,  0.15,  0.16,  0.17,
        0.18,  0.19,  0.2 ,  0.21,  0.22,  0.23,  0.24,  0.25,  0.26,
        0.27,  0.28,  0.29,  0.3 ,  0.31,  0.32,  0.33,  0.34,  0.35,
        0.36,  0.37,  0.38,  0.39,  0.4 ,  0.41,  0.42,  0.43,  0.44,
        0.45,  0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,
        0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,
        0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,
        0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,
        0.81,  0.82,  0.83,  0.84,  0.85,  0.86,  0.87,  0.88,  0.89,
        0.9 ,  0.91,  0.92,  0.93,  0.94,  0.95,  0.96,  0.97,  0.98,
        0.99,  1.  ])

which is, finally, exactly what I want on the first try.


  1. Don’t mock Fortran, especially if you never programmed in it. It’s a product of its time, and it’s still a valuable tool when raw numerical speed is of the essence. 

  2. Hi, Merlin! 


16 Responses to “Range rage”

  1. Jaosn says:

    As someone who uses R right now and am a few days into starting to learn Python, this drives me kind of bonkers. I don’t mind indexing starting at 0, but that the default functions are not some form of from-to-by, which makes a metric ton more sense to humans, is definitely enraging.

  2. Carl says:

    People often defend this behavior on the basis of a letter by Dykstra,\* but those people are nuts. This is confusing to newbies and it still manages to trip up experience programmers every once in a while.

    Lua is a very Python-like language with a starting point of 1 for arrays, but it doesn’t have the same depth of science and math tools. Also, it confuses together its array-type with its dictionary-type (a mistake also made by PHP), which leads to issues when you accidentally do use “0” as an index.

    \* http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

  3. Carl says:

    The comments box at this site is my nemesis.

    Funky Markdown rules + no preview button → comments with weird escaping errors.

  4. Dr. Drang says:

    If you want to make a link, Carl, just make a link.

  5. Ben says:

    This is a great idea. Embarrassingly, my default workflows for this sort of thing involve a lot of trial and error with print statements.

    Added to my secret scripts folder.

  6. Lauri Ranta says:

    In Ruby indexing starts from 0 and both ends of ranges are inclusive. There’s just no equivalent to range(5).

    p 1.upto(2).to_a # [1, 2]
    p Range.new(1, 2) # 1..2
    p (0.2..0.6).step(0.2).to_a # [0.2, 0.4, 0.6]
    
  7. Barron Bichon says:

    a) There is no way I will ever be able to read fromtoby as anything other than “from Toby”, and I thank Toby for providing me with the array I want.

    b) Yay Fortran!

  8. Dr. Drang says:

    Lauri,
    The upto method is one of the things I’m jealous of Ruby for, and it influenced the naming of fromtoby (it wasn’t because I had a friend named Toby, Barron). But there’s also a three-dot range, isn’t there? The one that excludes the endpoint? I’m not jealous of having to remember two range operators.

  9. Clark says:

    An other reason conceptually to prefer the C/Python form over the Fortran form is to think of the index as an addition. This makes a ton of sense in classic C where you’d often just increment a pointer to get array elements. Say *(string + 5) or the like. But it makes sense even for Python since you can think of an addition to get the other elements. So the first element would have an increment of 0.

    Some might say Python shouldn’t follow C but think of the slice operator in Python which arguably is doing exactly what that pointer increment does. So if I do string[:3] what am I doing conceptually? It’s really shorthand for string[0:3]. If you had a Fortran like array it would really make slices conceptually very weird.

  10. Dr. Drang says:

    To me, Clark, thinking of the index as addition is the same as thinking of it as as offset. Maybe there’s a subtlety I’m missing, but I don’t see any significant difference. And as you can see from the rest of the post, my real problem isn’t starting at zero, it’s ending before the “stop” value.

    And while I agree that having string[:3] as a synonym for string[1:3] would be a little weird, I think having string[1:3] mean “characters 1 through 3 of string” is perfectly natural.

    Again, I can see how the C conventions are useful for system programming and other types of programming. I just don’t think they’re great for numerical analysis, and I don’t think they’re natural for people who aren’t steeped in CS customs.

    This is, by the way, very similar to a complaint I have about scripting languages using integer division by default. Only CS types think 1/2 = 0; the rest of us think 1/2 = 0.5. Perl got this right from the beginning, Python only recently figured it out, and Ruby is still wrong.

  11. Clark says:

    Effectively you are right. There’s no difference between and offset and addition. I guess I didn’t phrase things well. (Sorry - home sick. I hate to think what I’ll think tomorrow of what I’m typing today. Normally I’m much wiser than to write while sick)

    Perhaps thinking of the negative slicing makes more sense. After I hit send I realized that would have been a better example.

    What should we expect string[1:-1] to give us? As soon as you allow negative slicing you really want a symmetry to positive slicing. The point really just is that if you think of slicing as offsets (or addition) rather than selecting countable members then it makes a ton more sense.

    With regards to integer division I can see the point. Once again put in context it makes a lot of sense as integer division is usually used in practice along with the modulus operator. So 5/2 = 2 and 5%2 = 1. That makes handling bits and the like much, much easier. I think the issue is that people want numbers to act like floats and not integers. Which I agree. But in programming where there is such a different utility to each I understand why that is. (BTW - did Python 3 change something with regards to division? Under 2.x it functions the way I expect)

    My point was more that this wasn’t just an arbitrary CS convention but there were very practical reasons for this. Unlike say the charge of an electron which was arbitrary and chosen very unfortunately.

    I agree about data analysis. Which is why many systems, like the Fortran inspired IDL, assume floats most of the time. I do think that the Python styled slicing is pretty useful in data analysis. Back in the day when I was still doing that sort of thing (and before all these newfangled free tools were available) I still thought in offsets most of the time rather than counts ala a database. That’s often because values corresponded to real values and what I cared about were the real values and less their database oriented index. So even then I often got frustrated with how IDL did things. (In fact my first job at LANL was writing an interpretive language inspired by SQL, Fortran, and C that would hide most of the back end database stuff from the scientists analyzing our X-ray data)

  12. Josh says:

    Clark, in Python 3 the division of two integers will result in a float, unlike the integer division in Python 2.x. You can bring this behavior into Python 2.x by using “from future import division” as in the fromtoby.py example in the post.

    In Python 3 (and Python 2.x with the “from future..” statement) “//” performs integer division.

    Python 2.7.2 (default, Jun 20 2012, 16:23:33) [GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin Type “help”, “copyright”, “credits” or “license” for more information.

    3/2 1 from future import division 3/2 1.5 3//2 1

  13. Josh says:

    Wow, Markdown hammered that comment. Next time I’ll remember to use this and preview before posting.

  14. Barron Bichon says:

    Josh, I’d never seen that before. Very helpful. Thanks.

  15. Carl says:

    But you can’t trust that dingus. Whatever version of Markdown Dr. Drang had server side, it doesn’t do the right thing to backslash plus asterisk. That should just be an asterisk after conversion according to the dingus, but for whatever reason, his server Markdown leaves it with the backslash.

  16. Dr. Drang says:

    Carl,
    The Markdown used here is PHPMEM, Michel Fortin’s PHP Markdown Extra with a couple of additions by me to handle MathJax equations. It may be the first Markdown port, and it’s of excellent quality. Why your backslashed asterisk (and again, as a stylistic matter, you should use links for linking, not footnotes) got messed up is something of a mystery to me, but I doubt it’s the fault of PHP Markdown Extra.

    If I had to lay money, I’d say its some unholy WordPress problem. I have never understood what WordPress does with comments, but I’m certain it doesn’t just hand them off to markdown.php. I once tried to follow the chain of processes that deal with comments and got lost and scared. I vowed never to go there again.

    If, however, a WordPress expert can give me a little guidance, I’d be happy to try to clean up the processing path to make comments work as predictably as possible.