Sampling

June 21, 2012 at 10:43 PM by Dr. Drang

Recently, I’ve had to inspect, measure, or in some way test sets of devices taken randomly from some larger population. I’ve been using Python and its random library to make the choices for me.¹ The library makes these scripts very easy to write.

Say I have a hundred devices, each identified by a unique serial number, and I want to run a test on ten of them. This is how I choose the ten:

python:
 1:  #!/usr/bin/python
 2:  
 3:  from random import sample, seed
 4:  
 5:  # Setting the seed like this will give the same set of samples every
 6:  # time the script is run. Omitting this line will give a different
 7:  # set every time the script is run.
 8:  seed(1)
 9:  
10:  # This is where I'd enter the real serial numbers. For illustration,
11:  # I'm just using 0-99 with some extra leading zeros.
12:  devices = map(lambda x: "%05d" % x, range(300, 400))
13:  
14:  testUnits = sample(devices, 10)
15:  
16:  print '\n'.join(testUnits)

The most difficult part is entering the list of serial numbers. I’m usually given the list in a spreadsheet, so I copy it out of there, paste it into my script, and use a regex find/replace to turn it from a column of numbers into a comma-separated Python list of strings. I don’t want to waste your time with stuff like that here, so I just set the list of serial numbers to 00300 to 00399 in Line 12.

I don’t usually set the seed (Line 8), but in more complicated scripts that rely on random number generation or sampling, setting the seed can be very helpful for debugging because it generates the same set every time the script is run. You can keep the seed line in the script until you know it’s running correctly, then comment it out (or change its argument) to generate a new set of values. The argument to seed can be any immutable Python object: numbers are probably the most commonly used seeds, but you can also use strings:

python:
 8:  seed('corn')

The key line is Line 14, which uses the aptly named sample function to draw a random subset of items from the given list. The output is

As a practical matter, I sometimes generate a few more samples than I plan to test. I do this only if I suspect that some of the devices I’ve been given can’t be tested for one reason or another. If a device in the list turns out to be untestable, it’s nice to have an extra serial number or two to use as a random replacement.

Yes, this is related to the confidence limits calculations I did in the SciPy v. Octave post of a few days ago. But random is part of the Standard Python Library—no need for SciPy. ↩

And now it’s all this

I just said what I said and it was wrong
Or was taken wrong

Sampling

Site search

Meta

Recent posts

Credits

And now it’s all this

I just said what I said and it was wrong Or was taken wrong

Sampling

Site search

Meta

Recent posts

Credits

I just said what I said and it was wrong
Or was taken wrong