June 21st, 2012 at 10:43 pm by Dr. Drang
Recently, I’ve had to inspect, measure, or in some way test sets of devices taken randomly from some larger population. I’ve been using Python and its
random library to make the choices for me.1 The library makes these scripts very easy to write.
Say I have a hundred devices, each identified by a unique serial number, and I want to run a test on ten of them. This is how I choose the ten:
python: 1: #!/usr/bin/python 2: 3: from random import sample, seed 4: 5: # Setting the seed like this will give the same set of samples every 6: # time the script is run. Omitting this line will give a different 7: # set every time the script is run. 8: seed(1) 9: 10: # This is where I'd enter the real serial numbers. For illustration, 11: # I'm just using 0-99 with some extra leading zeros. 12: devices = map(lambda x: "%05d" % x, range(300, 400)) 13: 14: testUnits = sample(devices, 10) 15: 16: print '\n'.join(testUnits)
The most difficult part is entering the list of serial numbers. I’m usually given the list in a spreadsheet, so I copy it out of there, paste it into my script, and use a regex find/replace to turn it from a column of numbers into a comma-separated Python list of strings. I don’t want to waste your time with stuff like that here, so I just set the list of serial numbers to 00300 to 00399 in Line 12.
I don’t usually set the seed (Line 8), but in more complicated scripts that rely on random number generation or sampling, setting the seed can be very helpful for debugging because it generates the same set every time the script is run. You can keep the
seed line in the script until you know it’s running correctly, then comment it out (or change its argument) to generate a new set of values. The argument to
seed can be any immutable Python object: numbers are probably the most commonly used seeds, but you can also use strings:
python: 8: seed('corn')
The key line is Line 14, which uses the aptly named
sample function to draw a random subset of items from the given list. The output is
00313 00384 00376 00325 00349 00344 00365 00378 00309 00302
As a practical matter, I sometimes generate a few more samples than I plan to test. I do this only if I suspect that some of the devices I’ve been given can’t be tested for one reason or another. If a device in the list turns out to be untestable, it’s nice to have an extra serial number or two to use as a random replacement.