Loops and arrays

I mentioned in yesterday’s post that Bruce Ediger solved the conditional probability problem using Monte Carlo simulation. This is when you use a random number generator to numerically simulate many repetitions of the problem and track the conditions of interest.

Ediger wrote his program in Go and posted it on GitHib. It’s pretty straightforward code, and you don’t need to know Go (I don’t) to understand how it works. I wrote a similar script in Python:

python:
 1:  #!/usr/bin/env python3
 2:  
 3:  import numpy as np
 4:  import sys
 5:  
 6:  # Initialize
 7:  N = int(sys.argv[1])
 8:  rng = np.random.default_rng()
 9:  pA = .75
10:  pB = .25
11:  
12:  # Generate N pairs of uniform random variates and count
13:  onlyOne = 0
14:  onlyA = 0
15:  for i in range(N):
16:    A = rng.random()
17:    B = rng.random()
18:    if A < pA:
19:      if B >= pB:
20:        onlyA += 1
21:        onlyOne += 1
22:    else:
23:      if B < pB:
24:        onlyOne += 1
25:  
26:  # Estimate probability that if only one hits, it's A
27:  p = onlyA/onlyOne
28:  
29:  print(f'Probability estimate: {p:.2%} ({onlyA:,d}/{onlyOne:,d})')

The variable N, which is passed as the first argument to the script (see Line 7), is the number of simulations; it’s the number of times we go through the loop that starts on Line 15. The varaiables onlyOne and onlyA keep track of, respectively, the number of simulations in which only one shooter hits the target and the number in which only Shooter A hits the target. Within each pass through the loop, random variables A and B are set via the random function, which generates a uniform random floating point number in the half-open interval [0, 1).1 The conditions of interest are tracked by Lines 18–24, where Shooter A hits the target if A is less than pA (0.75) and Shooter B hits the target if B is less than pB (0.25).

Finally, the probability we want is estimated by the division on Line 27 and returned as a percentage on Line 29. Overall, a pretty simple script if you’re used to doing this sort of thing.

This script is called monte-carlo-loop.py, and I ran it inside the time command to get both the probability estimate and the running time.

$ time python monte-carlo-loop.py 1_000_000
Probability estimate: 90.01% (562,294/624,714)

real    0m0.499s
user    0m0.402s
sys     0m0.033s

That’s a million simulations. Python disregards underscore characters in numbers, so I used 1_000_000 as the argument instead of 1000000 to make it easier to read. The script took about half a second to run on my M4 MacBook Pro and got, for all practical purposes, the theoretically correct answer of 90%.

Note that I’m using a loop to go through each simulation. You may recall from my post on the Electoral College that it’s often advantageous to work at a higher level of abstraction and deal with arrays of values as a single unit. This is a big part of what NumPy is for.

With that in mind, we can rewrite the Monte Carlo simulation this way:

python:
 1:  #!/usr/bin/env python3
 2:  
 3:  import numpy as np
 4:  import sys
 5:  
 6:  # Initialize
 7:  N = int(sys.argv[1])
 8:  rng = np.random.default_rng()
 9:  pA = .75
10:  pB = .25
11:  
12:  # Generate N pairs of uniform random variates and count
13:  A = rng.random(N)
14:  B = rng.random(N)
15:  onlyA = np.logical_and(A < pA, B >= pB).sum()
16:  onlyB = np.logical_and(B < pB, A >= pA).sum()
17:  
18:  # Estimate probability that if only one hits, it's A
19:  p = onlyA/(onlyA + onlyB)
20:  
21:  print(f'Probability estimate: {p:.2%} ({onlyA:,d}/{onlyA + onlyB:,d})')

In this code, A and B are arrays of length N, with each item being a random floating point number in the interval [0, 1). We then filter these arrays according to whether their values are less than the target values of pA and pB. For example, let’s say A is an array of 10 random numbers:

[0.67931298, 0.86084747, 0.66868163, 0.07347088, 0.37776693,
 0.86578819, 0.53811295, 0.84018319, 0.99293766, 0.39668558]

Then the result of A < .75 would be a 10-element array of Booleans:

 [True, False,  True,  True,  True, 
  False,  True, False, False, True]

The logical_and function takes two such arrays and outputs another Boolean array with elements that are True if both corresponding elements of the arguments are True and False otherwise.

Because Boolean arrays can be thought of as arrays of ones and zeros, the sum function acts to count the number of True values. This is how we make the (scalar) onlyA and onlyB variables in Lines 15 and 16.

The equivalent in this script of the onlyOne variable in the looping script is the sum of onlyA and onlyB, so that’s what shows up in the denominator of Line 19.

I called this script monte-carlo-array.py and ran it like this:

$ time python monte-carlo-array.py 1_000_000
Probability estimate: 89.99% (562,827/625,462)

real    0m0.091s
user    0m0.060s
sys     0m0.026s

Basically the same result but considerably faster. On the other hand, the array code uses a lot more memory because it has to make the various million-element arrays. The tradeoff between space and speed is common; what’s great about modern computers is that they have both.


  1. The opening square bracket means that 0 is included in the interval. The closing parenthesis means that 1 is not. 


Target practice and conditional probability

Bruce Ediger of Information Camouflage solved a nice little probability problem on his blog the other day. He got the problem from Greg Ross at Futility Closet, a blog I hadn’t run across before. Here it is:

Marksman A hits a certain small target 75 percent of the time. Marksman B hits it 25 percent of the time. The two of them aim at that target and fire simultaneously. One bullet hits it. What’s the probability that it came from A?

Implicit in this setup is that the two shooters are statistically independent. Why? First, because that makes the most sense; and second, because you can’t solve the problem otherwise.

Ross solves the problem in the traditional way, working out the intersectional probabilities

P ( A hits B misses ) = ( 3 4 ) ( 3 4 ) = 9 16

and

P ( B hits A misses ) = ( 1 4 ) ( 1 4 ) = 1 16

Since these two intersection events, which we’ll call AB and BA to make things more compact,1 are mutually exclusive, the probability of their union, which is when exactly one of the shooters hits the target, is just the sum of these two probabilities. Therefore, the conditional probability we want is

P ( AB   |   AB BA ) = 9/16 9/16+1/16 = 9 10

Ediger does a Monte Carlo simulation and gets roughly the same answer2 but is unsatisfied:

This is a completely unintuitive result to me.

I can’t disagree. Luckily, this problem is simple enough that it can be visualized in a way that might make it more intuitive.

We’ll sketch out a Venn diagram of the problem, but we’ll use boxes instead of the usual ovals, and we’ll size the boxes so their areas represent the probabilities of the corresponding events.

The universe of all possibilities will be a 1×1 box, and we’ll consider each axis to represent one of the shooters. Event A is a blue box that takes up ¾ of the horizontal range. Event B is a red box that takes up ¼ of the vertical range. Where they intersect (both shooters hit the target) is purple—it’s both blue and red. The uncolored portion represents neither shooter hitting the target.

Graphical probability

By subdividing the space into grids, we see that there are 10 squares representing exactly one shooter hitting the target and 9 of them are blue. There’s our answer, and you can see why it’s so.


  1. Overbars are typically used to mean “not,” and putting events adjacent means taking their intersection. 

  2. “Roughly” because there’s almost always some fuzz around a numerical solution. 


House of cards redux

You may remember this post from back in October. It’s about a puzzle in the November issue of Scientific American in which the goal was to determine if you could build a house of cards of the following design with exactly 100 cards. And if so, how many stories would the house have?

Triangular houses of cards

The answer was yes, an 8-story house built this way would have 100 cards. My solution hinged on the recognition that the numbers of cards for increasing story count is related to triangular numbers. I got the answer by using the equation for triangular numbers and the quadratic formula.

The folks at SciAm went about it a different way. They set up a table of story counts and card counts and took the first and second differences of the card counts. Like this:

Difference table

They reasoned that since the second differences were constant, the formula for the card count must be of the 2nd degree. They then determined the three coefficients of that 2nd degree formula via simultaneous equations.

At the end of the post, I said that although I preferred my solution, I wanted to learn more about difference tables. About a month ago, I was in a bookstore and saw J.F. Steffensen’s Interpolation on the shelf. Difference tables play a big role in the book, so I bought it. It helped that it’s a Dover reprint, so it was inexpensive.

I’ve been skimming the book to figure out which parts I want to dig into and found a really simple way to solve the house of cards puzzle with no algebra whatsoever. Start by building the difference table as SciAm did it. Then recognize that the constancy of the 2nd difference column lets you extend the table to whatever length you want.

Extended difference table

The part with the yellow background is the part of the table we did before. Since the 2nd difference column is constant, we can just write 3’s for every entry. Then we work backwards to fill in the rest. We extend the 1st difference column by adding 3 (the 2nd difference value) to each preceding value:

11+3=14 14+3=17 17+3=20 20+3=23

Then we fill in the Cards column by adding the 1st difference values to the preceding card count:

26+14=40 40+17=57 57+20=77 77+23=100

That 100 corresponds to Row 8 of the table, so an 8-story house of cards has 100 cards. I went further than was necessary to solve the puzzle just to show how easy it is. The numbers were small enough that I could do all the addition in my head.

It’s true that having a formula allows you to get the card count directly for any number of stories, while using the difference table this way forces you to calculate all the intermediate values. In general, it’s better to have the formula. But for this specific puzzle, the difference table is faster.


More tolerant weight recording

You may remember the Weight Today shortcut I wrote about last month. I run it from my iPhone every morning to record my weight in both the Health app and a CSV file I use for graphing my progress (or lack thereof). It’s a simple shortcut, and yesterday it proved to be too simple.

I made a typo when entering my weight and tapped the Done button before I could stop myself. I opened up a text editor to fix the entry in the CSV file, but got distracted and forgot to fix the entry in Health. Not a big deal, but later that day I saw that a 3-mile walk had, according to my watch, burned only 1 calorie. I complained on Mastodon about the watch screwing up, and this morning Juande Santander-Vela (without knowing of my weight recording error) correctly deduced the cause:

maybe you’ve recorded a new weight and it is too small? I remember having had the opposite problem, where merely moving incurred a lot of calories, and it was because 80,2 kg had been recorded as 802 kg…

Upon reading this, I immediately opened the Health app and saw that it had my weight as 1.4 lbs. That explained the minuscule calorie count.

Shortly thereafter, April had a suggestion:

Can you incorporate some logic in your Shortcut to check for a minimum weight to avoid this?

Great idea. I’ll collect the most recent weight, compare it to the value just entered, and if the difference between the two is too much, I’ll ask for the weight again. I’ll put a while loop—or whatever Shortcuts calls a while loop—into the shortcut and that’ll take care of the problem.

Except I couldn’t find anything like a while loop in Shortcuts. I asked on Mastodon if I’d missed something, and the answer was no. Shortcuts can loop a specific number of times and it can loop through the elements of a list, but it can’t loop until a condition is met. Two workarounds were suggested:

  1. Use a Repeat loop with an unreasonably high count and stop the shortcut when the condition is met. This was suggested both by Tom and by Ian Bjorhovde,1 but I couldn’t bring myself to do it this way. Too ugly.
  2. Use recursion. This came from Leon Cowle, and that’s what I went with. Recursion seems kind of highfalutin for Shortcuts, and rewriting Weight Today that way made me think I was doing an exercise in SICP.

Here’s the new, more tolerant version of Weight Today:

StepActionComment
1 Weight Today Step 01 Get today’s date.
2 Weight Today Step 02 Format it as yyyy-mm-dd.
3 Weight Today Step 03 Get all the weights recorded in Health over the past 7 days, put them in reverse chronological order, and limit the list to the most recent.
4 Weight Today Step 04 Get the numerical value of the most recent weight.
5 Weight Today Step 05 Ask the user for today’s weight.
6 Weight Today Step 06 Calculate the difference (absolute value) between today’s weight and the most recent.
7 Weight Today Step 07 If the difference is more than 3…
8 Weight Today Step 08 Rerun the shortcut
9 Weight Today Step 09 If the difference is 3 or less…
10 Weight Today Step 10 Log today’s weight into the Health app.
11 Weight Today Step 11 Make a new CSV line.
12 Weight Today Step 12 Add the line to the CSV file.
13 Weight Today Step 13 Done.

As you can see, I chose 3 lbs as the cutoff for a realistic weight change. Anything more than that and I’m asked to try again. In the 6 months or so that I’ve been tracking my weight, I’ve never had a change of more than 2 lbs, so 3 lbs seemed like a decent choice. Of course, this doesn’t prevent all typos, just excessive ones.

I think the comments explain the shortcut pretty well. If you want to use something like it, but without the CSV stuff, just delete Steps 11 and 12. One last thing: I learned that the fabs function used in Step 6 was available in Shortcuts through a Reddit post by gluebyte.


  1. Whose uncle (I think it was uncle) Reidar Bjorhovde was a structural engineering researcher of some renown.