Loops and arrays
January 20, 2025 at 9:41 AM by Dr. Drang
I mentioned in yesterday’s post that Bruce Ediger solved the conditional probability problem using Monte Carlo simulation. This is when you use a random number generator to numerically simulate many repetitions of the problem and track the conditions of interest.
Ediger wrote his program in Go and posted it on GitHib. It’s pretty straightforward code, and you don’t need to know Go (I don’t) to understand how it works. I wrote a similar script in Python:
python:
1: #!/usr/bin/env python3
2:
3: import numpy as np
4: import sys
5:
6: # Initialize
7: N = int(sys.argv[1])
8: rng = np.random.default_rng()
9: pA = .75
10: pB = .25
11:
12: # Generate N pairs of uniform random variates and count
13: onlyOne = 0
14: onlyA = 0
15: for i in range(N):
16: A = rng.random()
17: B = rng.random()
18: if A < pA:
19: if B >= pB:
20: onlyA += 1
21: onlyOne += 1
22: else:
23: if B < pB:
24: onlyOne += 1
25:
26: # Estimate probability that if only one hits, it's A
27: p = onlyA/onlyOne
28:
29: print(f'Probability estimate: {p:.2%} ({onlyA:,d}/{onlyOne:,d})')
The variable N
, which is passed as the first argument to the script (see Line 7), is the number of simulations; it’s the number of times we go through the loop that starts on Line 15. The varaiables onlyOne
and onlyA
keep track of, respectively, the number of simulations in which only one shooter hits the target and the number in which only Shooter A hits the target. Within each pass through the loop, random variables A
and B
are set via the random
function, which generates a uniform random floating point number in the half-open interval [0, 1).1 The conditions of interest are tracked by Lines 18–24, where Shooter A hits the target if A
is less than pA
(0.75) and Shooter B hits the target if B
is less than pB
(0.25).
Finally, the probability we want is estimated by the division on Line 27 and returned as a percentage on Line 29. Overall, a pretty simple script if you’re used to doing this sort of thing.
This script is called monte-carlo-loop.py
, and I ran it inside the time
command to get both the probability estimate and the running time.
$ time python monte-carlo-loop.py 1_000_000
Probability estimate: 90.01% (562,294/624,714)
real 0m0.499s
user 0m0.402s
sys 0m0.033s
That’s a million simulations. Python disregards underscore characters in numbers, so I used 1_000_000
as the argument instead of 1000000
to make it easier to read. The script took about half a second to run on my M4 MacBook Pro and got, for all practical purposes, the theoretically correct answer of 90%.
Note that I’m using a loop to go through each simulation. You may recall from my post on the Electoral College that it’s often advantageous to work at a higher level of abstraction and deal with arrays of values as a single unit. This is a big part of what NumPy is for.
With that in mind, we can rewrite the Monte Carlo simulation this way:
python:
1: #!/usr/bin/env python3
2:
3: import numpy as np
4: import sys
5:
6: # Initialize
7: N = int(sys.argv[1])
8: rng = np.random.default_rng()
9: pA = .75
10: pB = .25
11:
12: # Generate N pairs of uniform random variates and count
13: A = rng.random(N)
14: B = rng.random(N)
15: onlyA = np.logical_and(A < pA, B >= pB).sum()
16: onlyB = np.logical_and(B < pB, A >= pA).sum()
17:
18: # Estimate probability that if only one hits, it's A
19: p = onlyA/(onlyA + onlyB)
20:
21: print(f'Probability estimate: {p:.2%} ({onlyA:,d}/{onlyA + onlyB:,d})')
In this code, A
and B
are arrays of length N
, with each item being a random floating point number in the interval [0, 1). We then filter these arrays according to whether their values are less than the target values of pA
and pB
. For example, let’s say A
is an array of 10 random numbers:
[0.67931298, 0.86084747, 0.66868163, 0.07347088, 0.37776693,
0.86578819, 0.53811295, 0.84018319, 0.99293766, 0.39668558]
Then the result of A < .75
would be a 10-element array of Booleans:
[True, False, True, True, True,
False, True, False, False, True]
The logical_and
function takes two such arrays and outputs another Boolean array with elements that are True
if both corresponding elements of the arguments are True
and False
otherwise.
Because Boolean arrays can be thought of as arrays of ones and zeros, the sum
function acts to count the number of True
values. This is how we make the (scalar) onlyA
and onlyB
variables in Lines 15 and 16.
The equivalent in this script of the onlyOne
variable in the looping script is the sum of onlyA
and onlyB
, so that’s what shows up in the denominator of Line 19.
I called this script monte-carlo-array.py
and ran it like this:
$ time python monte-carlo-array.py 1_000_000
Probability estimate: 89.99% (562,827/625,462)
real 0m0.091s
user 0m0.060s
sys 0m0.026s
Basically the same result but considerably faster. On the other hand, the array code uses a lot more memory because it has to make the various million-element arrays. The tradeoff between space and speed is common; what’s great about modern computers is that they have both.
-
The opening square bracket means that 0 is included in the interval. The closing parenthesis means that 1 is not. ↩
Target practice and conditional probability
January 19, 2025 at 10:26 PM by Dr. Drang
Bruce Ediger of Information Camouflage solved a nice little probability problem on his blog the other day. He got the problem from Greg Ross at Futility Closet, a blog I hadn’t run across before. Here it is:
Marksman A hits a certain small target 75 percent of the time. Marksman B hits it 25 percent of the time. The two of them aim at that target and fire simultaneously. One bullet hits it. What’s the probability that it came from A?
Implicit in this setup is that the two shooters are statistically independent. Why? First, because that makes the most sense; and second, because you can’t solve the problem otherwise.
Ross solves the problem in the traditional way, working out the intersectional probabilities
and
Since these two intersection events, which we’ll call and to make things more compact,1 are mutually exclusive, the probability of their union, which is when exactly one of the shooters hits the target, is just the sum of these two probabilities. Therefore, the conditional probability we want is
Ediger does a Monte Carlo simulation and gets roughly the same answer2 but is unsatisfied:
This is a completely unintuitive result to me.
I can’t disagree. Luckily, this problem is simple enough that it can be visualized in a way that might make it more intuitive.
We’ll sketch out a Venn diagram of the problem, but we’ll use boxes instead of the usual ovals, and we’ll size the boxes so their areas represent the probabilities of the corresponding events.
The universe of all possibilities will be a 1×1 box, and we’ll consider each axis to represent one of the shooters. Event is a blue box that takes up ¾ of the horizontal range. Event is a red box that takes up ¼ of the vertical range. Where they intersect (both shooters hit the target) is purple—it’s both blue and red. The uncolored portion represents neither shooter hitting the target.
By subdividing the space into grids, we see that there are 10 squares representing exactly one shooter hitting the target and 9 of them are blue. There’s our answer, and you can see why it’s so.
House of cards redux
January 18, 2025 at 2:46 PM by Dr. Drang
You may remember this post from back in October. It’s about a puzzle in the November issue of Scientific American in which the goal was to determine if you could build a house of cards of the following design with exactly 100 cards. And if so, how many stories would the house have?
The answer was yes, an 8-story house built this way would have 100 cards. My solution hinged on the recognition that the numbers of cards for increasing story count is related to triangular numbers. I got the answer by using the equation for triangular numbers and the quadratic formula.
The folks at SciAm went about it a different way. They set up a table of story counts and card counts and took the first and second differences of the card counts. Like this:
They reasoned that since the second differences were constant, the formula for the card count must be of the 2nd degree. They then determined the three coefficients of that 2nd degree formula via simultaneous equations.
At the end of the post, I said that although I preferred my solution, I wanted to learn more about difference tables. About a month ago, I was in a bookstore and saw J.F. Steffensen’s Interpolation on the shelf. Difference tables play a big role in the book, so I bought it. It helped that it’s a Dover reprint, so it was inexpensive.
I’ve been skimming the book to figure out which parts I want to dig into and found a really simple way to solve the house of cards puzzle with no algebra whatsoever. Start by building the difference table as SciAm did it. Then recognize that the constancy of the 2nd difference column lets you extend the table to whatever length you want.
The part with the yellow background is the part of the table we did before. Since the 2nd difference column is constant, we can just write 3’s for every entry. Then we work backwards to fill in the rest. We extend the 1st difference column by adding 3 (the 2nd difference value) to each preceding value:
Then we fill in the Cards column by adding the 1st difference values to the preceding card count:
That 100 corresponds to Row 8 of the table, so an 8-story house of cards has 100 cards. I went further than was necessary to solve the puzzle just to show how easy it is. The numbers were small enough that I could do all the addition in my head.
It’s true that having a formula allows you to get the card count directly for any number of stories, while using the difference table this way forces you to calculate all the intermediate values. In general, it’s better to have the formula. But for this specific puzzle, the difference table is faster.
More tolerant weight recording
January 15, 2025 at 3:04 PM by Dr. Drang
You may remember the I wrote about last month. I run it from my iPhone every morning to record my weight in both the Health app and a CSV file I use for graphing my progress (or lack thereof). It’s a simple shortcut, and yesterday it proved to be too simple.
shortcutI made a typo when entering my weight and tapped the Done button before I could stop myself. I opened up a text editor to fix the entry in the CSV file, but got distracted and forgot to fix the entry in Health. Not a big deal, but later that day I saw that a 3-mile walk had, according to my watch, burned only 1 calorie. I complained on Mastodon about the watch screwing up, and this morning Juande Santander-Vela (without knowing of my weight recording error) correctly deduced the cause:
maybe you’ve recorded a new weight and it is too small? I remember having had the opposite problem, where merely moving incurred a lot of calories, and it was because 80,2 kg had been recorded as 802 kg…
Upon reading this, I immediately opened the Health app and saw that it had my weight as 1.4 lbs. That explained the minuscule calorie count.
Shortly thereafter, April had a suggestion:
Can you incorporate some logic in your Shortcut to check for a minimum weight to avoid this?
Great idea. I’ll collect the most recent weight, compare it to the value just entered, and if the difference between the two is too much, I’ll ask for the weight again. I’ll put a while loop—or whatever Shortcuts calls a while loop—into the shortcut and that’ll take care of the problem.
Except I couldn’t find anything like a while loop in Shortcuts. I asked on Mastodon if I’d missed something, and the answer was no. Shortcuts can loop a specific number of times and it can loop through the elements of a list, but it can’t loop until a condition is met. Two workarounds were suggested:
- Use a Repeat loop with an unreasonably high count and stop the shortcut when the condition is met. This was suggested both by Tom and by Ian Bjorhovde,1 but I couldn’t bring myself to do it this way. Too ugly.
- Use recursion. This came from Leon Cowle, and that’s what I went with. Recursion seems kind of highfalutin for Shortcuts, and rewriting that way made me think I was doing an exercise in SICP.
Here’s the new, more tolerant version of
:Step | Action | Comment |
---|---|---|
1 | Get today’s date. | |
2 | Format it as yyyy-mm-dd. | |
3 | Get all the weights recorded in Health over the past 7 days, put them in reverse chronological order, and limit the list to the most recent. | |
4 | Get the numerical value of the most recent weight. | |
5 | Ask the user for today’s weight. | |
6 | Calculate the difference (absolute value) between today’s weight and the most recent. | |
7 | If the difference is more than 3… | |
8 | Rerun the shortcut | |
9 | If the difference is 3 or less… | |
10 | Log today’s weight into the Health app. | |
11 | Make a new CSV line. | |
12 | Add the line to the CSV file. | |
13 | Done. |
As you can see, I chose 3 lbs as the cutoff for a realistic weight change. Anything more than that and I’m asked to try again. In the 6 months or so that I’ve been tracking my weight, I’ve never had a change of more than 2 lbs, so 3 lbs seemed like a decent choice. Of course, this doesn’t prevent all typos, just excessive ones.
I think the comments explain the shortcut pretty well. If you want to use something like it, but without the CSV stuff, just delete Steps 11 and 12. One last thing: I learned that the fabs
function used in Step 6 was available in Shortcuts through a Reddit post by gluebyte.
-
Whose uncle (I think it was uncle) Reidar Bjorhovde was a structural engineering researcher of some renown. ↩