Monte Carlo and the Two Child Problem

In the previous post about the Two Child Problem, we thought about how the probabilities would change under different rules. In this post, let’s write those rules into a program and see how the probabilities change in a Monte Carlo (no relation to Monty Hall) simulation.

To review, the Two Child Problem is this:

Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?

The answer depends on what rules we think the questioner is following. We’ll look at three cases:

  1. The questioner would never pose this problem if Mr. Smith had two daughters. The problem is restricted to families with at least one son and the question is always about the probability of two sons.
  2. The questioner isn’t restricted at all. He simply tells us about one child, chosen at random, in a two-child family and asks us if the other child is of the same sex.
  3. The questioner is biased toward boys. If there’s at least one boy in the family, that’s what he tells us; if the family has two girls, he tells us there’s at least one girl. In either case, he asks for the probability that the other child is of the same sex.

In Monte Carlo simulation, we use the computer to generate lots of random events and then combine the counts of those random events to estimate probabilities. For the Two Child Problem, we’ll simulate “families” by generating pairs of letters: G for girls, B for boys. The counts we need to keep track of are:

Note that nsons+n2daughters=nn_{sons} + n_{2daughters} = n.

For the first case, we’re eliminating from consideration the families with two daughters, so the probability will be

n2sonsnsons\frac{n_{2sons}}{n_{sons}}

For the second case, we include all the families. Since we’re choosing the “revealed” child at random and asking if the other child is of the same sex, it’s equivalent to going through the list of all the families and picking out the boy-boy and girl-girl families. The probability will be

n2sons+n2daughtersn\frac{n_{2sons} + n_{2daughters}}{n}

The third case is a little trickier. Recognize first that if the family has any boys, the questioner will ask about boys and the probability will be calculated as in the first case. The questioner will ask about girls only if the family has two girls, so the probability of having two children of the same sex under that condition is 1. We use conditional probability to combine these situations:

With our variables, this becomes

(n2sonsnsons)·(nsonsn)+1·(n2daughtersn)\left(\frac{n_{2sons}}{n_{sons}}\right) \cdot \left(\frac{n_{sons}}{n}\right) + 1 \cdot \left(\frac{n_{2daughters}}{n}\right)

With a little algebra this formula reduces that of the second case. Which means that these two sets of rules are equivalent, even though they don’t seem to be.

Here’s a Python program that implements these ideas.

 1:  #!/usr/bin/python
 2:  
 3:  from __future__ import division
 4:  from random import choice
 5:  
 6:  n = 10000
 7:  sexes = ('G', 'B')
 8:  families = []
 9:  
10:  for i in range(n):
11:    families.append((choice(sexes), choice(sexes)))
12:  
13:  nsons = len([x for x in families if 'B' in x])
14:  n2sons = len([x for x in families if x == ('B', 'B')])
15:  n2daughters = len([x for x in families if x == ('G', 'G')])
16:  
17:  print '''If we restrict ourselves to families that have at least one son,
18:  the probability of having two sons is %d/%d = %5.3f''' % (n2sons, nsons, n2sons/nsons)
19:  
20:  print
21:  
22:  print '''If we choose the "revealed" child at random, the probability of having
23:  two children of the same sex is %d/%d = %5.3f''' % (n2sons+n2daughters, n, (n2sons+n2daughters)/n)
24:  
25:  print
26:  
27:  print '''If we "reveal" boys in every case except when there are two daughters,
28:  the probability of having two children of the same sex is
29:  (%d/%d)*(%d/%d) + 1*(%d/%d) = %5.3f''' % (n2sons, nsons, nsons, n, n2daughters, n, n2sons/nsons*nsons/n+n2daughters/n)

We use the choice function from the random module to generate 10,000 simulated families as a list of tuples. Lines 13-15 then filter the list according to certain criteria and count the number of families left. Line 17 onward does the calculations according to the formulas above and prints the results.

Here’s a sample of the output.

If we restrict ourselves to families that have at least one son,
the probability of having two sons is 2520/7535 = 0.334

If we choose the "revealed" child at random, the probability of having
two children of the same sex is 4985/10000 = 0.498

If we "reveal" boys in every case except when there are two daughters,
the probability of having two children of the same sex is
(2520/7535)*(7535/10000) + 1*(2465/10000) = 0.498

Based on the reasoning of the earlier post, the answers are what we expected. But thinking the problem through from a Monte Carlo perspective does give a different view of what the various rules mean.

The mantra of Richard Hamming’s book Numerical Methods for Scientists and Engineers is

The purpose of computing is insight, not numbers.

I think this exercise is a good illustration of that. We didn’t really have to write the Monte Carlo program; just working out how we were going to write it gave us an understanding of the similarities and differences in the three sets of rules.