Two-Child Problem: A Brief Explanation

Estimated read time (minus contemplative pauses): 6 min.

The Infinite Monkey Cage podcast episode “Science’s Epic Fails” (1/31/2017) ends with a rushed discussion of the classic two-child probability problem, posed as follows by one of the show’s two hosts:

You bump into someone in the street, they have two children, one is a boy. What are the chances that the other is also a boy?

A guest on the show says 1/2. The host who stated the question corrects him: it’s 1/3. Another person—the other host, I presume—interjects that it depends on how the question is posed.

The interjector is correct: the answer to a two-child problem may be 1/2 or 1/3, depending on how it’s posed. The above formulation is ambiguous, so there’s no grounds for answering 1/2 or 1/3. I suspect that the host claims 1/3 because that’s (incorrectly) the default expected answer for two-child type problems. And I suspect the guest answered 1/2 on the rationale that we generally assume even odds for a randomly chosen child being a boy or a girl; however, when the answer is indeed 1/2, it’s not due to that rationale.

I’ll give a brief and, I hope, intuitive explanation of both answers, but if you’re still not convinced or would like to go deeper, check out my more extensive post, which also features a variation in which the observed child is born on a Tuesday (the “competing” probabilities there become 13/27 and 1/2): Two-Child Problem (when one is a girl named Florida born on a Tuesday).

Two-Child Problem = 1/2

Let’s first consider a 1/2 formulation. For convenience, I’ll use frequencies and will imagine the person we bump into is named Tina. To disambiguate the statement of the problem, we need only add that, when we bump into Tina, she has a boy with her whom we know to be one of her two children (how we know this isn’t important, so long as it doesn’t give us more evidence than the problem grants; for example, it shouldn’t be because we see her at a mother-son picnic, for reasons that will be apparent in a moment).

Imagine there are 80 instances in which we bump into Tina. If we conveniently assume that birth rates of boys and girls are practically the same, then the 80 instances can be grouped into four equally probable scenarios:

In 20 instances, Tina has two boys.
In 20 instances, Tina has a girl and a boy, and the boy was born first.
In 20 instances, Tina has a girl and a boy, and the girl was born first.
In 20 instances, Tina has two girls.

We can abbreviate this sample space as follows*:

BB = 20
BG = 20
GB = 20
GG = 20

To be clear, what we’re imagining here is obviously not that we actually bump into Tina on 80 different occasions. Rather, we’re imagining bumping into her exactly one time in each of 80 different possible worlds. It’s like tossing two fair coins 80 times.

(*If you have a problem with including BG and GB in the sample space, check out my post explaining why both must be counted: Omega Hungers: Skeptics about Heads-Tails and Tails-Heads in the Sample Space. I also point out there that both need to be included even if the two children were born at precisely the same instant. Conceiving of one being born before the other is just a conceptual convenience. This is similar to constructing the sample space of a tossed quarter and nickel, whether they are flipped sequentially or simultaneously:

QT, NT
QT, NH
QH, NT
QH, NH

I’ll also note that it doesn’t matter if Tina is the stepmother or adoptive parent of one or both children, nor does it matter if Tina identifies strictly as a caregiver. The models constructed here need only accord with our evidence in the mathematically relevant ways.)

Note that the above sample space still doesn’t reflect what we actually know about Tina’s children. Namely, we know that at least one of them is a boy. So we can remove GG from the sample space, leaving us with:

BB = 20
BG = 20
GB = 20

Now for the crucial move, which reflects another assumption: the odds are even for bumping into Tina with either of her children. For example, if she has a boy and a girl, then, for all we know, we could have just as easily bumped into Tina with a girl. This is our most reasonable presumption, as we have no evidence for giving a higher probability to bumping into Tina with one of her particular children. (This is why I ruled out running into her at a mother-son picnic.)

That in mind, we update the sample space one last time:

BB = 20 (because every time we bump into her in this scenario, she’ll be with a boy)
BG = 10 (because she’ll only be with a boy half the time in this scenario, and 10 is half of 20)
GB = 10 (same reasoning as in the BG scenario)

We can now calculate the probability that Tina has two boys by asking what proportion of the time she has two boys given an instance in which we’ve seen her with a boy. That is, we’ll see her with a boy 40 times, and 20 of those times, she has two boys. Put another way, she has two boys 20 out of the 40 times we see her with a boy. Or you can put it in possible-world terms: in 20 out of the 40 possible worlds we’re in, Tina has two boys. That’s half the time: 20/40 = 1/2.

And so, the probability that Tina has two boys is 1/2.

Two-Child Problem = 1/3

So when is a two-girl problem’s answer 1/3? This is easier to demonstrate, and is in fact the explanation given on The Infinite Monkey Cage (though it fails due to ambiguity).

Once again, we bump into Tina. But this time she has no child with her. We ask, “Is at least one of your two children a boy?” She replies, “yes” (and we believe her).

So, we are again back at the following sample space:

BB = 20
BG = 20
GB = 20

And, again, we can further adjust the sample space based on what we’ve learned. But the numbers are different this time:

BB = 20 (because every time we ask Tina if she has a boy, she says “yes”)
BG = 20 (same as above)
GB = 20 (same as above)

(Notice that this is the same sample space we’d presume had we bumped into Tina participating in a mother-son picnic.)

There are now 60 instead of 40 instances of learning that one of Tina’s children is a boy. Twenty of those instances are BB scenarios, which gives a proportion of 20/60 = 1/3.

And so, the probability that Tina has two boys is 1/3.

And that’s that.

Conclusion

Two-child problems are often posed ambiguously. When that happens, I (and others) think the proper answer is, “the question is ambiguous,” but the standard default assumption seems to be a 1/3 scenario. I suppose that, superficially, a 1/3 answer just makes the question seem more counterintuitive and thus both more fun and more instructive for thinking about conditional probability; though, as I think has been demonstrated here, the well-posed 1/2 formulation is at least, if not more, interesting.

Finally, it’s possible that my careful attempts at clarity and heading off objections (“What if both children were born at the same instant?”) overcomplicate my explanation. I think I must leave those in, however, as they at least give an acknowledging nod to the sorts of nagging counter-intuitions that are a large part of what makes probability difficult. Indeed, the problem’s intuitive, psychological dimensions are where its real value lies, rather than it just being a fun brain-teaser. There’s still more to say about the problem in that respect. But this is deeper than I’m allowing myself to go today. For that, and for diagrams and yet more examples (including with coins), see again my more in-depth post: Two-Child Problem (when one is a girl named Florida born on a Tuesday).


Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.

Further Reading

3 Replies to “Two-Child Problem: A Brief Explanation”

  1. The underlying issue with these sorts of problems, is that probability theory deals with “events.” Whether or not a fact is true is not an event. How that fact came to be known is, and the method often can provide different information when that fact is true. The classic example is called Bertrand’s Box Paradox. But the name does not refer to a specific probability problem, it refers to how confusing facts with events can produce a paradox. I’ll modify your “bump in the street” problem into an illustration.

    But first, a clarification on the interpretation of language. You imply that “Mr. Smith has two children, and one of them is a boy” means that he has exactly two children, but could have one or two boys. Why not two or more children, and exactly one boy? Or exactly one boy and one girl? Except in the context of a probability problem, the preferred meaning is exactly one of each.

    My point is that you are inserting the “at least” interpretation prejudicially. That can be justified for a probability problem; without it the question would be deterministic, and if applied to the number of children the question is unanswerable. Your point in this article is that how the “at least one of exactly two” is determined is important. But there are other interpretations that may be just as, or even more, valid. When first presented with this problem, most people think that only one child’s gender is known. So “the other” has a 50:50 chance to be a boy.

    As a simplification, suppose you live is a town populated only by families of two children. This will not affect the results, since we could just apply a filter to keep track of different-sized families separately. It just makes it easier to provide the example.

    Say “bump into” 80 people a day. You’d expect 20 to have two boys, 20 to have two girls, and 40 to have one of each. But 60 have at least one boy, and 60 have at least one girl. Assuming you learn only about one gender, that may or may not apply to both children, in each:

    1A) In how many cases do you expect to learn “at least one boy” ?
    1B) In how many cases do you expect to learn “at least one girl” ?
    2A) In how many of cases counted in #1A these do you expect the family to have two boys?
    2B) In how many of cases counted in #1B these do you expect the family to have two girls?
    3A) What are the chances that a family counted in #1A has two boys?
    3B) What are the chances that a family counted in #4 has two girls?

    If “1/3” is the correct answer to the Two Child Problem as you stated it, then the answers to my “A” series of questions are 60, 20, and 33%. But then the “B” series has 20, 20, and 100%. The paradox is that with the information you are given, #1A and #1B have to have the same answers, as do #2A and #2B, and #3A and #3B.

    The only way to eliminate this paradox, is to assume that “40” is the answer to both #1A and #1B. That same assumption makes “20” the answer to #2A and #2B, and “50%” to #3A and #3B.

    Yes, the question as you stated it, and as the Two Child Problem is often stated, is ambiguous. But the point of probability is to describe situations that are ambiguous with relative chances. The apparent intention of the Two Child Problem is to describe the ambiguity between the gender arrangements BB, BG, GB, and GG this way. But if you don’t know how you learned a fact, it must be treated the same way. The answer is 50%. This does not mean that we have deduced the method by which we learned it, it means we used probability.

    1. Hi JeffJo,

      Thanks for the response. Part of what I like about this problem is that it provides an opportunity for talking about assumptions in the context of probability, a point I touch on several times here (and I’ve now added a note about assuming “at least one”). I go more into this in my longer post about the problem (which you’ve already commented on). There, I bring up, for example, the idea that if you’re running into this problem in an intro gloss to probability, it likely assumes a 1/3 scenario (I give an example). But a more thorough textbook might demonstrate 1/2 and 1/3 scenarios (I believe I give an example of that as well; if not, I can).

      I also consider there that you might take into account the frequency with which people tend to mean a 1/3 versus a 1/2 scenario—but at that point it becomes a question about “what does the person posing the question expect me to think?” and is getting pretty silly, though I find it an interesting silliness, as it points to what I find most instructive about the problem, more about which in a moment.

      At any rate, here are my answers to your A/B series questions:

      In the 1/2 scenario, where I “bump into” the family and learn that they have “at least one boy” by seeing them with a boy:
      1A) 40
      1B) 40
      2A) 20
      2B) 20
      3A) 50% (I assume you mean, “a family chosen at random,” and not “exactly one family” or “at least one family”)
      3B) 50% (I assume you mean “1B” rather than “4”)

      In the 1/3 scenario, where I “bump into” the family (who are walking with no child), and I ask, “do you have at least one boy”?:
      1A) 60 say “yes”
      1B) 20 say “no”
      2A) 20 of those who say “yes” have two boys
      2B) 20 of those who say “no” have two girls
      3A) 33% of those who say “yes” have two boys
      3B) 100% of those who say “no” have two girls

      The same answers you gave. I don’t see a problem with this, though I acknowledge that I changed the initial (or we might say “generic”) statement of the problem to clarify how we get information. I happen to think that the best way to deal with the problem is to clarify it in this, or some similar, way. (As you’ve noted in a previous comment on a different post, Martin Gardner conceded this point after having given the 1/3 answer in his column in 1959.)

      If faced with a real-world situation in which clarification is impossible, I’d say this uncertainty should be built into the forecaster’s (or whoever’s) recommendations. Sometimes we just don’t have enough information. And making that fact salient is, I think, the chief value of this toy problem—for example, by the disagreement it sometimes engenders among those who might argue about “what probability is meant for, should be meant for, can do, can’t do, etc.”

    2. PS: I was just reminded of this thread and some clarifying thoughts occurred to me.

      What bugs me about the radio show’s formulation is that it borrows a common formulation of this problem, in which YOU bump into someone with a child, and this adds to your knowledge about that person’s children (though I’ve rarely seen it specified how you learn that the observed child has a sibling). In these “bumped into” cases, in which you see the person with a child, the answer is usually 1/2 (unless you’re at a father-daughter picnic or something). But the radio show doesn’t specify this—it just says you bump into someone, hinting (though not specifying) that you learn one child is a boy due to that boy being with the person you bump into. If this is indeed how the question is interpreted, then the answer is 1/2. If, rather, we interpret the question the alternative way I do above, where we explicitly ask the person (who’s walking alone), “Do you have precisely two children and, if so, is at least one of them a boy?” and the response is “yes and yes,” then the answer is 1/3.

      In short, it seems to me that the radio show’s formulation is trying to give an unambiguous expression of the problem, but needs to fill in at least one more detail: e.g., the person was with a boy, in which case the answer would be 1/2.

      That said, in formulations that say nothing about how one’s knowledge is gained, my intuition (these days) is with the “textbook” answer of 1/3. For example, as appears in Blitzstein and Hwang’s Introduction to Probability, with a given answer of 1/3: “A family has two children, and it is known that at least one is a girl. What is the probability that both are girls, given this information?” (p 45)

      I’m inclined to agree that the answer there is 1/3, given only that information. On the other hand, I’ve seen at least one textbook that seems to challenge such formulations as too ambiguous to allow for a clear answer—i.e., Grinstead and Snell’s Introduction to Probability (p 175-178), who note that “the ‘textbook’ solution” yields an answer of 1/3, but then follow the lead of Bar-Hillel and Falk’s article, “Some Teasers Concerning Conditional Probabilities” (Cognition, vol. 11 (1982), pp 109-122), who don’t give a definitive answer to the classic statement of the problem. Grinstead and Snell also point out that “It is not so easy to think of reasonable scenarios that would lead to the classical 1/3 answer. … the apparent paradoxes could easily be resolved by clearly stating the model that is being used and the assumptions that are being made.”

      Similarly, Gardner has noted that an “ambiguity arises from a failure to specify the randomizing procedure,” so the problem “must be very carefully stated to avoid ambiguity that prevents a precise answer” (The Colossal Book of Mathematics, pp 277-282).

      Again, though, I’m inclined to think that, if all we have is the info given in the classic statement of the problem, the answer really should be 1/3. But the fact that there is disagreement there gives me pause. (I’m referring here, mind you, to the classic statement; I quite disagree with Grinstead and Snell about it not being easy to come up with a reasonable 1/3 scenario). In fact, what I like about this problem is that such a seemingly simple little conditional probability problem would have such discussions surrounding it. What this might imply for messier real-world applications is what most fascinates me here.

Share your thoughts: