Bertrand’s Box Paradox (with and Without Bayes’ Theorem)

Estimated read time (minus contemplative pauses): 15 min.

bertrand's box paradox 14 joseph bertrand
Joseph Bertrand (1822–1900)

Bertrand’s Box Paradox has been mentioned in comments a couple of times here at Untrammeled Mind, so I figured I’d give a quick explanation. It’s a straightforward probability problem, if at first glance counterintuitive. Which is to say it doesn’t take much to resolve the so-called paradox.

I’ll first give a short explanation, then a slightly longer one. As a cool added bonus, the slightly longer explanation will lead us to discover Bayes’ theorem, whose innards I’ll then take the opportunity to briefly, gently probe with Bertrand’s Box Paradox as a guide.

Bertrand’s Box Paradox comes from Joseph Bertrand’s 1889 book Calcul des probabilités. I’ll lift it from Wikipedia:

There are three boxes:

(1) a box containing two gold coins,
(2) a box containing two silver coins,
(3) box containing one gold coin and a silver coin.

The “paradox” is in the probability, after choosing a box at random and withdrawing one coin at random, if that happens to be a gold coin, of the next coin drawn from the same box also being a gold coin.

So, to summarize: You randomly choose a box, then randomly pull a coin from that box. The coin is gold. You don’t put the coin back in the box. What’s the probability the second coin you pull from that same box is also gold?

Short Explanation:

To aid mental visualizing, I’ll sketch the boxes:

(1) [g,g]
(2) [s,s]
(3) [s,g]

You know you’ve got either box (1) or (3), so forget about box (2). We presume the same likelihood for choosing any given box. This suggests the intuitive response of there being an equal likelihood of having chosen box (1) or (3), and thus one-to-one (i.e., 50-50) odds for the next coin being gold or silver. In other words, you might now think the answer is 1/2. But this is wrong.

Notice that, given that box (1) contains twice as many gold coins as does box (3), you are twice as likely to have chosen box (1). In other words, there’s a 2:1 ratio in favor of box (1). That’s 2-to-1 odds, which means that, on average, for every three times you play this game and first pull a gold coin, two out of three of those runs will be a box (1) scenario, and one out of three will be a box (3) scenario. So the probability that the second coin is gold is two out of three, or 2/3.

In short, the first coin you pull will be gold twice as often in the box (1) situation than in the box (3) situation. So you’re twice as likely to be in the box (1) situation than in the box (3) situation. That’s 2:1 odds, which translates to a 2/3 probability for having chosen box (1) and a 1/3 probability for having chosen box (3).

Slightly Longer Explanation: 

I’ll now explain Bertrand’s Box Paradox more thoroughly with a tree diagram, some simple math, and a little common sense. Hopefully it’s clear; if not immediately, then with a little persistence. I tend to describe the same thing in different ways with the hope that, taken as a whole, the result is understanding rather than algorithmic dependence (both for myself and the reader). (If this stuff is new to you and my delivery is confusing, I apologize; if it’s familiar and my delivery seems patronizing, I apologize. Striking the right balance is tough to do. So I follow the rule of trying to explain things the way I think I’d like to have seen them explained when I first encountered them—also surprisingly tough to do.)

First, let’s define our starting sample space and give the boxes names instead of numbers: {GG, SG, SS}

So, “GG,” for example, means “the box with two gold coins.” Also, I’ll use lower case to symbolize the coins—so, “g” means “gold coin.”

Let’s now use theoretical frequencies to run the game 60 times. Each box carries the same probability of being chosen, so we end up with:

bertrand's box paradox 01

SS is crossed out because we already know were not in an SS situation. It’s at this point that one’s intuition pipes up, “Ah, two boxes, both equally likely of having been chosen. The probability of GG must be 1/2.” This fails, however, to account for the evidence the gold coin provides. We must extend our branches:

bertrand's box paradox 02

In total, we pull a gold coin 30 times, 20 of which happen in a GG scenario, and the other 10 happen in SG. We then set a ratio of the desired outcomes against all outcomes that meet the broader condition of we pulled a gold coin:

bertrand's box paradox 03
Now plug in the corresponding numbers from the tree:

bertrand's box paradox 04

And there you have it. But I’d like to break this down a little further and repeat the above method, but using probabilities. Essentially, you can think of this as the same as what we did above, but playing one game instead of 60. This amounts to the exact same proportions everywhere, but with the the relevant fractions simplified and the math more transparent.

For example, instead of starting with 60, we start with 60/60 = 1. Instead of choosing GG 60/3 = 20 times, we choose it (60/3)/60 = 20/60 = 1/3 times (or we might say, “1/3 of the time”), which is just the probability of selecting GG. You can then find the 2/3 answer with the same ratio we used above (which I represent in the next diagram below), or you can do what amounts to essentially the same thing by multiplying the probabilities on the branches on a given path in order to figure out the probability of that path’s happening, and then make an appropriate ratio (I’ll do that in a moment).

Let’s go right to the full tree:

bertrand's box paradox 05

We can interpret the above multiple ways. We can say, “On average, 1/3 of the times we play this game, we choose box GG, at which point we always pull a gold coin; and, on average, 1/3 of the times we play this game, we choose box SG, and we then pull a gold coin about 1/2 those times.” Those fractions are, of course, probabilities. Which is to say that we can multiply them to get the probability of a given path. In other words, we can also, and more efficiently, interpret the above, by noting that there’s a (1/3) × (2/2) = 1/3 probability of choosing box GG then pulling a gold coin; and there’s a (1/3) × (1/2) = 1/6 probability of choosing box SG then pulling a gold coin.

We can now find the probability that we chose box GG given the evidence of a gold coin by setting a ratio of our desired outcome’s probability against the sum of all the probabilities corresponding to the outcomes that meet our condition of having pulled a gold coin. I’ll make this explicit. Allow me to introduce some notation for readability. If I write P(x), read that as “the probability that x happens.” Notice that the components of the ratio follow the stories told by our branch paths:

bertrand's box paradox 06

To recap, the numerator has the outcome we’re hoping for, while the denominator has the outcome we’re hoping for PLUS the (relevant possible) outcome(s) we’re not hoping for. Or, more simply put, the numerator has the probability of this evidence arising given the situation we’re hoping for (or at least that makes our hypothesis true), and the denominator has the probability of this evidence arising period. In specific terms, this translates, respectively, to the probability the first coin being gold when we’ve chosen box GG and the probability of the first coin being gold period.

Now we just plug in the numbers. This introduces nothing new into the conversation except for a visual representation of the math we’ve already noticed:

bertrand's box paradox 07

And there you have it!1

Hey, That’s Just Bayes’ Theorem:

The above amounts to an application of what we’ve come to call “Bayes’ theorem” (or “law” or “rule”). Often with conditional probability problems like these, where the aim is to revise a probability (or “degree of belief”) estimate based on new evidence or data, folks will trot out Bayes’ theorem, plug in some numbers, and say, “And there you have it!” Here, though, we’ve gotten to the theorem from the inside out. The theorem really isn’t anything magical, but it is powerful. And when you understand it, it’s intuitive enough that it doesn’t need to be rotely memorized.

My goal today isn’t to talk at length about Bayes’ theorem, but rather to say enough to show how our above work matches up with it. I make this more complicated than it needs to be for the sake of continuing with the idea of working from the inside out. (For a more in-depth overview of Bayes’ theorem, see my post, “An Easier Counterintuitive Conditional Probability Problem [with and Without Bayes’ Theorem],” where I also apply the theorem twice to the same hypothesis . For a more thorough primer, along with some fun examples, check out Dan Morris’s book, Bayes’ Theorem Examples: A Visual Introduction For Beginners. For an overview of Bayesian statistics and Bayesian inference more generally, see this Wikipedia entry, where you’ll find links to other useful articles.)

A common way to represent Bayes’ theorem is as follows.

Where:

H = hypothesis
E = evidence
~ = not
P(H|E) = something like, “The probability my hypothesis is true given this evidence” or, for short, “the probability of H given E.”

bertrand's box paradox 08 bayes' theorem

This is essentially an abstraction of we produced at the end of the last section. To review: the numerator has the probability of this evidence arising given the situation we’re hoping for (or at least that makes our hypothesis true), and the denominator has the probability of this evidence arising period. In specific terms, this translates, respectively, to the probability of the first coin being gold when we’ve chosen box GG and the probability of the first coin being gold period.

Notice that, since the denominator is really just the probability of getting the evidence at all, it is equivalent to P(E). If P(E) is already known or easily calculated, you can skip the longer formula and plug P(E) into the denominator. Some problems are much more easily worked out that way, so Bayes’ theorem is often represented as:

bertrand's box paradox 09 bayes' theorem

In fact, this is Bayes’ theorem (though I’ll refer to it here as “abbreviated.”) We derive the longer version, when useful, by breaking up P(E) with the Law of Total Probability.

Arguably, Bertrand’s Box Paradox is most easily addressed with the abbreviated form, given that we can infer P(E) to be 1/2, due to the situation’s general symmetry and there being a total of 3 gold and 3 silver coins in the three boxes; but let’s let that 1/2 figure emerge from the inside out, using the longer version of the formula. I’ll do this two ways.

First, I’ll run it in a way that matches the longer version of the theorem above. Some of the numbers will at first differ from what we saw at the end of the last section (for reasons that will be obvious), but will quickly line up as expected. Then I’ll run it again with the with a slight variation, but in a way that is intuitively streamlined—in fact, the numbers there will more closely match what we did at the end of the last section.

Here’s the first way. The hypothesis here is that box GG was chosen, or “GG” for short. The evidence is that the first coin pulled was gold, or “g” for short. Start by plugging a verbal representation of this into Bayes’ theorem:

bertrand's box paradox 10

Tighten this up with the symbolic representation:

bertrand's box paradox 11

If need be, take a moment to compare these steps with our earlier formulation and convince yourself that this story’s on the right track. Now we find the relevant probabilities for the equation.

P(GG) is the probability of choosing box GG before conditioning on our current evidence. That’s 1/3.

P(~GG) is the probability of choosing either box SG or SS (again, before introducing our evidence). That’s 2/3.

P(g|GG) is the probability of the first coin being gold when we’ve chosen box GG. That’s clearly 1.

P(g|~GG) is the probability of the first coin being gold when we’ve chosen either box SG or SS. In other words, this asks us to look at what can happen when we don’t select box GG, and find the proportion of time that our first pull is a gold coin among all the ways things can go, win or lose, in that condition. I’ll sketch the relevant sample space as:

{(choose SG then pull gold), (choose SG then pull gold), (choose SS then pull gold)}

There are other ways to model and represent and think about this, but this’ll do. Now we’re looking for the ratio of

…the probability of selecting box SG (i.e., 1/3) then pulling a gold coin (1/2), which comes out to (1/3) × (1/2) = 1/6

to

….the probability of selecting box SG then pulling a silver coin (1/6) PLUS the probability of selecting box SG then pulling a gold coin (also 1/6) PLUS the probability of selecting box SS (1/3) (in which case we always pull a silver coin) = (1/6) + (1/6) + (1/3) = 2/3.

We take that ratio: (1/6)/(2/3) and get 1/4.

So, P(g|~GG) is 1/4. (This makes sense, given that, of the four coins in SG and SS, one is gold. Yeah, I am indeed making this more complicated than it needs to be—you know, for learning.)

Now plug in the numbers:bertrand's box paradox 13

And again we get a final answer of 2/3. And, as expected, the denominator is 1/2 = P(E). (You might notice that there are more efficient ways to do the math as I go along here. But I’d like to make clear the sorts of steps one might take when the numbers aren’t so convenient.)

Finally, I’ll run it again with the same hypothesis and evidence, but this time, rather than using ~GG in the denominator, I’ll use the competing hypothesis that we chose SG, or “SG” for short.

Technically, we should also include the third hypothesis, “chose box SS” (or “SS” for short), but we could skip it here as this will come out to 0 as P(g|SS) = 0. The idea, though, is that our unconditioned probabilities add up to 1. That is: P(GG) + P(SG) + P(SS) = 1/3 + 1/3 + 1/3 = 3/3 = 1.2 This is consistent with what we’ve done so far where P(GG) + P(~GG) = 1, and where ~GG is just SG or SS, and P(SG) + P(SS) = 2/3. 3

In fact, we can generalize Bayes’ theorem where “E” is an evidencing event that can occur in conjunction with one of the mutually exclusive and exhaustive hypotheses events H1, H2, H3… Hn —in other words, Hi, where i is a member of the counting numbers. So, for P(Hi|E) we can use the usual numerator of P(E|Hi)×P(E), and for the denominator we get:

\displaystyle \sum_{i}^{n}P(E|H_i)\times P(H_i)

(If you’re not familiar with sigma notation or need a quick refresher, here’s a quick and clear tutorial.)

I first put our problem strictly in terms of GG in order to stick with the way the theorem is popularly encountered. In our new formulation, we end up with the following (I’ll include SS for the sake of thoroughness):

bertrand's box paradox 17

I’ve skipped the verbal representation this time, but the story is still there and should be consistent with what we’ve done so far.

We already know all these probabilities, but let’s review:

P(GG) = 1/3
P(SG) = 1/3
P(SS) = 1/3
P(g|GG) = 1
P(g|SG) = 1/2
P(g|SS) = 0

Plug in to get:bertrand's box paradox 18

And again we get 2/3.

There are of course yet other ways to model this problem. For example, you might try it with the hypothesis that the second coin pulled will be gold (which can happen in the SG condition, provided the first coin was silver).

As always, I invite you to comment with suggestions for improvement, corrections, and questions.

Closing Remarks:

It shouldn’t be surprising that different routs can lead to the same answer depending on how evidence is modeled. Especially when it comes to problems where the rates of outcome are reliably predictable. For example, suppose you want to get a 3♥ out of well-shuffled deck of 52 playing cards. You pull a card and get a 7♣. You don’t need Bayes’ theorem to know that if you pull another card from the remaining 51 cards, the probability of 3♥ is now 1/51. But if you do use the theorem, you’ll need to decide whether to represent E as 7♣ (whose probability is 1/52) or as ~3♥ (i.e., 51/52).

Such toy problems are for getting comfortable with the tools and sharpening your chops. Real-world questions—”Did the defendant do it?”—are where you’ll find the real and messy action. Bayes’ theorem is only as good as the numbers that go into it, so where a big chunk of its value lies may be not only as way to generate a series of probability estimate updates as new bits of evidence are introduced, but also in the care it urges us to take in developing the numbers that often come from the subjective evaluation of that evidence.

That’s a discussion for another day and for which I’m excitedly accumulating notes and instructive readings like this article: “Bayes and the Law4; and this book: Bayesian Data Analysis5; and many others.

UPDATE 4/27/20: A big thanks to the VSauce2 channel for linking this post as a source in their explanation of this problem! Check it out: “The Easiest Problem Everyone Gets Wrong.”

Post Script & Bonus Question:

Just as I was about to send this post to press, I discovered Allen Downey’s blog, Probably Overthinking It. Downey is author of the Think X series (e.g., Think Bayes; looks good, though I haven’t read it, partly because I don’t know Python, which seems to be a prereq—maybe it’s time to learn!). In a blog post, he poses a problem formally similar to Bertrand’s Box Paradox:

Suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of Bowl #1?

Find that, plus a trickier variation involving M&M’s, here: “My Favorite Bayes’s Theorem Problems.” Solutions are in a separate post: “All Your Bayes Are Belong to Us!” I found his solution to the the M&M’s problem especially instructive—without giving too much away: his formulation of the hypotheses allowed for tidy calculations that could have otherwise been cluttered. (The post includes some other problems I haven’t looked at yet.)


Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.
Or click the banner to shop at Amazon (at no extra cost: it just gives me some of what would have gone to Amazon).


Further Reading

Footnotes:

  1. For fun, here’s yet another way to do the tree, in which we toss out box SS and update boxes GG and SG each to 1/2. Notice that so long as the initial numbers leading to GG and SG are the same (except for 0), they cancel out and you end up with 1/(3/2), just as we did in the previous diagram.

    bertrands box paradox 16

  2. We can also achieve this by renormalizing the GG and SG probabilities so they add to 1; i.e., by making each 1/2, which we can do once we’ve removed SS from the picture.
  3. This is just like when you can treat “the die didn’t land 1” as either P(~1) = (1 – P(1)) = (1 – 1/6) = 5/6; or as P(2) + P(3) + P(4) + P(5) + P(6) = 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 5/6. Notice, again, that P(1) + P(2 or 3 or 4 or 5 or 6) = P(1 or ~1) = 6/6 = 1.
  4. by Norman Fenton, Martin Neil, Daniel Berger (Annu Rev Stat Appl. 2016 Jun; 3: 51–77. Published online 2016 Mar 9. doi: 10.1146/annurev-statistics-041715-033428)
  5. 3rd edition (2013) by Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, Donald B. Rubin.

18 Replies to “Bertrand’s Box Paradox (with and Without Bayes’ Theorem)”

  1. I have this box with the inscription Raton ET Bertran on the top of it. Giving to me from my grandmother cause I like to put my lil rings and niknaks in it.I just wanted to know what is it?

  2. What Bertrand called the “paradox” isn’t the problem itself, it is how you can demonstrate that the answer “1/2” can’t be right.

    Suppose, instead of picking a coin at random and revealing it, say I pick a coin at random and take it out of the box without revealing it. What it the probability that the coin still in the box is the same kind?

    If I were to reveal it, and it turns out to be gold, we have the same original problem. If I were reveal it, and it turns out to be silver, we have an identical problem that must have the same answer. Since those are the only possibilities, and they have the same answer, we don’t need to look at the coin to answer the question. The answer to my question is the same as the answer to the original problem.

    If that answer is 1/2, we have a problem. The probability that a random box has two identical coins is 2/3, not 1/2. Taking a coin out, without looking at it, cannot change that. By reductio ad absurdum, the answer 1/2 can’t be right. Note that if I look at both coins, and tell you that I will intentionally take out a gold one if I can, then the probability does change to 1/2

    This approach can be applied to other problems, invalidating what are sometimes accepted as correct answers. Mr. Jones has two children, and we are told that at least one is a boy. What is the probability that he has two boys? Many “experts” will say it is 1/3; Bertrand’s Paradox says that can’t be right. An identically-worded problem that uses “girl” in place of “boy” has to have the same answer. The chances that a two-child family has two children of the same gender is 1/2. Learning the gender of one can’t change that, unless somebody looked at both with the explicit intent to say “at least one is a boy” if possible.

  3. The probability of Mr. Jones having 2 boys given that at least 1 is a boy is in fact 1/3. there are four ways Mr. Jones can have two children. BB,GG,BG,GB. Three of the ways have at least 1 boy, BB,BG,GB. 1 of those ways is BB or 1 out of 3 ways possible to have at least 1 boy. It’s the same for at least 1 girl. You are correct in that the chances of both kids having the same gender is 1/2 BB,GG (2) out of BB,GG,BG,GB (4)

  4. The probability of the second coin being gold is NOT 2/3 its 1/3….The mistake your making is that you are eliminating the SS from your calculations. You can’t do that.. Sure on paper it seems like you can but in practice its not possible to eliminate the SS box because you don’t know which one it is. If you have 3 boxes in front of you with one drawer open and a gold coin showing and you now want to eliminate the SS box, exactly how do you do that without looking inside? There is a 1/3 probability of picking the box with both gold coins, seeing the gold coin doesn’t change the probability. It only changes the probability if after you see the gold coin someone removes the box containing the SS like you describe in your calculations. then and only then would you have a 2/3 probability.

    1. Thanks for the comment! I unfortunately don’t have time at the moment to go back and review the post, but from what you’re describing, it sounds like the box I’m eliminating as SS is a box that I know has at least one gold coin it (where we are pulling twice from the same box).

      I haven’t thought about these questions in years, so I’m hesitant to get into a discussion about it (my obsessions lie elsewhere at the moment) — that said, I’d be curious to see if you can get your variation into the Wikipedia entry on this question, which arrives, as I did, at 2/3:

      https://en.wikipedia.org/wiki/Bertrand%27s_box_paradox

    2. This right here! Something is off about the “Bayesian” math that is considered probability gospel. You can’t change the probability of an event that already occurred by merely looking at a data point that reveals how part of the event turned out. It’s more like a mathematical derivation of figuring out how often you will see what is left over. Once I have picked a box in the Bertrand’s Box paradox, I’m truly down to 50/50 if I just eliminate the 1/3 possibility that I even got the SS box.

  5. Thanks for the response and I understand not wanting to go down this rabbit hole at the moment but you might want to reconsider. You can claim the status as the man who fixed this mistake, Wikipedia is wrong, all the Youtube videos explaining this are wrong also. It’s an easy mistake to make.. Remember that a box is picked at random and a drawer opened and its a gold coin… that will only happen 50% of the time. 6 drawers with 3 gold coins and 3 silver. half the time you pick a box at random and open a drawer it will be silver! so a 2/3 probability of Gold will only happen 1/2 the time. 2/3 X 1/2 = 1/3. Anyway cool website and thanks for the response..

    1. Sorry, I’m confused. Are you saying that I am claiming to have corrected Wikipedia etc. or that you have corrected them?

      For whatever it’s worth (perhaps not much), my explanation matches Wikipedia’s (according to my quick glance at it just now), and one of the more popular YouTube videos (by VSauce2) on this topic cites my blog post as a source: https://youtu.be/ytfCdqWhmdg

      Thanks again for the comment, much appreciated.

      EDIT: Ah, I think you’re saying I could claim to be the one who corrected all those sources etc., if only I’d revisit the rabbit hole. Got it. Thanks.

  6. No I am not saying you edited Wiki. I am not saying I edited Wiki. I only commented on your blog because you put a lot of work into this so it seems you care about the topic. I am not trying to be insulting. I also recognize that the wisdom of the crowds is usually correct, so I am humbly pointing out an error.
    Forget the math for a moment and actually, physically play the game. Have someone randomly pick one of the boxes and open the drawer, you will find about half the time you play the drawer contains a silver coin. Are you just going to reject that random event? If you do, then your not playing with three boxes. That would be the same as playing with two boxes; one with 2 gold coins and one with 1 gold, 1 silver coin.

    1. No offense taken! I’m just not sure what you’re saying. It seems you’re saying that I, Wikipedia, most YouTube videos and such are getting it wrong when we arrive at a 2/3 result. If that’s what you mean, fair enough.

  7. I know listing equally likely outcomes and counting results is a more basic type of calculation than multiplying probabilities, but in this case its a very easy way to show the probability of 2/3

    Before we open any drawers, here are the possible outcomes (labelling each box A, B and C and each coin 1 and 2):
    A1A2
    A2A1
    B1B2
    B2B1
    C1C2
    C2C1
    Only the first two win, showing the odds of getting two gold coins before we open any drawers is, indeed, 2/6 or 1/3
    But since we DID get a gold coin, the only equally likely outcomes remaining are
    A1A2
    A2A1
    B1B2
    Since only the first two give us a win, the odds of winning after showing a gold coin in trial 1, is 2/3

    1. Thanks for this. I’m a big fan of listing things out and find that doing so with simpler problems helps develop intuitions for when too much is going on to take the ‘listing out’ approach (though, I admit, I have spent hours doing it with combinatorics problems such that I end up with many pages taped together on the ground, with many branches extending across them). If I ever made a math website, I’d probably call it “inelegant math” :)

Share your thoughts:


Deprecated: Directive 'allow_url_include' is deprecated in Unknown on line 0