*Estimated read time (minus contemplative pauses): 18 min.*

I recently received a nice email from probability expert Henk Tijms, author of several intriguing looking books on probability, most recently *Basic Probability: What Every Math Student Should Know* (2019). It’s a short and reasonably priced book that seems to do exactly what it promises. From the preface:

This book is written for high school and college students learning about probability for the first time. … Key features of the book are conditional probability and Bayesian probability, striking applications of the Poisson distribution, and the interface between probability and computer simulation.

As someone who taught myself probability, I love that the book dives right into a straightforward discussion of combinatorics and essential results from calculus. The details of these relationships are often skipped over or taken for granted, so knowing about them in advance would have saved me some hassle—such as when I first endeavored to watch watch lectures online geared towards math or STEM majors, or, for one of many specific examples, when I first encountered the 100 Prisoners 100 Boxes problem.

That problem, by the way, appears in Professor Tijms’s other (!) 2019 book, *Surprises in Probability: Seventeen Short Stories*. Which brings me to the problem I’d like to talk about today.

In his email to me, Tijms shared an interesting problem from that book. From the email (slightly edited):

The Seven Dwarfs Dormitory Problem

Each of the seven dwarfs sleeps in his own bed in a shared dormitory. Every night, they retire to bed one at a time, always in the same sequential order, with the youngest dwarf first and the oldest last. On a particular evening, the youngest dwarf has had too much to drink and is in a jolly mood. He does not choose his own bed, but randomly chooses one of the other six beds to fall asleep on. As each of the other dwarfs retires, he chooses his own bed if it is not occupied, and otherwise chooses another unoccupied bed at random. What is the probability that the oldest dwarf sleeps in his own bed?The answer is 5/12.

How to solve it? Consider first a slightly different version of the problem: the youngest dwarf chooses randomly one of the seven beds, including his own bed. This problem is the famous lost boarding pass problem in disguise:

One hundred passengers are lined up to board a full flight. The first passenger lost his boarding pass and decides to choose a seat randomly. Each subsequent passenger (responsible enough to not lose their boarding pass) will sit in their assigned seat if it is free and, otherwise, will randomly choose a seat from those remaining. What is the probability that the last passenger will get to sit in their assigned seat?The answer to this problem is 1/2. This answer remains true whatever the number of passengers is. A simple intuitive explanation without mathematical formula is as follows. The key observation is that the last free seat is either the seat of the first passenger or the seat of the last passenger. This is an immediate consequence of the fact that any of the other passengers always chooses their own seat when it is free. Each time a passenger finds their seat occupied, the passenger chooses a free seat at random and then the probability of the first passenger’s seat being chosen is equal to the probability of the last passenger’s seat being chosen. It is also true that the first passenger chooses their own seat with the same probability that they choose the seat of the last passenger. Thus the last free seat is either the seat of the first passenger or the seat of the last passenger, each with a 50% chance.

The answer 5/12 for the first seven dwarfs dormitory problem is now easily explained. The probability is 5/6 that the youngest dwarf does not choose the bed of the oldest dwarf. Suppose that the youngest dwarf chooses the bed of dwarf number 3 (the dwarfs are numbered as 1 to 7 according to the order they go to bed). Then the problem is reduced to a five-passenger lost boarding pass problem with dwarf number 3 in the role of the first passenger. In this conditional reduced problem the oldest dwarf chooses his own bed with probability 1/2, and this same probability applies when the youngest dwarf chooses the bed of the

kth dwarf for anykwithkbetween 2 and 6. Conclusion: the probability that the oldest dwarf will sleep in his own bed is

(5/6)x(1/2)=5/12.

Nice! Maybe everything has already clicked for you, or maybe you’ve seen this problem before, such as when it was posed to readers in 2018 at *FiveThirtyEight*. Personally, I had to do a bit more thinking for it to solidly click. In case it’d be of help, I’ll share my thought process.

More specifically, I needed to think more about the Lost Boarding Pass problem. Which, again, is essentially:

*100 people board a 100-seat plane. Each person is assigned a seat, but the first passenger to board picks a seat at random. Each subsequent passenger sits in their own seat unless it’s occupied, in which case they randomly choose an open seat. What is the probability that the last passenger to board finds their seat empty? The answer is 1/2. This answer holds for any number of passengers greater than or equal to two.*

Honestly, I struggled at first to find 1/2 intuitive, despite seeing explanations such as Tijms’s. Before saying why, I’ll need to say a little more about the problem.

Let’s reduce it to four passengers. We could give them names and have them board in whatever order they happen to line up. Instead, just call them 1, 2, 3, 4, which is the order they’ll board in.

For convenience, let’s arbitrarily label their assigned seats A, B, C, and D, such that passenger 1 is assigned seat A, and so on in order.

Now, passenger 1 boards and gets us into this interesting mess by choosing a seat at random. Everyone else sits in their assigned seat if its open. Otherwise, they pick a seat at random. *What is the probability that passenger 4 finds their seat open?*

An intuitively appealing, but technically wrong, approach is to count up the “successful” outcomes (i.e., ones in which 4 finds their seat open), and to put that in ratio with the number of ways things can go in total. The reason this is technically wrong is that not every outcome has the same probability.

For example, there’s a 1/4 probability that passenger 1 sits in seat A. Then everyone else goes to their assigned seat, including passenger 4. Another successful outcome is if passenger 1 sits in seat B, then passenger 2 sits in seat A, with a probability of (1/4)(1/3) = 1/12.

We could easily list all the possibilities, or make a probability tree, and work out the respective probabilities as above. (I’ll do all that in a moment.) But we can’t just say: there are four successful scenarios and four unsuccessful scenarios, therefore the probability of succeeding is 50%.

To see what I mean, imagine you have a biased coin that lands Tails 60% of the time. It’d clearly be wrong to determine from the sample space of {HH, HT, TH, TT} that the probability of getting two Heads in a row is 1/4. The actual answer is (2/5)(2/5) = 4/25, or 16%.

If the coin were fair, it’d be a different story. Each event would have the same outcome, and you’d be able to look the possible outcomes and correctly say “you get HH one in four tries,” or choose whichever wording your philosophical commitments (if any) require.

Anyway, this gets at the heart of what thwarted my intuition in the Lost Boarding Pass case. I kept thinking, “But the person getting on the plane *could* get in their own seat…. or not…. and so could the next person… and many of these paths have different probabilities!” And the usual explanations I see for this problem weren’t getting my intuition past this (though the math is straightforward enough).

And then I noticed that there *is* a way you can evaluate the results by comparing successes and failures in the sample space. It turns out that, for every successful outcome, there is a corresponding *equiprobable* failure where passenger 4 switches seats with whoever is in seat A! And it’s obvious that this is generalizable no matter how many passengers there are!

Here are all the possible arrangements for four passengers. Keep in mind that this is not merely a matter of listing all the ways to permute four numbers. That’s because each passenger will take their own seat if it’s open. This rules eliminates many permutations (16, in fact, in the present example).

Successful outcomes are green, failures are red. Probabilities are to the right (I’ll say a little more below about how I calculated them):

A | B | C | D | Probability |

1 | 2 | 3 | 4 | 1/4 |

4 | 2 | 3 | 1 | 1/4 |

2 | 1 | 3 | 4 | 1/12 |

4 | 1 | 3 | 2 | 1/12 |

3 | 1 | 2 | 4 | 1/24 |

4 | 1 | 2 | 3 | 1/24 |

3 | 2 | 1 | 4 | 1/8 |

4 | 2 | 1 | 3 | 1/8 |

Notice that we have equiprobable pairs of success–failure. In a basic arithmetic sense, this supports the answer of 1/2 because the green figures add up to 1/2 (which is to say that the red figures do as well). But the more important insight are the pairs themselves. In particular, how the rules of the game play out so that they emerge.

Let’s examine some of the scenarios more closely, including how the probabilities are calculated. (Interestingly, the story that arises here is essentially the same explanation that I’ve struggled to internalize, such as the one Tijms gives; the diagrams bridge the gap for me.)

In the first row, passenger 1 goes to seat A—i.e., their assigned seat—with probability 1/4. Then the rest of the passengers go to their own seat, each with probability 1.

In the second row, passenger 1 goes to seat D with probability 1/4. Then the rest of the passengers go to their own seat, each with probability 1. Except for passenger 4, who must go to seat A (with probability 1).

In the third row, passenger 1 goes to seat B with probability 1/4. Then passenger 2 (who has three seats to choose from) goes to seat A with probability 1/3, after which the following passengers go to their assigned seat, each with a probability of 1. This gives (1/4)(1/3)(1)(1) = 1/12.

In the fourth row, passenger 1 goes to seat B with probability 1/4. Then passenger 2 (who has three seats to choose from) goes to seat D with probability 1/3, after which passenger 3 goes to their assigned seat C (with probability 1), leaving open only seat A for passenger 4. This gives (1/4)(1/3)(1)(1) = 1/12.

And so on for the rest.

We multiply, by the way, because these are “and” events. This feature is more obvious with a probability tree that starts with four equiprobable branches, where passenger 1 sits in seat A, B, C, or D. From each of those outcomes, extend branches to show where passenger 2 sits, and so on. Then multiply together the probabilities of each step along each path:

This time I gave the final probabilities the same denominator. You can see that some events weigh more than others, which is to say that there are not 24 equiprobable events of which 12 are successes. Rather, there are eight events of systematically varying probabilities, of which four are successful, and those four account for half the “weight” of the eight events.

It’s also interesting to note that the most-likely winning scenario is the one in which passenger 1 sits in their assigned seat. While the *least*-likely winning scenario is the one with the most branches—i.e., the one in which only passenger 4 sits in their own seat. (For the corresponding least-likely way to fail, nobody gets their own seat.)

Notice also that the number of permutations for these four passengers, including the 16 invalid ones with a probability of zero that are not represented in the tree, come out to 4! = 24. That gives the common denominator. This generalizes to any number of passengers because, for the path with the most branches, you have, where *p* is the number of passengers: *p* options, then *p−*1 options, down until you hit that two-branch split at the end, after which the last (or *p*th) passenger always has one option.

I could spend more time contemplating that tree, but let me get to what I find to be the real intuition-building observations—which, again, I’ve encountered before and and in some cases even intuited, but their collective meaning didn’t come into full focus until I’d played around with some diagrams.

After any passenger sits in seat A, all following passengers except for passenger 4 are guaranteed their assigned seat. This obviously generalizes to as many passengers as you like (as long as there are at least two).

After any passenger (whose not passenger 4) sits in seat D, all following passengers will find their assigned seats open, except for 4, who must go to seat A. In other words, if anyone but passenger 4 sits in seat D, it means nobody has yet to sit in seat A, which means seat A will then be waiting for passenger 4.

Those two observations, along with the diagrams, are enough to *feel* the probability is 50%. That is, if those observations make sense, it’s not hard to convince yourself that the following makes sense, and is generalizable to as many passengers as you like.

After all passengers are boarded, there will be someone in seats A and D. One of those people will definitely be passenger 4. In other words, it is impossible for passenger 4 to end up anywhere but seat A or D. This is because, if anyone ever occupies seat A before passenger 4 boards, then passenger 4 will get seat D. But if nobody occupies seat A before passenger 4 boards, then passenger 4 clearly has to sit in seat A.

You can see all this in the diagram. And, again, in any branch in which someone sat in seat A so that passenger 4 gets seat D, had that someone sat instead in seat D, everything else would have played out exactly the same, except passenger 4 would have to sit in seat A. In other words, once again, the two scenarios are nearly identical, with the same probability every step of the way, except the passengers in seats A and D are switched (see again the table).

Since every possible successful path has an equiprobable instance of switching in this way, the probability of passenger 4 ending up in D or A in any instance of such a pair is 1/2.

In other words, for any given possible arrangement, it’s a coin toss whether the seating for A and D have passenger 4 in A or D. This intuitively amounts to half the overall probability going to passenger 4 succeeding.

All this can of course be formalized mathematically—for that, see Chapter 7 of Tijms’s aforementioned *Surprises in Probability*. But what I’m concerned with here is the intuition, which, for me at least, is now solidified and easily extended to The Seven Dwarfs Dormitory Problem.

Namely, there’s a 5/6 probability that dwarf 1 leaves the bed of dwarf 7 open. In that case, it is now just a Lost Boarding Pass problem involving some number of “passengers,” where that number depends on which bed dwarf 1 stole.

For example, in the above email excerpt, Tijms notes that if dwarf 1 takes dwarf 3’s bed, you end up with a five-passenger Lost Boarding Pass problem. In other words, dwarf 2 gets into his own bed (with probability 1), and then dwarf 3 is now “assigned a boarding pass” for dwarf 1’s empty bed (whether or not dwarf 3 realizes it). Whether dwarf 7 gets his bed will depend on what dwarfs 3 through 6 do, exactly as with a five-person Lost Boarding Pass scenario.

More generally, after dwarf 1 makes his choice, so long as he doesn’t choose dwarf 7’s bed (with 5/6 probability), there will always then follow a Lost Boarding Pass scenario. As we’ve shown above, any such scenario will have a 1/2 probability for dwarf 7 finding his own bed empty.

This gives (5/6)(1/2) = 5/12.

To drive this home, break it up this way. There’s a 1/6 chance dwarf 1 takes dwarf 2’s bed, followed by a 1/2 chance dwarf 7 gets his own bed. That’s a probability of (1/6)(1/2) = 1/12. The same goes for dwarfs 3, 4, 5, and 6. Add those up:

(1/6)(1/2) + (1/6)(1/2) + (1/6)(1/2) + (1/6)(1/2) + (1/6)(1/2) =

1/12 + 1/12 + 1/12 + 1/12 + 1/12 = 5/12.

That’s pretty satisfying. But it’d be nice to look more closely at what’s going on here by relating it to the above four-passenger Lost Boarding pass example. Along the way, I’ll offer some less elegant, though I think instructive, solutions.

The basic goal is to condition on the knowledge that passenger 1 did in fact steal someone else’s seat. This knowledge demands a revision of of the original sample space, which, again, is represented by the following tree:

To update that, eliminate the path of passenger 1 selecting seat A. That leaves three options for passenger 1, each of which is still equally probable and must add up to one, so change (or “normalize”) those from 1/4 to 1/3, and adjust the final probabilities accordingly (once again, using a common denominator for easy adding):

We can calculate the probability of passenger 4 finding their seat open by adding the green figures in the tree: 2/18 + 1/18 + 3/18 = 6/18 = 1/3. This is the same answer we get by using the more elegant solution of (2/3)(1/2) = 1/3. (Notice that this is analogous to the (5/6)(1/2) = 5/12 in the dwarf case.)

Now for some deeper observations.

The probability is reduced here from 1/2 to 1/3. We expect a reduction, because we’ve lost the most probable way to win, which was 1/4. (Also notice that the last row is now the only one that doesn’t have a corresponding “switch” scenario between seats A and D.)

Of course, you can’t just subtract 1/4 from 1/2 to get the updated probability. Actually, you subtract 1/6, or, more specifically, (1/3)(1/2). This follows from the aforementioned point that you can add up the new branch’s probabilities separately:

(1/3)(1/2) + (1/3)(1/2) = 1/6 + 1/6 = 1/3.

This is essentially the same as (2/3)(1/2) = 1/3.

(Recall that I did this above with the dwarf case to get 5/12.)

Where the subtraction comes in is when we add up all the options, including the case where passenger 1 choose seat A (which naturally has the same conditioned probability as the last row, which is its corresponding “switch” outcome):

(1/3)(1/2) + (1/3)(1/2) + (1/3)(1/2) = 1/6 + 1/6 + 1/6 = 1/2. But we want to get rid of that last 1/6, so we subtract it: 1/2 − 1/6 = 1/3.

(We can also easily relate this mathematically to the more elegant solution (2/3)(1/2) = (3/3 − 1/3)(1/2) = 1/2 − 1/6 = 1/3.)

This generalizes. So, in the dwarf case, we can also subtract 1/2 of the updated 1/6 probability: 1/2 − (1/6)(1/2) = 1/2 − 1/12 = 5/12.

That said, there is a way in which we can in fact simply subtract the original 1/4 from 1/2. But we have to keep our odds straight, which is to say our ratio of wins to losses. In other words, we start with a 1/2 : 1/2 wins-to-losses ratio (or 1 : 1 or, popularly, “50 : 50”).

We can subtract 1/4 from the wins like this:

(1/2 − 1/4) : 1/2 = 1/4 : 1/2.

If we normalize that to add up to 1, we get 1/3 : 2/3, which is our new probabilities for winning and losing. And it’s of course the same ratio as 1/4 : 1/2 (i.e., 1 : 2).

(To normalize, divide 1/4 and 1/2 by their sum, 3/4.)

This will work with any scnario. Here it is with the dwarf case:

(1/2 − 1/7) : 1/2 = 5/14 : 1/2 = 5/12 : 7/12.

(To normalize, divide 5/14 and 1/2 by their sum, 12/14.)

You have to be careful about the proportions when making these moves. We’ve been starting with 1/2 : 1/2. But suppose you multiply those each by 2 to get 1 : 1, and you use that as a starting point. Then you can’t just subtract, say, 1/4. You have to subtract (2)(1/4) = 1/2. This gives (1 − 1/2) : 1 = 1/2 : 1 = 1 : 2. To get 1 : 2 to add up to one (as any probability distribution must), divide both sides by their sum to get 1/3 : 2/3. And there again we have the conditioned probabilities for winning and losing, given that passenger 1 doesn’t choose their own seat.

For fun, I’ll wrap up by addressing these scenarios with Bayes’s theorem, which is:

I’ll fill this in as follows for the four-passenger Lost Boarding Pass case.

**P(L)** = probability the last in line gets their own seat. There’s no conditioning here, so this is regular old 1/2.

**P(F)** = probability the first in line chooses their own seat. This is 1/4 here. But we want the negation of this: the probability the first in line does *not* choose their own seat. Represent this as **P(¬F)**. This is 3/4 for the present case, as there are four seats to choose from, three of which are not passenger 1’s.

**P(L|¬F)** = probability that the last in line gets their own seat given that the first in line doesn’t choosing their own seat. We might also word this as “conditional on the first in line not choosing their own seat,” or any number of other ways, depending on our perspective. This is what we’re trying to calculate. (Let’s pretend we don’t already know it’s 1/3.)

**P(¬F|L)** = probability we should update P(¬F) to on learning that the last in line got their own seat. Since we know that one of the winning paths was followed, we need only look at how much weight P(¬F) carries within those probabilities. The easiest way to do this is to look at the weight P(F) carries overall, and subtract that from 1/2, since we know all the probabilities for all the winning outcomes add up to 1/2.

In other words, P(F|L) + P(¬F|L) = 1/2 in the original, unconditioned tree. But we need to update these to account for the conditioning. We know that half that equation comes from the P(F), because it starts with a probability of 1/4. Which is to say that P(F|L) = 1/4 and P(¬F|L) = 1/4, and these add to 1/2. But we need them to add to 1. So divide everything by 1/2 (or just double everything) to get P(F|L) + P(¬F|L) = 1/2 + 1/2 = 1. In other words, we have our answer: **P(¬F|L)**= 1/2.

This reasoning is more illuminating when applied to the dwarf case, which I’ll do in just a moment. First, let’s pop the above figures into Bayes’s theorem:

Looks good.

Now for the dwarf case. The **P(L)** is still 1/2, and **P(¬F)** is clearly 6/7 (because he starts with seven beds to choose from, and six of those aren’t his).

The harder question is **P(¬F|L)**, but it’s not too bad. Once again (and as always), P(F) + P(¬F) = 1/2. But the proportions are different than above, because P(F) starts here as 1/7. In other words, before conditional updating, P(F|L) = 1/7 and P(¬F|L) = 5/14. This is because 1/7 + 5/14 = 1/2. (Or notice that 1/2 − 1/7 = 5/14.)

So we now normalize that so that it adds up to 1, which, again, means dividing each of those figures by their sum of 1/2 (or just double them). This gives (1/7)/(1/2) + (5/14)/(1/2) = 2/7 + 5/7 = 1. The updated **P(¬F|L)** is 5/7.

Pop those into the formula:

We can of course do all this with 100 passengers as well, conditional on passenger 1 not picking their own seat:

(98/99)(1/2) = 49/99

1/2 − (1/99)(1/2) = 49/99

(1/2 – 1/100) : 1/2 = 49/100 : 1/2 = 49/99 : 50/99

For Bayes’s formula, I’ll input my steps for **P(¬F|L)** right into the formula:

And there you have it.

I won’t get any deeper into Bayes’s theorem here, but will note that it’s not nearly as mysterious as it’s often made out to be, and that if you sit and think about it for a while, it makes a lot of sense. Once you understand it, you needn’t memorize it or its common extensions. See, for instance, my discussion of the famous Bertrand’s Box Paradox, where Bayes’s theorem emerges from simply thinking through a fairly simple conditional probability problem.

I’ll also note that there’s nothing “Bayesian” about the present problem in the philosophical sense. It is, rather, a straightforward conditional probability scenario that Bayes’s formula is a tidy way to give us yet another means for thinking about.

Also be sure to have a look at Tijms’s latest book *Basic Probability: What Every Math Student Should Know* and let us know what you think.

I’d like to thank Professor Tijms for inspiring me to finally spend some time with this fun problem!

*Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.*