*Estimated read time (minus contemplative pauses): 28 min.*

# 1. A Counterintuitive Probability Problem

Mathematician Gil Kalai recently posted the following intuition-bending probability problem at his blog:

*You throw a dice until you get 6. What is the expected number of throws (including the throw giving 6) conditioned on the event that all throws gave even numbers.**

(*The problem is originally due to Elchanan Mossel. The term “a dice” is used here to denote a single die. Some people dislike this usage, though it does not obscure the question.)

Most people give the initially intuitive answer: three. But that’s wrong. I’ll get to the right answer below. Solutions, along with commenters’ expressions of bewilderment and contention, have been posted at Kalai’s blog, Math with Bad Drawings, and at Mind Your Decisions’ blog and YouTube channel. It’s interesting to note that one YouTube commenter accepted the correct solution, and even gave an explanation of it in their own words, then later reverted back to the incorrect answer of three.

Methods for getting the correct solution are often hard to follow due to involving complicated-looking math or relying on background knowledge. I’ll share a solution that I think makes the problem more intuitive, and that only requires a basic understanding of probability, along with a little precalculus. Still, I’ll be sure to review the most relevant concepts as I go along, just in case it’s helpful.

# 2. The Meaning of “Expectation” and Basic Probability Rules

First, we need to be clear about what the problem is asking. I’ll start by clarifying what is meant here by “expect,” given that several commenters have expressed confusion about the word.

In the present context, “expect” has a technical meaning, though it is related to the ordinary usage of the word. The long and short of it is that “expect” here means what you expect to see *on average* over several trials. That in mind, I’ll first consider the ordinary usage, and will tease out an intuitively serviceable technical meaning from that. I’ll also cover some simpler probability problems and will review basic probability rules. This won’t be rigorous or formal, as the central aim is to train intuitions.

If you already know this stuff, skip to the next section: “Solving the Problem.”

Let’s start with a simple variation on the problem at hand: *How many times would you expect to flip a fair quarter to see a Heads, including the Heads flip? *

(I might not always put the “including the Heads” part, but consider it assumed unless otherwise noted.)

The ordinary usage of “expect,” yields several interpretations. For starters, we might think, “Well, as many flips as it takes to exceed a 50% chance of getting at least one Heads.” Ok, so how many would that be? The probability of getting Heads in one flip is obviously 50%. The probability of getting at least one Heads in two flips can be worked out in various ways. The most intuitive is to work out all the possible outcomes in the sample space, than make a ratio of the desired outcomes to all possible outcomes:

**H**,**H**

**H**,T

T,**H**

T,T

That comes out to four possible outcomes, three of which have *at least *one Heads. Therefore, there is a 3/4, or 75%, chance of getting at least one Heads in two flips of a fair coin. So perhaps you might expect to require two flips in order to feel confident, but not overly confident, that you’ll see a Heads.

Before moving on, let’s look at two other ways to solve this problem. Remember that when you want to find the probability of two independent events happening, you multiply their individual probabilities. A coin flip is independent because the result of one coin flip doesn’t depend on any other coin flips. So, the probabilities of the above example are as follows (I’ll denote the probability of something as “P(something)”):

H,H = P(H)×P(H) =(1/2)×(1/2) = 1/4 = .25 = 25%

H,T = P(H)×P(T) =(1/2)×(1/2) = 1/4 = .25 = 25%

T,H = P(T)×P(H) =(1/2)×(1/2) = 1/4 = .25 = 25%

T,T = P(T)×P(T) =(1/2)×(1/2) = 1/4 = .25 = 25%

Notice that if you add them all up you get: 1/4 + 1/4 + 1/4 + 1/4 = 4/4 = 1 = 100%. The probability of all possible outcomes—i.e., the sample space—should always add up to 1.

Now, to find the probability that one or another mutually exclusive events—i.e., events that can’t happen together—will happen, you add them together. So, the probability of getting Heads or Tails in a single flip is P(H) + P(T) = 1/2 + 1/2 = 1. We can use this to answer the question regarding the probability of getting at least one Heads in two flips, by simply adding together all the observations that satisfy that condition:

H,H = P(H)×P(H) =(1/2)×(1/2) = 1/4 = .25 = 25%

Or: H,T = P(H)×P(T) =(1/2)×(1/2) = 1/4 = .25 = 25%

Or: T,H = P(T)×P(H) =(1/2)×(1/2) = 1/4 = .25 = 25%

We now add those up to once again get 75%.

There’s yet another way to calculate this probability. It’s the most important way, because we’ll need it when we have too many possibilities to list out. Remember that the probabilities of a sample space sum to 1. This means we can ask the opposite, or complementary, question, and then subtract that answer from 1: *What is the probability of getting *zero* Heads in two flips?* Well, there’s one scenario where that happens: T,T. That probability is 1/2 × 1/2 (which from here on I’ll denote as (1/2)^2). We then subtract that from 1 to get 3/4.

Ok, now back to “expectation.” 75% sounds reasonable. But what if the stakes were higher? What if you your life depended on seeing a Heads? Then how many flips would you expect? “Hmmm… now I’d like to hit over 80%, or even 99%, certainty of seeing a Heads, just to be safe.” We can figure that out by recognizing that we need to subtract from 1 a certain number of Tails—namely, the number of Tails that has a 1% chance of happening. We can use this equation:

1 − (1/2)^x = .99

Where x is some number of Tails in a row, such that (1/2)^x equals .01.

We could do a guess and check to get this answer pretty quickly:

(1/2)^3 = 1/8 = 7/8 = .875

(1/2)^4 = 1/16 = 15/16 = .9375

…and so on.

An quicker way is to use a logarithm. Just pop “log base .5 of .01” into a calculator like so: log(.01)÷log(.5) ≈ 6.6439. Let’s round to 7 to test:

1 – (1/2)^6 = 63/64 = .984375

1 – (1/2)^7 = 127/128 = .9921875

It works out.

But, of course, 99% is so high that we wouldn’t really *expect* seven flips, though we might expect no more than seven flips. This high probability just bolsters our certainty that we’ll see one or more Heads (though I still wouldn’t bet my life on it!). Notice that getting *exactly* one Heads is a very different question. For example, getting exactly one Heads in two flips has a probability of 1/2, because two out of the four possible scenarios have exactly one Heads. Getting exactly one Heads in four flips can be calculated in the following way. I won’t elaborate much on this because it’s not necessary for the problem at hand, but it’s worth having a look at.

The numerator will be 4-choose-1^{1}, or 4, because any single flip can be a Heads. The denominator will be however many possible outcomes there are, which is this case is 2^4, or 16.^{2}. So, the probability of getting exactly one Heads in four flips is 4/16, or 1/4. To bear this out, here are the possible outcomes that fulfill the desired outcome:

**H**,T,T,T

T,**H**,T,T

T,T,**H**,T

T,T,T,**H**

Each has a probability of (1/2)^4, or 1/16. We add them up to get 4/16, or 1/4. Notice that there must be 12 “losing” scenarios that I didn’t list here, such as H,H,T,T.

Ok, this is all good for reviewing probability rules and helping us warm up our probability intuitions. Now, how do we determine how many flips to “expect,” in the technical sense, before seeing a Heads?

Expectation, in this sense, is not about a single trial. It’s about what you’d likely see on average if you do the trial over and over and over again. In fact, this applies to the above examples as well. When we say that the probability of getting at least one Heads in two flips is 75%, we’re saying that if you flip a coin twice, then flip it twice, then repeat, repeat, repeat, over and over again, and then you look at all the sets of two flips, you’re likely to see, on average, that about 75% of them have at least one Heads. And about 25% of them will be T,T.

That said, here’s the solution. The probability of a quarter landing Heads is 1/2. That means that there will be about one Heads for every two flips in a large set of flips. So, you expect two flips, on average, in order to see a Heads. Notice that 2 is just the reciprocal of 1/2. (A convenient, and highly intuitive, feature of geometric distribution that we’ll take for granted.)

To be abundantly clear, this does *not* mean that if you flip a coin 100 times, you should literally expect to see 50 Heads and 50 Tails. Indeed, the probability of getting exactly 50 Heads in 100 flips is only about 8%. But this is still a higher probability than getting exactly 49 Heads (which means you landed 51 Tails), or any other number of Heads. In other words, the most likely outcome is 50 Heads, though its likelihood is still fairly low. I’ve popped some of this into Desmos to demonstrate, though you need not spend much time on these diagrams; just notice the outcomes.

The diagrams show, from top to bottom, out of 100 flips: The probability of getting 49 Heads and 51 Tails (or vice versa). The probability of getting 50 each of Heads and Tails. The probability of getting from 40 to 60 Heads inclusive.^{3}

The key thing to notice is that the probability of getting exactly 50 Heads is fairly low, but the probability of getting from 40 to 60 Heads is around 96%. This means that if you flip a fair quarter 100 times over and over again, you’ll likely see an average of about 50% Heads across all those trials of 100 flips. For example, suppose one trial has 40 Heads and another has 60; that’s an average of 50 Heads per trial. Also across those 100 trials, which will have much variation, the average for *exactly *50 Heads is expected to be about 8%.

Ok, let’s translate this into a simple question about rolling a die: *How many times would you expect to roll a die to see a 6?* The probability of getting a six in a single throw is 1/6. Therefore, on average, you’ll have about six throws for every appearance of a 6. In other words, you can expect an average of 6 throws in order to see a 6 (as usual, this includes the throw that gives the 6).

This concept is commonly applied for assessing the “expected value” of some event. For example, a game in which you win $12 every time a die lands on a 6. In 60 rolls, you’d expect about 1/6 of them to be a 6. Which means you’d expect to get 60/6, or 10, payments of $12 dollars. So the expected value of 60 rolls would be $120 dollars. This averages out to $2 per roll. That is, the probability of getting a 6 multiplied by the payout for getting a 6: (1/6)×12 = 2.

That’s a simple example. Usually we’d include some probability for loss. Such as: You’ll have to pay a dollar for any non-6 roll. So this means for every six rolls, you can expect to pay $5, but earn $12. In other words, you’ll net $7 per six rolls. This makes the expected value of a given roll 1/6 of $7, or about $1.17. I think most people would play this game if given, say, 120 trials, in which case you’d expect to earn $140: i.e., it should yield about 20 instances of a 6, meaning getting paid $12 about 20 times, which means $240; subtract from the 100 instances of paying $1: $240 – $100 = $140; this is also what you get if you multiply 120 and $7/6.

But would you really expect to earn $1.17 if given only one roll? Only in the technical sense. In the ordinary sense of “expect,” you’d probably expect to lose a dollar.

In a more thorough explanation of expectation, I’d pick apart some other examples, such as the expected number of flips of a fair coin to see your first Heads (it’s the exact same process for figuring out the rolls of a die expected to see your first 6, which I’ll cover below; I figured it doesn’t hurt to see it spelled out with a simpler example). I won’t go deep into this here, but will try to motivate the intuition for how it works, and how it relates to expectation as an average. The idea is that if you play this game hundreds of times, you’d expect to get Heads on the first flip about half the time, on the second flip about a quarter of the time, and so on, with expected outcomes, and their relative frequencies as follows:

**H** – about 1/2 the time (which is just the probability of getting a Heads in one flip);

T**H** – about 1/4 of the time (which is just the probability of getting a Tails and then a Heads);

TT**H** – about 1/8 of the time;

TTT**H** – 1/16;

TTTT**H** – 1/32;

… and so on. If you played the game 10,000 times, you’d have 5,000 wins in the first flip; 2,500 on the second flip; 1,250 on the third; and so on, ad infinitum. The idea now is to do what you always do with an average: add up all the results you have, then divide by the number of results you’ve added up. You do the same thing by simply adding up as follows, where you sum by multiplying the probability of winning in one flip, in two flips, in three flips, and so on:

(1/2)×(1) + (1/4)×(2) + (1/8)×(3) + (1/16)×(4) +… and onward forever. The result is two flips. That is, you’d expect it to take two flips on average to see your first Heads. You can also use this method to figure out the expected number flips to expect on average before seeing two Heads in a row (it’s six).

(To translate this into expected value, imagine you’re getting $1 for each flip each time you play the game. On average, you’ll get $2 per game—e.g., playing the game 10,000 times: 5,000 times you’ll get $1 + 2,500 times you’ll get $2 + 1,250 times you’ll get $3, etc.; divide the resulting sum by 10,000, and the answer will be $2. Notice that this is a weighted average, as encountered in school; in this case, it’s like saying 1 counts as 1/2 your final grade, 2 counts as 1/4, and so on…. resulting in a final average, or grade, of 2. Or just think of it as the sum of possible outcomes: (1/2)×($1) + (1/4)×($2) + (1/8)×($3) + (1/16)×($4) +… and onward forever comes out to an average of $2. ^{4})

At any rate, this gets at some of the difficulty between relating single trials to large sets of trials (which I’ll call an “experiment”), a difficulty which I think served as a source of confusion for many of the commentators I observed discussing the problem at hand. Hopefully that confusion is now resolved, and we can agree that “expect” here is about the result you’ll likely see on average.

We’re now ready to solve the problem in an (I hope) intuitive way.

# 3. Solving the Problem

I’ll first solve it using an experiment involving 120 trials; this will conclude with a computer simulatione. Then I’ll solve it just using math.

Once again, Kalai states the problem as follows:

*Test your intuition: You throw a dice until you get 6. What is the expected number of throws (including the throw giving 6) conditioned on the event that all throws gave even numbers.*

Here’s my statement of it:

You play a game in which you roll a fair die. The rules of the game are:

- If you roll a 6, you win and the game ends.
- If you roll a 1, 3, or 5, you lose and the game ends.
- If you roll a 2 or a 4, you roll again.
- Repeat the above, starting with Step 1, until the game ends (i.e., until you win or lose).

Question: How many times can you expect to roll the die, on average, in those sequences that include a 6? In other words, what will the average number of rolls be in a winning sequence? Put yet another way: What is the expected number of rolls, on average, you’ll need to throw in order to see a 6 (including the 6)?

Some comments on my statement of the problem. Anything I’ve added to the content of Kalai’s statement, I consider to be either blatantly obvious (e.g., it’s a standard 6-sided die), or clearly implied by the original statement. I’ll belabor this a little, given that I’ve seen many complaints about Kalai’s statement being ambiguous. It isn’t.

We’re throwing a fair, six-sided die. This means we have to consider the likelihood of any one of the die’s sides coming up in any given throw. This also means we’ll end up with a large set of outcomes if we run the trial (or “game”) many times, and many of those outcomes will terminate with a 1, 3, or 5. Those trials will be thrown out, and along with them many instances of 2 and 4. But it’s not as though those numbers don’t exist on the die and won’t come up in a trial. It’s just that we’ll only tally from the trials that have a 6 in them—I call the collection of these trials the “winning” set. That’s the set we’re interested in. It will include every 6 that was rolled, but not every 2 or 4, though it will include some of the 2’s and 4’s rolled (i.e., the 2’s and 4’s that preceded a 6). Indeed, 6 is the only number on the die that can make the winning set on its own; this gives it a big advantage. A major theme here will be that we should expect more 6’s than 2’s or 4’s in the winning set.

That said, here are some examples of how the game can go:

You roll a 1 and lose (a probability of 1/6).

You roll a 6 and win (a probability of 1/6).

You roll a 2 and then a 3 and lose (a probability of (1/6)^2 = 1/36).

You roll a 4 and then a 6 and win (a probability of (1/6)^2 = 1/36).

Notice that the probability of getting a single 6 is higher than that of getting a 4 and then a 6. Again, we should expect more 6’s than 2’s or 4’s. (Remember, the intuitive wrong answer assigns 2, 4, and 6 equal outcomes of 1/3 each, and thus predicts getting equal numbers of 2’s, 4’s, and 6’s in the winning set.)

[**I’ll try to say the key idea a couple of other ways:**

You are more likely (twice as likely, in fact) to get a 6 on your first roll than you are to get some number of 2’s and 4’s and then a 6. Per the conditioning rule, all other instances of 2’s and 4’s are thrown out along with the sequences containing an odd number—e.g., sequences such as {2, 2, 1} and {4, 2, 4, 3} must be thrown out.

But the main thing is simply that you’re most likely to have a 6 as the first roll among the winning sequences (e.g., {6} is more likely than {2, 4, 4, 6}). The result is that there are fewer 2’s and 4’s among the “winning” sequences than there are 6’s, and thus a smaller average number of rolls to see 6 than there would be if 2’s, 4’s, and 6’s were represented equally.

The mistake is to think that 2’s, 4’s, and 6’s each have the same probability of showing up in the conditioned space. They do not. You are not being asked to roll a three-sided fair die, but rather a three-sided die weighted in favor of 6.

Another way to put this is: while it’s obvious that you must “ignore” the odd-numbered rolls, it’s not so obvious that you must also ignore a proportion of 2’s and 4’s along with those odd-numbered rolls. But you must, or else you’ll over-count the 2’s and 4’s.

Once you convince yourself of this, the problem ceases to be counterintuitive.]

## 3.1 Solving by Application

Now, consider you play the game 120 times. (Note that I’m using “trial” and “game” interchangeably.) In those 120 games, you expect the following results (I could parse out the sample space in other ways, but this way seems instructive):

[1] 1/6 of the 120 trials will be a 6. (YOU WIN)

[2] Some fraction of the 120 trials will have at least one 2 or 4, and then terminate in a 6. (I’ll figure this out below.) (YOU WIN)

[3] 1/2 of the 120 trials will be a 1, 3, or 5 (that’s P(1) + P(3) + P(5) = 3/6 or 1/2). (YOU LOSE)

[4] Some fraction of the 120 trials will have at least one 2 or 4, and then terminate in a 1, 3, or 5. (I’ll figure this out below.) (YOU LOSE)

The probabilities above should add up as follows: [1]+[2]+[3]+[4] = 1 (or 100%).

The probabilities for [2] and [4] require a bit of precalculus. I don’t have space here to thoroughly explain these techniques, but hopefully what I do explain will suffice. I’ll provide links to explanations for anyone who wants to delve deeper.

For [2], let’s first consider some examples of how things can go. You could roll:

2, 6

2, 4, 6

2, 2, 4, 6

Notice again that these outcomes are all less likely than rolling a 6. The third trial, for example, has a probability of (1/6)^4 or 1/1296, or about .078%. Now, let’s state what can happen in [2] more specifically. To be included in the [2] condition, you could get:

2 or 4, then a 6

Or: 2 or 4, then 2 or 4, then a 6

Or: 2 or 4, then 2 or 4, then 2 or 4, then a 6

Or… well, this can go on forever!

Putting the above in terms of probabilities, where P(2 or 4) = 2/6 = 1/3, and the P(6) = 1/6, the [2] condition may be satisfied as follows:

(1/3)×(1/6)

Or: (1/3)×(1/3)×(1/6)

Or: (1/3)×(1/3)×(1/3)×(1/6)

Or: (1/3)×(1/3)×(1/3)×…… (1/6), with infinitely many (1/3)’s

This creates an infinite geometric series. We now must add it all up, all infinitely many of it. Luckily, there are two simple ways to do it. First, we need to recognize what our first term is in the series. It’s (1/3)×(1/6), or 1/18. Then we need to recognize what that term is being multiplied by each time we perform a new roll of the die. We see that this is 1/3. So, the above can be expressed as:

(1/18)×(1/3)^0 + (1/18)×(1/3)^1 + (1/18)×(1/3)^2 + (1/18)×(1/3)^3…. and so on indefinitely.

(Remember: A number raised to the zero power equals 1.)

We have two ways to add this up. We can use a sigma calculator (this one’s from Wolfram|Alpha):

Or, we can use use a simple formula: a/(1–r), where *a* is the first term and *r* is the common ratio, which we’ve observed here to be 1/3.

Applying the formula: (1/18)÷(1 – 1/3) = (1/18)÷(2/3) = (1/18)×(3/2) = 1/12.

So, [2] will contain 1/12 of our 120 trials.

Now we do the same for [4]. This time I’ll just use the formula. Let’s figure out the first term and the common ratio, where P(2 or 4) = 1/3, and P(1 or 3 or 5) = 1/2. To be included in [5] if you must get:

(1/3)×(1/2)

Or: (1/3)×(1/3)×(1/2)

Or: (1/3)×(1/3)×(1/3)×(1/2)

Or: (1/3)×(1/3)×(1/3)×… (1/2), with infinitely many (1/3)’s

The first term is (1/3)×(1/2) = 1/6, and the common ratio is 1/3. So, using the formula:

(1/6)÷(1–1/3) = (1/6)÷(2/3) = (1/6)×(3/2) = 1/4.

So, [5] will contain 1/4 of the trials.

Let’s add these up and make sure they come out to 100%.

[1] = 1/6 = 20 trials

[2] = 1/12 = 10 trials

[3] = 1/2 = 60 trials

[4] = 1/4 = 30 trials

1/6 + 1/12 + 1/2 + 1/4 = 1 (or 100%)

And: 20 + 10 + 60 + 30 = 120

Looking good.

Ok, so we know all the trial outcome types are accounted for in our sample space. Now we need to refine that space according to the conditions of the problem. The problem is asking how many rolls to expect in order to see a 6 within a trial terminating with a 6. In other words: What is the probability of getting a 6 in a given roll within the winning set—that is, within [1] and [2] combined? This is a key point. Remember that the usual probability of getting a 6 in a given roll (i.e., 1/6) is equal to the probability of getting a 6 within the sets [1], [2], [3], and [4] combined. (Perhaps that’s an obvious point, but it becomes less obvious in the problem at hand, where the probability will not be 1/6.) ^{5} The same concept applies here, but we’re only interested in [1] and [2].

In [1] and [2], we have 30 trials total. Twenty of those were trials in which a 6 was rolled in the first throw. The other 10 trials also contain a 6, but they also each contain at least one or more 2’s or 4’s. (In a moment I’ll consider how many 2’s or 4’s there are.) Now we just have to calculate what fraction of all the winning trials yielded a 6 in the first roll. That is 20 out of 30 trials, or 2/3. So, within the context of the problem at hand, there is a 2/3 chance of rolling a 6.

This means that for every three trials, two yielded a 6 in the first throw. This also means that for every three individual throws in the set of all winning trials, there are two 6’s. All this boils down, on average, to one 6 per 1.5 throws. Notice that this is just the reciprocal of 2/3.^{6} So that’s the answer. You can expect to throw 3/2 times to see a 6, given that the preceding throws, if any, are even.

If this is correct, there must be about twice as many 6’s in the winning set than there are 2’s and 4’s combined. In other words, 1/3 are 2’s and 4’s combined. Those two outcomes are equally likely, so they must each have a probability of 1/6 in the context of this set.

Looking at 2’s and 4’s is useful, I think, because it seems that a big part of why this problem is counterintuitive has to do with overestimating their quantity. As I noted above, there should mostly be 6’s here: every trial here is guaranteed to have a 6 in it. Not so for 2 or 4.

That said, let’s calculate how many of each number to expect in the winning set. We have 30 6’s total. That is 2/3 of 45, so there must be 45 throws total, 15 of which are 2’s and 4’s (i.e., after subtracting the 30 6’s). Two and 4 are equally likely, so there must be 7.5 of each on average.

Let’s put this to the test! I programmed the game into Excel and ran the 120-trial experiment 11 times, with the following results:^{7}

I expected…

- …20 wins with a 6 on the first roll. The average was 20.18.
- …10 wins with a 6 after getting some number of 2’s or 4’s. The average was 9.64.
- …30 wins total. The average was 29.82.
- …66.67% (i.e., around 2/3) of the winning trials to contain a 6 only. The average was 67.65%. Notice the wide range of values for this column, going from 53.33% to 92.59%! But the final average still comes close to the expectation—it overshoots by less than one percentage point (about .98 percentage points, actually). The wide range here has to do with the wide range of 2’s and 4’s (see below), due, I suppose, to throwing so many of those out, leaving a small sample of them. The star here really is 6. It is always well in the majority, usually by a large margin.
- …7.5 of each 2 and 4. I got 7.27 and 6.73, respectively.
- …15 2’s and 4’s total. I got 14 on average.
- …a 50/50 split of 2’s and 4’s. I got close, at around 52/48. Also notice that the range for the 2’s is 14–1=13; and the range for the 4’s is also 14–1=13.
- …6’s to be twice as many as the 2’s and 4’s combined. It was 3.19 times as many. Well, at least it wasn’t half as many (which is what the most common wrong answer predicts). Notice the impact of Experiment 5 here, where 6’s are a whopping 13.5 times bigger! Without that, the average would have been 2.16.
- …6’s to make up 66.67% (or 2/3) of all outcomes. The average here is 68.05%. That’s a difference of 1.38 percentage points. Not bad.
- …an average of 1.5 throws to see a 6. The average here is 1.46. Had I rounded everything to one decimal point, it would have been 1.5, but I opted for a little more precision.

Had I not been convinced before, I would be now! Though it’s still possible I could be making an error in my math somewhere. I’m open to critique.

### 3.2 Solving with Basic Probability Math

Drawing from the probabilities I worked out earlier in this section, we know that, for any given run of this game, there’s a 1/6 chance of getting a 6 on the first throw, and a 1/12 chance of getting some sequence of 2’s and/or 4’s that terminates in a 6. Add those up and you get 1/4. So, there’s a 1/4 chance of winning. In other words, 1/4 of all outcomes (including those with 1’s, 3’s, and 5’s) will be winners.

We now calculate what fraction of the winning trials consists of a 6 only. That is (1/6)÷(1/4) = (1/6)×(4/1) = 4/6 = 2/3. Take the reciprocal to get 3/2 = 1.5.

### 3.3 Solving with Slightly More Complicated Probability Math

[*NOTE: I’m tacking this on some time later. I don’t think I really need it, given that, as I’ve repeatedly emphasized here, I’m interested in bolstering intuitions, not presenting formulaic shortcuts. But I figured I’d include it to be a little more mathematically thorough.*]

Finally, there are at least three other relatively quick and clear ways to calculate the answer. I’ll present them without much elaboration (hopefully they are made sufficiently intuitive by having explored the problem in other ways; though I also realize that for someone new to probability, a deeper explanation of things like expectation and, perhaps especially, conditional probability—and how those things relate to basic, run-of-the-mill probability—would be helpful).

First, you can use a more straightforward calculation for expectation. There’s more than one way to do this. I’ll do it like this:

(the probability of rolling a 6 on first try × one roll) + (probability of getting at least one 2 or 4 and then a 6 × however many rolls this takes) + (probability of getting at least one 2 or 4 and then an odd number × however many rolls this takes) + (the probability of getting an odd number on first try × one roll)

I input this into Desmos as follows to get 1.5:

Note that *n* represents the number of rolls, and the upper limit should be infinity, but Desmos doesn’t have that option, so I used 100 as a stopping point, which is plenty high enough. So, getting a 6 in four rolls would look like: (1/3)^3 × (1/6). Note also that this answers the question of how many rolls to expect on average in a given trial. This includes those trials that end in an odd number. I find this less intuitive than the other explanations I explored, particularly because it seems we’re failing to condition properly. (See more on this below, in the third example, where I solve for *E*).

This brings me to the second way to cleanly calculate the expectation, but in a way that makes the conditioning more obvious, where again *n* is the number of flips it takes to get the job done:This time, I’ve divided (the probability of getting to a 6 either in one flip, or after some number of 2’s and/or 4’s × however many rolls it takes to accomplish this) by (the probability of getting a 6 after getting no or some 2’s and/or 4’s). This equals (the expected number of rolls to a 6, either on the first roll, or when preceded only by 2’s and/or 4’s). This follows a standard approach to conditional expectation, but strikes me as less intuitive than the other ways I’ve explored.

The third way I’ll look at is pretty straight forward and uses a similar method one might use to calculate, say, the number of expected flips to get *n* heads in a row (a fascinating topic in its own right, and one in which Fibonacci numbers make an exciting appearance, but one I won’t delve into here). It’s generally referred to as “first-step analysis.”

Suppose we want to calculate the number of flips expected to get a Heads. Assume this takes, on average, *E* flips. In a given trial, however, we could get Heads on the first try and be done with it. We could also get Tails on the first flip, in which case we now expect the whole process to start over, given that flip results are independent of one another. We can use this to solve *E *(assuming that P(H) = P(T) = 1/2):

*E* = P(H)(1 flip) + P(T)(1 flip + *E* flips)

*E* = (1/2)(1) + (1/2)(1 + *E*)

A little algebra shows that *E* = 2. You can easily do likewise to confirm that you’d expect to roll a fair die 6 times, on average, in order to get a 6. Things get a bit more complicated if we look for say, two Heads in a row or three 6’s in a row. I’ll save that discussion for another day*, and will instead apply this to the problem at hand.

*E* = P(6)(1 roll) + P(2 or 4)(1 roll + *E* rolls) + P(1, 3, or 5)(1 roll)

*E* = (1/6)(1) + (1/3)(1 + *E*) + (1/2)(1)

[*EDIT: I do this for two Heads in a row in this post: “Gambler’s Ruin & Random Walk: Probability, Expectation, Steal the Chips.”]

Some algebra gets us to 1.5 expected flips. Again, while this method strikes me as perfectly intuitive in the case of, say, getting 3 Heads in a row, it is less intuitive to me here, as it’s not immediately obvious to me that we should be including the P(1, 3, or 5)(1 roll). But we need to include it with this approach (as with the first example in this section), because we are construing the problem as asking how long a given trial will last, on average. In other words P(2 or 4)(1 roll + *E* rolls) includes the possibility of ending on a 1, 3, or 5. Each of these results will happen just as often as ending on a 6. So, we treat the problem generally here, simply noting that to get any non-2 or non-4 result after a certain number of rolls will be the same expectation for getting a 6 after a certain number of 2’s and 4’s (if any). This observation makes this approach more intuitive as a valid way of getting to the right answer, but I am of course partial to the other methods I explore in this post for really understanding what’s going on with this problem.

[NOTE: If anything here is still mysterious, I also recommend checking out the comments section below. The essential thing I point out there is that the proper way to understand this problem—that is, as one where we’re given the condition for landing our first 6 given no odd outcomes—is that we end up throwing out all sequences containing a 1, 3, or 5. Several of those sequences will also contain some 2’s and/or 4’s. If we fail to throw out those 2’s and 4’s, we don’t fully condition, and end up with the wrong answer of 3. So, we get the correct answer of 1.5 not by under-conditioning due to “failing to throw out the odd results,” but rather we get 1.5 by thoroughly conditioning, which is to say by throwing out all sequences containing odd results.)]

The second and third approaches I took in this section are essentially those taken by Presh Talwalker at his blog.

# 4. Errors?

Did I get something wrong here? Conceptually? Computationally? If so, let me know!

*Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.*

*Or click the banner to shop at Amazon (at no extra cost: it just gives me some of what would have gone to Amazon).*

#### Footnotes:

- If you need a refresher on combinations, here’s a Khan Academy video.
- If you need a refresher on permutations, you can check out this Khan Academy video
- Go here for a refresher on sigma notation. In the denominator, I’ve put the formula for the aforementioned nCr, which, again, you can brush up on at Khan Academy.
- This brings to mind the St. Petersburg Paradox where, instead of summing
*n*(1/2)^{n}, where*n*is the number of flips from 1 to infinity, you sum 2(1/2)^{n}, which yields startling results.^{n} - This convenient assumption is allowable due to the outcome of the die rolls being independent. I can also calculate this like so (using the Wolfram|Alpha sigma calculator):
For discussions that include more solutions with explanations, see these Mathematics Stack Exchange threads: What is the expected value of the number of die rolls necessary to get a specific number? and On average, how many times must I roll a dice until I get a 6?

You have to be careful with this assumption. Problems involving playing cards, for example, tend to involve dependent events. Consider this question: You have a normal deck of 52 playing cards. You start taking cards off the top. How many cards do you expect to remove in order to see an Ace? The answer isn’t 13, but 10.6. This problem is from Frederick Mosteller’s Fifty Challenging Problems in Probability with Solutions. I recommend the book, but you can also find a discussion of that problem, with beautiful excel simulations, here: Fifty Challenging Problems in Probability; and there’s a short and sweet explanation in this PDF: When Some Tricks Are Trickier Than Others A Collection of Probability Tricks

- A convenient, and highly intuitive, feature of geometric distribution.
- This modest program got the job done:
1. In cell A1, I entered: =RANDBETWEEN(1, 6)

2. In cell B1, I entered: =IF(OR(A1=2,A1=4),RANDBETWEEN(1,6),IF(A1=6,”WIN”,”LOSE”))

3. I then copied and pasted B1 into C1 through H1 (only one of the 1,320 trials made it to H1; it ended in a 1).

4. I then copied and pasted that row, from A1 to H1, so that it covered 120 rows.

5. I then ran the game 11 times, copying and pasting each result into its own sheet for tallying.

Dan’s formulation as a game reveals the fallacy in all these claims of conditional expected times 3/2:

“1. If you roll a 6, you win and the game ends.

2. If you roll a 1, 3, or 5, you lose and the game ends.

3. If you roll a 2 or a 4, you roll again.

4. Repeat the above until the game ends (i.e., until you win or lose).

Question: How many times can you expect to roll the die, on average, in those

sequences that include a 6? In other words, what will the average number of

rolls be in a winning sequence?

This is not the right experiment. The def of conditioning on all throws prior to 6 being even is that you IGNORE any run in which an odd number appears before a 6, rather than averaging in its length. Then the problem indeed reduces to a 3-sided die with sides 2,4,6, and the expected time is 3.

Thanks for your comment, Albert.

What you’re saying would be correct if the question were about, for example, the chance of having rolled a 6 given that a rolled die’s outcome was even: it could be a 2, 4, or 6, so the answer’s 1/3.

The question here, rather, is about the chance of an outcome being a 6 given a sequence of throws consisting either only of a 6, or of some quantity of even-number outcomes preceding a 6. The resulting sequences, or sets, could look like: {6}, {2,6}, {2,2,2,4,6} and on and on forever.

If we count up all the numbers in these sets, the proportion of 6’s to all the numbers is now not 1/6 or 1/3, but 2/3. Why? Because you’re guaranteed to have a 6 in a given set. Indeed, most of the sets will likely consist only of a 6, given that it’s far more likely to get a 6 than to get, say, a bunch of 2’s and 4’s and then a 6. This is on top of the fact that fewer 2’s and 4’s will be included than usually expected, as we’re not counting in our sample space those instances that end in an odd, e.g.: {2,2,4,1}.

So, we throw out the sequences containing odd outcomes, but we don’t exactly ignore them, as they help us figure out how many non-6 even numbers to expect in those sequences we do count.

It’s been a while since I’ve thought about this problem, but I think this captures the gist of the thing. I’d direct you to my article for a more thorough account (at least have a look at the simulation; you might also want to design your own).

Perhaps you don’t think the wording of the initial statement of the question supports this interpretation. I think it does, unambiguously so. In fact, I think you and I do have essentially the same interpretation, but you are keeping too many 2’s and 4’s after throwing out the sequences containing odd numbers. On top of this, we have other good reasons to believe that this is the interpretation intended by the person who initially posed the question (see the links at the beginning of my article). Hopefully, though, the controversy surrounding this problem isn’t just about what the question is actually asking.

That in mind, and going a step further, suppose we were to agree on the execution of that interpretation, as I formulate it in the quote you included from my article. Would you then agree that the answer should be 2/3 (rather than 3)? I slightly edited that breakdown just now, after noticing Step 4 seemed ambiguous. I’ll include that here, plus a little more for context:

Kalai states the problem as follows: You throw a dice until you get 6. What is the expected number of throws (including the throw giving 6) conditioned on the event that all throws gave even numbers.

Here’s my statement of it:

You play a game in which you roll a fair die. The rules of the game are:

Question: How many times can you expect to roll the die, on average, in those sequences that include a 6? In other words, what will the average number of rolls be in a winning sequence? Put yet another way: What is the expected number of rolls, on average, you’ll need to throw in order to see a 6 (including the 6)?

Yes, I’m finally persuaded of the 3/2 conditional expected time answer. It was engaging to watch several eminent colleagues get sucked into a shared intuitive fallacy with me, as seems to be common with this problem.

Thanks for your extended response.

Here’s how I like to intuit this problem:

Let X be a random variable describing the number of rolls of a fair die up to and including the first 1, 3, 5, or 6. This is a geometric random variable with p = 4/6, so EX = 1/p = 1.5.

I think it’s fairly intuitive (by symmetry) that E[X|roll ended in 6] = EX. All that remains is to convince yourself that E[X|roll ended in a 6] captures the event space of interest.

very nice approach!