In a post called “(Counterintuitive?) Dice Probability: Sue vs Bob,” I promised to cover two more dice problems that build on the tools developed in that post. They are:
How many rolls of a fair die are expected in order to see a 6 (including the roll that yields the 6)?
How many rolls expected to see all six sides of a fair die?
The first question is a good opportunity for getting a handle on expectation, and the second is a nice introduction to the so-called coupon collector’s problem. I’m going to split these into two posts, starting with the first question.
For most of this, the only prerequisite you’ll need is an intuitive handle on the most basic rules of probability—i.e., for adding the probabilities of mutually exclusive events and for multiplying the probabilities of independent events.
Expected Rolls to See 6: A Brief Review of Expectation
“Expected” means average. So we might restate the question: If you roll a fair die over and over again, how many rolls would it take, on average, to see your first 6? There are other ways to word or conceptualize this. I’ll shuffle freely between them.
Expectation generally amounts to a weighted average, a concept I’ll review here from the ground up.
That in mind, a good start is to imagine how things can go, and with what probabilities:
1st roll is 6 with probability 1/6
1st roll is not-6 then 2nd roll is 6 with probability (5/6)(1/6) = 5/36
1st and 2nd rolls are not-6 then 3rd roll is 6: (5/6)(5/6)(1/6) = 52/63
1st, 2nd, and 3rd rolls are not-6 then 4th roll is 6: (5/6)(5/6)(5/6)(1/6) = 53/64
The pattern is clear. It’s easy to see, for example, that your fifth role yields your first 6 with probability 54/65. Generalizing this into a formula is no problem. Where the number of rolls is n, and 6 happens on the nth roll (with probability 1/6), and all the rolls before that (i.e., n–1) are each not-6 (with probability 5/6):
(We could also easily deal with the case of a non-fair die by, say, replacing 1/6 with p and 5/6 with 1-p. I’ll do this below.)
Notice that, in our formula, when we get 6 on the first roll, 5/6 is raised to the power of 0, which equals 1.
How do we use the formula to find the expected number of rolls to see a 6? We essentially do what we usually do for finding an average. We add up a bunch of values and then divide by however many values we added up. It’s not yet clear what those values will be, but it should be intuitive that they’ll have something to do with numbers of rolls: 1 roll + 2 rolls + 3 rolls… Hopefully it’s also intuitive that, while this is on the right track, it’s incomplete.
The critical thing to notice is that the probability of each possible outcome is different. Or don’t ‘weigh’ the same, let’s say. You’ll see your first 6 on the second roll far more often than you’ll see it on the 237th roll. So, this is a weighted average. But weighted averages are just averages, which is to say they work the same as any other average. To see this, let’s work up from some simpler examples.
Suppose you want the average of these eight numbers: 2, 3, 3, 3, 3, 7, 15, 15.
Add them up and divide by 8:
Better yet, simplify the numerator by multiplying the repeated numbers:
In that form, we’ve weighted the 3 and 15 according to their representation in the set. The 3 weighs four times as much as the 2 and twice as much as the 15.
This is how test scores are differently weighted as well. If you have three tests, where the final counts for 50% of your class grade and each of two midterms counts for 25% (notice that this sums, as it must, to 100%), then we can say that the final weighs twice as much as a given midterm.
Suppose someone in that class scores 90, 92, and 95 on their midterms and final, respectively. One way to average these is to enter each into the numerator, add them up, and divide by 4. But you must enter the final score twice, as it weighs twice as much (thus our dividing by 4 and not 3):
Or you can multiply, as we did above, and just multiply the 95 by 2:
Here’s where I’ll start making it look like what we need for the die question. Notice that each score has a common denominator of 4. This means we can break it up like so:
We can simplify the last term by cancelling the 2 against the 4 (i.e., 2/4 = 1/2):
To really emphasize what’s going on here, pull each fraction out from each term:
Notice that this is just saying (25% times 90) plus (25% times 92) plus (50% times 95). Those are the weights given at the start! So we obviously could have just started with that: (.25)(90) + (.25)(92) + (.5)(95) = 93.
Notice that one way to describe this is to say that we’re adding up a quarter each of 90 and 92, and half of 95. And, again, notice that this all adds up to 100%, which as a multiplier is just 1. That is, 1/4 + 1/4 + 1/2 = 1. Or, if you prefer: .25 + .25 +.5 = 1.
There’s yet more worth noticing here. For example, that the following setup works just as well, as it results in the same fractions once simplified:
All that matters is that the ratio between weights stays proportional. Notice that the weights here are technically 3/12, 3/12, and 6/12—which still comes out to .25, .25, and .5.
In other words, when finding an average, the weights in the numerator must add up to the value in the denominator. This is nothing mysterious or new. It’s exactly how you’ve been doing averages since elementary school. It’s just that, when all elements in the set weigh the same, they have an invisible 1’s in front of them, which of course add up to the number of elements in the set. That number goes in the denominator.
But, for instance, in the example above, the weights add up to 3 + 3 + 6 = 12. So 12 goes in the denominator. To make it look like an elementary school average, simple list out all the elements in the numerator (each of which is now being multiplied by an invisible 1):
We can do this with fractional amounts as well. For example, we could say that 90 happens .25 times; 92 happens .25 times; and 95 happens .5 times. You can, if you wish, add up that number of instances (.25 + .25 + .5) to get 1, then divide by 1:
There’s just no need to do that once we’ve gotten down to the decimal or fractional amounts, because those will just add up to 1.
Expected Rolls to See 6: Solved as Weighted Average
Assuming weighted averages are now no more mysterious than your typical elementary school average, let’s return to the die formula:
Recall that this will tell us the probability of seeing 6 (or whichever side you like) on the nth roll.*
*More technically, it’s the probability mass function of a geometric distribution with parameter p = 1/6, where there are n-1 failures and one success. To go a little deeper into that, here’s a neat video from an excellent YouTube channel:
For example, seeing your first 6 on the eighth roll—so that n = 8—would mean we got seven failures and then a success:
What we now want is the expected—i.e., average—number of rolls to see the first 6 (including the roll of the 6). We know that we can land our first 6 in our first or second or third or fourth or… (this has no upper bound, but let’s hold off on worrying about that).
Again, we intuit that, some way or another, we’ll be adding those roll amounts together: 1 + 2 + 3 + …
Or maybe this isn’t intuitive. Or, at least, why we’re adding the number of rolls together may not have been obvious at first glance. Hopefully it’s now clearer after having reviewed weighted averages above, which is, in fact, essentially what we need here. (And, as above, I’ll shuffle between different ways of talking about the phenomena in question. I find it helps with understanding what’s going.)
We know that some fraction of the time we’ll land 6 in one roll, some fraction of the time in two rolls, and so on. These fractions get smaller and smaller as n grows because, again, we’ll land our first 6 on the second roll far more often than on the 237th. These fractions are also just the probability of that particular outcome occurring—e.g., of landing our first 6 on the eighth roll. The fraction (or ‘proportion of time’) that a given event happens also is the weight of that given outcome, relative to the other outcomes (something that happens 1/6 of the time ‘weighs’ twice as much as something that happens 1/36 of the time).
So this is nothing but a weighted average, where the thing being assigned a weight is any given number of rolls that can occur. A weight is just the probability of a given number of rolls occurring.
In other words, you’ll see 6 on your first roll 1/6th of the time, because there’s a 1/6 probability of getting a 6 in one roll. Which is to say this event accounts for 1/6 the weight of all possible outcomes combined (that grand total of combined weights, as always, is 1, as this is simply the probability that one of the possible outcomes occurs). Simply put, getting 6 on your first roll weighs 1/6.
(Weighs 1/6 of what? Of the set of all possible outcomes, also known as the sample space. Each possible outcome, or event, is a subset of that sample space.)
You’ll see 6 on your second roll (5/6)(1/6) = 5/36th of the time, and so on.
I’m going to repeat myself with slightly different language, because I’ve so often seen such explanations fail to click. I’d rather over- than under-explain.
Another way to put things is that 1/6 and 5/36 are just the respective probabilities of the above events occurring.
We can also say that those probabilities represent the proportion of time we expect those events to happen. Which in terms of averages and expectation, is to say that the probability of an event happening is the weight of that event, and vice versa. In yet other words, the representation of the event in the set is just the proportion of time that it happens, which we also call its probability.
Hopefully I’ve sufficiently belabored the point.
Now, just as with the weighted test scores above, we just need to add up 1 + 2 + 3 + … , but with each of those numbers (each of which represents a quantity of die rolls) multiplied by its probability of occurring. This will give us the expected rolls to see our first 6. In other words:
There’s no theoretical end to this series. So we obviously aren’t going to try to list out every single possible number of rolls. One way to get a precise answer is to add up to a comparatively small number of rolls for n. Up to 100 rolls should be plenty. I popped that into Desmos to get:
(Here’s a clear and simple review of sigma notation: Math Is Fun: Sigma Notation.)
Very close to 6. Let’s call it 6. Technically the upper bound should be ∞ but Desmos doesn’t have that feature. But 100 is plenty.
Expected Rolls to See 6: Two Shortcut Formulas
To make totally sure, though, we can use an even simpler formula to find the answer:
Where p is the probability of success. To be clear, this gets you the number of expected failures. In this case:
So we expect five failures, and then a success on the sixth attempt.
That formula can be derived from the above sigma formulation, but I think explaining how extends beyond the scope of this post, as it requires some calculus, which I’m not assuming a knowledge of here.** Besides, there’s a simpler way to derive it involving yet another method—and my favorite, in fact—for solving the question of how many rolls we expect for landing our first 6. I’ll get to that method in a moment.
**Here’s a proof in case you’re interested. Notice that n starts here at 0, so there are n failures instead of n-1 failures. And, to further generalize, let p = probability of success, while 1-p is probability of failure:
I adapted the above from the Wikipedia entry “Geometric Distribution,” where it’s noted, as a matter of good math hygiene, that “The interchange of summation and differentiation is justified by the fact that convergent power series converge uniformly on compact subsets of the set of points where they converge” (accessed 2/4/20).
First, here’s another simple formula for solving the problem:
That’s just the reciprocal of 1/6. I’ll derive this one, as well as the previous one, after covering the method I referred to above as my favorite. Here goes.
Expected Rolls to See 6: First-Step Analysis
I’ve seen the shortcut-formula approach mislead with trickier problems, or at least make it too easy to make unwarranted assumptions (see the problem I refer to as counterintuitive in the “Further Reading” section below). Here is, I think, a safer approach, called first-step analysis, that essentially amounts to thinking things through logically. Which is to say that it might take a bit of sitting and staring for it to click (a good thing to do, as there are trickier problems made easier by using this sort of approach as a starting point; I link at the end of this post).
This approach is my favorite because it’s almost as simple as using a little formula, but is still transparent and thus more adaptable to similar problems and less prone to mistakes. From this approach, I will also be able to derive the above two shortcut formulas.
Let E represent the expected number of rolls to see the first 6 (including the roll that lands 6). You get the 6 on the first roll with probability 1/6. If not, you roll not-6 with probability 5/6, and then you reset your expectation (as each outcome is independent), so now you expect 1 + E rolls overall.
Notice what happens if, above, we replace 1/6 with the more general p. The final output is 1/p. That’s the second shortcut formula I gave above. And there you go: I’ll now consider the derivation of 1/p to be justified.
Finally, notice what happens if we take this same approach but don’t include the success roll in E, so that E is just the expected number of failures. A little adjusting is required, but not much. If you first roll a 6, then you have zero failures with probability p; alternatively, you end up with a failure on the first roll and the process starts over, so you now expect E + 1 failures with probability 1–p.
This works out to the first shortcut formula mentioned above: (1-p)/p, which in the present case evaluates to an expectations of five failures preceding the first roll of 6. And there you go again, no calculus required.
This seems a good place to stop. Here are some other things to check out.
If you’d like to go deeper into expectation of this sort—for example, how many rolls expected to see two 6’s in a row?—check out this paper, which also derives the 1/p formulation (and then some!): “How Many Coin Flips on Average Does It Take to Get N Consecutive Heads?”
Not covered in that paper, however, is that if you want to use a counting method, as with the above sigma notation examples, for getting two Heads in a row with a fair coin, you can use the Fibonacci numbers; for three Heads, the tribonacci. To see the latter in action, check out this post and scroll down to point 19: “Anthropic Bias (Ch 2, Part 2): Fine-Tuning in Cosmology & 100 Heads in a Row.”
For more involved examples that uses first-step analysis, see my post “Gambler’s Ruin & Random Walk: Probability, Expectation, Steal the Chips.”
For a fairly simple die problem that nevertheless has proved to be a counterintuitive head-scratcher even for probability nerds (until it clicks), see my post “Counterintuitive Dice Probability: How many rolls expected to get a 6, given only even outcomes?” (currently the all-time most visited post on this website). I solve the problem several ways, including with first-step analysis.
In the near-ish future, I’ll write up an explanation of the second question I mentioned at the start of this post, and will link that here: How many rolls expected to see all six sides of a fair die? (This is a coupon collector’s problem, which you can read about at Wikipedia: Coupon Collector’s Problem.)
UPDATE: Here’s the post with that second question, as well as two follow-up posts involving the median:
Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.