As promised in my recent post, “Dice Probability & Expectation: Expected Rolls to See a 6,” here’s an explanation for the dice version of a coupon collector’s problem. It goes like this:
How many rolls are expected to see all six sides of a fair die?
I’ll start off the same way I did in the above-linked post. Namely, by imagining how things will go in practice. I won’t go into as much detail as I did there, so if anything is mysterious here, you might want to have a look at that post.
Here goes.
The problem says we need to see sides labeled 1, 2, 3, 4, 5, and 6, but not in that order. Actually, it’s worth noting that calculating the expectation for getting each side in that order is very easy. I expect six rolls to see my first 1, then six rolls to see a 2, and so on. That’s 36 rolls total. (This is no different, by the way, than just the expectation for “seeing your first 6” six times in a row, as each roll is an independent event. This is intuitive. But I’ll say a little more to bear it out below when I bring up linearity of expectation.)
Since I don’t need to see all six sides in order for the coupon-style problem, I should expect fewer than 36 rolls. I know with certainty, for example, that I’ll collect a side on my first roll.
Because the sides are likely to be collected out of order, I find it easier to imagine six slots. Each slot needs to be filled by a distinct number. I’ll label the slots A, B, C, D, E, F. The slots must be filled in order. You don’t have to think of the problem this way, but that’s how I’ll think of it here.
Slot A will be filled on the first roll with 100% probability.
Now to slot B. There’s a 1/6 probability we’ll get the same thing that’s already in A, and a 5/6 probability we’ll get something new.
That in mind, here’s the probability of fulfilling the mission in just six rolls:
That’s one scenario that could happen. It’s pretty unlikely—about a 1.54% chance—but maybe more likely than I would have initially guessed.
Or we could fill all slots in seven rolls. What’s the probability of that? It means we got a duplicate roll at some point, of either the first, second, third, fourth, or fifth roll. So there are five ways to choose a repeating roll. Multiply that by the six possibilities for the sixth roll. And multiply that by the number of permutations for… all right, you get the idea. This is getting complicated. But not nearly as complicated as figuring what it’ll be for 10 rolls. Let’s not.
(Though, I admit, I already have. As noted at the end of this post, I’ll be doing a follow up to this post where I address the difficult question of finding the median number of rolls in this problem. In that post, I’ll say a little more about using combinatorics—i.e., mathematically assisted, but still brute-force, counting—to address this problem.)
As you know, expectation is a weighted average of the events in a sample space, where each event’s weight is its probability of occurring. So we need probabilities. And due to the above difficulties, it’s not going to be (6 times the probability of getting it in six rolls) + (7 times the probability of getting it in seven rolls) + … and so on. Unlike with the problem in the previous post, where those probabilities were easy to calculate.
So now what?
I’ll start over. This is very inelegant, but if I were to make a math website, it’d be called Inelegant Maths. My interest is much more in the journey to finding the answer than it is in the answer itself.
Anyway, slot A is a guaranteed gimme.
Slot B can now be filled five different ways. If slot A holds a 1, we now need the expectation for hitting either a 2, 3, 4, 5, or 6. Well, the expectation for getting any one of those in particular is six rolls. But what’s the expectation for getting any one of four sides? Let’s linger on that question for a moment.
The expectation for hitting a 1, 2, 3, 4, 5, or 6 is one roll.
What’s the expectation for hitting a 2 or 3? Recalling, again, that expectation is the weighted average of all possible outcomes, where the weight of each outcome is its probability of occurring, it’ll go like this.
You roll a…
…2 or 3 on first roll with probability 2/6
…non-2-or-3 on first roll with probability 4/6, then a 2 or 3 on second roll with probability 2/6
…non-2-or-3 on first two rolls with probability (4/6)(4/6), then a 2 or 3 on third roll with probability 2/6
…non-2-or-3 on first three rolls with probability (4/6)(4/6)(4/6), then a 2 or 3 on fourth roll with probability 2/6
The pattern’s clear. Getting a 2 or 3 in one roll has a 2/6 probability; in two rolls, a (4/6)(2/6) probability; in three rolls a (4/6)(4/6)(2/6) probability. This generalizes to: (4/6)(n-1) (2/6). Where n is the number of rolls in question.
Now we just need to add up the (infinite) series that comes out of this. Just like in the previous post, I can do this with sigma notation (keeping the denominators here at 6 for transparency):
We could do this for each slot, then add the results. Even better, we can use one of the short formulas derived in the previous post. The simplest is 1/p where p is the probability of success (in a geometric distribution): 1/(2/6) = 6/2 = 3.
So simple. And intuitive in retrospect. Getting a 2 or 3 means two out of six—which breaks down to one out of three—rolls will succeed. So we expect it to take three tries to get there. (Well, this is especially intuitive given that we expect six rolls to roll any given number with a 1/6 chance of landing.)
From here it’s simple. We expect it to take one roll to fill A. To fill B? Well, there are five outcomes out of six that’ll fill it. That gives 1/p = 1/(5/6) = 6/5 = 1.2 rolls. Do this for all of them:
A → 1/(6/6) = 1 roll
B → 1/(5/6) = 1.2 rolls
C → 1/(4/6) = 1.5 rolls
D → 1/(3/6) = 2 rolls
E → 1/(2/6) = 3 rolls
F → 1/(1/6) = 6 rolls
Add those to get 14.7 rolls. You’ll often see this represented as follows, where the 1/p formulation already reciprocated:
Again, we can add these up because of the well-known linearity of expectation, which says that, for random variables* X and Y, the expectation of X plus the expectation of Y is equal to the expectation of X + Y. The amazing thing is that this works even when events are dependent. So you can solve some very cool problems with it (I’ll share a fun one at the end of this post).
*I’m realizing that I haven’t explained what a random variable is. I don’t think I need to get into the details of that here, but will say this.
A random variable is a function that maps the events of a sample space to the real number line. In other words, when rolling a die, the sample space contains the outcomes, “die lands with 1 face up,” “die lands with 2 face up,” and so on. Random variable X (let’s call it) maps these events to number values, such as: {1, 2, 3, 4, 5, 6}. These are the values that X can take on as the result of a given trial (or roll, in the case of a die). This set of values is called the support of random variable X. Note that these are integer values. This makes X a discrete random variable (it’s also possible to have a continuous random variable).
A discrete random variable can take on a finite or countably infinite set of values (while a continuous random variable can take on any real value).
You’ll see notation like, P(X = 1) = 1/6, for a die. That just says “the probability that random variable X takes on the value 1 is 1/6.” Anything not in the support has a probability of 0.
Another example would be the probability of getting your first 6 in one, two, three, … rolls (or trials), with support: {1, 2, 3, …}. You’ll see a support like this listed, for example, at the top of the Wikipedia entry on “Geometric Distribution.”
For more on this, see the Brilliant entry on Discrete Random Variables. Or go deeper in my favorite probability textbook, which has my favorite explanations I’ve seen of these things: Introduction to Probability by Blitzstein & Hwang (buy here or read online for free here; see Chapter 3, page 103).
I hope I haven’t said anything wrong or misleading here. If so, let me know.
If you’d like to see a proof linearity of expectation, check out the Brilliant entry on the topic: Linearity of Expectation. For fun, though, I’ll demonstrate here an example that makes linearity more obvious, where we see the insides of it, rather than just going right to the short formula.
Return to the question of getting all six sides in order (while noticing the arbitrariness of this: it could just be asking for seeing 6 six times). Also recall that this isn’t asking for rolling 1, 2, 3, 4, 5, 6 in six rolls. It’s simply asking for the expected rolls for seeing your first 1, and then a 2, and then a 3, and so on. The answer is 36 rolls.
For just getting a 1 and then a 2, then, the answer should be 12. We can confirm this with the short formula 1/p: 1/(1/6) + 1/(1/6) = 12. But to demonstrate linearity of expectation, I’ll return to sigma notation:
Since linearity of expectation tells us that we’ll get the same answer for [Expectation of X] + [Expectation of Y] as for [Expectation of X + Y], we can also do it like this:
The latter will just get more and more tedious for larger numbers, so thank goodness for linearity (and for computers, I guess).
And thank extra goodness for the surprising fact that events need not be independent for the linearity property to hold!
Due to this generality, you can use it to solve this classic problem involving dependent events:
A bunch of folks leave their hats at a hat-check. The hat-check-booth guy is distracted by drink and forgets to tag the hats. He notices his mistake when folks start handing in their tickets at closing time. He lets chance sort it out, hands hats back at random, hopes folks are as drunk as he is. They are and nobody notices.
Given n people and n hats, what’s the expected number of people who got their own hat back?
Here’s a nice explanation from John Tsitsiklis (whose probability textbook I also like a lot):
There’s also a cool variation that asks the probability of nobody getting their own hat back. If you’ve got some basic combinatorics under your hat, this video by Ari Novick provides a lovely explanation:
Finally, I’ll soon publish a post looking at the median number of rolls for seeing your first 6, as well as the much harder question of the median number of rolls for seeing all six sides. The latter poses some frustrating, though also instructive (and maybe kinda fascinating), difficulties. Along the way, I’ll explain/derive the geometric distribution’s cumulative distribution function and median, among other things, mostly for the sake of intuition-training.
Update. That post is here: “Dice Probability: Median Rolls to See All 6 Sides (Coupon Collector’s Problem).”

Or click the banner to shop at Amazon (at no extra cost: it just gives me some of what would have gone to Amazon).