*Estimated read time (minus contemplative pauses): 7 min.*

In April, I posted an exploration of a formula for approximating the probability of having collected all the faces of a die within a given number of rolls: “Dice Probability: Median Rolls to 6 Sides (Coupon Collector’s) with Stirling Numbers of the Second Kind.” The formula is:

Where *n* is the number of faces of the die, *m* is the number of rolls, and S_{2}(*m*,*n*) is a Stirling number of the second kind, which can be calculated here. In that post, I think through how to derive that formula and link some resources for learning more about Stirling numbers.

I’ll use a similar method here, but without as much explanation, so you might want to visit that post if anything here doesn’t make sense. (If that doesn’t help, ask for clarifications in the comments.)

Things brings us to the point of the present post. A reader has emailed me essentially asking about the difference between the above formula and a formula I’ll call the *L-formula*:

Where *n* is the total number faces (or coupons or prizes or whatever) available, *m* is the number trials, and L is the number of faces (or whatever) one wishes to collect.

(Thanks for sending it in!)

One difference between the L-formula and the one I explored in the last post is that the L-formula is more general by allowing for *n* and L to differ. I didn’t consider this possibility in my previous post, but the work we’re about to do here could also be done there in order to similarly extend that formula.

The more important difference is that the previous post’s formula deals with the probability of collecting *n* objects WITHIN *m* trials, while with the L-formula collects L (out of *n*) objects in EXACTLY *m* trials.

In other words, if we want to collect all six faces WITHIN 13 throws, this means collecting them in six, seven, eight, nine, … or 13 throws. The probability for that, again, is found with:

But the L-formula says that we will, on our 12th throw, have collected five faces, and on the 13th throw will collect the sixth and final face. I already calculated this in a yet earlier post using a Markov chain approach, which yielded ~0.076.

[That yet earlier post: “Dice Probability: Median Rolls to See All 6 Sides (Coupon Collector’s Problem).”]

Let’s verify that result with the L-formula, with *n* = 6; *m* = 13; and L = 6:

Perfect! No doubt, were we to use the L-formula to calculate the probability of getting the sixth face on the sixth throw, PLUS getting it on the seventh throw, PLUS getting it on the eighth throw, all the way up to getting it on the 13th throw, it would all add up to ~0.514. In short, this we’d apply the L-formula eight times and adding up the results.

But back to the calculation we just did. Notice that it contains S_{2}(12,5). This does indeed bear out the aforementioned observation that we need to separate out the first 12 throws to distribute them among the first five collected faces. Then we can take that probability, which by the way, is ~0.456, and multiply that by the probability of getting the final, missing face on the next throw, which is 1/6, to get, again: ~0.076.

Rather than thinking this particular example through any further, however, I’d like to consider an example in which we aren’t collecting all available faces, so that *n* and L are different (otherwise that just always makes the third term in the L-formula come out to 1, which is not illuminating).

How about: What is the probability of collecting three out of the six faces in exactly eight rolls of the die? This would make the third term but: 1/(6 – 3)! = 1/6.

I’ll think this through in the same way I thought through the formula in the previous post, with the hope of making it all intuitive. The final output should be the same thing we get when we use the formula directly, which is, for *n* = 6; *m* = 8; and L = 3:

Here goes. Again, I’ll use the same conceptual approach I used in my previous post, but streamlined.

First, let’s figure out the number of ways to succeed. An example of a successful sequence would be to roll the following faces: 1, 2, 1, 1, 2, 2, 2, 3. Here we’d collected two faces (i.e., faces 1 and 2) by the seventh roll, and on the eighth roll we collected the third face (a 3). Another successful sequences is: 4, 5, 5, 4, 5, 5, 5, 2. How many such sequences are there?

Imagine that we have six boxes, numbered one through six. These numbers correspond to the faces of the die. The idea is that every time you roll a die, that roll goes into one of the boxes. We want to have collected exactly two faces in the first seven throws, then a new face on the eighth throw. In other words, we’ll distribute the seven throws between two boxes (no box can be left empty).

The eighth throw is fixed, so I will first focus only on the first two boxes.

Here’s an example of a successful distribution, or *partitioning* (to use the technical term), of seven throws into two boxes (which matches the first example given a moment ago of a successful sequence):

**Box1**[roll1, roll3, roll4]; **Box2**[roll2, roll5, roll6, roll7].

Another example:

**Box1**[roll1, roll3, roll4, roll5, roll6, roll7]; **Box2**[roll2].

How many such partitions are there? The number of ways to partition seven rolls into two boxes is S_{2}(7,2) = 63.

But, of course, this isn’t the total number of ways things can go here, because the order in which faces are collected matters (when order matters, we actually call it a *composition*). In other words, for that first example, we can switch the box labels to get a new sequence of rolls:

**Box2**[roll1, roll3, roll4]; **Box1**[roll2, roll5, roll6, roll7].

Switching the labels like this turns 1, 2, 1, 1, 2, 2, 2 into 2, 1, 2, 2, 1, 1, 1.

And recall that the box numbers correspond to face numbers on the die. So our scenario can be any two of the six boxes, such as:

**Box3**[roll1, roll3, roll4]; **Box5**[roll2, roll5, roll6, roll7].

This changes the sequence to 3, 5, 3, 3, 5, 5, 5.

There are six ways to label the first box and then five ways to label the second box, giving 6 × 5 = 30 possibilities. But I’m going to get there in a generalizable way, where we are permuting two out of six objects: 6!/(6–2)! = 6!/4! = 30.

[This is just the common formula taught for *n*P*r* = *n*!/(*n *– *r*)!. But there’s no need to memorize this, as it’s intuitive that when you permute six objects as 6!, you can isolate the first two objects by dividing by 4!. Or you can think of it as attempting to permute two objects, but over-counting by a factor of 4!, which needs to be divided out. In other words, you get: (6 × 5 × 4 × 3 × 2 × 1)/(4 × 3 × 2 × 1), which cancels the 4 × 3 × 2 × 1 in the numerator, leaving just 6 × 5.]

So far, we have counted (6!/4!) × S_{2}(7,2) = 30 × 63 = 1890 ways to succeed in having collected two faces in seven throws. We now need to deal with the eighth throw, which goes into the third box. That box can be labeled four ways (i.e., can be any one of the remaining four faces). So we update:

(6!/4!) × S_{2}(7,2) × 4 = 30 × 63 × 4 = 7560.

So, the total number of successful outcomes is our 7560. We can use that as our numerator, which we can set in ratio to the total number of possible outcomes (i.e., all successful PLUS all unsuccessful outcomes), because every outcome of eight throws has the same probability of occurring. (We’ve assumed here a fair die, or a “fair game” in general, e.g., for collecting coupons.) The total number of possible outcomes will be our denominator.

The denominator, then, is just 6^{8}, because for each throw, there are six ways things can go. For example, we could get this sequence of faces: 1, 2, 5, 2, 6, 1, 3, 2. This fails because it collects five faces when we want only three (with the third face coming up on the last throw).

This gives us 7560/6^{8} ≈ 0.0045. This is exactly the probability we got by plugging directly into the L-formula. We can also explicitly match what we’ve done here to the parts of the L-formula. Right now we have:

This can be rearranged like this, starting with bringing the 4! to the denominator:

Then separate the fractions and simplify:

That’s where we ended up when plugging into the L-formula we started with, which, again, looked like this:

*Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.*