*Estimated read time (minus contemplative pauses): 12 min.*

While recently browsing YouTube videos on counterintuitive probability problems involving pairs (e.g., of coins, dice, frogs, siblings), I was struck by a recurring theme. A great many commenters on those videos claim—incorrectly—that the videos get sample-spaces wrong by double-counting mixed sets. For instance, by including Heads-Tails and Tails-Heads as distinct outcomes when flipping two coins. (See the end of this post for links to some of those videos.)

What struck me is not that so many commenters are giving incorrect advice. History shows us just how wrong smart people can go with probability—18th-century mathematician Jean Le Rond d’Alembert, for instance, apparently made this very same error himself. I’d bet that anyone who’s spent a lot of time with probability has felt their intuitions twisted inside out and in a knot plenty of times (that’s a big part of the fun!).

What struck me, rather, is the *confidence* of so many of these commenters. Despite their own obvious lack of practice with formal probability, they unwaveringly assert that experts are getting very wrong something that, in our current era (unlike during d’Alambert’s)^{1}, is utterly basic. Many such commenters (let’s call them “skeptics”) simply state matter-of-factly that, for example, Boy-Girl and Girl-Boy shouldn’t both be counted. Others give some interesting reason or another. For example, one argues on the grounds of commutativity, analogizing with multiplication, as in: 2 × 3 = 3 × 2.

Notice above my reference to a lack of “formal” probability training. On the other hand, most thinking creatures, such as ourselves, do seem to develop what we might call an *informal* probability sense while going through life assessing the likelihoods of all sorts of things occurring. Such a sense tends to be fragile under the scrutiny of clever behavioral economists, but generally serves its owner well enough over the years, at least as far as that owner can tell. Indeed, I’m sure some of these skeptics are very smart, and so all the more confident about what their naive intuition—or what we might call an analytic acuity yet untuned to the probability domain—tells them about sample spaces.

Maybe that acuity has helped them dominate hard problems in other domains, or at least to stand out against their similarly untrained peers. Whatever the case, the net result seems to be that they simply just don’t realize how much they don’t know about probability, even when it comes to something as simple as two coins. (Simple-seeming questions involving even just one coin or die can be quite tricky! See, for example, my post “Counterintuitive Dice Probability: How many rolls expected to get a 6, given only even outcomes?“… the answer is 1.5 rolls, on average, for a single die. As I’ll show below, the present question is easier.)

Perhaps you’re a skeptic about mixed results belonging in the sample space. If so, I’m sure that after thoroughly analyzing the problem at hand—fully engaging your System 2, to borrow Daniel Kahneman’s metaphor—you’ll revise your opinion. One of formal probability’s key roles is to correct our informal probability sense, which tends to be reinforced by seemingly passable results in life overall, by folk probability, and perhaps by what comes naturally to human cognition—in sum, by common sense. For today’s case, only a small correction is needed.

As a point of contrast, while the skeptic’s confidence here strikes me as similar to that with which many folks reject other basic ideas in math—such as the uncontroversial statement that .9-repeating equals 1—I see the mixed-results rejection as particularly surprising. For example, .9-repeating’s value is a theoretical (or a priori etc.) question best answered with pure math. Well, I could cut a 1-foot piece of fabric into three equal parts and try to convince a skeptic that (.3-repeating of a foot) plus (.3-repeating of a foot) plus (.3-repeating of a foot) equals (.9-repeating of a foot) equals (1 foot), but there’s no fooling them in that this still comes down to an appeal from math.^{2}

On the other hand, closely investigating some coins, or better yet chucking them in the air a bunch of times and counting the results, provides empirical evidence involving nothing more than counting with, and comparing, whole numbers. In other words, I’m surprised that skeptics would so vehemently reject something so easily demonstrable! Indeed, I hope a persuasive investigation would come easily without having to toss any actual coins (though I encourage you to do so if you’re a skeptic).

Holding onto that hope, I’ll now attempt to persuade the skeptic. (I also touch briefly on this topic in my post on the Two-Child Problem, which is usually the context in which sample spaces for sibling pairs comes up.) Note that my principle motivator here is my fascination with, and wish to understand, the skeptic’s confidence. This may be especially apparent at moments that might otherwise come across as overcomplicating things. I’m sure there’s some interesting, instructive reason why smart people nearly always get this simple thing wrong at first, and often with confidence.

Before diving in, a note to the skeptic. If you remain unconvinced (or turn agnostic) after reading this, I’d love to know why! And please stick to your guns—declare a change in belief only if you mean it.

Here goes, starting with (I hope) a clear statement of the problem.

The intuition I hope to correct is that which leads the skeptic to claim, for example, that it’s wrong to count both: BG and GB in a sample space of boy-girl sibling pairs; TH and HT in a sample space involving two tossed coins; or 1,2 and 2,1 when rolling two dice. The skeptical reasoning is that each of such cases should be counted only once, as those cases—e.g., TH and HT—constitute identical outcomes. Further, this claim implies that, for example, the probability of getting a mixed Heads-Tails result is identical to getting double-Heads, so that the sample space *should* be: {HH, HT, TT}, with each outcome assigned a 1/3 probability (a mathematical result of which I’d assume even most skeptics to be suspicious).

It’s not hard to produce intuitive reasons for rejecting the skeptical claim. For instance, just imagine tossing a dime and a quarter, or two different colored dice. In the former case, you might notice that there is only one way to get two Heads (the dime shows Heads and the quarter shows Heads), but two ways to get a mixed result of Heads and Tails (i.e., the dime shows Heads while the quarter shows Tails, OR the dime shows Tails while the quarter shows Heads).

That approach helps eliminate what I think is a potentially misleading aspect of how sample spaces are represented—e.g., for tossing two coins: {TT, TH, HT, HH}. That is, skeptics often point out that order doesn’t matter in these cases, to which I’ve seen many non-skeptics reply that order does matter. The truth, though, is that the skeptic is correct! Order doesn’t matter.

The sample space is represented as it is—i.e., by including TH to HT—as a shorthand for saying that there are two ways to get a mixed result of Heads and Tails. The sample space, therefore, may look exactly the same whether we flip one coin twice, two coins sequentially, or two coins at *exactly the same time*. Indeed, this will be true even if one coin lands, impossibly, precisely at the same time and within the exact same region of space as the other. We model our sample spaces as simply as we can, with attention given only to the relevant features of the events in question.

At this point, a skeptic might be even more encouraged than usual to bring up a variation on a complaint mentioned above that goes: “If you count 1,2 and 2,1 when rolling a die, then you also have to count 1,1, twice.” The variation is: “If there are two ways to get mixed Heads-Tails results with a dime and quarter, then there are also two ways to get Heads-Heads results: {Dime-Heads, Quarter-Heads} and {Quarter-Heads, Dime-Heads}; and there are in fact now *four* ways to get mixed results, so that you should, for example, count both {Dime-Heads, Quarter-Tails} and {Quarter-Tails, Dime-Heads}. It seems to me that the intuition guiding this complaint—assuming it sincerely survives this strange reasoning—turns, again, on how sample space outcomes are *listed*. In other words: “If you’re going to list ‘Tails’ both on the left and right of ‘Heads,’ while calling them two different results, then you should do the same with double-Heads outcomes.”

Of course, in the case of double-Heads, as well as with the newly proposed dime-and-quarter mixed-results, differently ordered pairs—e.g., {Dime-Heads, Quarter-Heads} and {Quarter-Heads, Dime-Heads}—in fact do represent exactly the same events. I emphasize again the fact that a given representation is designed to show only the probabilistically relevant features of a given event, in a given context. The sample space under discussion here is *only* concerned with Heads and Tails, not coin denomination or at what time a coin was tossed. Though I use denomination here, and will use time below, in order to help train the skeptic’s intuition. This in mind, the four presently relevant outcomes with a dime and quarter are:

- Both coins land Heads;
- The dime lands Heads, the quarter lands Tails;
- The dime lands Tails, the quarter lands Heads;
- Both coins land Tails.

Each outcome has a 1/4 chance of occurring, whether you flip the coins one at a time or at the exact some instant. So, order of tossing, and certainly order of listing, doesn’t matter. What does matter is how many ways there are to get each of the above outcomes, given our pre-established context. Clearly there are twice as many ways to get a Heads-Tails mix than, say, a Heads-Heads result. (Note that we could have pre-established a context in which denomination *does* matter, but there would still be 1/4 probability for each of the four outcomes; you just wouldn’t be able to add outcomes 2 and 3 together to get 1/2.) Try it for yourself a bunch of times with real coins, writing down the results *only *in terms of Heads and Tails. I’d expect you to get *about* twice as many mixed results as double-faced results, even if you list all mixed results as “HT.” In fact, I’ll do this right now 100 times using a dime and a quarter:

- TT
- HT
- HH
- HT
- TT
- HH
- HT
- HT
- HH
- HT
- HH
- HT
- HT
- TT
- HT
- TT
- TT
- TT
- TT
- HT
- HT
- HT
- HT
- HT
- TT
- TT
- HT
- HT
- TT
- HT
- TT
- HT
- HT
- HT
- HT
- HT
- HT
- HH
- TT
- HT
- HH
- TT
- HT
- HH
- HH
- HH
- TT
- HH
- HH
- HT
- TT
- TT
- TT
- HT
- TT
- HT
- TT
- HT
- HT
- TT
- TT
- HH
- HT
- HT
- HT
- HT
- TT
- HT
- TT
- HT
- TT
- HH
- HH
- HH
- HT
- HH
- HT
- HT
- HH
- HT
- TT
- HT
- HT
- HT
- HT
- TT
- TT
- TT
- HT
- HT
- HT
- HT
- TT
- HH
- HH
- HT
- HT
- HT
- HT
- HT

**Results:**

HH = 19

HT = 52

TT = 29

I didn’t track denominations, and I represented cases of {Dime-Heads, Quarter-Tails} and {Dime-Tails, Quarter-Heads} simply as “HT,” though I could have developed some arbitrary system for, say, listing the former as “HT” and the latter as “TH,” the sum of which would have been 52. Our representations of sample spaces often reflect just such an arbitrary organizing principle.

The same goes for the birth sex of sibling pairs, at least in the world of probability puzzles, where the female:male sex ratio is assumed to be 50:50. In other words, the skeptic is essentially correct even when saying that *birth order* doesn’t matter! We use birth order as an organizing marker for counting because it makes intuitive sense and is easy to model. But, again, even if the children were born at the exact same instant (say pulled out simultaneously in a cesarian, or if the mother’s body was teleported by a Star-Trek style transporter while leaving twins behind), the sample space would still be {GG, GB, BG, BB}. We could justify this space with a more complicated story, analogous to the dime and quarter example, but instead we rely on the simpler idea of birth order.

This in mind, I’ll give an even more obvious example in which a quarter is flipped twice. I consider this the most useful approach for correcting the skeptic’s intuition. (I don’t consider flipping 100 coin to quite do the trick, as I could have gotten lucky; it’s still better than a computer simulation, however, which some reject on the grounds that either the simulation must have been programmed wrong, or that the pseudo-randomness programmed by humans was constructed in accordance with by theoretical rules.)

In this example, actual flips are indeed distinguished temporally. So, the order of the results emphasizes the higher expected frequency of mixed results. But this is just a convenience I’m relying on in order to get your intuition headed in the right direction (in case you’re not already convinced). But the sample space is the same as in any other two-fair-coin-flips scenario. Here are the possible outcomes:

- You flip and get Heads, you flip again and get Heads;
- You flip and get Heads, you flip again and get Tails;
- You flip and get Tails, you flip again and get Heads;
- You flip and get Tails, you flip again and get Tails.

Again, twice as many opportunities for mixed Heads-Tails results than for either double-Heads or double-Tails (given that each of those four outcomes are equally likely). This should be obvious. I could demonstrate it with other sorts of diagrams (e.g., a tree), but I think this should do it.

Finally, we’ve so far been discussing cases in which pairs of objects share identical (relevant) outcome possibilities. Notice what happens, however, when their possible outcomes are different. For example, if you flip a coin while choosing a child at random (with gender as relevant outcome), the sample space is: {GH, GT, BH, BT}, where each pair has a 25% probability. Notice that there’s no need to list, for example, “TG,” as GT and TG are indeed identical outcomes, as there’s no chance of a child’s gender coming up Tails or of a coin landing Girl. Likewise for flipping a coin and a die: {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}, where each outcome has a 1/12 probability.

If you’re still skeptical, I suggest you sit with some coins or dice for a while and track results, paying attention to the possibilities each coin’s physicality seems to grant (I don’t recommend having 2 children 100 times, but you could conduct a survey of two-child families). With two dice, you’ll notice 7 coming up as the sum of each pairing more often than other numbers, especially 2 or 12. Why? Because there’s only one way to get a sum of 2 (snake eyes) or of 12 (double-6). But there are six ways to make a 7. What are they?

—–

Some of the aforementioned YouTube links (steer clear o’ the rabbit-hole… pretty much any video with a probability problem I’ve seen, *most* of the commenters are skeptics giving bad advice; even when they are correct about a given solution being wrong, they tend to get there for wrong reasons, or at least reasons they clearly can’t justify):

Counter-Intuitive Probability: The Snake Eyes Riddle

Can you solve the frog riddle? – Derek Abbott (I think I noticed fewer sample-space complaints here, because the skeptics were more interested in correcting other things in the video; nearly always incorrectly, I’m afraid.)

Video Response to “Ted Ed’s Frog Riddle is Wrong” (This one I share as a follow-up to the above video. You won’t see many gripes about sample-space order, but you’ll see plenty of other confident gripes.)

*Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.*

*Or click the banner to shop at Amazon (at no extra cost: it just gives me some of what would have gone to Amazon).*

#### Footnotes:

- Though Galileo got it right circa 1620, casually knocking it out in response to a question posed to him, I believe, by the Grand Duke of Tuscany, about why some numbers come up more than others in a game—
*passadieci—*involving three die: “Sopra le Scoperte dei Dadi.” - I’ve tried this sort of thing and it doesn’t help. In fact it sometimes makes things worse! It’s often still a useful exercise, however, as it asks us to compare our intuitions about the world with our intuitions about math, a domain in which you can have things like .3-repeating of a foot (i.e., 4 inches!) of paper, or be square-root-of-2 meters tall, or in which all flat triangles have interior angles that sum to 180 degrees, even though you can’t draw one no matter how hard you try, because your best efforts will turn out to have zig-zagging crevices grossly visible even to a good squint: What constitutes a triangle? The ink and paper molecules? Or the math? The math, of course. And so the triangle’s a conceptual, or abstract, rather than physical object.