Part II of my survey of Chapter 2 of Nick Bostrom’s 2002 book, Anthropic Bias: Observation Selection Effects in Science and Philosophy. Part I is here. This is part of a series in which I go chapter by chapter of the book, zeroing in on key ideas that grab my attention. Honestly, though, it’s really Part II of my post, “Nassim Taleb’s Fat Tony Example / And: Is it possible to flip 100 Heads in a row?“
I pick up here where I left off in Part I, at the section called “Surprising vs. Unsurprising Improbable Events” (p 23). You can follow the bulk of this post without having read Part I, especially given that my commenting on this chapter is really just an excuse for me to vent my ongoing puzzlement regarding the distinction between surprising and unsurprising improbable events. In particular, I obsess here over the question of whether it’s literally, physically possible to get 100 heads in a row from a fair coin. I’ve grappled with this question before, in the above-mentioned post, “Nassim Taleb’s Fat Tony Example / And: Is it possible to flip 100 Heads in a row?”
Bostrom starts this section with a puzzle. Bayes’ theorem tells us that if (A) the probability of multiverse theory being true increases given the existence of our (fine-tuned, presumably rare, life-containing) universe, then (B) the probability of our (fine-tuned, presumably rare, life-containing) universe’s existence increases given multiverse theory being true. This seems fine, but suppose the universe of interest contains only chaotic light rays, rather than life. Call that universe E*. It would seem wrong to say that the probability of multiverse theory being true increases given E*. Yet, it does seem fine to say that the probability of E* existing increases given multiverse theory being true. As Bostrom puts it:
This presents the anthropic theorizer with a puzzle. Somehow, the “life-containingness” of α1 must be given a role to play in the anthropic account. But how can that be done?
Several prominent supporters of the anthropic argument for the multiverse hypothesis have sought to base their case on a distinction between events (or facts) that are surprising and ones that are improbable but not surprising (see e.g., John Leslie (Leslie 19892) and Peter van Inwagen (van Inwagen 19933)). (Bostrom p 23)
Bostrom then motivates the intuition for distinguishing surprising and unsurprising improbable outcomes. Surprising: flipping a fair coin and getting 100 heads in a row; one person winning three lotteries, each of which consists of a thousand tickets. Unsurprising: flipping a coin 100 times and getting some un-patterned result (even though the “random-appearing,” un-patterned result has the same probability as getting 100 heads in a row); someone winning a lottery which consists of a billion tickets (i.e., unsurprising because it was guaranteed that someone would win).
These examples are meant to be analogous to the distinction between the fine-tuned universe (surprising) and the one filled with only electromagnetic radiation (unsurprising). The key difference between the two is the fact that surprise (which Bostrom will soon point out is a psychological concern) moves us to seek an explanation—e.g., “the lottery was rigged.”
It’s common for people to dismiss surprise in these cases as a failure to submit one’s intuitions to the greater and more rational truths unveiled by probability theory. Bostrom cites, for example, Stephen Jay Gould, who points out that something’s gotta happen, and whatever that is—no matter what it is—will amount to an improbable outcome. On the other hand, Bostrom also cites Peter van Inwagen, who asserts that this “must be one of the most annoyingly obtuse arguments in the history of philosophy” (Bostrom p 25)4
I’d like to pause and explore the distinction at issue, as I find it to be one of the most fascinating and perplexing features [sic] of probability. What follows are my own thoughts unless otherwise noted. In particular, I explore my intuitive doubt towards the possibility of getting 100 heads in a row from a fair coin. I admit that the intuitions motivating that doubt come from a naiveté that’s in tension with what a reasonable understanding of probability recommends. But those intuitions are strong, and I’m happy to explore, indulge, defend, and interrogate them publicly. Maybe someone can then correct my naiveté. At the very least, I’ll have disclosed the lens through which I examine Bostrom’s book and will have demonstrated how stubborn the intuition is.
NOTE: I just finished writing the below. It’s a thorny thicket that gets thornier and thicketier the more I wonder and wander within it. I exit only barely less bewildered than before, though I suspect any improvement temporary. It was fun, anyway, if a little maddening in its repetitiveness (surely it can use much trimming, which maybe I’ll return to do, but its repetitiveness is a natural consequence of my competing intuitions—one naive and the other, uh, less naive—as they take turns reminding each other of who’s really in charge of casting my beliefs).
(1) Gould is of course right that, given a sequence of 100 coin flips, something has to happen, and whatever happens will be as improbable as anything else that happens. In other words: good luck predicting the outcome ahead of time. But I’m not convinced this doesn’t mean we should not be—or could not help being—surprised at seeing a fair coin yield 100 heads in a row rather than some random-appearing sequence. There are many, many, many more random-appearing (i.e., to the human eye) sequences than patterned ones.
We don’t need to go as far as 100 flips. The above van Inwagen quote directly follows an example he gives involving only 20 coin flips, which has only a probability of 1/1048576 of occurring. But if 100 heads doesn’t sound farfetched to you, imagine I’m talking here of 100 trillion heads or as many as it takes for you to bet anything and everything that the coin’s not fair—even though any random-appearing sequence of heads and tails will have the same probability of occurring as getting all heads.
(2) Surprise in such cases results, to some large degree, from our tendency to segment the sample space into seemingly non-random and seemingly random classifications. The latter has far more members. So many more, that one could flip a coin 100 times for years and years without ever seeing a seemingly non-random outcome. So of course to suddenly see one would be surprising enough to stoke our innate desire for coherent stories—to urge questions like, “What does it mean?” and “Who or what had a hand in this?”; in short, to seek explanation. If it occurs in nature, and we’re convinced the coin is fair, getting such a pattern would seem a miracle.
Of course, what counts as a pattern is arguably a function of the limitations of human imagination. If randomly drawn alphabet letters sounded out a complicated and profound phrase in Klingon, I wouldn’t recognize it. But someone who knows Klingon would see the pattern. And the profounder the phrase, particularly when one is convinced that it did not come from human intervention, the greater urgency there’d likely be for making sense of it as a message from God or suddenly conscious computer or the realm of Sto-vo-kor or from nature (which may also send messages in its own language of physical laws, regularities, cycles, periodicities…).
Of course, I’m not the first to make these sorts of observations. See, for example, Pierre-Simon Laplace’s comments in his 1852 book, Philosophical Essay on Probabilities (p 9), in which refers to “extraordinary”5 classes in probability—i.e., classes containing only a very small number of events among a larger set of classes in which all possible events have been mentally6 arranged. For example, getting 100 heads in a row or randomly choosing letters that spell out “Constantinople,” which would not be considered extraordinary were that word “not used in any language” (encountering such an arrangement of printer’s letters laid out on a table, he notes, we would conclude the more likely scenario of someone having deliberately put them there).
Robert von Mises, in his 1957 essay “Definition of Probability,” 7 borrows Laplace’s example for a similar discussion, describing a game in which there are 26 cards, each with one letter of the alphabet (he notes that there 2614 possible ways to fill in the 14 letters, but seems to me it’s more like 5214 if you included upper cases—i.e., von Mises and Laplace start “Constantinople” with an uppercase C; to be clear, if you now look for the number of ways to reorder those 14 symbols, you need to divide in order to account for the repeating n’s, o’s, and t’s: 14!/[2!2!3!] ≈ 3.63 × 109 ).
I won’t go into what those authors conclude, but will say that they seem more optimistic than I am about there being solid theoretical and practical solutions to such conundrums.
(3) If our universe is fine-tuned, which may be another way of saying our universe is meaningful to minds more or less like ours in a way that E* couldn’t be, then the thoughts I explore here will apply to that case as well. But I will focus now on the harder case of coin flips. Harder, for one thing, because I know our universe exists, but have not seen anything close to 100 coin flips. Also harder, because I actually know with impressive precision the probability of a real-life fair coin flip, and yet I still cannot make intuitive sense of surprisingly improbable results involving coin flips (though such results are theoretically manageable).
Bluntly put, I believe, like it or not, that there is some physical limitation on the number of heads or tails that can be gotten in a row from a fair coin.
Assume from here on that all coins are fair unless otherwise indicated.
(4) I might be able to intuitively accept 100 heads in a row on conditional grounds: if a coin landed 99 heads in a row, there’d be a 50% chance the next flip lands heads. I have reservations about this, even as a theoretical construct; not only is this a very big “if,” but in my more skeptical moments, it feels something like, “if a square-circle existed, then the ratio of its diagonal to its diameter would be…” But it seems to me that something along these conditional lines is headed in the right direction for finding an intuitively satisfying resolution. Even if impossible, what’s the harm of allowing for 100 flips in a row just as we can, for example, find the theoretical difference in height between someone negative-two-inches tall and someone 42,000-feet tall? We won’t encounter such people, and it doesn’t hurt the model to leave boundary conditions for height unspecified.
That said, I do often encounter the idea that accepting the possibility of 100 heads in a row is not just a theoretical convenience, or even requirement, for our math to work out. Understandably so. It certainly would be bizarre to learn from a probability textbook that there are physical constraints on which of 100 coin flips’ 2100 permutations are actually possible (especially if the textbook recommended removing those from the space of possible outcomes, and thus from our denominator!). Rather, we should and do take each of those permutations to be just as likely as any other—just as physically possible, you could say.
I include myself in that “we”—meaning I’m in a perpetual state of cognitive dissonance about this. Maybe it means I have a healthy skepticism about math’s relation to the real world.
There are probability models that cross over into something like the conditional territory I describe above, wherein the model allows for physically impossible outcomes for the sake of computability. To calculate how many flips of a coin to expect before getting your first heads, there’s no upper bound on failures in the model, as you could get infinitely many. I doubt may people think infinitely many tails could ever happen. And not just because the universe will end before the game is done, but because that is strictly a mathematical construct, without which we’d need to designate a finite upper bound, and that is not something we can do; nor do we need to worry about it (any more than we need to worry about putting a restriction on human height).
Here’s a stronger claim. I doubt anyone thinks 10,000 tails will happen*. Or 9,999. Or 9,9998. I don’t know, however, where to stop this descent towards 1.
Where the real upper bound on consecutive results is (if there is one), then, maybe isn’t a question for math, but for our intuitions about how math maps onto the world.
(*I include here folks who like to talk about splintering paths of reality in which everything that can happen will happen. That is, even if I believed in splintering realities [I don’t], I wouldn’t accept the event of 10,000 failures in a row in this game, anymore than I’d accept that there is, as I type this, about to splinter a reality in which my head morphs into a spherical cube.)
(5) “Fairness” need not be about the coin per se. Rather, it characterizes the nature of a certain condition in which the coin finds itself—a condition amount to a certain kind of treatment by, and interaction with, the world around it. We call that treatment and interaction “fair.” If we flip a coin under sufficiently similar conditions, we’ll get the same result repeatedly. For example, see the paper “Dynamical Bias in the Coin Toss,” by Persi Diaconis, Susan Holmes, and Richard Montgomery8 for a description of a machine that can consistently yield the same flip result.
We may think of this as a nulling of the coin’s fairness or as an exploitation of it. Either way, it comes down to a behavior consistent with the symmetrical properties of the coin; how that behavior manifests—how the coin’s behavioral tendencies appear to observers like us—will depend on the context given by the coin’s external conditions. When we think of the flip as random, we aren’t only thinking of the unpredictability or variability of something within the coin, but of the totality of the conditions involved in the coin’s movements. In total, this involves a vast set of tiny events we call, “flipping the coin.”
(6) What I wrote above regarding “a healthy skepticism about math” reminds me of some comments from Robert Gallager in his wonderful introductory lecture to his Spring 2011 MIT 6.262 Discrete Stochastic Processes course (that link is to YouTube; quotes are presented here in order of relevance):
If you really want to be serious, as far as your study of mathematical probability theory, you really have to take a course in measure theory at some point. But you don’t have to do it now. In fact, I would almost urge most of you not to do it now. Because once you get all the way into measure theory, you’re so far into measure theory that you can’t come back and think about real problems anymore. You’re suddenly stuck in the world of mathematics, which happens to lots of people. So anyway, some of you should learn about all this mathematics. Some of you shouldn’t. Some of you should learn about it later. So you can do whatever you want … (starts @ 39:02)
… I’ve gone for 50 minutes and nobody has asked a question yet. Who has a question? Who thinks that all of this is nonsense? How many of you? I do. OK, I’ll come back in another 10 minutes. And if nobody has a question by then, I’m just going to stop and wait. OK, so anyway. If you look at a union of events … (starts @ 40:05)
… Students are given a well-specified model, and they calculate various things. This is in mathematical probability. Heads and tails are equiprobable in that system. Subsequent tosses are independent. Here’s a little bit of cynicism. I apologize for insulting you people with it. I apologize to any faculty member who later reads this. And I particularly apologize to businessmen and government people who might read it. Students compute, professors write papers, business and government leaders obtain questionable models and data on which they can blame failures. Most cynical towards business leaders because business leaders often hire consultants. Not so much to learn what to do, but so they have excuses when what they do doesn’t work out right. When I say the students compute, what I mean is this in almost all the courses you’ve taken up until now—and in this course also—what you’re going to be doing is solving well-posed problems. You solve well-posed exercises because that’s a good way to understand what the mathematics of the subject is about. Don’t think that that’s the only part of it. (starts @ 17:30)
(7) Events so rare. Zoom out and imagine a spectrum of highly improbable events running from mundanely unsurprising (even though they might be too complex to calculate!) to seemingly (to me, at least) physically impossible (despite often being trivial to calculate and far more likely than mundanely unsurprising events).
(8) Stepping on That Pebble. A mundanely unsurprising event.
You’re walking along a sidewalk and step on a tiny pebble. Who knows where its particles have been. But some unfathomably complex course of events has resulted in their being where they are now and in you—a still more unfathomably complex arrangement of particles and events—stepping, when your foot meets the pebble, in precisely the region of space you do with precisely the force you do while thinking precisely the thoughts you’re thinking and smelling the smells you’re smelling and wearing the shoe-wise-arranged particles you’re wearing and on and on.
Imagine taking stock of all the universe’s stuff at some point just 10 million years ago (much less 13.8 billion) and noting that some subset of the that stuff would, at some precise instant, come together as it did just now, when the massive and intricate complex of events called “you stepped on that pebble” occurred. The probability of such a prediction being correct must be so minuscule as to be practically, maybe even theoretically and thus literally, incalculable.
There’s nothing surprising about an event of this sort—of the sort that is also occurring with every click of my keyboard as I type this and with every blink of my eyes and every spec of dust and yawn and as with nearly everything anyone ever does.
(9) Coin Toss Given Everything That’s Ever Happened. Another mundanely unsurprising event.
While cocking your wrist to chuck a coin, you ask your friend, “heads or tails?” What’s implied, though, is “heads or tails, given the Big Bang happening as it did and the galaxies and in particular our galaxy forming as it did and our planet forming as it did and life evolving here as it did and not to mention all the species going extinct as they did and humans crawling out of the primordial goo as they did and fighting and loving and generally doing as they did so that history unfolded as it did not to mention that the butterflies flapped their wings as they did and the weather did as it did and humans responded to it as they did and then there’s the colonizing and the famine and the wars and the fleeing and migrating and our parents meeting and their genetic material mingling as it did and here we are now after having crossed paths and become friends as we did—given all that, which is a kind of shorthand for talking about unfathomably many tiny events bringing us to this moment in time—given all that which, now that it’s happened, and given that I actually do flip the coin and it actually does land and is not stolen by a passing humming bird and does not spontaneously combust, we might as well call all of that φ, with probability 1, and then ask how the coin will land given φ or in other words φ-and-heads or φ-and-tails where heads and tails are independent events but the coin making it into the air at the moment it does depends on a lot?”
(10) 100 Heads in a Row. A seemingly (to me) impossible, but easily calculable, event.
The probability of flipping 100 heads in a row is 2-100 (in case you’re rusty with negative exponents, that’s equal to (1/2)100). That’s about a 0.0000…(28 zeros in all)…0008% probability. But there is some number of flips that is supposed to make you 99% confident in its happening. That number doesn’t make me confident. More on that shortly.
(11) Massive Lottery. An unsurprising improbable event, but not mundanely so.
In a lottery wherein 2100 people are each assigned a number, each person has the same chance of winning, namely 2-100 (exactly the same chance of flipping 100 heads in a row). Each of these 1,267,650,600,228,229,401,496,703,205,376 (that’s 1 nonillion 267 octillion 650 septillion 600 sextillion 228 quintillion 229 quadrillion 401 trillion 496 billion 703 million 205 thousand 376) people hopes to win. But the chances are so low, believing that you’d lose is rational and justified, maybe even epistemically obligatory. Some number will be drawn and whoever wins will be surprised. More soon on this one, as well.
(12) Well-Ordered Deck of Shuffled Playing Cards. An event so improbable that, should it occur, I’d reverse my disbelief in ghosts and God before believing it happened by chance.
Suppose you thoroughly shuffle a deck of cards, then reveal the cards one-by-one from the top, getting a typical ordering you’d find in a newly opened deck: A♠, 2♠, 3♠, 4♠, … ,4♥, 3♥, 2♥, A♥.
This is no less likely than any other order. But far, far, far, far less likely than some random-appearing order. I’d be willing to bet, at least a small amount, it’s never happened and never will, but maybe not for the same reason I’m skeptical about landing 100 heads in a row. My intuition is that a coin’s behavior is restricted by fairness. With playing cards, I sense no such restriction. All the cards are there, just waiting to be arranged. A repeatedly shuffled deck will eventually come up well-ordered, like every other possible ordering of those cards. But I’d still be astonished to see it. Totally freaked out.
I say this even though the probability of getting 100 heads is much smaller than getting the well-ordered card ordering. In fact, my intuition is blushing at this thought. Suppose you have 100 coins lined up in a row. Is the card case no different than “shuffling” the coins by flipping each one and setting it back down in the row? Why should there be any restriction at all on any given sequence? My initial sense is that there’s something lost when moving between 52 distinct cards and 100 identical coins. Intuitively, we’re asking 100 identical coins for a token event repeated ad nauseam, but of the cards ask only for natural permutations. This is a sore confusion I’ll need to think more about.
The first thing I notice is that if I reflect for a moment on certain kinds of examples, the correcting intuition I’m trying to stoke is extinguished—for example, if you mix equal amounts of green and red liquid into a large container of water, shake it up all you like, but you’ll never see the molecules arranged so that the red liquid, green liquid, and the original perfectly separate. So I’ll avoid that thought—which I think may be a misleading comparison in the present context—and will focus on the playing cards. But first, an aside on why the thought may not be so off the mark.
On the other hand, maybe the comparison isn’t so misleading and there is something in play here like the fairness restriction I intuit with coins, but that, in both the coin and card cases, just boils down to entropy. As I’ve heard Sean Carroll say in a lecture on time and entropy:
If you shuffle cards, they tend to disorganize, they tend to become more random. It is never the case that you shuffle cards and they spontaneously order themselves: A, Q, K, J, 10, and so forth. (The Great Courses. Sean Carroll, Mysteries of Modern Physics: Time [2013]. Lecture 4: “Times Arrow,” at 20 minutes 10 seconds.)
Though what Carroll means here by “never” is not so clear. In the next lecture in that series, he says:
It’s not absolutely impossible for entropy to spontaneously decrease. It’s just really, really unlikely. It’s very, very improbable that it would ever happen in the life-time of the universe from macroscopically sized objects.9 (Lecture 5: “The Second Law of Thermodynamics,” at 7 minutes.)
An example is given in Lecture 12, in reference to a freshly broken egg encountered on a sidewalk:
It’s even possible that the particular microstate of the egg will cause it to leap up and un-break—it will Humpty Dumpty itself back into the form of an unbroken egg. That is not likely, it’s an extremely small number of microstates that would behave that way, but there’s some probability for that to happen even if it’s really, really small. (Lecture 12: “Memory, Causality, and Action,” at 8 minutes 54 seconds)
What he seems to mean by “never,” then, is that the expected time in order to see such a thing occur is such that there aren’t enough years left in the universe’s lifespan for it to happen). At any rate, this brings up the idea that at some arbitrarily number of, say, heads in a row, there simply isn’t enough time in the universe to flip a coin enough times to see that result. It seems to me this requires a kind of side-stepping of the Gambler’s Fallacy, as there should be no principled reason to think the coin wouldn’t do something along those lines on the first try—e.g., landing heads indefinitely for as long as we can flip it.
(I’m also reminded of a passage in James Gleick’s 2011 book The Information: A History, A Theory, A Flood, reproduced in this footnote.10)
That said, let’s try to fathom the unfathomable numbers involved in card permutations.
First, imagine three distinct cards: A, 2, 3. You shuffle them. There are three cards, so there are 3! (i.e., “three factorial”) ways to arrange them; that’s just 3 × 2 × 1 = 6. Every permutation has the same probability, so, the probability of getting A, 2, 3 is 1/6. We could calculated this as 1/3! to begin with.
So, for four cards we get 1/4! = 1/(4 × 3 × 2 × 1) = 1/24. For five cards we get 1/120. For six cards we get 1/72. Let’s skip to 15 cards: 1/1307674368000. That denominator is over 1.3 trillion. For just fifteen objects lined up in a row. (From here on I’ll use scientific notation, which would put the previous denominator at 1.307674368 × 1012; this just asks you to move the decimal 12 places to the right.)
Skip to 29 cards: 1/(8.841761993739701954543616 × 1030). That’s a denominator of 8 nonillion 841 octillion 761 septillion 993 sextillion 739 quintillion 701 quadrillion 954 trillion 543 billion 616 million. For just 26 objects. My third-favorite number! It’s also where the probability gets smaller than that of landing 100 heads in a row.
With a deck of 59 cards, we’d actually get a denominator greater than the number of atoms in the visible universe (i.e., approximately 1.3868 × 1080 is greater than the atom estimate of 1080), but luckily we only have 52 cards, which gives us a denominator of just 52!, or 8.0658 × 1067. That’s a probability of 1/52!, or about 0.0000 [63 more zeros] 124. So, that’s the probability of the deck coming up in any particular order.
How many times would you need to shuffle the deck to feel 99% confident in getting some particular order? A good start for figuring this out is to work out the probability of not getting the desired order in n shuffles. We can then subtract that from 1 to get the probability of the desired outcome not not happening—in other words, of happening.
Let’s make this explicit by returning to the simple three-card case. On average, you’ll see the order you want about one out of six shuffles, because there are six ways to order three cards. And you’ll see the order you don’t want about 5/6 shuffles. We could figure out how many shuffles it’d take on average to see the order you want, but, remember, we’re interested in a different question here. Namely: How many shuffles will it take to have a 99% probability of seeing the desired order at least once? Let’s call any desired order .
One shuffle won’t be enough, as that’s just a 1/6 ≈ 16.67% chance of getting . With two shuffles, we could get
first but not second, second but not first, or both times. Rather than calculate that, I’ll consider the one situation that fails: We don’t get
in either shuffle: (5/6)(5/6) = 25/36 ≈ 69.44%. So the probability of getting
at least once in two shuffles is the complement of that: 1 – (5/6)² ≈ 30.56%. We can easily perform this same operation for three shuffles by updating the exponent, which increases the chances for
: 1 – (5/6)³ ≈ 42.13%. And for four shuffles: 1 – (5/6)4 ≈ 51.77%. And so on. Recall that this is for three cards being shuffled however many times is in the exponent.
Generalize this by adjusting the probability of failure. For example, with four cards, the probability of getting is 1/4! = 1/24, so the probability of not getting
is just 1 – 1/24 = 23/24. So we can input that, instead of 5/6, into the equation we were using for three cards. For instance, the probability of getting at least one occurrence of
in five shuffles of four cards is: 1 – (23/24)5 ≈ 19.17%. In general, the probability of getting at least one occurrence of
in n shuffles of x cards is 1 – (1 – 1/x!)n.
We could keep turning the knobs on this and trying larger exponents until the output is 99%. Instead, let’s build an equation. To make things easier, I’ll start with the simple example of three cards. What I want to know is: what does n need to be in order to get 1 – (5/6)n = .99? Again, this is asking how many shuffles we need in order to feel 99% confident in seeing at least once. I can move things around a little to turn this into a straightforward logarithmic calculation: (5/6)n = .01. This is just
≈ 25.26.
So, we need about 25.26 shuffles for feeling 99% confident (by which I mean for assigning a probability of 99%). Rounded to 26, we get 1 – (5/6)26 ≈ 99.13%.
Now generalize this for figuring it out for whatever number of cards. The numerator will stay log(.01), as we’re looking for 99% (if we were looking for, say, 98%, we’d change that to .02, etc.), but the denominator will change based on how many cards are being shuffled. We already worked out above that we can figure out what base to use in the logarithm by computing 1 – 1/x!, where x is the number of cards. I’ll make this into a function f, and will pop a bunch of examples into Desmos. We also might as well clean up our result by forcing an output of the next highest whole number (i.e., with a ceiling function). Here’s what we get for three cards, four cards, five cards…:
Desmos gets shaky after 15 cards. I soon started getting different answers there and at Wolfram|Alpha, and both calculators soon petered out. But we can see that the numbers grow fast and, besides, popping 19 into the formula represents the precise answer, and that’s interesting in itself. Anyway, to feel at least 99% confident about getting from 15 distinct cards, you’d need to shuffle about 6,022,021,699,600 times, which Wolfram|Alpha tells me is about 20 times the number of stars in our galaxy and about 57 times the number of people who’ve ever lived (on Earth, I presume). That’s at just 19 cards, so we can also assume that getting
with 52 cards would require an unfathomably humongous number of shuffles.
If that’s not daunting enough, keep in mind that, because shuffles are independent goings-on, if your first shuffle of, say, three cards doesn’t produce , you should not now expect 25 more shuffles to hit 99% for
. Rather, it stays at 26. It’s as if that first shuffle didn’t happen. That’s event independence.
I’m imagining now the possibility that there have been very few hands of cards played with the same deck. Maybe it’s never happened or maybe it’s only happened once—like, the same permutation happened for a game in 1602 Paris and then again in 1807 Shanghai, and no other permutation has since repeated. Maybe my intuition is wrong on this and it’s like a magnified version of the birthday Problem.11
But I don’t think so. A few minutes of playing around with small numbers of cards, which I slowly increased, showed that as the permutations grow, the probability of anything even approaching 1% for seeing a repeated permutation any time soon becomes quickly elusive, but I couldn’t make the numbers very big before my calculators rebelled (e.g., for 15 cards, seeing at least one repeat within 100,000 shuffles is about 0.3816%; I couldn’t get an answer for 1,000,000 shuffles; see this footnote for the equation I used12). It’d make for an interesting Fermi problem to think more deeply about this question, starting with thinking of a way to estimate the number of rounds that have ever been played with 52-card decks.
At any rate, I do have the intuition that with enough shuffles, you’ll get a repeat. In fact, it’s intuitively obvious to me that you must get a repeat by (52! + 1) shuffles with a standard deck13. To see this, observe that three cards can be ordered six ways, which means that on the seventh shuffle, you’re guaranteed a repeat if you haven’t seen one yet. This obviously doesn’t mean you’re guaranteed to see all six permutations by then—no amount of shuffles guarantees that. Still, I have a strong intuition that you’ll eventually see all six permutations. In fact, as there’s a 1/6 probability for any given permutation, I’d expect about 14.7 shuffles on average to see all six of them. This is easy to calculate:14
How different is this from, say, lining up 100 coins and “shuffling” them? There are 2100 permutations. Let’s make that number smaller, using three coins, which means 23 = 8 permutations. After nine flips, you’re sure to see a repeat. With four coins, you’re sure to see a repeat by 24 + 1 = 17 flips. And so on, up to 2100 + 1. Does this convince me that every possible permutation will eventually repeat? For small numbers, sure. For three coins I’d expect it to take about 22 flips on average to see all six permutations; for four coins about 54 flips; for 25 coins about 600,822,143 flips. Actually, 25 is already pushing it for me, but I can deal with that.
Once we’re asking about 100 coins (my calculators couldn’t handle much above 25), my intuition rebels. (I’m tempted to say that the threshold for my intuition rebelling seems to match that of my calculators’.) Rather, intuitively, I feel that any repeating sequences will only be those that demonstrate fairness behavior. But wait. If I rule out the possibility of all heads, all tails, and any other patterned outcome, then we would have to see a repeat within 2100 – (ruled out sequences) + 1 “shuffles.” That’s absurd. But my intuition is devoted to this absurdity.
That devotion can be tested by extending the absurdity into a coin-flip variation on the aforementioned Klingon example. Suppose a thoroughly random-seeming sequence of billions of coin flips, converted to 0’s and 1’s, turns out to have profound meaning for some super-intelligent mind—that of a powerful digital computer, maybe. Perhaps those 0’s and 1’s amount to a program that generates some patterned or ordered engagement with the world. Perhaps any computer running this program would become conscious. Or, if we’re in a computer simulation, maybe the sequences turns out to be what I’ll now dub a Genie Sequence or God Sequence or even just a subset of the KEY Sequences (i.e., unlocking DOORS and WINDOWS in the program, between levels, worlds, universes, dimensions). This isn’t the Matrix, mind you: in the Matrix, you have a non-computer-simulated body somewhere “out there.” In the simulation, we’re purely made up of such 0’s and 1’s. Are all such “meaningful patterns” off limits? Are they off limits precisely because one could accidentally create a Genie Sequence? Is something analogous going on in our actual world, even if we’re not in a computer simulation?
Even backing off of this sci-fi scenario and returning to the simple idea that all meaningful patterns would be cut off, I think I have a satisfying response. There’s nothing to stop a fair coin from flipping a random-appearing pattern. Even if it were practically hopeless to try to guess it ahead of time. As I’ll point out shortly, I think if someone were to try to predict 100 flips in a row, their best bet would be a random-appearing sequence. Which is to say I think “Constantinople” could come from the cards. What I’m skeptical about, however, is a fair coin yielding physically repetitive results indefinitely. This is consistent with the idea that there is a distinction between the physical realm of material particles and the metaphysical realm of meaning, though they may be bridged by simple orderings and patterns.
(13) It seems, then, that I don’t have the intuition that there’s anything like a fairness restriction in the playing cards case. Make the deck as huge as you like, all permutations will come up with enough shuffles. This distinguishes cards from coins. The notion of asking the coin for repetitions of the same token event seems to be on the right track. I don’t think this is enough, but I’ll return to it after working through some other thoughts.
(14) My intuition may face yet further embarrassment. Suppose we assign a number to each of the 2100 permutations you can get from a sequence of 100 coin flips. S1, S2, … , S2100. We can now refer to each sequence just by its number. Put each number on a card and into a big hat. If we draw cards from the hat enough times, with replacement, all numbers will eventually come out. I have no problem with that idea, even though each number has the same probability of coming out as its corresponding sequence. I don’t sense a fairness resistance there.
Suppose, however, that we have 200 cards: 100 depict an “H” and the other 100 a “T.” This is a bit tedious, but the 200 cards need to be alternated in a stack: HTHTHTHT… Now, take two cards from the top, shuffle them, put one on your left and the other on your right (keep them facedown for suspense). Do this 99 more times. My intuition is just barely convinced that, done enough times, you will end up with a stack that’s all H’s (and thus the other stack is all T’s).
What is different between these cases and the coin case?
Perhaps the difference lies near the fact that you cannot make a machine that can ensure all heads in the above card case. In theory, if I practiced enough, I could learn to flip heads consistently. I cannot do that with well shuffled cards. There’s nothing about the cards themselves that carry the fairness bias. At least nothing I can see.
On the other hand, we might say that any technique for landing heads repeatedly is not analogous to being “well-shuffled.” The aforementioned machine, for example, requires that the coin start heads up if you want it to land heads (or you could calibrate it to start tails up, I suppose). You could not, however, make a machine that ensures the coin lands heads if you’re flipping it onto a highly bouncy surface. In other words, the machine stacks the deck, as it were.
Finally, my intuition commits me to getting all heads being impossible in the following case. One hundred cards are each labeled “heads” on one said, “tails” on the other. Put the cards into a large box. Shake up the box then grab a card at random and note whether heads or tails is face up. You can now either put the card back or not. My claim is that you can pull cards as many times as you like, and you’ll never get 100 consecutive heads. Which is to say that my intuition is committed to the claim that you’ll get all possible permutations from the cards when they are printed only on one side, but once they’re printed on both sides, and which side you see is randomly determined, then you will not see all possible permutations. Why? Due to the same reason you wouldn’t were you to play this game with coins instead of cards in the box.
Yes—my intuition’s flushed.
(15) Here’s a slightly weaker claim about landing 100 heads in a row.
No matter how many people you get together, it’s unlikely that someone will correctly call 100 coin flip results. There are different ways to play this game. One is to have each person predict the results ahead of time. With 2100 people collaborating to ensure that all 2100 possible sequences are guessed, one person will guess the correct sequence; and with half that many people—i.e., 299—there’s a 50% chance of someone getting it right.
In the example I’ll explore here, assume they don’t collaborate.
Another approach would be to have each guess made and shouted out just before each flip. Here, if everyone can somehow hear everyone else, it wouldn’t help at all unless each keeps track of who’s guessed what. Even then, it’s hard to imagine this going well. For example, it’s hard to imagine that, at any particular instance, everyone would be willing to simultaneously yell “heads” when they hear everyone else is yelling “heads.” But, out of the 100 flips’ 2100 permutations, half of them—i.e., 299 of them—will have heads in the first result. And half of those—i.e., one quarter or 298 overall—will have heads as the first two results. And so on.
(This reveals an efficient way to collaborate when allowed. Assign each person a number from 1 to 2100, then for the first flip ask the bottom half of the group to call “heads” and the top half to call “tails”; for the second flip, of all who called heads, ask the bottom half of that group to call “heads” and the top half to call “tails,” and do precisely the same for the group that initially called “tails.” In other words, make a human probability tree. This is also a very quick way to list out possible permutations of a coin toss or truth table, etc. Check out this footnote for a simple example.15)
At any rate, assume nobody knows what anyone else has guessed until the results are counted.
What sort of guess would you give in this game? Is any guess as good as any other? It’s often noted that the human mind is not naturally suited to recognizing or producing appropriately randomly behaved results. Thus the common story of a statistics teacher recognizing when students have faked their homework assignment to record a hundred coin flips. They get the statistical behavior wrong—neglecting, for example, to include sufficiently long strings of consecutive heads and tails. There’s a reason experienced statisticians can see this error.
Computers are even better than instructors at this. Consider this fun webpage, where you’ll find these instructions: “Press the ‘f’ and ‘d’ keys randomly. As randomly as you can. I’ll try to predict which key you’ll press next.” I tried it and once managed to keep it at 50% for a good while, but it took practice and concentration. I then very quickly managed to hover at 50% by flipping a quarter to determine what to input.16
All this in mind, in strategizing for the guess-100-flips game, I’d hesitate to guess a sequence that would make a teacher immediately think, “ah, most likely a cheater.” Rather, I’d guess something that betters maps to the expected behavior of a fair coin. Have a look at two sequences of 200 coin flips, one fake and one real. Which would you say is real? Spend no more than 60 seconds deciding.
Sequence #1
THHHHTTTTHHHHTHHHHHHHHTTTHHTTHHHHHTTTTTTHHTHHTHHHT
TTHTTHHHHTHTTTHTTTHHTTTTHHHHHHTTTHHTTHHHTHHHHHTTTT
THTTTHHTTHTTHHTTTHHTTTHHTHHTHHTTTTTHHTHHHHHHTHTHTT
HTHTTHHHTTHHTHTHHHHHHHHTTHTTHHHTHHTTHTTTTTTHHHTHHH
Sequence #2
THTHTTTHTTTTTHTHTTTHTTHHHTHHTHTHTHTTTTHHTTHHTTHHHT
HHHTTHHHTTTHHHTHHHHTTTHTHTHHHHTHTTTHHHTHHTHTTTHHTH
HHTHHHHTTHTHHTHHHTTTHTHHHTHHTTTHHHTTTTHHHTHTHHHHTH
TTHHTTTTHTHTHTTHTHHTTHTTTHTTTTHHHHTHTHHHTTHHHHHTHH
Anecdotally, people tend to guess that the one with shorter strings of consecutive heads or tails is the real list. This aligns with most people’s attempts at faking results. That in mind, Sequence #1 seem a good bet, with two strings of eight heads, compared with Sequence #2’s single string of five heads. Sequence 1 is the correct answer.17
For 100 flips, then, you might want to include a string or two of five or six heads and tails in a row, one of seven heads and/or tails, and one of eight heads or tails. It seems to me that, while the game is still hopeless and I’d bet against anyone winning (I wonder how many repeated answers would be given), experienced statisticians are at an advantage. Something like Sequence #1 above would be a better bet than Sequence #2. But if all heads really is just as likely as a well-behaved, “random-appearing” sequence, then there should be no such advantage. So maybe there isn’t one. But I’d be willing to bet that if someone does win, the winning sequence will be well-behaved and “random-appearing.” And I’d be willing to bet that certain aspects of the results will be predictable; which is to say that the actual sequence will be well-behaved and “random-appearing.” It won’t look like a cheater made it.
Of course, my more reasonable intuitions respond to such thoughts with the standard, “Yes but there are far more ‘well-behaved’ random sequences than there are patterned-appearing ones, so it’s really just a good bet that ‘non-well-behaved’ results are fakes.” To which my naive intuition replies, “My point exactly.”
The above-cited paper gives the following formula for estimating the longest expected run in n number of flips: . Here are some example results. For 100 flips, expect the longest run of heads to be about 5.64; 200 flips yield a longest run of about 6.64; 1,000 flips yield about 8.97; 2,000 flips yield about 9.97.
In a moment, I’ll share the real-world results of a 2,000-flip sequence. First, a few more thoughts on the intuition I’m indulging here. (I keep referring to my intuition, though some of my concerns are, I hope, intellectual. My aim is to mediate so the two get along better—I like them both.)
(16) I suppose my worry comes down to a reasonable idea of fairness. The expected response to a coin landing heads indefinitely would not be, “Well, that’s as likely as any other string of outcomes, so I have no evidence the coin isn’t fair.” Rather, nearly anyone would consider it heads-biased. And not because of the low probability of the specific, all-heads sequence observed. But because that’s just not how a fair coin behaves. No appeal to averages will bear this out. There’s no need to say, “This violates the theoretical expectation that, when flipping a fair coin 100 times, I’d see a mixture of exactly half heads and half tails about 8% of the time, and a mixture containing between 40 and 60 heads about 96% of the time.” These averages don’t rule out the minuscule probability that you’re in a 100-heads-in-a-row world, particularly given that the probability of seeing that sequence really just is the probability of seeing any other. All you need is the intuition that a fair coin simply doesn’t behave that way—an intuition that predates quantified probability theory.
We’ve formalized this age’s old intuition into a probability calculus that accounts for what we expect from a fair or non-fair coin—it accounts for it so well that, when results are counterintuitive, it’s often our intuition, and not the math or model, that needs correcting; especially when we’re outside of a domain as simple as that of coin flips. Though, even that domain, I worry that we may get lost in the models, however useful they may be. (For a counterintuitive problem involving just a single die, and for which good models are invaluable, see my post, “Counterintuitive Dice Probability: How many rolls expected to get a 6, given only even outcomes?“)
The danger of getting lost seems, unsurprisingly, to increase as models account for more complex and unfathomable cases. At some point, unfathomable doesn’t get more unfathomable—our intuitions simply drop out. Here’s an example of what I mean by this.
I remarked above that with just 59 distinct objects—e.g., cards—there are more ways to order them in a row than there are atoms in the observable universe. I recently mentioned this to someone who said, “Well, that’s just the observable universe, so not really that many atoms, right?” We’re talking something like 200 billion galaxies and an estimated 1080 atoms. And yet there are about 1.4 times that many ways to seat just 59 people in a row (just pop 59! into Google search to see that).
What if we make that number much bigger? Not too much bigger, as with the uber-complex system called the “human brain.” I’ll go someplace smaller, but still unfathomable: chess.
In his 2012 book, The Signal and the Noise: Why so Many Predictions Fail—But Some Don’t, Nate Silver writes:
The number of possibilities in an entire game of chess, played to completion, is so large that it is a significant problem even to estimate it, but some mathematicians put the number as high as . These are astronomical numbers: as Diego Rasskin-Gutman has written, “There are more possible chess games than the number of atoms in the universe.”18
But power isn’t just bigger than 1080. It is much much much much bigger! Instead of 10 multiplied by itself 80 times (i.e., the number of atoms), we’ve got 10 multiplied by itself 100,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 times (i.e., the number of chess games).
Silver doesn’t point this out, but there’s no need to. Once we’ve established “greater than the number of atoms in the universe,” extra orders of magnitude amount to slightly different-looking shapes on the paper saying pretty much the same thing: “it’s a big number.” I mean, 1082 is a hundred times bigger than 1080, but it doesn’t feel that way thus notated. And if we can so easily toss around such numbers, maybe my friend was right: the number of atoms in the visible universe isn’t really all that big. I mean, if you can surpass that by permuting just 59 distinct playing cards, how big a number can it really be? And if that’s not so big, then why would a coin landing 100 or even 10,000 times in a row be such a big deal?
My point here is that these quantities really are “all that big,” and we—by which I mean you, me, everyone—are not good at conceiving of them—of feeling them. Nor can we feel averages (which may or may not be an artificial notion in its own right; either way, it’s a highly useful one for mapping phenomena in the world). So, most people, when trying to produce a random-appearing series of heads and tails, will overcorrect and thus fail to include relatively small strings—like eight heads in a row—that aren’t all that uncommon on average given many flips, because the assumption is that those strings are too big; while simultaneously being insensitive to what that intuition might mean for huge numbers, where it might actually be on the right track: “100 heads in a row? Sure, it’s possible. Why not?”
This seems at first paradoxical, but not so much so when we consider just how alien the domain of large numbers is to human conception. Interestingly, this might make us unduly impressed when someone nails a relatively small number of predictions in a row when, if thousands were trying, at least one person was likely to get lucky.
This doesn’t just apply to the good luck of financiers. Things can also go disastrously in the other direction as well. An oft-mentioned example is that of Sally Clark, who was wrongly convicted in 1999 of having killed her two infants on the faulty grounds that the probability was extremely low for two siblings to die of SIDS. The expert who testified to this fact treated the deaths as independent and in general failed to properly condition, thus committing the “prosecutor’s fallacy.” Another possible example is that of a nurse now serving 17 consecutive life sentences on similarly faulty grounds of rareness, discussed in these 2016 articles: “Ben Geen: Thrill-Seeking Killer Nurse or Innocent Victim of Statistics?” and “Miscarriage Watchdog to Take Another Look at Ben Geen Case.”
The point here, I suppose, is that statistically rare events happen more often than we might intuit: car accidents; getting struck by lightening; three kids in the same neighborhood happening to getting cancer; winning the lottery twice; 12 heads in a row; coincidences surprising enough to inspire a belief in fate (despite living among countless uninspiring coincidences, such as Stepping on That Pebble). While we seem to think some far less likely events aren’t so impressively rare: 30 heads in a row (maybe because no one’s ever seen it happen, and so it’s an easily shrugged away fiction; someone who actually saw it happen might be driven mad). Add to this the notion of surprising versus unsurprising improbable events, so that some incredibly improbable event is promised to happen, while certain of those outcomes would be shocking (e.g., 100 heads in a row) while others would be mundane (e.g., 100 random-appearing flip outcomes), and this all becomes a confusing mess (at least to me).
That noted, I’d here like to consider what we can expect from a fair coin by looking at relatively short strings, even when they are parts of bigger strings. Note that I mean by “short” and “bigger” is vague or contextual. When aiming for 100 heads in a row, a billion tosses is a short string.
(17) Suppose we manage a billion tosses. This, plus another 73.7 million flips is about the expected (i.e., average) number of flips needed to see 29 heads in a row. We can bear out this theoretical number with a simulation, or just by using this formula: 2n+1 – 2, where n is the number of flips.19 Relatively short runs are obviously easier to test and to get a feel for, but trying this empirically is potentially misleading. It’s true that long runs are just made up of strung-together, independent shorter runs. But we can’t do enough of those shorter runs to bear out what happens in shorter runs given tremendously many instances of them.
On the other hand, if my naive intuition is correct, any long run will be made up of a concatenation of runs short enough to test. That is, the behavior expected of relatively short runs in isolation would play out repeatedly over the course of any long run. But wait—this is in fact the case. If, for example, running a computer simulation of a very long run, it would be perfectly fine to simulate one flip at a time and string those together. Doing this (or flipping a single coin) 2,000 times is no different than enacting 1,000 flips twice, or 20 flips 100 times, or 5 flips 400 times, and so on. You can of course put years between each flip, and use a new quarter for each experiment. It’s a challenge to drop the sense that something magical happens given huge numbers of flips enacted in short succession; but my intuition claims that such magical thinking is just what it’s rejecting.
A reminder here that this is all absurd. If you’ve landed 26 heads in a row, there’s no reason you can’t do this again. Things always start over with the next flip. Coin flips are memoryless. And the sun will rise tomorrow, but not because we’ve seen it do so repeatedly until now, but for other, deterministic reasons. And one day it won’t rise again (also for deterministic reasons). And Becca will be in the elevator tomorrow morning, everything-bagel and almond milk latte in hand, as she has been the last 417 mornings Orson has entered the elevator; but she won’t be on Monday because she’s moving to Vegas, but Orson doesn’t know this. On the other hand, if you’ve seen a coin land heads 26 times, this is grounds for updating your beliefs about the fairness of the coin. Becky’s may move to Vegas on a whim, but the sun won’t decline to rise tomorrow on a whim, nor will the coin decide to be heads-biased for a while. The coin is memoryless in terms of token outcomes, but not in terms of its bias.
That baffling reminder in place, I’ll continue. It should take, on average, about two flips to get to a single heads; six flips to get see two consecutive heads; 14 flips to see three; 2,046 flips to get to see ten, and so on. But, again, if you’ve flipped a coin 2,036 times and haven’t yet seen a 10-headed string, don’t expect it to be around the corner. You should still expect it to take about another 2,046 flips. My intuition doesn’t have a problem with independence. What it has a problem with is precisely what seems to disrespect independence: the idea that, in enough flips—in some calculable number of flips, in fact—something utterly uncharacteristic of “fair” should be expected to happen. But why does this depend on quantity of flips?
One might reasonably and even correctly argue that it would be uncharacteristic of “fair” for 100 heads in a row not to happen by a certain number of flips, given that this would show unfair preference by the coin in the grander scheme of things. I suppose it begs the question, then, for me to appeal to some idiosyncratic definition of fairness in order to question implications of some definition of fairness I happen to not like—that is, a more mathematically reasonable definition.
So I’ll just say that my intuition doesn’t respond positively to the idea that, for as long a finite string you’d like to see, there is some mathematically supported number of flips that will justify 99% confidence in seeing it. This is in part because, not despite, the fact that any given substring in that longer sequence is independent of all other substrings.
(18) For fun, I just rolled a die looking for two 6’s in a row. It happened on the 47th roll. Then I did it again, but in five rolls. If I keep doing this, it should average to about 42 rolls. With a good amount, but not too much, variety. I won’t see a string of a thousand 1’s, nor a string with a hundred instances of 1-3-5-2-4-6 repeated over and over.
So, I’ll reemphasize here that, while behavior over relatively short strings is a nice guide to understanding fair behavior, what to expect in the long run must be taken in the context of longer strings that may or may not be made up fo characteristically “random-appearing” shorter strings. My intuition suspects the former. At least, while averages are of course useful for doing well on average, for concrete predictions a narrow perspective is better (e.g., you wouldn’t call 3.5 when rolling a die once, even though that is the average outcome of a fair die).
(19) We can verify expected behaviors by doing runs as long as we can manage. (I’d prefer actual runs to simulations.) Unfortunately, impressively long runs are physically out of reach.
The average number of coin flips expected to get 100 heads in a row is 2,535,301,200,456,458,802,993,406,410,750 flips. And, as always, if you’ve flipped 2,535,301,200,456,458,802,993,406,410,749 times and haven’t yet seen 100 in a row, expect another 2,535,301,200,456,458,802,993,406,410,750 flips to go.
I’m not going to worry about how many flips are needed to be 99% confident for getting 100 heads in a row. But I did work out what it takes to reach 99% confidence for getting three heads in a row: 57 flips.
For fun, here’s how I did it. I brute-force listed out the first several ways to get the first several results, looking for a pattern. The emerging sequence was like the Fibonacci numbers, but you add three numbers instead of two (a nicely consistent result, given that Fibonacci numbers are what you’d use for dealing with two heads in a row, something I similarly worked out). I put that sequence into the oeis, saw that it’s the Tribonacci numbers (new to me), then grabbed an explicit formula from Wolfram|Alpha’s “Tribonacci Number” entry. I adjusted the formula a little to suit my purposes and put it into Desmos to get the below answer. I assume Tetranocci will yield the number of ways to get four heads, and so on. Also: What a great time to be an autodidact!
This yields approximately 0.9904.
Going from left to right, n represents the number of flips. Raise 1/2 to the nth power because 1/2 is the probability for heads and for tails. For example, the probability of getting THHH is (1/2)4 = 1/16. (Notice that there is just one way to flip a coin four times so that the only instance of three heads in a row happens over the last three flips.)
The beautiful beast to the right of the 1/2n (tamed by a nearest integer function [x]) gives Tribonacci numbers starting at 0 (because there are zero ways to get three heads in one or two flips); this formula outputs how many ways there are to get each result (e.g., there are four ways to see your first instance of three heads in a row in six flips: TTTHHH, HTTHHH, HHTHHH, THTHHH; seven ways to see it in seven flips; 13 ways in eight flips; 24 ways in nine flips…; notice that each of these ends in THHH). We then add up those probabilities for seeing it in 3 flips + 4 flips + 5 flips + 6 flips + … + 57 flips = (1)(1/8) + (1 )(1/16) + (2)(1/32) + (4)(1/64) + … + (1.2087971295 × 1014)(1/2)57 ≈ 99.04% probability for seeing 3 heads in a row by 57 flips.
Interesting that, given 57 flips, there are 1.2087971295 × 1014 ways to see the rare event of not getting three heads in a row until the last three, finally successful flips.
Ok, so that’s 57 coin flips to be 99% confident for getting three heads in a row at some point, though it will take 14 flips on average to get three heads in a row. It’s 21 flips for being 99% confident of seeing two heads in a row, by the way, though what it actually takes should average out to six flips. We could also calculate 99.99999% confidence for these. We could also do so for biases other than 50%—say, 1% in favor of success. In other words, with a 100-sided die, there is some number of rolls that should make you 99.99999% confident in seeing a million 17’s in a row.20
Ironically, though not surprisingly, with larger rows of heads, my confidence goes down as the confidence-percentage goes up. I can’t get myself to believe that a fair coin would ever stop acting like a fair coin (does in short-ish strings), and my skepticism is bolstered by being told that I should ever be 99.9999% confident in seeing it happen.
(20) John Edmund Kerrich (1903–1985) was a mathematician who recorded the results of 2,000 coin flips while interned by Nazis in 1940s Denmark. Here are the results (1 is heads; 0 is tails):
00011101001111101000110101111000100111001000001110001010101001000010011000100001110101000100001011010111010000110100101000001111101111100110110010101101010000011000111001111101101010110100110110110110011111000011101100010100100000101001111110111010111000110001100011000110011010010000100001110111100011111110000000001101011010011111011110010010101100111011011100100000100011001011001111101001111000100000100110101110101011001111101100100000110101111111010001111110010111111001110011111111010000100000000011111001010101111000011101110010001101000011111100010100111111110110111011011101101001011011001101010011011111110010111000111101111111000001001001010011101110110110111111000001010101010101010010011110110111001110000000100110101001100100010000110010111100010011010110110111001101001010100000010000000010110011010110111110001011001010000111001100111110010101101000011000100110001001000110010000100101000011100000011101101111001110011010101101001011010000011101101000100011100100111000010100000000101001000101100001001010001111110110111101010101000001100010100000100000000010000001100100011011101010110110001101110101100100101110001011011010101101100000101101110101010100001110011100011010011101110110001101110000010011110001110100001010000111110100001111111111110101010010011000101111001010100011111100011010101001101001011111000011101111011001100111111010000011101010111101101011100001000101101001100110100001011111011110101100110111100000101100100011011010111110101110010100110110010001100001100001010011000110100111010000011001100011101011100001110101110111101011011011110011110111000110110100000101111010011101100100111000111101100001111001111101101011101110011011100011001111001011101010010010101000110101110110001111100000110000000100111010111000101110100010111111011100000111111101100000001010111111011100010000110000110001111101001110110000000011110111000111010100010110001101110100011101111000001000011010000010100001010100010110001011110000101110010111010010110010110100011000001110000111
In point (15), I used the formula to calculate that the longest run we’d expect in n = 2,000 flips to be about 9.97. The longest run here is 12 heads (in red). In isolation (i.e., if you just flip a coin 12 times), getting 12 heads in a row has a probability of 1/4096. Getting 10 heads (or tails) in a row has a probability of 1/1024, but we don’t see one of those here, nor do we see an 11-string (with 1/2048 probability; I just ran a 3,000-flip simulation in R and 11 was the longest string I saw; nothing about it was surprising). Strings of seven (1/128) and eight (1/256) are easy to find: there are six strings of eight tails, and two of eight heads (not including within the string of 12). None of these results is surprising, especially in the context of 2,000 flips.
The probability of getting exactly this sequence from a fair coin is 22000. Hysterically unlikely. But something had to come up. There’s nothing surprising here.
How seriously would the results be taken (by the non-superstitious) had the reported results been all heads or any other seemingly predictable pattern? Not very, certainly, and reasonably so, given that so many other explanations would be far more likely. However, the string that did come up has the same small probability of any of those results.
That in mind, suppose that this were the precise sequence we’ve all memorized and are on the look out for and that is the subject of discussions on coin flips like this and Taleb’s Fat Tony example. Someone’s claiming to have documented this exact sequence in just 2,000 coin flips would then be highly suspicious. I’d bet a fair amount of money, in fact, that this sequence will never come up again in the lifetime of the Earth even if humans made it their central project to see it come up again, not because it’s impossible, but because it is so so so unlikely. I say this even though the sequence exemplifies what I mean by “fairness.” To be clear, I’m not just claiming that no single coin will live this sequence. Feel free to draw a border around any sequence of 2,000 flips of 2,000 distinct coins, even if they happen in different countries, so long as they are non-arbitrarily chosen (i.e., are immediately sequential, or, if possible, simultaneous), and the Kerrich sequence will not repeat.
(21) I repeat: If I can get 12 in a row, why not another 12? Boils down to: If you can get one heads, why not another? But if my intuition is right (and I’m not saying it is), this does not generalize. Because it’s not how a fair coin behaves.
Building our expectations about coin flips onto concepts about a single flip provides the easiest rule—it most compresses a sequence of 100 or 1,000 flip results as a whole, so it appears to be one event repeating over and over and over. A self-contained, atomic microcosm that still somehow holds all the rules. But it doesn’t. It’s a fiction. If my intuition is right, that is (I’m not saying it is).
There’s a comfort in a rule like this—for avoiding dangers, solving problems, extracting meaning; in short, for repeating tokens of event types. Another sort of comfort is the idea that nature, or some such fundamental entity, even if it is more complicated than we can understand, may express (fine-tuned) patterns simple enough for us to decode, whether explicitly (e.g., indefinitely many heads from a fair coin, which amounts to a kind of nature-violating miracle) or implicitly (e.g., predictable averages over repeating tokens of events; itself a kind of miracle, but a fancier and scientifically approved one, even if it admits of getting 10,000 heads in a row).
In more cynical moments, this all strikes me as part of humanity’s long search for meaning where there is none. More specifically, to turn the question of how there is a universe to why there is one, which often means inserting a sentient meaning-creator where there is none.
(22) Alternative thought. There is a pattern there, but it’s of a higher order than what we can perceive in the usual, basic sense. Rather, we are able to get at the pattern—itself regulated by physical laws—with mathematical tools such as averages. For a fair coin, this means behaving in a certain way, a way that, if our models don’t capture it, then they are incomplete, and fail to capture a constant in nature that here I’m calling “fairness.”
Maybe this does mean that facile patterns—of the sort basic humans senses are able to discern—are indeed naturally ruled out of the sample space. This isn’t necessarily to say that such patterns don’t exist in nature; on the contrary, it’s reasonable to think that one condition for the evolution of lifeforms is the presence of easily tracked patterns in nature, or at least sequences of events similar enough so that they can be represented (e.g., phenomenologically) as repeating events. But those events don’t repeat randomly. The moon’s orbit isn’t random and the more-or-less stable integrity of my tea cup isn’t random (though some random event, even internal to the tea cup, could disrupt that integrity).
Another sort of constant finds its expression in average behavior, even if underlain by what we call “randomness.” The average tendencies we pick out are of a higher order of the phenomena that appear as more basic—i.e., lower-order—patterns, but I would imagine not that much higher. Just a step above those such as all heads or HTHTHTHT or any similar sort of result we’re equipped to perceive, in a basic (or lower-order) sense as a “pattern.” Maybe, then, just as nearly all numbers are not rational, nearly all—or perhaps all—physically possible outcomes of 100 coin flips are not recognizably patterned (and, of course, even were some short or long string to appear patterned, this is just an appearance in the sense that you cannot use the apparent pattern to predict what will happen next, unlike that of an orbiting satellite; though you can us the mathematically graspable higher-order pattern to predict averages, provided you’re right about the relevant probabilities in play).
Thus, if the so-called patterned outcomes were to be extracted from the human-concocted sample space, this would not be on account of an arbitrary and misguided whim of human psychology in an effort to hyper-self-correct. Rather, the fact that they appear patterned would be a real indication that they are in fact just wishful or misguided thinking, something impossible such that, should they occur, the physical laws of fairness would be violated: a miracle. Leaving them in the sample space would amount to a kind of superstition, which would be corrected by taking them out, or at least by acknowledging that they are indeed impossible, though we can do nothing other but to leave them in.
(One can easily imagine a series of in-again/out-again hyper-self-corrections as we view ourselves to be thinking too small by leaving them out, then view of ourselves as bringing the universes down to the level of human thought by leaving them in; one way or another, we’ll view the model-mediated engagement of the human mind with the mysteries of the natural universe as a kind of blasphemy in need of correcting and supplication.)
On this view, and on any view that posits a “fairness boundary,” the KEY sequences and profound phrases in Klingon are possible so long as there the fairness boundary isn’t violated. It needn’t be, as those examples needn’t follow the sort of pattern human minds recognize; which, again, is a kind of adaptive capacity selected for naturally (and maybe in some ways artificially); seeing such patterns—i.e., from which what specifically happens next can be reliably predicted—or thinking one even could see them, in long fair sequences is a misfire of what is in other ways a beneficial adaptation.
I’ll leave this alternative thought here. On the other hand, maybe I’ve just given the purest expression of what my naive intuition believes about the world.
(23) My basic claim, if I understand it, is that there’s some natural restriction imposed by a coin’s bias—or, better put, the bias we assign the coin reflects how the coin physically behaves in the world in response to natural conditions (I say “natural” because it’s possible to impose sufficiently similar conditions repeatedly, such that the coin lands heads repeatedly; e.g., see again the aforementioned work of Persi Diaconis: “The Not So Random Coin Toss“). Indeed, this natural restriction or pressure just is the bias—a number we generate as a function of that pressure. And so on. There are outliers, but those too are reasonably restricted. To what? I don’t know. But I confidently claim 30 billion heads in a row will never happen. Similarly, there’s a natural pressure that makes the probability zero of meeting a biological human who’s 200 billion years old and 17 trillion feet tall—there is an answer to the question, “If you aged one inch and grew one inch, why not one more of each?” (“Why not, indeed?,” asks the Transhumanist. Fair enough.)
“Natural pressure” may be the wrong term, as word any term that invokes the idea of some sort of ongoing physical pressure; in other words, it casts coin flips as something other than memoryless, when they indeed are memoryless. It would be ridiculous to suggest that there’s some force field that ensures a coin doesn’t land heads more than, say, 28 times, such that if you landed 28 heads, stuck the coin in a drawer for 30 years, came back and flipped it, you’d be promised a tails. That’s an obviously egregious instance of the Gambler’s Fallacy, but no different than expecting tails 30 seconds later.
If I’m not speaking of some natural pressure or force, then what am I saying? I’m saying if it lands heads a 29th time, I’d be at least agnostic about its fairness, even if it happened 30 years later. If I saw another 29 heads come from that coin, I’d reject that it’s a fair coin. If it then began behaving like a fair coin… I don’t know what I’d think. Magic? Nanobots? Ghosts? Hypnosis?
Here’s another attempt. I’m clearly not positing an explicit restriction on any particular coin toss, but rather a behavioral tendency that emerges from the conditions composed of the coin’s physical attributes (including its body, its environment, and the physical forces that mediate the relations between those things). The result amounts to a restriction of the sort of behavior the coin may exhibit. The coin will not, mid-toss, morph into a dragonfly or start singing the blues or spin in place indefinitely or land heads 100 times in a row—unless some unusual outside force intervenes. But those things won’t happen “naturally” on a typical Wednesday afternoon in Flushing.
(24) More intuition-probing with coin flips and lotteries. Suppose 2100 people put a unique number into a hat. Each person has a probability of 2-100 of seeing their number pulled from the hat. Now suppose each of those people flips a coin 100 times. The probability of any one of them getting 100 heads in a row is 2-100. It is very unlikely that anyone will get 100 heads in a row. In other words, each person has the same probability for winning the lottery as they do for flipping 100 heads in a row. And yet, it’s a sure thing that one of them will win the lottery, though it’s a safe bet that not one of them will flip 100 heads.
Any given contestant will think it hopeless that they win the lottery. But one of them will win. It could happen to me, thinks contestant number 4,037, and she’s right to think so! Still, the winner will—and should—be surprised to win. Similarly, each person will think it hopeless for getting 100 heads in a row; the difference here is that no one should think it might happen: it’s a safe bet that not one of those 2100 people will be surprised.
But wait. What’s wrong with this comparison? For one thing, in the case of the lottery, there’s a probability of 1 that someone’s name will be drawn from the hat (provided the drawing happens, etc.). But not so for the coin-flip. What’s that probability?
While true that, from each person’s perspective, the probability of getting 100 heads is the same as winning the lottery, the bigger question here is about the probability that at least one person in the group flips 100 heads. There are 2100 distinct instances of the coin being flipped 100 times and, mathematically speaking, one or two or all of those instances could yield 100 heads. Let’s look at some numbers.
The probability for any given person winning the lottery is 2-100 ≈ .000000000000000000000000000000788861. The probability of at least one person in 2100 people winning the lottery in a single draw is 1 (we are promised that exactly one person will win; winning and losing are not independent events given that you only need to find out who won in order to learn who lost).
The probability that at least one person out of 2100 gets 100 heads in a row is ≈ .63. Not bad.
Here’s how I worked that out. Please check my math. Each person has a probability of 2-100 of getting 100 heads in a row, which means each person has a probability of of failing to get 100 heads (i.e., of getting at least one tails). The probability that each of the
people fails to get all heads in a row is
. So, the probability that not all 2100 people fail to get 100 heads in a row (i.e., at least one person gets 100 heads in a row) is
. About 63%. And if we up it to 2103 people flipping 100 times each, we cross over into 99% confidence:

There’s another wrinkle here. The lottery analogy might be better if each of our participants were competing in a distinct lottery with a 2100 chance of winning. Our confidence would then go way, way down in one of our participants winning a lottery. Each person’s winning or losing would now be independent, and would match the probability of the coin game, at least for those in our initial set of 2100 people. But does this appease my intuition? Does it just create problem reemergence, where now I’d like to see each person in each of the distinct lotteries flip 100 coins? That is, it’s unlikely that we’ll see one of our subjects win the lottery (much less all of them), but within the ranks of each of those lotteries, someone will win. And they can all also, within each rank, reproduce the coin-flip experiment. And each person playing all the games has 2-100 chance of winning their own lottery, and 2-100 chance of landing 100 heads.
This means, overall, that 2100 people will win a lottery. But of the 2200 people playing in total (that is: 2100 × 2100), it’ll require much work to convince me that even one will get 100 heads, despite the above equation putting the probability of at least one set consisting of 100 heads in this scenario at practically 100% (Wolfram|Alpha spits out 1.0000000… with row after row of 0’s).
There’s more to unpack here, but I’ll leave it at this. My central point is that there’s something different about an individual having a probability of 2-100 for winning a lottery (that someone is sure to win), and an individual hoping to flip 100 heads in a row with that same probability. This admittedly confused dive I’ve taken into this problem hasn’t helped.
A recurring complication for me is that we can always come up with an equation to instill “confidence” in as many consecutive heads as we like. At some point, I lose all faith in the model. If not at 100 heads, then certainly by a nonillion. But I have no problem with someone winning a lottery that carries the same odds. And if you can build a die with 2100 faces, then I have no problem with the idea that each of them will eventually come up with enough rolls. This is similar to the card shuffling example. All of those results should be included in the sample space as genuine possible outcomes.
This brings me again to the embarrassing thought that my notion of a coin being fair may require excluding many perfectly reasonable permutations of 100 coin results from the sample-space. Or, at least, the idea that we should leave it in as a low-probability event, as with the case of a sample space that includes the .0001-inch tall adult. Though my deeper concern is that it’s more like leaving in a contradiction, as with the event that it both rains and doesn’t rain today.
At any rate, my intuition always gets the best of me when it comes to coins, even with relatively short sequences of flips. Imagine 1,000 people in a room. Each flips a quarter n number of times. What’s the probability at least one person gets all heads? The probability shrinks fast (this is from Desmos; Wolfram|Alpha holds out longer before rounding to 0):

These numbers will of course be smaller for smaller numbers of people. A hundred people have about a 95.82% chance of at least one person getting five heads in a row, and about a 9.31% chance of getting ten in a row. But even with 1,000 people, the numbers start to look hopeless pretty fast.
I’ll wrap up this point by reiterating the nagging feeling that there’s some important distinction between these two examples: With four people in the raffle, each has a 1/4 probability of winning, and one is sure to win. But if each flips a coin twice, each with 1/4 probability of getting two heads, there’s a 1 – (3/4)4 ≈ 68.36% chance none gets two heads. I realize that there are differences in the designs of these situations that account for this, but no exploration of those differences has satisfied my skeptical intuitions.
(25) I keep talking about all heads, but anything said of all heads goes for all tails. That doubles the probability for getting 100 consecutive results in 100 flips—e.g., the probability of getting all heads in two flips is 25%, and the probability of getting all heads or all tails in two flips is 50%: {HH, HT, TH, TT}. Fair enough. Assume I’m aware of this but don’t view the probability increase in long-sequenced cases as important. In the case of 100 flips, we go from 2-100 to 2-99; an unimpressive increase of precisely 1/1267650600228229401496703205376.
We can make similar adjustments by including some great quantity of seemingly patterned outcomes—e.g., HTHTHT…
I haven’t thought about how to calculate the quantity of those, but I think a place to start is with a restriction on how many consecutive strings it can contain. For example, alternative 27 heads and 27 tails might be fine, while alternating 28 of each isn’t. This restriction, which is due to the usual fairness bias, reduces the possible patterns considerably.
(26) I don’t expect a tremendously long and patterned coin-toss outcome any more than I’d expect pebbles plopped upon a piano’s strings to produce a Chopin étude, or an ant to trace a photo-realistic likeness of Winston Churchill in the sand21. And yet something must happen in those cases: the pebbles will strike some order of notes and the ant will trace something into the sand. But never those things, not in a nonillion years of constant trying.
(27) I’ll allow myself just a few seconds to talk about monkeys and typewriters and Shakespeare. It’s a ridiculous idea that only carries any plausibility at all if you conceive of the monkeys as a theoretical machine that randomly generates letters, with some probability of producing any given letter. Otherwise, there’s nothing to stop the monkey from banging on the same cluster of letters for an eternity or, more likely, just pooping all over the typewriter. Which, by the way, is what some British researchers observed when they tried an approximation of this experiment back in 2003, when they left a computer in a zoo enclosure inhabited by six macaques. The researchers apparently described it as more “performance art” than science, but I appreciate their efforts. (I haven’t looked up the team’s report on their experiment, but rather am getting my info from this 2003 Wired article: “Monkeys Don’t Write Shakespeare.”
That said, even with a random letter-generator, my intuition would need several other restrictions on the scenario in order to change from a flipping-n-heads-in-a-row scenario to a shuffling-a-deck-of-cards scenario (not to be confused with Laplace’s “Constantinople” example). I’d get into those, but my few seconds are up.
(28) And no matter how many times a fair coin is flipped, its particles won’t mid-flip rearrange themselves into a tiny dragonfly and buzz away22. Or even just into a two-headed coin. Talk of ridiculously low probabilities—of bizarre freak occurrences—comparable to those noted in these last few points will come up as serious propositions will come up in later chapters of Bostrom’s book.
for the coin to phase through the floor, thus passing through it without disturbing the floor’s physical integrity.
(29) It would be fascinating to empirically test what happens with huge numbers of coin flips. How about a webpage that allows people to enter coin flip results? Set it up so that each person inputs one coin result at a time, time-stamped finely enough so that the master sequence is reported as a long list of sequentially input results. Publish results monthly—or maybe even yearly, and even then only including a sub-string of the results—via email to subscribers, perhaps even on different days of the month (or year), randomly chosen for a given user each update, to dissuade inputters from being influenced by subtle confirmation or “pattern-seeking” biases (namely, you shouldn’t know how the master sequence ends when you’re inputing your results; e.g., you might move faster to input the heads you just landed if the master list ends ….HHHHH). And so on.
This is mathematically no different than having one person flipping a coin and reporting results. Some folks could opt to roll a die and call even numbers heads and odd numbers tails. Doesn’t matter. The main criterion is that the result be from a 50-50 physical process, rather than from a computer program, unless the program is using some 50-50 physical process to determine its output sequence. I suppose a machine could drop a thousand quarters into a kind of blender that spits them out onto a trampoline and records the results as they come out. There would be something deeply satisfying about humans alone collaborating on this project, however. No?
Still, I’d of course lend just as much credence the robot-trampoline. Love to see it! (Something like the machine featured in this Numberphile video: Fair Dice (Part 2).)
As for the people-flippers, if one billion honest players each entered ten coin flips per day for a year (assuming 365 days per year), that would amount to 7.3 trillion coin flips. The longest run of consecutive outcomes expected in that many flips, on average, is about 42. I’d be surprised to see 42. But seeing it would increase my confidence in ever seeing 100 flips.
How would we test for honesty? We can’t. The idea here is to set aside the predictive models and let the coins do what they will. If at some point 42 tails in a row happens, even if that’s the very first string of 42 results, we accept it. We can’t say, “That should happen later,” otherwise we commit the Gambler’s Fallacy: there’s no physically dictated or God-ordained point in a 7.3 trillion flip sequence at which 42 tails will happen. Though the average number of flips to expect to get 42 tails in a row is about 8.8 trillion. (And, as always, if you’ve done 8.8 trillion flips and haven’t seen 42 yet, expect to need, on average, another 8.8 trillion flips to see a 42-long string).
Imagine there were some God-ordained point at which 42 heads should happen. Before calling “heads” or “tails” when flipping one of the coins in your pocket, you’d want to first know how many times that coin has been flipped or jostled around or dropped or spun since being minted. Same goes for any die in the casino. Paradigmatically absurd. If you’ve gotten 8 heads (not hard to imagine), you’re now just as likely to get 9 heads as you are to get a tails. And so on ad infinitum.
In that light, I am embarrassed to say that I intuitively believe there to be some restriction on how many heads can be landed in a row, so that the ad infinitum rule is false. But of course that rule is true. Whatever I’ve flipped just now, there is a 50% chance of flipping that same thing again, not a 0% chance. And of course that rule is false, given that fairness constrains a coin’s behavioral tendency, when that coin is equally biased for heads as it is for tails, so that it will never land more than some unknown number of heads—on some level, that must be just what we mean by the bias we call “fair.” A strange sort of bias; that is, one ensuring that the coin won’t land more than some number of heads before landing tails. It’s a higher-order constraint.
After looking as closely as I can at this claim, after vigorously scratching my head over its weirdness (I’ve just about dug through the skull into the squishy brain tissue)—I feel, simultaneously, that it cannot be wrong and cannot be right.
Wrong at the lower-order, right at the higher-order.
(30) But even if you convince me that 10² or 10³ and so on heads in a row are possible, at some point this is some kind of theoretical possibility23, certainly not a literal physical possibility, and not only because the coin will disintegrate (replace it with a new one; it need never be the same coin over any two flips) or because the universe will collapse. It can’t be that a fair coin can literally, physically flip any finite number of flips you like, because there is no largest finite number. And I reject outright any notion involving literally infinite strings of all heads, even as a subset of a bigger infinity. Partly because it is just too foreign to me—maybe I don’t yet know enough about the relevant math (e.g., measure theory) to ingest and discuss this thoroughly. The point, though, is that I absolutely, emphatically reject the idea of a fair coin ever flipping the biggest number that our most powerful supercomputer can concoct.
(31) Two final comments about getting surprising sequences from large numbers of coin flips. Think of these as questions—as challenges—for those who think there is no real-world, physical restriction on which sequences of many flips are possible.
(31.1) I’d bet that anyone who sees a coin land 100 times in a row would bet on something other than the coin being fair or fairly flipped. It’s a common recommendation, in fact, and one I mention others giving in my first writing on this topic (i.e., regarding Taleb’s Fat Tony example), and I touch on it here when I write of statistician professors who can spot faked coin flip results from their students. Should a student turn in HHHHH… or HTHTHT… or some such result, the professor would no doubt suspect a blatantly cheating smart-ass (and rightly so!).
But on what grounds? It can’t be because the result is too improbable. It’s been hammered into me since my first college psychology course that any two equally numbered sequences of coin flips are equally likely. So why should one set of results raise suspicion when another set of results, with the exact same probability as the suspicious set, not raise suspicion? Easy: the suspicious (or “surprising”) results don’t obey the laws of fairness. But we have to be careful here. We can’t say, “Fairness bias says you’d have to flip a coin more than 100 times to get these results.” That’s the Gambler’s Fallacy: imagine the student used a 1,000-year-old coin and said, “Well, the coin is old and has been jostled around for centuries, so these results were due.” Fairness bias means something else—something that is observable in relatively small sequences of results. No?
(31.2) That said, as someone who is kinda obsessed with watching lectures and reading on these things, I do often encounter qualificatory language from experts hinting at my point of view—e.g., “it’s mathematically possible to get xyz, but it won’t ever happen.” Though, on other days, they might simply say “possible” without qualification. I suspect they mean “mathematically” or “logically” possible there, but who knows. I’ve also have had this discussion with non-experts who say they really do mean literally, physically possible. And sometimes experts (though, really, what qualifies one as an expert in this particular domain?), who really do seem to mean literally, physically, “it will eventually happen with enough flips” possible.
I ran into maybe such an example today from someone with a PhD in math referring to 200 heads in a row. From a Quora entry on Gambler’s Ruin: “There’s a small subtlety here: is there a positive chance that none of us wins, and we just keep playing forever? It should be quite intuitive that this is not so—after all, if we keep flipping the coin long enough, eventually we will get a straight sequence of 200 Heads or 200 Tails, at which point one of us will have surely won.” The idea here is, roughly, if you and I play a game where I give you a dollar when a coin lands heads, you give me a dollar when it lands tails, one of us will eventually win if we play enough times, assuming we start with a finite amount of money (in this example, there’s $200 between us combined). Maybe the claim isn’t that we’d ever literally get 200 heads in a row, but rather: We need not worry about getting exactly symmetrical results forever, so that neither of us wins, on the grounds that if and that’s a big IF, the game goes unwon over a huge number of flips, then we must be in a world where 200 heads in a row is indeed possible. However, just like when people talk about strange results in physics and math more generally, I’m not always clear when speakers mean those things as literal descriptions of the concrete world, and when they mean them as an interpretation of an idealized model—and I’m not sure the speaker is always clear on these things, either, or is ever prepared to admit to the more extreme interpretations of their beliefs. I’ll be writing more about this soon, with examples from lectures, public talks, interviews, etc.
At any rate, I have not encountered anyone bluntly saying, “More than x number of heads in a row is physically possible,” as I, on behalf of my naive intuition, am claiming here. Though I tend to assume this is at least weakly implied by statements like, “mathematically possible, but won’t ever happen.” (Consider a made-up example that isn’t too far off from clearly tongue-in-cheek examples I’ve encountered in actual lectures: “Our model allows for negative human heights, but those have such a low probability of occurring that we’d be very unlucky to encounter one,” where the “low probability” is actually around what it would take to get ten heads in a row, which is to say much higher than what it takes to get 100 heads.)
My question then becomes: Where is the line between “will rarely happen” and “will never happen”? Between real-world possible and real-world impossible? We don’t know where to put that boundary, but there must be one. I can see leaving this open-ended, as we do with a geometric expectation (e.g., in counting how many tails we expect on average before getting our first heads, we set the upper bound for tails to infinity), or with something more empirical, like height, where there are, for example, theoretically infinitely many heights even just between 107.1 inches (i.e., 8’11.1″, which was Robert Wadlow’s height) and 107.2 inches. We may not be able to put a precise upper bound, but can approach probabilities about heights empirically, e.g., “The tallest human on record is such and such a height by the average adult male or female human in such and such a region falls within such and such a range,” as well as theoretically, e.g., “10 feet seems like a good bet for max height, according to such and such physical constraints.” I believe there are people working on this very question, in fact. On my need-to-read list, for example, is Geoffrey West’s 2017 book, Scale: The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life in Organisms, Cities, Economies, and Companies, whose work does, I believe, involve developing mathematical models to capture things like height and mass.
At any rate, coin flip results are discrete, so any literal boundary will be a whole number. If we could flip enough coins, we could work towards that number empirically. Though, again, if you’ve landed x heads just now, the probability for x + 1 is again 50%. So maybe our best mathematical model would build on what’s been observed in order to produce a projected estimate that may contain decimals. Would there be any point to trying to develop such a model? What would be the implications of success or failure to do so?
This is fun, but I think we’ve exhausted my naive intuition for today. I’ll need to sleep on things to see if I wake up with this baby tooth looser or yet more stubbornly rooted.
(32) Finally, a reminder that a central, or at least background, idea here has been about the distinction between surprising and unsurprising improbable events. I’ve used 100 heads in a row to represent the former, and 100 random-appearing coin flip results to represent the latter; both events have the same probability of occurring.
Finding ourselves in a universe like ours is improbable. More precisely, that exactly this universe should have come into existence and ended up in the state it’s in right now is incredibly improbable. Surprising or not? For that matter, consider a subset of what we call “exactly this universe”: is any given person’s having been born surprising? In other words, should it be surprising that this universe has taken on the very particular state of being a universe that contains “me”?
I’ve never met a small child amazed by the fact of her own existence. Nor any adult, really. After all, that someone was born isn’t surprising.
But what is or isn’t surprising has consequences beyond what I’ve discussed here. Maybe the question “Did the defendant do it?” comes down to the question, “How surprised would we be to learn the defendant is innocent given the evidence?” How would it help or hurt to rightly point out that the probability of innocence is no more than the unsurprising event we just observed when the prosecutor flipped eight coins to get TTTTTHTHHH with 1/1024 probability, to which the defense attorney might reply, “Yes, but do it again and try predicting the result with 1/2024 probability of your guess being correct.”
As I’ve already noted, this may be the sole reason 100 heads in a row, or any other patterned result, is surprising. We are, by nature, in a kind of constant state of trying predict what will happen next. And when, with coin flips, we’re able to do that with 100%, that is surprising. Surprising, for one thing, because we’ve determined that our beliefs in that case should be felt and apportioned in degrees: .5 for heads and .5 for tails. Hey, maybe that’s all there is to it!
The word count here is already high, so I’ll break this particular chapter of Bostrom’s book into three parts total. Part III will review the remainder of Bostrom’s discussion of surprising vs. unsurprising improbable events in the context of anthropic arguments and fine-tuning in cosmology. Coming soon-ish…

Or click the banner to shop at Amazon (at no extra cost: it just gives me some of what would have gone to Amazon).
Footnotes:
- α represents “our universe” or “whichever universe happens to be ours.”
- Refers to Leslie’s book Universes, which I haven’t read. Leslie appears to have the most works cited by far in Bostrom’s bibliography.
- This is van Inwagen’s book Metaphysics, the fourth edition of which was published in 2015. I haven’t read it.
- (van Inwagen 1993, p 204).
- “Extraordinaire” in French.
- “par la pensée”
- Available in the 2011 volume Philosophy of Probability: Contemporary Readings. See pp 362–363.
- SIAM Review Vol. 49, No. 2 (Jun., 2007), pp. 211-235.
- Or maybe he says, “…from a macroscopically sized object.” But it is so improbable that we treat it like a law.
- Suppose the box of gas is divided by a diaphragm. The gas on side A is hotter than the gas on side B— that is, the A molecules are moving faster, with greater energy. As soon as the divider is removed, the molecules begin to mix; the fast collide with the slow; energy is exchanged; and after some time the gas reaches a uniform temperature. The mystery is this: Why can the process not be reversed? In Newton’s equations of motion, time can have a plus sign or a minus sign; the mathematics works either way. In the real world past and future cannot be interchanged so easily. …
[James Clerk Maxwell’s] point [in his letter to John William Strutt] was that in the microscopic details, if we watch the motions of individual molecules, their behavior is the same forward and backward in time. We can run the film backward. But pan out, watch the box of gas as an ensemble, and statistically the mixing process becomes a one-way street. We can watch the fluid for all eternity, and it will never divide itself into hot molecules on one side and cool on the other. The clever young Thomasina says in Tom Stoppard’s Arcadia, “You cannot stir things apart,” and this is precisely the same as “Time flows on, never comes back.” Such processes run in one direction only. Probability is the reason. (pp. 273–274 of Kindle edition)
- Also known as the “birthday paradox,” it’s the result that, counterintuitively, with only 23 random people in a room there is slightly over 50% probability that at least two share a birthday. A hint as to why this is so is that you can make 253 pairs of people given 23 people, intuitively yielding more opportunity for a match than one initially supposes. For a great presentation of the Birthday Problem, see this lecture by Joseph Blitzstein on YouTube: Lecture 3: Birthday Problem, Properties of Probability | Statistics 110. It’s part of a lecture series that I recently discovered but wish I’d known about years ago. I highly recommend it to anyone looking to take their first steps towards thinking rigorously about probability, or just to get a solid review of first principles from an instructor who tries hard to make concepts intuitive. There’s also a website that includes class handouts, including homework assignments with answers. There also an EdX version of the class. I might as well also mention that the course follows an excellent and reasonably priced textbook, coauthored with Jessica Hwang: Introduction to Probability and has its own Quora blog. There’s yet more to say about the course, but I’ll stop myself here.
The idea here is to figure out the probability of getting no repeats within 100,000 shuffles, then subtract that from 1. I figure that’s 1 – (15!/15!) × ([15!–1]/15!) × ([15!–2]/15!) × … × ([15!–n]/15!) for n flips. Notice that you must get a repeat by (15! + 1) shuffles. To see this, try it with a smaller number. There are six ways to arrange 3 cards; after 7 shuffles, you must have at least one repeat. - That’s 80 unvigintillion 658 vigintillion 175 novemdecillion 170 octodecillion 943 septendecillion 878 sexdecillion 571 quindecillion 660 quattuordecillion 636 tredecillion 856 duodecillion 403 undecillion 766 decillion 975 nonillion 289 octillion 505 septillion 440 sextillion 883 quintillion 277 quadrillion 824 trillion 1 shuffles.
- In a future post, I’ll venture an intuitive justification for why this works. I’ll just say here that it’s a species of “coupon collector’s problem” and is analogous to asking for the average number of rolls of a fair, 6-sided die to see all six faces. A popular question—e.g., at Math StackExchange.
- With three flips and eight people, numbered 1, 2, 3, 4, 5, 6, 7, 8. For the first flip have set {1, 2, 3, 4} guess heads and set {5, 6, 7, 8} guess tails. For the second flip, perform the same process on those subsets, so {1, 2} guess heads and {3, 4} guess tails and {5, 6} guess heads and {7, 8} guess tails. Finally, repeat the split so that {1} guesses heads, {2} guesses tails, {3} guesses heads, {4} guesses tails, {5} guesses heads, {6} guesses tails, 7 guesses {heads}, and 8 guesses {tails}. This ensures that each person has guessed a different sequence, and all possible sequences have been guessed.
- There’s a common tendency to characterize humans as bad at producing “genuinely” random lists. But it seems to me that this doesn’t mean that human minds aren’t randomness machines of a certain and important kind. More on this another day.
- Source: “The Longest Run of Heads” by Mark F. Schilling, The College Mathematics Journal, Vol. 21, No. 3 (May, 1990), pp. 196-207. Also available at JSTOR. Schilling got these from here: P. Révész, “Strong Theorems on Coin Tossing,” Proceedings of the International Congress of Mathematicians, Helsinki (1978) 749-754.
- Page 269 of Silver’s book. For the Rasskin-Gutman quote, Silver cites this 2010 NY Review of Books article by Garry Kasparov: “The Chess Master and the Computer.”
- This paper shows how to derive the formula: “How Many Coin Flips on Average Does It Take to Get n Consecutive Heads?“
- Even better, here’s a 120-sided die that I recently received as a gift and immediately started annoying people with. What’s the expected number of rolls to seeing all 120 sides? A question for another day.
- The ant example is due to Hilary Putnam.
- A variation on an example I heard my epistemology professor use in college, involving the logical possibility of a flipped coin transforming into a butterfly.
- In the aforementioned post on Nassim’s Fat Tony example, I toy with holding these ideas up to different sorts of possibility: practical, logical, etc. But it’s by no means a rigorous effort.