[NOTE: I’ve written an extended follow-up to this post, which I’ll also link at the bottom of this page: “Anthropic Bias (Ch 2, Part 2): Fine-Tuning in Cosmology & 100 Heads in a Row.”]
In his book The Black Swan1, Nassim Nicholas Taleb, a fellow urban slow-walker, imagines posing the following question to two characters, the rational & educated Dr. John and the intuitive & streetwise Fat Tony:
Assume that a coin is fair, i.e., has an equal probability of coming up heads or tails when flipped. I flip it ninety-nine times and get heads each time. What are the odds of my getting tails on my next throw? 2
Dr. John refers to the question as trivial and gives the mathematically correct answer of one half. Fat Tony calls Dr. John a sucker and says, “no more than 1 percent, of course … the coin gotta be loaded.”
This gets at a critical disconnect, noted often by Taleb in The Black Swan, that arises when we endeavor to generalize a real-world-applicable probability calculus from neatly devised games (application of which he aptly calls the ludic fallacy). This distinction between probability models and the real world is one I often struggle with in my ongoing attempts to understand the tense relations between formal probability, intuition (what I sometimes call informal probability)3, complexity, and epistemology (i.e., belief, opinion, knowledge). In short, I’m with Fat Tony: If I saw someone throw 99 Heads in a row, I’d think the game rigged.
But Taleb’s example urges us to go further than suspecting fowl play. I presume the rational Dr. John would also be skeptical in that situation. What’s questioned in the example, rather, is whether we should accept such a scenario even on conceptual or theoretical grounds. This is what Fat Tony refuses to do by rejecting the thought experiment itself.
I’d like to explore this theme further, starting with a similar question: What is the probability of throwing 100 Heads in a row?
Some thoughts (I used Wolfram|Alpha for the math):
This is an easy question to answer. The probability of flipping a fair coin and getting 100 Heads in a row is 1 in 2^100. That’s 1 in 1,267,650,600,228,229,401,496,703,205,376.
Or, written out: 1 in 1 nonillion 267 octillion 650 septillion 600 sextillion 228 quintillion 229 quadrillion 401 trillion 496 billion 703 million 205 thousand 376
Or, in decimal form: .0000000000000000000000000000007888609052210118054117285652827862296732064351090230047702789306640625
In other words, the probability is very, very, very, very low. Not zero, but might as well be.*
[*Note: To be clear, even a probability of precisely zero does not mean impossible. We might, rather, think of it as meaning there’s no degree of certainty that it will happen. For an undercooked, but I think adequate, example, the probability is zero for randomly pulling the number 2 from a bag containing the real numbers between 1.5 and 2.5. But some number will come out of the bag—a number that also had a probability of zero of appearing. Same goes for, say, the probability of randomly selecting, on a NYC street, a man who’s precisely 5′ 8″ tall. For more on this, see this short video: “An Introduction to Continuous Probability Distributions“; and this Wikipedia entry, “Almost Surely,” which phrase I’ll use in a sentence: “A probability of zero means the event almost surely won’t happen.” And here’s a relevant and interesting quote from the article: “Any infinite sequence of heads and tails is a possible outcome of the experiment. However, any particular infinite sequence of heads and tails has probability zero of being the exact outcome of the (infinite) experiment.”4]
And the probability of getting at least one Tails in 100 flips is: 1 – (1/2)^100.
Or, as a fraction: 1,267,650,600,228,229,401,496,703,205,375/1,267,650,600,228,229,401,496,703,205,376. (The only difference here is that the denominator ends with a 6 instead of a 5.)
Or, as a decimal: 0.9999999999999999999999999999992111390947789881945882714347172137703267935648909769952297210693359375
In other words, very, very, very, very high. Practically 1.
(For the record, seven is the number of flips you need to execute in order to breech 99% confidence in landing at least one Heads: 1 – (1/2)^7 or .9921875.)
These odds don’t stop many of us from saying, “It could happen.” I don’t believe it could happen, not in any predefined sample space. By which I mean each coin flip could be spaced 100 years apart, or could happen in a different country or you could flip the 100 coins all at once, or you could randomly select 100 results from all the coin flips that have ever happened in this or any other world. So long as you are clear about which coins and which flips before you know the results of those flips, you’re never going to get 100 Heads. (It goes without saying that, then, that it wouldn’t do to find 100 Heads that came up somewhere in the world over the last 24 hours and draw a circle around them.)
I’m of course even more skeptical about flipping one thousand or one million or one trillion Heads, though all of those are mathematically intelligible. The math is easy: each throw has a .5 chance of coming up Heads, even after 999,999,999,999 Heads in a row. But this doesn’t mean that getting even a mere 100 Heads in a row is not practically (perhaps also physically; distinctions are described below) impossible without rigging the game.5 It seems to me that it is, in fact, impossible.
There is a certain sense, however, in which the seemingly impossible—according to the probabilistic terms I’m outlining here—does happen in the real world: If you flip a coin 100 times, you will get some arrangement of Heads and Tails, and that arrangement will have a probability of (1/2)^100 of occurring, which is the same probability as getting all Heads. So why would I think that arrangement is possible but all Heads isn’t? Presumably for some reason in the ballpark of why Taleb doesn’t have a story where Fat Tony rejects 99 seemingly random results on the grounds that “that particular string of results has a probability of (1/2)^99 power of happening!,” even though 99 Heads in a row would be just as random, provided the coin is fair.
(Note: As I’ll mention again below, this writing is motivated by a naive intuition that I can’t shake. That’s what this post is really about, at its core: any arrangement of Heads and Tails that seems organized to a human mind will yield suspicion. And yet, any large sequence, even the “random-appearing” ones, will be extremely rare and indeed, given enough flips, perhaps as few as 100, will be the only time that sequence ever occurs on this or any other planet.)
At any rate, something is going to happen when you flip the coin. Just not what you happen to predict. Choosing one out of 2^100 outcomes, I’d say you have practically no shot at all of seeing that outcome come up; but, something with the same probability of occurring will come up. The number of those other outcomes is 2^100 – 1. Big enough that I’d bet anything on any sequence you predict ahead of time being hopeless (though, again, this doesn’t mean impossible, in at least some important sense I’m struggling to comprehend; there’s nothing obviously special about all Heads except that it’s so easy for the human mind to organize, so that we seem to stay on the lookout for it—it’s the prediction we keep failing to see come true; at the same time, what motivates this writing is the seeming impossibility of getting the exact same outcome over many many trials from a sample space whose other outcomes have a probability of greater than zero of occurring; those outcomes need not be equiprobable: a biased coin that produces Tails 99% of the time could, theoretically, land Heads 100 times in a row, and that occurrence could be predicted by some simple math). Further, any sequence involving what would count to a human as a recognizable pattern—i.e., that gives the impression a reliable prediction may be made—would also suggest foul play:
I don’t know what the probability is of getting one from the set of all seemingly patterned outcomes, but it must be extremely low. In fact, what I seem to be claiming here, if I understand myself correctly, is that one of the facts, or features, entailed by a coin’s being fair is that such sequences are impossible.6 Still, to be clear, and if we take seriously (or at least wish to remain consistent with) the theoretical possibility of such a sequence, there is no pattern here, no natural rule that’s being enforced, but only the observer-dependent illusion of one, even if that observer gets lucky enough to guess subsequent flips correctly by believing in the apparent pattern. The odds, however, are incredibly in favor of the pattern going off the rails—these simply aren’t the sorts of results to expect (or hope for) from a random sequence of fair coin flips.7
And yet, everything that happens is in some way an extremely rare event—an unsurprising thing for me to say, given my belief that any event only ever happens once. Types of events, however, do repeat. When you along a sidewalk, no one could have expected well in advance that you—as that particular arrangement of those particular particles, with your unique history, etc.— would at that time step on that particular arrangement of particles in that particular region of space. So, while part of what’s at the heart of my exploration here is an attempt to make sense of how to treat highly rare events, I should make clear the importance both of importance and of whether the rare event in question is a token (e.g., a particular step upon the ground, which only happens once) or a type (e.g., generic steps on the ground, which happen often).
Expectation, or to avoid that technical term, hope seems to play a critical role here: you won’t get the sequence of flips you hope for. Go ahead and try it as many times as you like. But in the case of all Heads or any other apparent pattern, hope does not—or need not—precede a run of flips; i.e., we’d be surprised to see such a run regardless of not having hoped for or predicted it, and even if we’d never before imagined such an outcome. My claim about 100 Heads in a row never happening seems, then to commit me to similar claims not just about all Tails, but about the emergence of any reliably predictable pattern. And yet, I’m faced with the conundrum that some equally unlikely sequence is sure to occur! Though this becomes less of a conundrum given the coin’s fairness: when Heads and Tails each account for in the ballpark of 40–60 flips out of the 100, there’s nothing remarkable going on. Unless, again, it does so too tidily; e.g.: THTHTHTHTHTHTHTHTH… . A fantastically remarkable result; and, I seem to claim, a practically impossible one.
So far, I’ve been focusing on the physical (or practical) possibility of flipping 100 Heads in a row. There may, however, also be a theoretical (or conceptual or logical) problem here. The way we know that a coin is fair is not just by declaring its possible flip outcomes (of which, frankly, there are more than two) as equiprobable in theory, but by flipping it many times and observing that it lands on each of its two faces roughly half the time.8
(Though, interestingly, not if that result is HTHTHTHTHTHT… or a similarly predictable pattern. Were a coin rigged to yield that pattern indefinitely, supposing it starts on Heads after its first, freshly minted flip, it would not at any point be a 50:50 coin, but would have a probability of 1 of landing next on the alternate side. This, though, only works for the observer who knows what the previous flip result was! For the ignorant observer, even if the coin’s tendency is known, the probability of the first flip observed—i.e., the next flip—will revert to 50:50.)
If you flip a coin 100 times and it lands only on one side, it’s by at least some definitions not a fair coin. Certainly this would be case if you threw it another 100 and then another 100 and another 100, and it continues to come up on the same side. These are supposed to be theoretically intelligible scenarios. We’re suppose to say that, after 300 throws, whatever those outcomes, the probability of getting Heads on the 301st throw is one half. But clearly this is not a fair coin.
It seems, then, that it is an oxymoron to invoke together the words “fair coin” and “100 Heads in a row.”
In other words, it may be logically incoherent to posit a coin that has an equal probability of coming up Heads or Tails, and to then describe a scenario in which that coin comes up only Heads for some huge number of flips. Just as it would not make sense to characterize a coin as heavily Heads-biased and then describe a scenario in which it comes up Heads only roughly half the time in some huge number of flips.
What counts as a huge number of flips? Five Heads from a fair coin is unremarkable. Slightly remarkable is the fact that I just now picked up a quarter to see how many Heads I might get. First try, I got five in a row. The sixth toss was Tails. Six in a row, by the way, is about the longest sequence you should expect (in the technical sense of the word) to get of Heads or Tails in 100 flips. This can be calculated using a formula of log base 2 of 100 (where 2 comes from dividing 1 by the probability of getting Heads; 100 is the number of flips)9. Using this formula, we see that we need about 10^31 flips in order to expect the longest string of Heads or Tails to be 100.
(And we can use another formula to see that, theoretically, we’d expect it take 2.5353012005×1030 throws on average to get 100 Heads in a row: (2^(100+1))–2. You can change the 100 here to however many throws you’re hoping to get in a row. For example, it’ll take, on average, (2^(10+1))–2 = 2046 throws to see 10 Heads in a row. For a rigorous analysis, see this PDF: “How Many Coin Flips on Average Does It Take To Get n Consecutive Heads?“)
At any rate, on getting four Heads in a row, it’s obviously far too soon to get suspicious. And it’s certainly to commit the Gambler’s Fallacy to assume any sort of interdependence between the flip results, or that mysterious forces—natural or otherwise—are influencing flip outcomes (cheaters notwithstanding). So, four Heads in, you should assign .5 to the likelihood of the fifth throw yielding Heads—which is to say that you now have a .5 chance of having thrown HHHHH, as well as of having thrown HHHHT. (To underscore the independence of the flip events, imagine throwing the first four Heads today, then returning ten years from now to throw the fifth toss.)
I’d say the same about six, seven, eight, nine throws. Twenty in a row should be doable—that’ll happen in about one in 1.05 millions runs. Maybe even 30 is fine. But at some point, there must be a line where the likelihood of getting all Heads in a row becomes problematic. Where is that line? Certainly far before, say, one nonillion—there is no world in which that happens (feel free to collaborate billions of coin-flippers to test this, so it’s not a matter of running out of time). Somewhere, the line from possible to impossible is crossed. There also may be some intermediate lines, such as between logically possible and practically impossible (meaning, we think maybe it could happen, but it’s certainly not something to expect to see; while impossible—whether physically, logically, or metaphysically—explicitly means it’s never going to happen on this or any other world, even in an infinite number of flips).
I think the possible–to–impossible line lives somewhere before 100 Heads (or, as noted above, before any apparently predictable sequence of 100 flips occurs). Where that is, as far as I know cannot be intelligibly said within current probability theory. That is, if I say it’s at 35, that would mean that, after 35 Heads, a Tails would be due. But that’s nonsense, and just as much an instance of the Gambler’s Fallacy as in the unremarkable cases above. There can be no number that demarcates possible and impossible: If you make it to one Heads you have a .5 chance of making it to two Heads, and on and on up to 36 and beyond. Put in other terms, there’s no clear line where you can definitely rule the coin unfair. (At which point perhaps you might try a different route and measure the physical properties of the coin itself.) Though you can say things like, “there’s a 1-2^(-5), or 97% chance, of getting at least one Heads in five flips of a fair coin.”
And yet, I maintain that there is at least some vague line between unremarkable and remarkable results; and, somewhere beyond that, between possible and impossible (if not before nonillion Heads, how about infinitely many? In what world could that be a fair coin?). In conclusion, then, I rule in favor of Fat Tony.
In a similar vein, Evelyn Lamb, in an article for Scientific American called “Has Anyone Ever Flipped Heads 76 Times in a Row?,” examines the 76 Heads in Rosencrantz and Guildenstern Are Dead and concludes, “After crunching the numbers, I am convinced that no one in the world has ever flipped heads 76 or 90 times in a row on a fair coin…” She also writes about the topic here: “Heads I Win, Tails You Lose,” where she links a nice dialog by Ben Orlin (at Math with Bad Drawings) that conveys a similar moral to that of Taleb’s Fat Tony example: “The Swindler’s Coin.”
Some closing thoughts:
This post falls within a larger line of questioning motivated by the following naive but, for me, unshakable observation: Getting two Heads in a row from a fair coin is unremarkable. Getting 100 trillion (or some even larger arbitrarily chosen number) in a row seems more than improbable, it seems impossible. If it is, then there must be some line, even if a fuzzy one, that cannot be passed. Where is that line and what are its implications?11 It is from this perspective that I’m interested in probability—a perspective that is in line with my interest in underlying, world-making concepts in general (the word I give this perspective, as a daily practice, is philosophy).
It’s often said that, to the person who has only a hammer, the whole world looks like a nail. Probability is a tool. Its form shapes how those who use it see the world. And a lot of people use it.12 Those who deal in purer and purer concepts engage in world-building. Shape the tools and you shape the world. Dismantle the tools, you dismantle the world. Barring that, one might at least try to understand the tools. It strikes me as meaningful (if insomnia-inducing) work. I’ll keep at it.
NOTE: I’ve written a follow-up to this post, in which I dig yet deeper in an effort to uproot the naive intuition in question; the intuition survived, but it was fun digging: “Anthropic Bias (Ch 2, Part 2): Fine-Tuning in Cosmology & 100 Heads in a Row.”
Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.
- Initially published in 2007. I reference here the 2010 Second Edition.
- Page 124.
- Intuition is one part of informal probability. Also included are psychological, political, and sociological concerns. For example, it’s possible to view certain accounts of racism and sexism as critiques about informal probability; e.g., about assigning an either outsized or otherwise inappropriate likelihood to a person fitting a particular stereotype. There’s a lot to unpack here. It would be nice to see some formal research into the cognitive and social psychology of this, of the sort we see described in current work being done on cognitive bias. I think the socio-political dimension will prove challenging to study, however, given that while basic cognitive biases can be picked out given discrepancies between intuitive and objective mathematical results, there may be, for example, good reasons for ignoring or looking past certain technically valid treatments of data when trying to effect positive social change.
- I ran into a fascinating proof today showing that trying to assign an infinitesimal, rather than zero, won’t always yield a non-zero probability for possible outcomes. From page 20 of the 2011 essay collection Philosophy of Probability: Contemporary Readings (ed. Antony Eagle): “But infinitesimals in fact do not succeed in preserving Regularity in all cases. Many possible outcomes will still receive probability zero, as Tim Williamson shows. Consider an infinite sequence of outcomes of independent tosses of a fair coin, I. If the probability function is regular, I should receive some infinitesimal probability, i. If we now consider I‾ , the infinite subsequence of I that includes all of I except the first toss, we should conclude that, as the coin is fair, the probability of I‾ is twice i. But as the events of tossing I and I‾ are structurally identical, and have the same measure, the probabilities of I and I‾ are very plausibly the same. The only value of i, infinitesimal or otherwise, such that 2i=i, is zero. So even here the possible event I must be assigned zero probability.”
- For rigging expertise, see the work described in Dynamical Bias in the Coin Toss by Persi Diaconis, Susan Holmes, and Richard Montgomery; SIAM Review Vol. 49, No. 2 (Jun., 2007), pp. 211-235. They made a machine that can consistently yield the same flip result.
- For later consideration: Alan Hajak’s “Fifteen Arguments Against Hypothetical Frequentism.”
- Since writing this, I’ve come across a couple of relevant passages. The first is from Chapter 3 of Pierre-Simon Laplace’s A Philosophical Essay on Probabilities comes to mind, in which he defines “extraordinary”:
We arrange in our thought all possible events in various classes; and we regard as extraordinary those classes which include a very small number. In the game of heads and tails, if head comes up a hundred times in a row then this appears to us extraordinary, because the almost infinite number of combinations that can arise in a hundred throws are divided in regular sequences, or those in which we observer a rule that is easy to grasp, and in irregular sequences, that are incomparably more numerous.
The second is from Richard von Mises’ 1957 paper, “The Definition of Probability,” available on page 355 of Philosophy of Probability: Contemporary Readings:
Another example is given by Laplace in his famous Essai Philosophique: In playing with small cards, on each of which is written a single letter, selecting at random fourteen of them and arranging them in a row, one would be extremely amazed to see the word ‘Constantinople’ formed. However, in this case again, the mechanism of the play is such to ensure the same probability for each of the 26^14 possible combinations of fourteen letters (out of the twenty-six letters of the alphabet). … What astonishes us [here] is the fact that fourteen letters, taken and ordered at random, should form a well-known word instead of unreadable gibberish. Among the immense number of combinations of fourteen letters (26^14 or about 10^20), not more than a few thousand correspond to words. The elements of the collective are in this case all the possible combinations of fourteen letters with the alternative attributes ‘coherent’ or ‘meaningless.’ The second attribute (‘meaningless’) has, in this collective, a very much larger probability than the first one, and that is why we call the appearance of the word ‘Constantinople’—or of any other word—a highly improbable event.
This isn’t to suggest, I don’t think, that these events shouldn’t astonish. And if it spelled the name of the game-player, it would be yet more astonishing. Even more astonishing would be to pull 14 cards 100 times in a row, each time after vigorously shuffling, and getting out again and again one’s name. I would say the game is certainly fixed. I’d believe I was hallucinating, or even in ghosts, before I’d believe such a thing happened by chance.
- It’s often claimed that we could also determine that a coin is fair by measuring it. If so, what I’m claiming is that you will not find, by whatever empirical means, a coin to be fair that could then land 100 Heads in a row, much less infinitely many Heads in a row (despite the standard tendency to declare this theoretically possible; it certainly isn’t physically possible).
- Due to Paul Erdős and Alfréd Rényi. I don’t remember where I found this formula, unfortunately. I’ve also encountered (log base 2 of n)–1, where 2 comes from 1/(probability of Heads). Subtracting 1 strikes me as odd with lower numbers; e.g., I’d expect 1 run of Heads in two flips: (0)(1/4)+(1)(2/4)+(2)(1/4) = 1; and that’s what you get when you don’t subtract 1. But both versions yield decimal estimates that are in about the same ballpark as numbers grow, of course. I found the second formula here: “The Longest Run of Heads” by Mark F. Schilling, The College Mathematics Journal, Vol. 21, No. 3 (May, 1990), pp. 196-207. Also available at JSTOR. That said, I think the formula I’ve used here is a simplified shortcut for something else Erdős and Rényi proved (published first, I believe, in 1970) and which I’ve encountered in several papers, including the one I just linked: the limit as n→∞ of Rn/(log base 1/p of n) = 1 with probability 1. It should come up in Rényi’s Probability Theory book as well, though I haven’t read it.
- One way to look for the line would be to flip a coin for the duration of our universe’s existence and see what the longest string of Heads is. A seemingly more accurate approach would be to flip a coin for an eternity, or better yet, infinitely many coins for an eternity, and so on. None of this is possible. But an a priori or purely mathematical approach won’t tell us either. Math as we currently understand it, and in which coin flips are understood to be mutually independent events, tells us there is no answer—all we can do is measure degrees of certainty for getting a particular result, for example.[/note
To be clear, this isn’t meant to cast doubt on the usefulness of basic models, but is rather an expression of a deeply felt perplexity about the real world, which is magnified by the tidiness of our models—indeed, we can readily calculate how many tosses would be theoretically required in order to be 99% certain of getting 100 trillion Heads in a row at some point.
It would be nice were our probability models to map neatly onto the real world, but perhaps the best we can aim for is to be aware of their limitations (as we are, say, with Euclidean geometry) while taking care not not to confuse them—nor our models more generally—with the unfathomably complex real world.
Probability is a model that permeates the broader models we rely on to create our world.10By our world, or just world, I mean the world we construct and inhabit through sensory perception, math, language, embodied cognition, probability models, art, and so on. The real world on the other hand is unconstructed and model-independent; in a word, it’s reality: the thing that our world is constructed to help us navigate. Our world may be complicated, but it’s not necessarily complex. The real world, though it has enough regularities for, say, consciousness to have evolved, is largely a complex mess full of uncertainty, irrational numbers (figuratively speaking), and other things tough for us humans to make sense of. These two modes may influence one another, and may overlap in part, but they are distinct. You and I might live in different worlds, but we live in the same reality.
- As I remarked in Footnote 3, many people use probability informally. Here’s a nice summary from of probability’s formal applications, taken from the second paragraph of the Stanford Encyclopedia entry “Interpretations of Probability“:
It plays a role in almost all the sciences. It underpins much of the social sciences — witness the prevalent use of statistical testing, confidence intervals, regression methods, and so on. It finds its way, moreover, into much of philosophy. In epistemology, the philosophy of mind, and cognitive science, we see states of opinion being modeled by subjective probability functions, and learning being modeled by the updating of such functions. Since probability theory is central to decision theory and game theory, it has ramifications for ethics and political philosophy. It figures prominently in such staples of metaphysics as causation and laws of nature. It appears again in the philosophy of science in the analysis of confirmation of theories, scientific explanation, and in the philosophy of specific scientific theories, such as quantum mechanics, statistical mechanics, and genetics. It can even take center stage in the philosophy of logic, the philosophy of language, and the philosophy of religion. Thus, problems in the foundations of probability bear at least indirectly, and sometimes directly, upon central scientific, social scientific, and philosophical concerns. The interpretation of probability is one of the most important such foundational problems.