Mathematical Beauty with Hardy & Whitehead
Recently, on page 14 of G.H. Hardy’s 1940 booklet A Mathematician’s Apology (or read it for free here), I came across reference to the following sentence in Alfred North Whitehead’s Science and the Modern World (1925):
There is an erroneous literary tradition which represents the love of mathematics as a monomania confined to a few eccentrics in each generation. (p 20)
The point Hardy aims to bolster with Whitehead’s words is that the aesthetic appreciation of mathematics is more widespread than is often assumed. Hardy offers the great popularity of chess and newspaper puzzles as evidence.
To demonstrate that at least a glimpse of mathematical beauty is accessible by non-experts, Hardy shares two proofs from antiquity that can be “mastered in an hour by any intelligent reader, however slender his mathematical equipment” (p 18); they are: there’s no greatest prime number, and is irrational. “A reader who cannot appreciate them,” writes Hardy, “is unlikely to appreciate anything in mathematics.”
I’m not a mathematician, nor do I aspire to be. Math is a hobby for me. One that started as a way to deepen my understanding of probability, which itself started as a philosophical fascination with Bayesian epistemology. But eventually, I must admit, I began to experience things in math that struck me as beautiful as, say, a strikingly colored butterfly. More so, in fact, given my tendency to favor the abstract over the concrete, for better or worse. More like a singing butterfly, then.
On reading the Hardy passage, I was reminded of my first such experience, and it was not due to the aforementioned proofs (though I did “appreciate” them on first seeing them; I might have said, “Oh, that’s neat”). It did involve a pattern, however, the importance of which Hardy emphasizes:
A mathematician, like a painter or a poet, is a maker of patterns… made with ideas. … The mathematician’s patterns, like the painter’s or the poet’s must be beautiful; the ideas like the colours or the words, must fit together in a harmonious way. Beauty is the first test: there is no permanent place in the world for ugly mathematics. (p 13–14)
Two years ago, I was trying to brute-force calculate the expected number of flips for seeing the first instance of two consecutive Heads in a given number of flips (rather than using a canned formula). To my delighted astonishment, the Fibonnaci numbers emerged. And something similar occurred a few months later when I was looking for three consecutive Heads: the tribonacci numbers!
I wrote about this in a post called “Anthropic Bias (Ch 2, Part 2): Fine-Tuning in Cosmology & 100 Heads in a Row” (point #19). On reflection of Hardy’s book, it occurred to me that I might give these beautiful results their own blog post, and, as a fun bonus, derive the closed formula for the Fibonnaci numbers.
(And see the Addendum for a follow-up dice-related question emailed in by a reader.)
Here goes.
Fibonacci Numbers from Coin Flips
I discovered the Fibonacci result while trying to calculate an expectation. I’m not going to concern myself with expectation here. If you’d like to see more on expectation with explanations of streamlined solutions (including so-called “first-step analysis”), see these posts, where you’ll find links to yet other posts:
Dice Probability & Expectation: Expected Rolls to See a 6
Gambler’s Ruin & Random Walk: Probability, Expectation, Steal the Chips
Or see this paper: “How many coin flips on average does it take to get n consecutive heads?” (which appear to be lecture notes by Paul Ginsparg)
In the present post, I’m only going to concern myself with the question of counting ways to get a desired outcome, as this is where the Fibonacci (or whichever) sequence arises. Once I’ve address that question, I’ll use the results to work out some probabilities (e.g., the probability of seeing two Heads in a row within 10 flips).
Before getting started, I should note that I have since run into the sorts of results I’m about to explore in the context of binary strings (as noted in the Wikipedia entry “Fibonacci Number“), and am easily able to find literature on this topic with respect to coins (as in this interesting paper by Martin Griffiths: “Fibonacci Expressions Arising from a Coin-Tossing Scenario Involving Pairs of Consecutive Heads“).
But it didn’t occur to me to seek out such things until after I’d noticed the relationship on my own. These results should be mentioned more. Also, those sources are elegant. But this is a story of inelegant, naive discovery.
To get started, here’s an easy question: How many ways are there to see your first instance of Heads in one flip, two flips, three flips, four flips, and so on?
One flip: H (one way)
Two flips: TH (one way)
Three flips: TTH (one way)
Four flips: TTTH (one way)
There’s obviously one way to see your first Heads in any given number of flips. How can we use this to express probabilities? For example: What is the probability of seeing your first Heads in one flip, two flips, three flips, and so on? If we assume a fair coin, then every given flip has a 1/2 chance of coming up Heads, and the same goes for Tails.
So the probabilities for the above are:
1/2
(1/2)2
(1/2)3
(1/2)4
So, the probability of getting your first Heads in one flip is 1/2. The probability of getting your first Heads on your second flip is 1/4. And so on.
Here’s a slightly more involved question: What is the probability of getting your first Heads within three flips?
That question is the same as asking for the probability of seeing your first Heads in one flip, OR in two flips, OR in three flips, and so on. As these are mutually exclusive events, we just add up the respective probabilities:
(1/2) + (1/2)2 + (1/2)3 = 7/8
We can bear this out as well by enumerating the way things can go and counting all the successful results. When flipping a coin three times, we have the eight following possible outcomes (the seven “winning” outcomes are in bold):
HHH
HHT
HTH
HTT
THH
THT
TTH
TTT
All we’ve shown here is the probability of getting at least one Heads in three flips. There’s actually a much easier way to do this. Notice that the only way to fail is to get three Tails. In other words, we can ask what the probability of getting zero Heads in three flips is (i.e., getting three Tails), then subtract that from 1:
1 − (1/8) = 7/8
I’m going to set this elegant approach aside for today, however. Instead, I’ll use sigma notation. (If you need to review that, see this MathIsFun.com page.)
For the above we get:
We can also use this to find, say, the probability of getting the first Heads on the second or third flip:
Or precisely on the third flip:
Or on the 15th through 23rd flip:
That’s a very low probability (about
because it entails getting at least 14 Tails in a row.Ok, let’s apply this approach to the more interesting case of seeing two Heads in a row. First, I want to note an important distinction about the sort of question we’re asking.
In the above example, the same question can be asked about getting your first Tails. If it’s a fair coin, the probabilities will be the same.*
[*We can easily calculate these for a biased coin. Taking the first approach, where we ask about getting our first Heads on the first or second or third, and so on, flip, we could use a geometric distribution. I do that in other posts, such as the aforementioned “Dice Probability & Expectation: Expected Rolls to See a 6,” where I derive the geometric distribution; you might also check out this post, where I derive the geometric distribution, its cumulative distribution function, and its formula for finding the median: “Dice Probability: Median Rolls to See All 6 Sides (Coupon Collector’s Problem).”
For something more like the second approach, where we look at the number of ways to get three Heads in three flips or two Heads in three flips, and so on, we could use a binomial distribution.
I won’t worry any further than this about biased coins in this post.]
If, above, we’d asked for the probability of seeing Heads or Tails in the first flip, the answer would be super easy: 100%.
All of this applies to seeing your first instance of two Heads in a row. You can replace that word with “Tails” and (if it’s a fair coin), all the probabilities are the same. That’s where the Fibonacci numbers arise, as we’ll see in a moment. But if we ask a different question, the answer is much easier to work out: What is the probability of seeing your first Heads OR Tails in a row in such and such number of flips?
This is easy because there are always two ways things can go, depending on whatever the first flip is. For example, if you want to know the probability of getting two Heads or Tails in 10 flips, there are exactly two ways for that to happen:
HTHTHTHTTT
TTHTHTHTHH
What I’m after here is a harder question. Namely, the number of ways to get two consecutive results given that we want to see only Heads (or only Tails). This means, for instance, that the following two results are valid:
TTTTTTTTHH
TTHTTTHTHH
How many of those are there? There are 34. How do I know? Because it’s the 9th number in the Fibonacci sequence 1, 1, 2, 3, 5, 8, 13, 21, 34, 55…
I can use that knowledge to figure out the probability of seeing two Heads in a row within ten flips, or in precisely ten flips, and so on, by using sigma notation as we did above.
But more to the point. How do I know that the 9th number in that sequence is the number of ways to see your first instance of two Heads in a row in 10 flips of a coin? Let’s dig into.
There’s one way to get two Heads in a row in two flips:
H H
There’s also one way to see your first instance of a consecutive Heads on your third flip:
T H H
There are two ways for seeing your first instance of two Heads in a row at the end of four flips:
TTHH
HTHH
By now it’s clear that every winning sequence will end in THH. Which is to say that what we’re really concerned with above is how many ways the very first flip can go. There are two: Heads or Tails.
Now, how many ways are there for it requiring five flips to see your first instance of two Heads in a row? There are three:
TTTHH
THTHH
HTTHH
At this point, it’d be useful to introduce a better way to count. I’m going to opt for a kind of tree method, but involving how many ways there are for each flip to go. For example, here’s the tree for five flips:
Here’s how to read this. The first step, or throw, has two options: Heads or Tails. If you flipped a Heads, then the next throw has to be a Heads, so there’s one way things can go. If your first throw was a Tails, then your second throw can be Heads or Tails, which is to say it can go two ways. Throws three, four, and five are already settled as Tails, Heads, Heads.
What we do now is add up the numbers preceding the THH node: 1 + 2 = 3. That tells us that there are three paths to get to THH. I could have made a tree that branches first to T and H, then goes from there. But the above is more compact and better suited to what I’m trying to see.
Now for six flips:
Once again, add up the paths just before THH: 2 + 1 + 2 = 5. Notice also that there is a rule emerging. Whenever there’s a 2, the path branches to a 1 and a 2; when there’s a 1, it goes straight to a 2. Also notice that we still have the information from the previous scenario of five flips. We actually have it more than once. For example, the second step contains, ordered from bottom to top, {2, 1}, while the third step contains that plus an extra 2, which gives {2, 1, 2}. In that {2, 1, 2} we see what happened in the first step: {2, 1, 2}, in the second step: {2, 1, 2}, and in the third step: {2, 1, 2}.
This is to be expected of the Fibonacci’s. That is, in a given step, we’ll get the exact same thing as the last step, and then some extra.
All this suggests that I can go ahead and jump to 10 flips, and should see the Fibonacci sequence emerge:
The numbers in red are the totals for each column, which of course is indeed a Fibonacci sequence. One need only observe the bottom diagonal of 2’s to see why the 7th row contains all the information of everything that came before it. That is, you can see that the steps go (recall that I chose to leave the outcomes for two and three flips from this, each of which has only one possible outcome at every step):
2
2, 1
2, 1, 2
2, 1, 2, 2, 1
2, 1, 2, 2, 1, 2, 1, 2
2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1
2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2
In the last three rows, I made corresponding parts orange and bold to show how they are put together. This is of course happening throughout. There’s plenty else to observe here. Any instance of 2 leads to three new paths (i.e., given the rule that a 2 branches to either 1 or 2, and 1 + 2 = 3), while any instance of 1 leads to two new paths.
This is enough to convince me that the pattern will continue (which is nice to know if you’re interested in avoiding two Heads in a row; or, say, two 0’s in a row in a binary sequence). Without doing any more graphing, I can copy and paste from above to make the next one:
2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 2, 1
Those must add up to 55. But not only that, if you set it vertically, you have the next column should you wish to continue the graph (which keeping in mind that there is here, for the previous step, a distinct 2 for every 1 there, and a 1 and 2 for every 2 there).
All right. So that’s pretty great. It seems obvious now, but I was pretty much thunderstruck when I noticed all this two years ago.
The next step is to formalize what’s going on here, and then to develop that into a closed formula. The obvious formula is the recursive one that you usually see. That is, the first term is 1, the second is 1, and from the third onward, you’re adding the previous two, making the third one 2, the fourth term 3, the fifth 5, and so on. I’ll represent this recurrence relation as:
an = an−1 + an−2
Where n is the term number. So, for example, to find the 4th term, it would be:
a4 = a3 + a2
But you have to know a3 and a2. Well, the second term is 1 by definition, as is the first term. And so the third term is:
a3 = a2 + a1 = 1 + 1 = 2
We can use all this for the fourth term:
a4 = 2 + 1 = 3
And so on. How about a58? Well, we’d need the 57th and 56th terms, and so on down the line. While this recursive formula is really informative, it’s not as useful as an explicit formula, such that we can just plug in 58 and get out the answer.
We could just Google it and be done with it. But I’ll derive it using a difference equation. Difference equations are the discrete analog of differential equations. To see a post where I evaluate a second-order linear non-homogeneous recurrence relation (I’ll explain those terms in a moment): “Gambler’s Ruin & Random Walk: Probability, Expectation, Steal the Chips.”
This will be along the same lines as that, but it’s actually less involved because the Fibonacci recurrence relation is homogeneous, which just means that there are no constants in the recurrence relation. If we had, say, a -2 in there, it’d be non-homogeneous, and solving it would involve more steps. (Again, see the above post for that, where I also link video tutorial suggestions.)
All told, it’s a second-order linear homogeneous recurrence relation. Second-order because it goes back two terms. Linear because, essentially, we are producing new terms by adding or subtracting terms, rather than, say multiplying them; another way to put this is that all the terms appear separately and are first-degree polynomials (i.e., are raised to the first power).
We’ve also got initial conditions that we can input for working out an explicit—e.g., the number of ways to get two Heads in a row in one flip is zero.
I’ll go through all the steps for transforming the recurrence relation into a closed formula, and will try to clarify everything I’m doing. But in case you’d find it useful to watch someone work through similar problems real-time I’ll recommend three videos, a long one and two short ones.
Here’s a nice, 25-minute, video by TheTrevTutor. As a bonus, at the end of the video he derives the Fibonacci recurrence relation by simply puzzling out the number of ways to have a binary sequence in which there are no repetitions of the number 1. This is isomorphic to asking how many ways we can avoid having two Heads in a row before hitting the final THH sequence. He sets up the framework, by the way, to solve the recurrence relation, but worries that the algebra is too difficult to work through on the spot. I’ll work through that algebra below and will show all my steps:
And here are two videos by GSVUmath that manage to impart a lot of information in just 9.5 minutes total, called “Characteristic equation and characteristic roots of recurrence relations” and “Solving linear homogeneous recurrence relations,” respectively:
Ok, so the first thing to do is to replace the equation an = an−1 + an−2 with something that’s easier to work with. How about:
rn = rn-1+ rn-2
This move of trying rn as a solution is usually referred to as “guessing” (just like with the analogous move for differential equations). But it’s not really a guess in the everyday sense because students are given a chart telling them what to try in a given certain scenario. In cases like the present one, things tend to work out as hoped by following the chart.
But rn is not so convenient to work with. What would be a convenient value for n to take on? Obviously 2 would be nice. Let’s substitute that in for n to turn the expression into a second-degree trinomial.
Or, a more generalizable way to think about this is that we’re dividing each term by rn−2 in order to eliminate the n‘s. More generally: always divide by the term with the lowest exponent in order to get rid of all the n‘s. So if the recurrence relation had gone as far back as an−4, we’d find ourselves with an equation that contains rn−4, and would divide all the terms by that term, resulting in a 4th-degree polynomial.
For our present expression, here’s what we get. I’ll also go ahead and set it equal to 0 for easier manipulation:
r2 = r1+ r0
r2 = r + 1
r2 −r −1 = 0
This is the equation we’ll work with. It’s called our our “characteristic equation” (wherein the expression on the left of the equal sign is the “characteristic polynomial”).
If this had been something nice, like r2 −1, we’d easily factor it as (r + 1)(r – 1). But our expression is not so easily factored. So let’s pop it into the quadratic formula to solve for r.
The quadratic formula starts as ax2 + bx + c = 0, which can be transformed by completing the square to get:
For our purposes:
a = 1
b = 1
c = -1
Plug those in:
This gives two solutions for r, which are called our “characteristic roots”:
and
Now, we had “guessed” rn as a solution. So we’ll want to plug the above in for r and then use our initial conditions to see where that takes us. But there’s one little move here that might seem a little mysterious. There’s a theorem that says that any that for any solution of r, we can make a linear combination of those solutions, with each r term multiplied by some constant.
[This, by the way, will work a little differently depending on the number of solutions for r. For a little more on that, see first or third of the above-embedded videos. And for a proof of the theorem I’m relying on in the present case, search for Theorem 6.11 on page 17 of the following paper, though you might need to scroll back up to make sense of it all, as well as some knowledge of linear algebra: “Lecture Notes on Difference Equations” by Arne Jensen.]
In other words, we will plug both characteristic roots into rn, will put a constant in front of each of them, and will add them together:
Now for our initial conditions. Our first term is 1. For convenience we define a 0th term as well, which of course must be 0 (how many Heads in a row can you get in one throw?). The initial conditions are:
a0 = 0
a1 = 1
Plug those into our solution:
These clean up nicely:
We know that a0 = 0, which is to say that A + B = 0. Which means that B = -A. So we can replace B in the second equation with -A:
Since a1 = 1, we can solve for A:
Since A = , we know that -A = B =
.
Now replace A and B with those values and simplify the result with algebra. I’ll try to make my steps transparent:
And that’s our closed formula for the Fibonacci numbers. We can use that with sigma notation to play around with probabilities, as we did above. First, let’s test it out. To do this, I’ll define a function F as:
Now, using Desmos, I’ll plug in 0, 1, 2, 3, …. up to 10:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13
F(8) = 21
F(9) = 34
F(10) = 55
Looking good! Now to put this into sigma notation. Remember to multiply by (1/2)n, as that’s the probability of getting any particular sequence of n coin tosses. This is exactly as we did above for seeing our first Heads, but updated in order to reflect the number of ways to see your first instance of two Heads in a row given n flips, which, is actually given by F(n-1). For example, there are F(9) = 34 ways to see your first instance of HH at the very end of a 10-flip sequence.
In other words, our input is one less than the number of flips in question. I’ll reflect that in the formulation:
Or, to be more transparent, I’ll pop in the full formula:
Where n ≥ 1.
If you’d like to play around with that, you can paste the following into a calculator, e.g., at Desmos:
\sum_{n=j}^{ }\left(\frac{1}{2}\right)^{n}\cdot\frac{\left(1+\sqrt{5}\right)^{\left(n-1\right)}-\left(1-\sqrt{5}\right)^{\left(n-1\right)}}{2^{\left(n-1\right)}\sqrt{5}}
Let’s play around with it a little. The probability of getting two Heads in a row in one flip is of course 0:
Looks good. The probability of getting two Heads in a row in your first two flips is (1/2)(1/2) = 1/4:
That checks out as well. For three flips we expect to see the probability of THH, which is 1/8:
Checks out. The probability of getting two Heads in a row within three flips is the sum of getting it in 1, 2 or 3 flips, which is just 0 + (1/4) + (1/8):
Here’s one more. The probability of seeing two Heads in a row within 10 flips is:
That’s about 86%. We won’t hit 100%, not technically, but we cross confidently into 99% territory at 23 flips.
Ok, so there you have it!
Tribonacci Numbers from Coin Flips
Before wrapping up, I’ll say a little about tribonacci numbers, which are similar to Fibonacci numbers, except you add up the previous three numbers, rather than the previous two, to get, after defining the first three terms as 0, 0, 1:
1, 2, 4, 7, 13, 24, 44, 81, 149, 274…
The recurrence relation for this is:
an = an−1 + an−2 + an−3 with initial conditions of a0 = a1 = 0 , a2 = 1
A few months after I discovered the emergence of Fibonacci numbers from coin flips, I found myself working on the same problem, but was looking for the number of ways to see my first three Heads in a row.
Adding one more Heads makes things more complicated for sure, as is reflected in the more complicated closed formula for tribonacci numbers, which I’ll go ahead and insert into sigma notation:
If you’d like to play around with this in Desmos, just paste the following:
\sum_{n=j}^{}\frac{1}{2^n}\left[3\frac{\left\{\frac{1}{3}\left(19+3\sqrt{33}\right)^{\frac{1}{3}}+\ \frac{1}{3}\left(19-3\sqrt{33}\right)^{\frac{1}{3}}+\frac{1}{3}\right\}^{n-2}\left(586+102\sqrt{33}\right)^{\frac{1}{3}}}{\left(586+102\sqrt{33}\right)^{\frac{2}{3}}+4-2\left(586+102\sqrt{33}\right)^{\frac{1}{3}}}\right]
Notice that I’ve again reduced the exponent, this time to n−2, and for basically the same reasons as above. The brackets, [ ], signify rounding to the nearest integer. I have not indulged in the exercise of deriving that formula myself. I got it from Wolfram MathWorld‘s entry on tribonacci numbers.
Here’s how I found it. I started counting the way things can go, then input the first several results into The On-Line Encyclopedia of Integer Sequences, which took me to the tribonacci entry: A000073. I then looked for a closed formula that seemed easy to work with, and that took me to the above-linked MathWorld entry.
I’m not going to work out the closed formula here, but will say that if I were going to, I’d start with the characteristic equation: r3 − r2 − r − 1 = 0. And then take it from there. Maybe pop it into the cubic formula:
Or maybe a streamlined version of that formula. For more on this, see the excellent Mathologer video: “500 years of NOT teaching THE CUBIC FORMULA. What is it they think you can’t handle?”
To play with the cubic formula in Desmos, enter the following:
-\frac{b}{3a}+\sqrt[3]{\left(\frac{-b^{3}}{27a^{3}}+\frac{bc}{6a^{2}}-\frac{d}{2a}\right)+\sqrt{\left(\frac{-b^{3}}{27a^{3}}+\frac{bc}{6a^{2}}-\frac{d}{2a}\right)^{2} + \left(\frac{c}{3a}-\frac{b^{2}}{9a^{2}}\right)^{3}}}+\sqrt[3]{\left(\frac{-b^{3}}{27a^{3}}+\frac{bc}{6a^{2}}-\frac{d}{2a}\right)-\sqrt{\left(\frac{-b^{3}}{27a^{3}}+\frac{bc}{6a^{2}}-\frac{d}{2a}\right)^{2}+\left(\frac{c}{3a}-\frac{b^{2}}{9a^{2}}\right)^{3}}}
As for counting the coin sequences in question, you can do it the same way as counting two Heads, and can establish a similar rule. For example, let’s look just at getting three Heads in a row for the first time on the last three flips of a seven-flip sequence:
The final column has 7 entries, as expected. I’ve labeled the H’s with a “1” or a “2,” because we want to be sure to only allow for one way to proceed after getting two consecutive Heads. This observation also makes it easy to see that the next number will be, as expected, 13 (all those in the previous column branch off twice, except for the solitary H2).
To make this more compact and easier to count, as with two Heads case:
The red numbers are again the totals, from which we see emerging the expected sequence. That’s all I’ll do with today the tribonacci case.
Finally, I’m going to go out on an intuitive inductive limb and bet that tetranacci numbers will get you counts for four Heads, and so on for pentanacci, hexanacci, heptanacci, etc. For more on those, see Wikipedia: “Generalizations of Fibonacci Numbers.”
Addendum:
By coincidence, soon after it occurred to me to write this blog post, I received an email from a reader saying they’d taken a journey similar to the Fibonacci/tribonacci adventure I’ve mentioned in earlier blog posts, especially “Anthropic Bias (Ch 2, Part 2): Fine-Tuning in Cosmology & 100 Heads in a Row.”
But they’re playing with dice rather than coins. So they ask:
…I want to be able to create an equation/formula that would have three inputs and one output. The three inputs would be, the number of rolls/flips that occur, the number of consecutive rolls/flips, and also the number of faces that the dice has, and output the probability of such a thing happening for all three input variables. … I was wondering if you have either found such a formula, have made such a formula yourself, or wish to make such a formula?
I’ll think more closely about this interesting problem in a future post, particularly with respect to the sorts of patterns that arise, and for what it will take to develop closed formulas. In the meanwhile, share your own thoughts in the comments.
A good starting place might be this Mathematics StackExchange thread: “Probability of consecutive dice rolls.”

Or click the banner to shop at Amazon (at no extra cost: it just gives me some of what would have gone to Amazon).