*Estimated read time (minus contemplative pauses): 21 min.*

Today’s probability toy is borrowed from Joseph Blitzstein & Jessica Hwang’s 2015 *Introduction to Probability*, a fun book with loads of exercises designed to set your probabilistic intuitions straight. The problem exemplifies a variation of the prosecutor’s fallacy known as the *defense attorney’s fallacy*, and is undoubtedly adapted from a possible real-world instance of the fallacy committed by the defense during O.J. Simpson’s murder trial. Be warned, this gets morbid.

From Wikipedia:

…the prosecution presented evidence that Simpson had been violent toward his wife, while the defense argued that there was only one woman murdered for every 2,500 women who were subjected to spousal abuse, and that any history of Simpson being violent toward his wife was irrelevant to the trial.

The entry goes on to point out that the defense’s reasoning is fallacious, citing Gerd Gigerenzer, who discusses the case in his 2002 book *Calculated Risks: How to Know When Numbers Deceive You*. I haven’t read that book, though I don’t need convincing.^{1} Just how off it is will become apparent in a moment.

First, some motivational preambling.

One answer I’ve often encountered to probability’s counterintuitive, or just plain difficult, nature is to disallow it as courtroom evidence. Maybe that would at times be best. I’m reminded of other real-world cases I’ve written about, such as one from the 1960s in which a Los Angeles couple was convicted of assault in part due to an expert witness failing to properly condition when calculating the probability of having a mustache and a beard. The conviction was rightly overturned on the grounds that the earlier court “erred in admitting over defendant’s objection the evidence pertaining to the mathematical theory of probability and in denying defendant’s motion to strike such evidence.” (I say more about that here: “The Incredibly Small Probability of Having a Beard.”)

According to James Franklin’s fascinating 2001 (with new Preface added in 2015) book, *The Science of Conjecture: Evidence and Probability Before Pascal*, such difficulties have been around as a formal concern for juries at least since “the use of the formula ‘proof beyond reasonable doubt’ to express the standard of proof in criminal cases became established in English law around 1800” (Franklin, p 63). Fascinating because “before Pascal” means, essentially, before probability was mathematized—and because of the book’s in-depth investigation into probabilistic legal reasoning about, for example, how to assign degrees of belief to testimonies of witchery.

Philosopher Ian Hacking has argued that contemporary notions of probability didn’t much exist before roughly the time of Pascal; as he writes on page 1 of his excellent 1975 (with new Introduction added in 2006) book, *The Emergence of Probability: A Philosophical Study of Early Ideas about Probability, Induction and Statistical Inference*: “A philosophical history [of probability] must not only record what happened around 1660, but must also speculate on how such a fundamental concept as probability could emerge so suddenly.” (If you’re looking for a philosophically rigorous history of probability that includes, for instance, a chapter on the late-17th-century development of annuities in the Netherlands, the mathematics for which was “invented by politically influential students of Descartes” [Hacking, p 5], this book’s for you.)

But Franklin makes a strong case that such reasoning has been around much longer, and not just in a semantic sense, given the old age of the word “probability” (and its cognates). That, in fact, seems to be the route Hacking takes, zeroing in on earlier uses of the word “probable,” e.g., where something is probable if it is consistent with the beliefs a certain number of well-respected scholarly authorities. Here’s how Franklin begins an Appendix called “Review of Work on Probability before 1660”:

The literature on probability before Pascal is dominated by Ian Hacking’s *The Emergence of Probability* (Cambridge, 1975). Entertainingly written and with a sound philosophical viewpoint, it well deserved its popular success. As history, it left something to be desired with regard to the ratio of evidence to conclusions. In particular, it claimed that there was very little concept of evidential support in the literature before 1660. (Franklin, p 373)

If I understand Franklin, pre-1660 probabilistic reasoning not only aligned with Hacking’s semantic observations, but also took the form of assigning a subjective credence about whether or not some event happened, especially given the evidence of testimony or confession (either of which may require follow-up with torture). Legal thinkers, though they weren’t outputting numbers from 0 to 1, painstakingly endeavored to resolve life-or-death decisions according to the strength of their belief, evidence, and proof, a strength whose intensity was to be properly tuned to the circumstances in question, in accordance with subtly defined and elaborately debated categories of gradation (e.g., “half-proof”). This amounts to a rigorous approach to what Franklin calls “ordinary language reasoning about probabilities” (Franklin, p xvii), a valid approach still common today. I’m oversimplifying, but this seems to be the gist.^{2}

That’s a tough job for dedicated scholars, and even tougher, at least in some important ways, for part-time jurors. How jurors form their beliefs, even when explicitly prescribed a standard of reasonable doubt, is largely a function of their probabilistic intuitions. How to properly temper and balance those intuitions is what’s at stake (along, of course, with the lives of the accused). Franklin goes on to note that, in earlier years of English canon law, “judges and juries were commanded to make up their minds ‘according to conscience'” (Franklin p. 66). This will no doubt always be how things go to some degree, at the end of the day. But a more promising route for arriving at that point—more promising, for instance, than the “beyond reasonable doubt” path—lies, I bet, in figuring out how to incorporate into the evaluation of legal propositions the same sorts of statistical understanding that today appears to be steering science in a more promising direction (most publicized is social psychology).

It’s a hard sell, though, because it means getting jurors to put a number—or at least number range—on their belief. Franklin, again:

But subtle distinctions among grades of presumptions, and fractions of proof, were ill adapted to explanation to juries. Eventually all probabilistic concepts in English law were reduced to one word, *reasonable*. The common understanding that the standard of proof in criminal trials should lie somewhere between suspicion and complete certainty came to be expressed solely in the formula “proof beyond reasonable doubt.” (Franklin, pp 62–63)

Fair enough. But I think today’s jurors are more used to the idea of grading beliefs, or could be more easily led in that direction, particularly if introduced to the idea at an early age and aided with user-friendly software. Not that it would be easy. Just easier than in 1800. The following story shouldn’t perplex the modern mind. If an American quarter were on trial, the degree of belief you should start with that the accused coin landed Heads is 1/2. On learning that a magician famous for fixing coins flipped it, you might update that to somewhere in the range of 3/5 to 4/5.

Coming up with numbers is hard, but is also, I’m convinced, much better than the easier program of proceeding with a vague sense that, “Yeah, the coin probably landed Heads. Otherwise, why’s it seem so nervous?” Of course it’s harder to put a number on it; that’s the point: to make you think harder. And the messier the real-life situation, the harder it is to put a number on it and the harder you should think.^{3}

Otherwise, we’re stuck with terms like “reasonable” and “probably” and “highly likely” and “serious possibility” and what do those really mean, anyway? For a compelling discussion of such terms turning out to mean very different things to smart, informed people in important jobs who mistakenly thought they were speaking the same language, see Philip Tetlock & Dan Gardner 2015 book *Superforecasting: The Art and Science of Prediction*, Chapter 3 (p 55). This may be a tough sell for Franklin as well, though, given his comments such as:

Even now, the degree to which evidence supports hypotheses in law or science is not usually quantified, and it is debatable whether it is quantifiable even in principle. Early writers on probability should therefore be regarded as having made advances if they distinguish between conclusive and inconclusive evidence and if they *grade* evidence by understanding that evidence can make a conclusion “almost certain,” “more likely than not,” and so on. Attempts to give numbers to those grades are not necessarily to be praised. One should not give in to the easy assumption that numbers are good, words bad. (Franklin, pp xix–xx)

The point is well taken. Real life doesn’t involve the clean probabilities one gets with games and dice, and we should thus take care to avoid what Nassim Taleb has called the *ludic fallacy*.^{4} But, again, putting a number (or number range) on it, even or especially as an expression of one’s subjective belief given a lack of clean probabilities, encourages us (or at least us non-scholars) to think harder and helps us to avoid talking past one another. It also facilitates the inclusion of useful statistics, when available, into one’s beliefs. The aforementioned Tetlock & Gardner book is a sustained testimony to this idea. So, rather than throwing numerical probability—including and *especially* subjective numerical probability—out of the courtroom^{5}, I think we should make its application compulsory, but easy, for jurors. How to do that? I have tinkered with and come across some ideas I won’t get into here. But I will opine that a good starting place is to develop intuitive ways to explain to non-probability-nerds the profound effects of conditioning.

For example, the case at hand doesn’t ask for the probability of a wife abuser murdering his wife. Rather, it asks for the probability of a husband having murdered his wife *conditional* on the knowledge that she has been murdered and he abused her. I believe that even the most math-allergic can intuitively feel, and act on, the significance of this distinction. Though to comfortably output a number, it’ll help to have more statistics than what’s noted in the above Simpson blurb.

Here’s a slightly altered expression of Blitzstein & Hwang’s adaptation (page 66) of the problem, which also shows up in a freely available homework assignment on the website of the course that uses that textbook. I’ll present my own (triple-solved and overthought) explanation:

A woman has been murdered, and her husband is accused of having committed the murder. It is known that the man abused his wife repeatedly in the past, and the prosecution argues that this is important evidence pointing towards the man’s guilt. The defense attorney says that the history of abuse is irrelevant, as only 1 in 1000 men who beat their wives end up murdering them.

Assume that the defense attorney’s 1 in 1000 figure is correct, and that half of men who murder their wives previously abused them. Also assume that 20% of murdered married^{6} women were killed by their husbands, and that if a woman is murdered and the husband is not guilty, then there is only a 10% chance that the husband abused her. What is the probability that the man is guilty? Is the prosecution right that the abuse is important evidence in favor of guilt?

One way to start would be to jump right into developing some notation. Starting, maybe, with the probability that a husband kills his wife given that he abuses her. Express this as: Probability that (Husband murdered wife GIVEN that husband Abused wife) = P(H|A) = 1/1000, where “H” means “husband murdered wife” and “A” means “husband abused wife,” and the vertical bar means “given.” (Assume that “husband” always refers here to the man on trial.)

We can similarly translate the rest of the prompt’s figures into workable notation, which I’ll do shortly. I’d prefer to start, however, by taking a bird’s eye view of the situation. For one thing, this will quickly show that P(H|A) could be as small a number as you like—it could be one in a trillion—but would still be irrelevant. We’re not trying to figure out the probability that a man will murder his wife given that he abuses her, nor even that he murdered his wife given that he abused her. We already know the wife has been murdered, so what we seek is the probability that her husband murdered her given that she’s been murdered and he abused her.

For the bird’s eye view, I’ll construct a tree diagram using theoretical frequencies. As I said, this gets morbid. In fact, allow me a moment to emphasize the topic’s gravity. I can’t know what it’s like to be a woman in such a position, though I’ve lived among such women. When I was around ten and 11, back in the early ’80s (in Meridian, Mississippi), my mother was the live-in social worker in two hideaway homes for battered wives and their children. I lived in those homes as well. There were many a memorable and, unfortunately, tragic moment witnessed and experienced during that time.

And now I’m sitting in Queens, NYC blogging about a toy probability problem that touches on that species of tragedy. But hopefully it’s more than just a toy. Probability matters, and this problem is a little closer to the complexity of parameters we get when the subject matter has real-life gravity. I’m not really interested in puzzles for their own sake; I’m drawn to problems that point meaningfully, directly or indirectly, to the real world, to human psychology and behavior, and so on. This problem strikes me as a little more direct than what I usually write about.

So I’ll resist the temptation to change the story to one of puppies stealing daisies and instead will consider what we can infer about 100 murdered wives, given what we know from the prompt—in other words by conditioning properly. The goal is to find the fraction of murdered wives in the condition of interest (i.e., abused and murdered by their husbands), and to then find what proportion that is of the fraction of abused wives who are murdered period (i.e., whether or not by their husbands). I’ll restate this goal several times and in different ways as we go along.

Suppose 100 married women are murdered. One in five were killed by their husbands. That’s 20 women. Half of those murderous husbands were abusers, which means 10 of those 20 murdered women were abused. Overall, this puts 10 of the 100 wives in the condition of interest—i.e., of being both murdered and abused by their husband. That’s our numerator.

We now need to figure out how many of those 100 murdered women were abused and *not* killed by their husband. We know 4/5 of the murdered women were not killed by their husband. That’s 80 women. And 1/10 of those—i.e., 8—were abused. This means 8 out of the 100 murdered women were abused by their husband but killed by someone other than their husband. Thus, 10 + 8 = 18 of the 100 women were abused, regardless of who killed them, and the proportion of abused and murdered women who were murdered by their husbands is 10 out of 18, or 10/18 = 5/9.

The probability, then, that the defendant murdered his wife, given that he abused her and she was murdered, is not 1/1000 or even 1/5, but 5/9. That’s about 55.56%. And that’s that.

Here’s a map from the bird’s-eye view. Or think of it this way: if you were handed 100 random case files of murdered married women, this is one system you could use for sorting them:

As I said before, puzzling together the diagram exposes the irrelevance of the defense’s 1/1000 figure. Go ahead and try working it into the tree in a tidy and meaningful way. On disregarding that figure, though, the problem becomes easy to solve. I’m going to belabor this point because I think it’s the most critical thing to understand here. Thinking that the 1/1000 figure matters is what leads to a confusion that, as far as I can tell, doesn’t immediately evaporate on building the tree. How can we be sure I haven’t gotten the tree wrong by throwing out a vital piece of information?

The defendant might be one of the many men—let’s say 9,990—who abuse but do not murder their wives. But that’s not the precise group of men we’re interested in. Rather, we’re interested in the group of 18 abusers whose wives have indeed been murdered. That is, some small percentage of the 9,990 abusive men’s wives will be murdered by someone else; let’s call that about .08%, which is 8 women. That is, I’m supposing that every abuser’s wife is either murdered by him, someone else, or no one with the following frequencies: 10/10000 abused wives are murdered by their husband; 8/10000 are murdered by someone else; 9982/10000 aren’t murdered and thus weren’t married to the man before us.

(The remaining 82 murdered wives weren’t abused and thus have nothing to do with this group of 10,000 couples. Again, imagine we’re drawing from a filing room of 10,000 “abused women” files; this room does not contain the 82 files of not-abused wives. Those women’s files will be in a much larger room that we can now disregard: the one containing the “not-abused women” files.)

The man before us is clearly in one of the two smaller groups, regardless of how small those groups may be: he’s either an abusive husband who murdered his wife, or an abusive husband who’s wife was murdered by someone else. And so we may safely disregard the 1/1000 figure by conditioning on the knowledge that the wife was abused and murdered.

Put yet another way, we’re not looking for the proportion of husbands who murder their wife among husbands who abused their wife, which is 10/10000; you can also think of this as the {proportion of wives murdered by their abusive husband = 10} among {wives who are abused = 10,000}. Rather, we’re looking for the {proportion of wives murdered by their abusive husband = 10} among {wives who are abused *and* murdered = 18}, which is 10/18 given our present numbers. We’ll consider yet other interpretations of this story in a moment.

With that well established (I hope!), I’ll think the problem through again from start to finish, but in terms of probabilities that can be easily gathered from the above frequencies. In fact, I’ll translate the frequency tree into a probability tree. Think of this as starting with one murdered wife rather than 100. We know that 1/5 of murdered wives are murdered by their husband. And 1/2 of those wives were abused by their husband. This means 1/2 of 1/5, or 1/10, of wives who were murdered, were murdered by their abusive husband.

What proportion is that of all murdered married women who were abused (regardless of who murdered them)? To figure that out, we first have to figure out what fraction of abused wives was murdered by someone other than their husband. We’re given that 4/5 of murdered wives were murdered by someone other than their husband. And 1/10 of those women were abused by their husband. So 1/10 of 4/5, or 4/50, of murdered married women are in this condition.

Here’s a probability tree:

Multiply the probabilities of a given path’s branches to get the probability of the path, then use those numbers to find the proportion we’re looking for. The probability of being in the circled condition is (1/5) × (1/2) = 1/10. The probability of being in the boxed condition is (4/5) × (1/10) = 4/50. The probability of being in either of those conditions is the probability that the murdered wife was abused, whether or not she was killed by her husband: 1/10 + 4/50 = 9/50 (this perfectly matches the earlier frequency numbers, which were 10/100 + 8/100 = 18/100 = 9/50; all the probabilities here can be reconfirmed in this way).

Hence, once again, the probability of a murdered woman having been murdered by her husband given that he abused her is (1/10) ÷ (9/50) = 5/9.

Finally, now that we’re so well acquainted with the relevant math and scenarios, I’ll solve the whole thing again by translating the prompt into notation that’ll relate the problem to Bayes’ theorem, which is:

One way to read this is: “the probability of A given B equals the probability of B given A divided by the probability of B.” How it’s read in application depends on the nature and relation of events A and B. I won’t get into how to derive the theorem. It really just amounts to the same thing we did in the above two approaches, which is to say that the theorem’s structure has already emerged above, so we should expect to again see: (1/10) ÷ (1/10 + 4/50) = (1/10) ÷ (9/50) = 5/9.

Still, approaching the problem with Bayes’ theorem is useful for furthering our understanding of how the problem’s various parts relate to one another.

Here are some formalisms. There are other ways, but I find this interpretation intuitive. The second paragraph of the prompt, again, is:

Assume that the defense attorney’s 1 in 1000 figure is correct, and that half of men who murder their wives previously abused them. Also assume that 20% of murdered married women were killed by their husbands, and that if a woman is murdered and the husband is not guilty, then there is only a 10% chance that the husband abused her. What is the probability that the man is guilty? Is the prosecution right that the abuse is important evidence in favor of guilt?

Let…

M = married woman was murdered

A = husband abused wife.

H = husband murdered wife.

¬H = husband did not murder his wife. (The “¬” symbol means “not.”)

| = “given.”

∩ = intersection of events, read as “and” or “but.”

P(husband murdered wife GIVEN that she was murdered AND her husband abused her): **P(H|M∩A) = **This is what we’re looking for.

P(husband abused wife GIVEN that she was murdered AND her husband murdered her) = **P(A|M∩H) =** **1/2. **It may seem redundant to condition on the intersection of M and H, but I do so because, while H implies M, the converse isn’t true, and showing M here and throughout helps maintain the tidy granularity and relation between the relevant events we’re conditioning on from step to step; namely, M∩¬H is quite different from just ¬H.

P(husband murdered wife GIVEN that she was murdered) = **P(H|M) = 1/5**.

P(husband didn’t murder wife GIVEN that she was murdered) = **P(¬H|M) = 4/5**.

P(husband abused wife GIVEN that she was murdered by someone BUT it wasn’t her husband) = **P(A|M∩¬H) = 1/10**.

P(husband abused wife GIVEN that she was murdered) = **P(A|M)**. This must be deduced from what we’ve done so far. It’s the probability that a murdered married woman was abused by her husband, regardless of who murdered her. So we’ll need the sum of two probabilities. I’ll get back to this one.

Plug all this into Bayes’ theorem, keeping in mind that, once again, we’re looking for the proportion of time wives are killed by abusive husbands among all instances of abused wives being killed, whether or not their husband was the murderer. I’m especially interested in the evidentiary significance of the abuse. That’s the evidence the defense rejected as irrelevant.

We have all the numbers for the numerator, but before filling those in I want to make sure the notation’s story makes sense. Excluding the denominator, it says something like this. The probability that the man on trial murdered his wife given the evidence that she’s been murdered and he abused her *equals* the likelihood that a woman turns out to have been abused by her husband given that she was murdered and her husband did it *times* the probability that the husband is the culprit in the event that his wife was murdered. Can we make this less confusing?

For one thing, we can more simply conceptualize the numerator as just P(A|H)P(H), where the wife being murdered is implied by H; something like: the probability that the wife turns out to have been abused in those worlds in which her husband killed her *times* the probability that we actually are in one of those worlds. That’s a more natural way to express it verbally and helps convince me that it’s right to condition, as we do, on M∩H—i.e., keeping the “M” and “H” together, rather than relating events A, M, and H some other way. And, again, we include “M” in the actual notation for reasons already noted above, and which will become especially apparent when working out the denominator.

Keeping in mind the simpler conceptualization while conditioning on event M, it’s now clearer why the two probabilities in the numerator are multiplied. We want to make sure we’re considering only, and exhaustively, our “desired” scenarios (or worlds)—i.e., those in which the husband is the culprit in the event that his wife’s been murdered, which comprise 1/5 of the scenarios in which a married woman is murdered. Thus, as in the tree diagrams, we start with a prior probability of P(H|M) = 1/5 that the husband is the murderer (I realize that this number is listed second in the numerator; that ordering isn’t important). But we need to also account for the fact that the defendant abused his wife. We know that 1/2 of wife-murdering husbands abused their wife, which means we need to multiply, or “branch” off of, P(H|M) = 1/5 by P(A|M∩H) = 1/2.

I continue to belabor such details because it’s easy to get turned around when plugging numbers into this theorem when the story isn’t obvious, which is often true when there’s extra conditioning. This is more clearly emphasized by the diagrams, where the “murdered” branch comes before the “abused” branch, and is why I use language like “turns out to have been abused” when interpreting our notation—it better fits the court scenario, where the evidence of abuse comes after the evidence of murder and after suspicion against the husband. With the branches, we start with the wife being murdered, then go to the “husband did it” branch, then go to the “husband abused her” branch.

There are multiple valid ways to interpret this sequence (e.g., the final step being, “it’s learned that the husband abused his wife”). But it’s of course also easy for the defense to disorient and lead us astray by positing a probability tree that follows a more natural chronology: the husband abuses his wife and then, with probability 1/1000, murders her.

If how to organize the story and math were obvious, there’d be less room for fallacy. That said, I think we now have at least the foundation for a story that gets the math right and makes intuitive sense, especially if we chose to flesh it out more thoroughly (e.g., involving case files or courtroom drama).

The next part of the story has us divide by all cases in which a married woman is murdered and abused (whether or not her husband killed her). This amounts to the straightforward step of taking a ratio between the probabilities of the “desired” scenarios (i.e., our numerator) and the “desired” scenarios *plus* the relevant “undesired” scenarios = all possible relevant scenarios (i.e., our denominator). I’ve referred to this operation several times by now.

Pop in the numbers we have so far for the “desired” scenarios:

We lack only P(A|M). Again, we can think of this as the fraction of all cases in which it turns out a murdered married woman was abused by her husband, regardless of who murdered her. This will be the sum of the fraction of time *a woman abused by her husband was murdered by her husband*, and the fraction of time *a woman abused by her husband was murdered by someone other than her husband*. That sum represents the probability of “all possible relevant scenarios” given our evidence. We already have the parts we need for this. The first part is the exact same expression we have in the numerator, and the second part is that same expression, but change “H” to “¬H”:

Notice that P(H|M) + P(¬H|M) = 1. That is, there’s a 100% probability that, given a situation in which a married woman has been murdered, she was murdered either by her husband or by someone else. And we’ve ensured that, within each of those distinct worlds, the murdered woman turns out to have been abused by her husband. We’ve thus covered all the relevant possibilities we’re concerned with for P(A|M).

We’ve already worked out all the parts of the denominator (this will quickly turn into the same numbers we saw in the probability tree):

And that’s that.

*Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.*

#### Footnotes:

- I have, however, encountered reference to Gigerenzer’s book in another book that also discusses this fallacy in the context of Simpson’s trial:
*The Drunkard’s Walk: How Randomness Rules Our Lives*(2008) by Leonard Mlodinow. - I’m only in the fourth chapter. Perhaps I’ll be moved to update this once I’ve finished the book.
- I’m now imagining a story in which the outcome of a trial hinges on whether a coin landed heads or tails, which, due to circumstantial convolutions, turns out to be something other than a 50-50.
- See Taleb’s 2010 (2nd edition)
*The Black Swan*, where he begins his definition of “ludic fallacy (or uncertainty of the nerd)” with: “the manifestation of the Platonic fallacy in the study of uncertainty; basing studies of chance on the narrow world of games and dice” (p 303). Interestingly, Franklin’s book currently has precisely one customer review at Amazon.com—a five-star rave by “N N Taleb.” - See, for example, this 2011
*The Guardian*article: “A Formula for Justice.” It discusses the worry of banning Bayes’ theorem form the courtroom. To be clear, this isn’t just about a formula. Bayes’ theorem is a fairly trivial result, particularly in its most basic form (though it is of course powerful, especially when manipulated a bit—e.g., for performing a sequence of likelihood ratio updates!). But it has come to represent a certain—namely, subjectively grounded—way of thinking about probability, and*that*is what’s, uh, on trial here. - I added the word “married” here to match the problem’s statement in the textbook. More importantly, it helps give a slightly tidier explanation. For example, if we say 80% of murdered women are
*not*murdered by their husbands, this will include women who didn’t have husbands. I’d rather not deal with this small wrinkle, for reasons that will show up shortly.