*Estimated read time (minus contemplative pauses): 41 min.*

Today, I again heard someone reduce—or appear to reduce—the difference between Bayesian and frequentist probability to using or not using, respectively, Bayes’ theorem. This is inaccurate. Bayes’ theorem, also known as Bayes’ rule or law, is an intuitive extension of basic conditional probability. No one rejects conditional probability. All probabilities are conditional on, for example, reference class, as in: the probability of a human menstruating before age 12 is different than the probability of a human *female* menstruating before age 12.

Here’s an easier one to calculate. The probability that the fair die I just rolled landed 2 is different than the probability that the fair die I just rolled landed 2 given (i.e., *conditional on* the fact) that the die landed on an even number; the former is 1/6, the latter 1/3.

I could end this post here, maybe cite an authoritative source, say Deborah Mayo, which I’ll do below. First, I’ll approach the question in my typically untrammeled way, as if it were a gateway to a more interesting—illuminating, confusing—set of questions.

Those who tend to oversimplify the frequentist-Bayesian divide in the above way often point to the example of disease diagnosis to demonstrate the unique power of Bayes’ theorem. While diagnosis scenarios are a great way of introducing Bayes’ theorem, particularly when they call for an individual to be repeatedly tested for a disease, they usually don’t require a ‘Bayesian rather than frequentist’ way of thinking. In other words, popping spoon-fed numbers into a ready-made formula doesn’t suddenly transform a ‘frequentist’ interpretation of a problem into a ‘Bayesian’ one.

To see what I mean, check out this 2018 post, in which I demonstrate two iterations of cancer diagnosis using theoretical frequencies and rudimentary probability methods; then I do it all over again using Bayes’ theorem explicitly: “Cancer Screening: An Easier Counterintuitive Conditional Probability Problem (with and Without Bayes’ Theorem).” Both methods do essentially the same thing and arrive at the same (surprising) answers.

There’s nothing particularly Bayesian about that problem because all of the prompt’s probabilities are well defined. In solving it, frequentists and Bayesians alike will take for granted, for example, that if 1% of a population has a disease, then a randomly selected person from that population has a 1% chance of having that disease.

But suppose we don’t know that percentage. The central point of contention (as I understand it) between, let’s say, inference that’s not in a *hardcore* Bayesian mold and inference that is in a hardcore Bayesian mold has to do with how we go about choosing what numbers to pop into Bayes’ theorem with respect to that unknown probability. (Someone not in the hardcore Bayesian mold need not be a frequentist—they may be a less subjectively inclined Bayesian. This should make more sense shortly.)

Here’s a common way to represent Bayes’ theorem:

A standard way to read this is: the probability of A given B equals the probability of B given A times the probability of A, divided by the probability of B.

In particular, folks argue about how loose—or subjective—we should be in designating the ‘prior’ distribution represented here as P(A)—that is, the probability you’d assign event A prior to event B having occurred, or prior to having *learned* that event B has occurred; for example: the probability you’d assign to rolling 2 before learning that you’d rolled even; the probability you’d assign to having cancer before testing positive for cancer; the probability you’d assign to God existing before finding yourself in Heaven.

Some self-designated Bayesians like to characterize priors as ‘prior belief,’ though this isn’t necessary. Another way to put it is as ‘quantified uncertainty.’ A stricter or more philosophically neutral way is just ‘prior distribution’ (which, just as in frequentist statistics, may be well approximated with a normal, Poisson, gamma, etc. distribution).

At this point, it becomes apparent that there are roughly two phenomena under discussion here. While using Bayes’ theorem and, especially, certain of its several variations, does amount to approaching a problem within a Bayesian technical or methodological framework (rather than a technically frequentist or ‘classical’ one), it is not the case that working within such a framework makes you a Bayesian. In other words, being a Bayesian seems to entail some sort of special philosophical commitment, wherein being someone who works within a Bayesian framework need not be so committed (I imagine a *philosophical* frequentist—more about whom in a moment—could get along fine in such a framework).

On the other hand, I’ve heard folks speak of themselves as Bayesians due to that being the technical framework they prefer (or have been principally trained) to work in. So, I’ll endeavor to refer here to the more philosophically committed Bayesians as ‘hardcore’ Bayesians, while hoping not to build a straw man, and will use the unmodified ‘Bayesian’ as a broader category.

These confusing tangles acknowledged, a basic implication of what I’ve said so far, I suppose, is that there is no problem with one’s Bayesian priors being informed by a frequentist worldview. But the Bayesian methodological framework—and the philosophical view that framework is meant to encourage and serve—may also, I suppose, force us to think more deeply about our about our priors (prior beliefs, credences, biases, evidence, whatever) and thus more deeply about what we should be conditioning on.

Even something as simple and frequentist-friendly as *rolling a 2*, or *rolling a 2 given even* can look more interesting under a stark Bayesian light: rolling a 2 conditional on… the die being fair… my not being a con artist… my rolling it at all… my parents having met and mingled their genetic materials in just such a way that I’d be born… the die not being stolen mid-toss by a passing humming bird… where each of these conditioning events must be estimated and is difficult to quantify, at least for some observer. And so enters *subjectivity*—a word that may refer to the differing evidence held by two different people, or, quite distinctly and loosest of all, the differing felt credence two people might experience in response to the exact same evidence.

I’d say the reliance, or *insistence*, on quantifying subjective belief constitutes the deepest groove in the philosophical divide between Bayesianism and frequentism. Consider how Ben Lambert, in his 2018 textbook *A Students Guide to Bayesian Statistics*, characterizes the Bayesian worldview, after pointing out that the frequentist definition of probability (more about which soon) is challenged by one-off events like presidential elections:

For Bayesians, probabilities are seen as an expression of subjective beliefs, meaning that they can be updated in light of new data.

^{1}

Of course, a frequentist’s probabilities may also be updated in light of new data, but I imagine that the point at which this happens may depend on the practitioner, just as it might for a Bayesian. It also seems to me that frequentist methods can be used to quantify uncertainty in a presidential election. Lambert’s frequentists are people the likes of whom I’m sure don’t exist: some kind of robot abstracted from the starkest possible adherence to naive notions about frequentist probability—one whose probability assessments don’t budge after learning that a murder victim’s ex-boyfriend was a violent psychopath highly motivated to see the victim dead.

(I do still recommend the book and its accompanying YouTube videos as an introduction to the mathematical and inferential tools of Bayesian statistics, whose “central dogma,” as Lambert puts it (page 28), is Bayes’ theorem. To get the most from the book—and to avoid frustration—you’ll want to already be familiar with standard [i.e., frequentist] statistics, at least basic integral calculus, and maybe basic linear algebra.)

Lambert’s characterization of probabilities as “an expression of subjective beliefs” (page 18) is precisely what I think of when I think of hardcore Bayesianism. Notice that, in the above disease diagnosis example, you are given that the probability of cancer before screening is 1%. That is, you’re told what your belief is, and it’s not characterized as an especially subjective one. Instead, it’s stated as something that, at least for the sake of the problem, we can take for granted as validly derived ‘according to our best available data’ (and so on). This strikes me as no different than what frequentism recommends in most situations (e.g., using a sample proportion to approximate a population proportion). So, when I think of hardcore Bayesian ‘subjective belief,’ I imagine something far messier, far more *personally *subjective.

‘Personally’ need not translate to ‘whatever I happen to feel,’ but instead generally means something more like, ‘from my own observational vantage point, which usually will not involve all the relevant facts about a situation; and the facts I have will often be different than the facts others have’—in conjunction, perhaps, with a healthy dose of (seasoned) intuition.

To incline the diagnosis exercise more in the hardcore Bayesian direction, we might leave one or more priors out of the prompt and ask students to independently formulate thoughtful guesses, Fermi style (with or without access to Google). See what each student comes up with after, say, three iterations of disease testing. Then have them compare notes about how they formulated their priors and rerun the exercise as a group effort. See where that leads. Et cetera.

Problems of that sort haven’t come up often in the textbooks I’ve read. But they do come up in discussions I’ve heard on podcasts where practitioners discuss this stuff, and in lectures I’ve watched online. (I’m blurring the line here between statistics and probability; I’ll say more, though probably not enough, about this distinction here and there.) In such discussions, plenty of folks who seem dedicated to Bayesian methods often talk like frequentists; that is, with language and imagery in tune with the common characterization of frequentism as a philosophical view in which probabilities, put simply, are imagined to be drawn from the theoretical limit of an infinite series of repeated trials. This is an extremely difficult, perhaps impossibly doomed, idea to pin down. It’s easy (-ish) to imagine flipping a fair coin infinitely many times to approach the theoretical ratio of 1:1 for Heads:Tails, but (as Lambert points out) what does this characterization mean for one-off events like presidential elections?

This just scratches the surface of (a naive sort of) frequentism’s well-known problems. Even the simple coin case is difficult to make sense of if looked at closely, as the perfectly fair coin whose edges don’t wear down after many quintillions of flips is itself a theoretical construct, as is the notion of flipping it infinitely many times to get to its ‘true probability distribution,’ whatever that means (I suppose it means ‘infinitely many flips’; but what does this have to do with a given flip or, for that matter, many quintillions of flips?).

In the real world, every event is identical only to itself, from coin flips to human doings.

So, uncertainty abounds in the world, which makes Bayesianism appealing. But frequentists are well aware of uncertainty and seem to be concerned not so much with satisfying some weird and unfathomable theoretical construct, but with outputting the best answer they can while minimizing and quantifying the chance for error, as we’ll see shortly when I share some more reasonable, or at least more charitable, characterizations of the frequentist worldview.

Among those steeped in or reliant on Bayesian methods, there’s disagreement about how much to rely on subjectivity, particularly of a deeply personal sort. Bayesian practitioners would generally, of course, push against the idea that the subjective dimensions of their methods are fatally problematic, for various reasons and to varying degrees, depending on where they fall on the Hardcore Spectrum—they might say that (strong; i.e., ‘informative’ or ‘non-vague’ or ‘non-diffuse’) priors can be avoided (at least early on*) or they might embrace the subjective nature of hard-to-pin-down priors while (persuasively! rightly!) pointing out that *all* probability is subjective but ‘at least we Bayesians admit it and try to account for it.’

For Bayesians, it seems, an appropriate application of Bayesian tools is one in which differing priors, no matter how subjective, are reliably transformed into roughly the same final (or ‘posterior’) probability over *repeated* observations and as evidence amasses.

[*A problem for another day is how Bayesians should quantify a prior assumption of innocence in a courtroom setting. A defendant literally presumed innocent wouldn’t be on trial. In particular, if you begin with a prior of ‘100% chance of innocence,’ you have no place else to go with Bayesian methods, no matter what evidence emerges. It seems to me that the solution lies in developing priors as part of evidence evaluation before doing any sophisticated formal calculations with those priors (though, admittedly, this process in itself may be at least informally Bayesian; see again my above reference to priors that are vague or diffuse; see below for more on ‘formal’ vs. ‘informal’). It also seems to me that this should include priors about the trustworthiness of lawyers and judges.

I’ve seen interesting suggestions for dealing with this problem but, again, it’s a discussion for another day. I thought it worth mentioning here as a real-world example of where I think formal Bayesian reasoning could be useful, provided the kinks can be worked out. I say “formal,” because there are those who’d argue that jurors—and indeed all of us—already are Bayesian epistemic agents, but informally so; in which case, giving jurors formal tools to work with would clean up their thinking. So maybe the idea is this: the most difficult priors to pin down are best formed within an informal, non-numerical Bayesian process (e.g., “here’s a bunch of evidence, what do you think?”), and then are cleaned up and quantified with formal Bayesian tools.

To be clear, though, the famous trials gone awry due to bad probabilistic practices did not go awry due to a failure to use Bayes’ theorem, but rather due to a failure in conditioning.]

Indeed, Bayesian statistics seems to make especially good use of our ever-growing computer power, which is often cited as what makes Bayesian statistics robustly doable (and thus more appropriate to the computer age than are frequentist methods), given its intimidating (or delectable, depending on your taste) mathematical rigor and, especially, its amenability to running loads of repeatable simulations. While the core of such methods is Bayes’ theorem, which can be manipulated into a variety of useful forms depending on what you want to do, this doesn’t mean that frequentists don’t use Bayes’ theorem or, more specifically, conditional probability such that probabilities are updated in light of new evidence.

In other words, Bayes’ theorem in its simplest form—especially before a first round of conditioning one’s initial priors—is a starting place, and one at which frequentists and Bayesians can find themselves face-to-face. From there, though Bayesian statistics has developed its own tools, models, culture, its not obvious to me at what point a dedicated wielder of such devices suddenly becomes a philosophically committed, or hardcore, Bayesian. My impression is that most utilizers of statistics—including those who are and aren’t trained statisticians—seem to not be dogmatically dedicated to any particular philosophical school, and will trade in both frequentist and Bayesian language as suits their pedagogical or conceptual needs.

It’s also my impression, though, that there seems to be—*seems to me to be*—a rise in people being careful that, or at least educated in such a way that, their language is distinctly hardcore Bayesian: you might be a Bayesian if you use the word ‘update’ a lot in casual conversation (or if you wear a headband that says *I ♥ Rationality*; for the record, ‘updating’ is what we all do—as formal or informal Bayesian epistemic agents—to our existing beliefs [or ‘priors’] conditional on new evidence; and ‘rational’ isn’t a technical term in Bayesian statistics, but self-designated Bayesians seem to like it a lot, as in the term ‘rational updating’^{2}).

That said, I often hear people use the word ‘prior’ these days while inhabiting otherwise firmly frequentist conceptual territory (I like that!), which shows us just how Bayesian the statistically inclined culture in general has become, though the technical tools most people are using still seem to be largely frequentist; perhaps this signifies that the Bayesian-frequentist lines need not be so stark (we’ll see how things go; some might argue that this is a symptom of tenaciously lingering frequentism of a sort that bogs down our progress as we attempt to undo and escape the trauma of the replication crisis, more about which below; it’s also worth noting here that a common trope these days is to hear a tenured professor say, ‘my graduate students are teaching me Bayesian statistics’).

I’d also bet that plenty of self-proclaimed hardcore Bayesians haven’t studied probability or statistics in depth. I don’t mean this as a putdown, though, just as I wouldn’t put someone down for believing in or accepting evolution despite not having closely studied the topic. It does worry me, however, insomuch as it means that those people might switch their view the moment it becomes fashionable to do so among whatever authorities convinced them to be Bayesians (or frequentists or pro-evolution, etc.) in the first place. (Am I such a person?)

Unsurprisingly, where I’ve especially noticed explicitly stated, nuanced philosophical grounding for one’s frequentism versus Bayesianism—which are not the only two options and there are competing views within each option—is among philosophers of probability, such as those featured in this 2011 collection: *Philosophy of Probability: Contemporary Readings* (Eagle (ed.), Routledge). It’s a fascinating read. A current Bayesian *might* be surprised at how sophisticated frequentist thinkers can be (regardless of whether their arguments are the most convincing in the end). Perhaps as Bayesian self-consciousness rises among those whose work relies on statistical tools, I’ll also notice more instances of explicitly stated, nuanced philosophical grounding for why they prefer those, rather than other, tools.

To be clear, what I (at my current stage of understanding) would prefer to see is people self-consciously, and conscientiously, traveling fluidly between various modes of thinking—taking what works and rejecting or setting aside what doesn’t (in a given context). Frequentist concepts, for example, strike me as unavoidable and reasonable given the human tendency—*necessity*—to hunt for patterns, particularly when there really are patterns (e.g., in the way symmetrical appearing coins tend to behave); while the Bayesian framework, by making us sensitive to priors (biases, implicit beliefs, etc.) that might otherwise go unconsidered, may help us avoid the traps of that pattern-hunting tendency (e.g., in the aforementioned courtroom case).

Perhaps this expects too much clarity from the naturally blurry lines that separate philosophical grounding and practice, where the latter itself is already criss-crossed through with blurry demarcations—which, again, I like, and may be enough to hope or ask for, though I do think a solid philosophical grounding always helps (e.g., protects against vague disputes against supposed purists: at least with explicit philosophical grounding, it’s clearer not only what the disputes are, but when there actually is one).

At any rate, the Bayesian-oriented textbooks in my collection liberally make use of or reference techniques that I associate with frequentist methods (e.g., confidence intervals; more on this particular example below).

At least as telling, I suppose, is that, when posed with an actual problem, discussions about whether to proceed as a frequentist or Bayesian would be a silly distraction. To see what I mean, check out the problem I share in the Addendum at the end of this post.

[*Note: While taking a snack break from writing this post, I again heard someone reduce—or appear to reduce—the difference between Bayesian and frequentist probability to using or not using Bayes’ theorem.*]

Alright. I’ve wandered by now into complicated—philosophical, methodological, psychological, anthropological—terrain. I hope I’ve captured some of the topic’s complexity and messiness, or at least have gotten across (and vindicated) the substance of my own confusion about it.

Allow me, then, to snap the focus back to my main point.

Whatever separates frequentists and Bayesians, it’s not the use of Bayes’ theorem. But don’t take it from a rambling bystander with a weird, budding obsession for probability and whose study of statistics proper has been more dutiful than obsessive. Instead, check out some thoughts from philosopher of statistics and proudly confirmed frequentist Deborah Mayo in her excellent 2018 book, *Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars*:

Bayes’ Theorem is just a theorem stemming from the definition of conditional probability; it is only when statistical inference is thought to be encompassed by it that it becomes a statistical philosophy. Using Bayes’ Theorem doesn’t make you a Bayesian. (p 24)

She goes on to talk about the different goals of frequentist and Bayesian inference, citing statistician Larry Wasserman:

The Goal of Frequentist Inference: Construct procedure with frequentist guarantees [i.e., low error rates].

The Goal of Bayesian Inference: Quantify and manipulate your degrees of beliefs. In other words, Bayesian inference is the Analysis of Beliefs. (p 24)

Though these distinctions are “too crude,” notes Mayo, “they give a feel for what is often regarded as the Bayesian-frequentist controversy” (p 24). She goes on to refine the discussion. Read her book for that. And then check out the post on her book at statistician Andrew Gelman’s influential blog: “Several reviews of Deborah Mayo’s new book, *Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars*.”

The several expert reviews referenced at Gelman’s post are delivered with varying degrees of thoroughness and thoughtfulness. The most colorful response is from mathematical psychologist E. J. Wagenmakers, who didn’t read the book:

I cannot comment on the contents of this book, because doing so would require me to read it, and extensive prior knowledge suggests that I will violently disagree with almost every claim that is being made. Hence I will solely review the book’s title, and state my prediction that the ‘statistics wars’ will not be over until the last Fisherian is strung up by the entrails of the last Neyman-Pearsonite, and all who remain have been happily assimilated by the Bayesian Borg. When exactly this event will transpire I don’t know, but I fear I shall not be around to witness it. In my opinion, the only long-term hope for vague concepts such as the ‘severity’ of a test is to embed them within a rational (i.e., Bayesian) framework, but I suspect that this is not the route that the author wishes to pursue. Perhaps this book is comforting to those who have neither the time nor the desire to learn Bayesian inference, in a similar way that homeopathy provides comfort to patients with a serious medical condition.

Most of the reviews are more reflective than that, as are many of the user comments, many of which are from practitioners (at least one of whom, I recall, criticizes Mayo on the grounds of being a non-practicing theorist). I find Wagenmakers’ response of particular interest as it pushes against the commonly encountered idea (by me, at least) that today’s statisticians are happy to use frequentist or Bayesian methods when appropriate—which is to say that notions about frequentist-Bayesian tension are overblown.

It strikes me that they are overblown. Above, I said that hardcore Bayesianism seems to be on the rise. This is true, but even more on the rise, it seems, is a kind of frequentist-Bayesian mixed toolkit, which is actually one of the main points I’m trying to get across here. (Though, to be clear, my simplest point is that everyone uses conditional probability, in whatever form it may take!)

Or, as Rafael Irizarry puts it in this 2014 blog post, “I Declare the Bayesian vs. Frequentist Debate over for Data Scientists“:

Because the real [recently published

New York Times] story (or non-story) is way too boring to sell newspapers, the author resorted to a sensationalist narrative that went something like this: ”Evil and/or stupid frequentists were ready to let a fisherman die; the persecuted Bayesian heroes saved him.” This piece adds to the growing number of writings blaming frequentist statistics for the so-called reproducibility crisis in science.

Obviously, Mayo sees the debate as neither finished nor calmed, given her inclusion of the word ‘war’ in her book’s subtitle, which apparently informed the violent imagery of Wagenmarkers’ (self-mocking!) response. Or is he mocking a common caricature of the Bayesian, a species that may no more exist than does Lambert’s robot frequentist? Still, in light of Wagenmarkers’ reaction, Mayo has a point: the tendency Irizarry describes in the above quote seems not just the stuff of journalists. Frequentist statistics is increasingly cast as the culprit for the replication crisis, even though it was frequentist methods that enabled researchers to point out replication and reproducibility problems in the first place.

I would include Mayo and Irizzary among a growing—though, from my (possibly ill-informed) outsider’s perspective, seemingly still in the minority—group of critically minded folks pointing out that blaming overly pure frequentist methods in themselves (most famously, perhaps, involving the conscious or unconscious manipulation of *p*-values) for the replication crisis is flat wrong, and that switching over to a shiny and sexy new set of tools won’t fix things, particularly if done unquestioningly.

This seems obviously true to me. I can’t imagine it ever being a good idea to replace one unquestionable dogma with another unquestionable dogma. It seems clear that Bayesian statistics alone won’t stop people from promoting faulty results, and won’t guarantee against some other hungry group of young researchers coming along two generations from now and installing yet another new dogma while knocking down whatever findings replace the old ones that are currently being razed.

As I recently heard one concerned young-ish social psychologist put it, to paraphrase: we need to throw out all pre-crises social psychology findings. To be fair, ‘hungry’ doesn’t appropriately characterize this person, so much as worried, but there seems to be a schedule that looks like this: grad students are excited by the crisis; mid-career researchers are worried, but relieved that they still have a chance to make their mark; older researchers range from ‘whatever, I’ll just retire’ to devastated; some of the youngest researchers seem to take delight in witnessing that devastation.

To be clear, most folks don’t seem to be recommending a mere switch to new tools. In fact, some put a premium not on that switch, but on cleaning up practices that have nothing to do with one’s statistical loyalties, such as preregistration and larger sample sizes, which obviously seem like great ideas with ethical, good-faith research practices, but are still hackable under a Bayesian program. Indeed, what many point out (especially non-Bayesians; see again Mayo) is that it is cleaning up practices that matters more than switching tools, though most of those I’ve encountered also pine for the switch. And again I say: we’ll see how things go.

Another worry is that these discussions are happening within a context that is never going to allow for a genuine solution will ever be possible, due to the difficulty of pinning down not just the content of human nature, but what that thing even is or could be. This is made even more difficult by the restrictions that ethical practice imposes—indeed *should* impose (though I’ve heard persuasive accounts of things going too far in that regard)—on experimental controls.

How can we hope—in the laboratory or any other environment sufficiently controlled so as to allow human mental states and behaviors to be translatable into shareable data that may in turn be reliably translated accounts that are both reliably world-representing and intelligible to human brains—to get at anything like some universal nugget of human nature, independent of culture, time, space? Maybe future researchers will increasingly concern themselves with local populations (valuable in itself); maybe maybe maybe; maybe what we lose to the strictures of experimental controls will always keep reliability and replicability out of reach for any questions that actually matter. Biases, artifacts, confounders, complexity, messiness abound, and that’s just that.

And in that murky fizzy fluid, there will be constantly shifting tides agitated by currents of irrelevant arguments over what the best tools and philosophies and (hiring) policies and such are for bypassing the very hard, time-consuming, life-sucking, tedious, nearly always disappointing and gloomily fruitless (from a given individual’s perspective) work that clean and conscientious and *genuinely important* research requires. In other words, I worry that any debate over which statistical tools to pledge one’s loyalties to distracts from real and harder problems (among them that the world is unforgivingly complex and humans tend to get bored easily).

Social psychology—which, though not the only observe-and-infer field under threat, may be the most under threat—might consider fashioning itself after certain still-strong areas of philosophy, at least in the short term: help identify the most important questions to ask of a certain kind (which kind?), then show, rigorously, why they can’t be answered, or won’t be answered any time soon. This would mean, perhaps, a shift to a focus on counterexamples (not just to showing a failure to reproduce a result, but rather showing a support for an opposed hypothesis).

Some may find this a downer, and so would intermingle their practice with some intricate and wild speculations (known as ‘discussions’) supported by radical, but defendable, interpretations of respectable research outcomes (that we nevertheless can’t be sure we can rely on), particularly by means of comparing the results of research questions that superficially appear to be unrelated (the more ‘invisible’ a set of relations is, the more leeway there is for radical interpretation). As always, the same data would continue to inspire widely divergent interpretations among experts, but now within a highly conceptual framework—so conceptual, that discussions may lead to questions about the degree to which the framework is real, rather than about the interpretations (much less the research results) themselves. This may lead again to ruin, rendering the field worse than before. (As with the worst of contemporary philosophy and its theoretical offshoots.)

(This is also starting to sound like certain strains of contemporary physics. Is everything reverting back to philosophy, in its best and worst forms?)

The above thoughts are themselves speculative, and I haven’t thoroughly thought them through. Though the basic idea holds, I think, that social psychologists should be more publicly (and perhaps personally) open about the severe degree to which constructs and operationalizing (e.g., ‘happiness’ must be operationalized as something that can be measured before it can be measured) and ethical restrictions and observational/situational controls and an inability to predict/detect/account for confounds is so severe that it renders many or most of the questions we care about impossible to answer. But this doesn’t make for sexy press releases or (I presume) grant applications.

Here’s some more speculation stemming from what strikes me as an easily observed, basic truth.

That real question underlying debates I’m hearing among young (and young-ish) social psychologists seem to be not about how science should be practiced as such, but rather the extent to which the social psychologists should see themselves as political activists rather than as scientists (i.e., as objective, ‘data-driven’ researchers trying to understand the world around them). The answer to that question will no doubt play a huge role in shaping tomorrow’s research practices.

For example, I’ve heard practicing researchers debate whether they should share the results of a study if it does not support a particular political position. The assumption is that the political position (or dogma) must be correct, therefore something must be wrong with the study.

This is distinct, by the way, from researchers gravitating towards areas of social import. The problem is when researchers are studying hypotheses that are practically universally assumed to be true, by social psychologists and perhaps from society (of a certain political bent) from the outset, such that any contrary evidence must be rejected or concealed.

When I was at studying at Columbia University, I was told by someone working in a psychology lab that research results are often concealed due their having failed to support certain socio-political assumptions; the worry was about bad publicity, which makes me wonder how much of the debates I’m hearing are honest.

Preregistration may help with this. But I’m not hopeful. What I’m hearing are not discussions about accepting the results as they fall, but rather how to essentially apologize and say ‘I’ll do better next time’ when results don’t bear out the expected political position (again: is this an honest apology? I doubt it… but am not sure what’s worse here: apologizing from fear or sincerity).

(Might this conflict with, or bolster, a growing emphasis on counterexamples? Counterexamples to what?)

At any rate, conflating diversity (of a certain, demographic or epidermal kind) with good science is a worrisome proposition, one I’ve heard pushback against from worried practitioners. I have many more thoughts on this, best saved for some other time.*

[*Well, here’s a taste. There is growing urgency for the important question of what constitutes the greater good: a society in which research teams are sexually and gender-wise and racially and ethnically (rather than cognitively or [dis]ability or background-wise or class-wise) diverse, or one in which cancer and other widespread afflictions are eradicated. These need not be mutually incompatible goals, but they are not the same goal—I’m by no means alone in pointing out that it’s easy to imagine plausible situations in which one trumps the other.

For example, if the best team to solve a scientific problem happens to be composed of black women, I see no point in insisting on introducing other team members solely to increase diversity. Many today talk as though it’s impossible for such a team to be as effective as a more diverse one, but this is a politically, rather than empirically, grounded position. Others seem to think that solving a scientific problem is less important than having a diverse team, because this in itself addresses more important problems having to do with social inequity or inequality; I think there is something to this, but the details obviously matter (a neighborhood may increase in diversity as it gentrifies, but does this make it a de facto good?).

(Note that how this discussion goes will depend on the understanding of the word “diverse.” I do think it’s difficult to overestimate the power of *cognitive* or, perhaps what me might simply call *viewpoint,* diversity; the question, then, is in figuring out where these notions of diversity overlap, etc.)

What many would likely say they are actually calling for is increased diversity not in teams of the sort I just mentioned, but in the overall field of science. I think it obvious that this has to happen, though there must be some clear picture here of the relation between the field and teams—I doubt our moral intuitions would be satisfied by a diverse field composed of un-diverse research labs; though this may depend on how funding is distributed and on what projects the labs are pursuing, and the means by which such things come, or are brought, about.

Of course, the possibility of such scenarios will not impede the flow of social progress. An inability to candidly discuss their finer details might.

That in mind, I was very moved to hear Edward Frenkel—a mathematician and creative soul whom I admire; whose 2013 book *Love and Math: The Heart of Hidden Reality* I love so much; who himself was, explicitly enough, denied entrance to a top Russian math program as a Jew—say in a discussion with Brady Haran on the *Numberphile *podcast episode “Coffin Problems” (12/3/2019) (at 1:08):

I personally think—and I’m not just being facile or facetious, but I really mean it—I think that underrepresentation of women and minorities in general in mathematics is the single

mostimportant problem that we have to deal with because a lot of other things are consequences of that, and definitely we have to make an effort to bring more young women and minorities to mathematics. And, not just bring, but be supportive, be compassionate, be aware of the plight, of the difficulty of being a minority. It’s not just something to say as a slogan, but something that requires sustained effort.

]

It’s not so difficult to imagine the psychological enterprise splintering in the way physical and cultural anthropology have. That would be better than a more dystopian direction in which the replication crisis is used for establishing a new order predicated on political activism disguised as science, rather than on the push and pull of constant disagreement—in research results, interpretation of given research results, and in intuitions about what seems most logical, and on and on—a push and pull that gives fit-and-start-and-lurch-and-lop to the machinery of scientific advancement (or, if you prefer: progress). (One reason this is a cause for concern is that, no matter the outward shape or mechanisms of a scientific field, governments will continue to seek out certain kinds of truth—e.g., the kind that enables mass destruction. Scientists must be both intellectually equipped and free to pursue complementary as well as opposing avenues of discovery.)

And now I must acknowledge that my understanding, such as it is, on these matters today comes mostly from paying attention to a small group of outspoken voices (who at least represent themselves, and their expert audiences, as representative; they also seem to be at least a little influential) and a handful of textbooks. It doesn’t come from personal experience.

Better to just concur with Irizarry: “their choice of technique is not the problem, it’s their lack of critical thinking.”

As for Mayo’s use of the word ‘war’ in her book’s title, check out her reference to it in a tweet:

In trying to decide if I dared to use "Statistical Wars" as part of the title of my book, I tried to identify essential features of cases where it seems apt to use "The ––Wars", e.g., the mommy wars, science wars. What are others? And what are the shared features, if any?

— ♕Deborah G. Mayo♕ (@learnfromerror) September 25, 2019

It will be fascinating to see where things head in the next 10 to 30 to 300 years.

For a less technical (in terms of mathematics and philosophy of science) introduction to Mayo’s ideas, I highly recommend her February 12, 2019 appearance on the *Sci Phi* podcast: Episode 58: Deborah Mayo. She’s also got a website: *Error Statistics Philosophy*.

I’ll close with some additional book excerpts.

The following measured comparison is from Gelman’s also excellent (and rather advanced) 2013 textbook *Bayesian Data Analysis* (3rd Edition) (3rd edition, coauthored with Carlin, Stern, Dunson, Vehtari, and Rubin, and in which the word ‘rational’ never appears):

A primary motivation for Bayesian thinking is that it facilitates a common-sense interpretation of statistical conclusions. For instance, a Bayesian (probability) interval for an unknown quantity of interest can be directly regarded as having a high probability of containing the unknown quantity, in contrast to a frequentist (confidence) interval, which may strictly be interpreted only in relation to a sequence of similar inferences that might be made in repeated practice. Recently in applied statistics, increased emphasis has been placed on interval estimation rather than hypothesis testing, and this provides a strong impetus to the Bayesian viewpoint, since it seems likely that most users of standard confidence intervals give them a common-sense Bayesian interpretation. One of our aims in this book is to indicate the extent to which Bayesian interpretations of common simple statistical procedures are justified.

Rather than argue the foundations of statistics—see the bibliographic note at the end of this chapter for references to foundational debates—we prefer to concentrate on the pragmatic advantages of the Bayesian framework, whose flexibility and generality allow it to cope with complex problems. The central feature of Bayesian inference, the direct quantification of uncertainty, means that there is no impediment in principle to fitting models with many parameters and complicated multilayered probability specifications. In practice, the problems are ones of setting up and computing with such large models, and a large part of this book focuses on recently developed and still developing techniques for handling these modeling and computational challenges. The freedom to set up complex models arises in large part from the fact that the Bayesian paradigm provides a conceptually simple method for coping with multiple parameters…

^{3}

And this note is representative of the book’s generally fair, friendly, collaborative attitude about frequentist statistics:

Just as the Bayesian paradigm can be seen to justify simple ‘classical’ techniques, the methods of frequentist statistics provide a useful approach for evaluating the properties of Bayesian inferences—their operating characteristics—when these are regarded as embedded in a sequence of repeated samples.

^{4}

I like the book’s approach, which seems to be: ‘Here’s a bunch of cool stuff we can do with extensions and manipulations of Bayes’ theorem; add them to your tool box! Think hard about them while you do so.’

I’m not familiar with Gelman’s coauthors, but I often read Gelman’s blog where he seems to strike an admirably balanced approach to disputes in his field. I’ve enjoyed especially the conversation between him (one of the most influential voices in [at least the popular communication of ideas about] statistics) and Mayo (the only philosopher of statistics I know of who’s entered into the semi-popular domain of these discussions). I’m grateful to both of them for having this discussion publicly, as well as to all the knowledgeable folks taking the time to chime in along with them.

I also have a book called *The Complete Idiot’s Guide to Statistics*. It seems to do a nice job of teaching introductory (i.e., frequentist) statistics, which is to say that its approach is frequentist by default. It’d be incomplete, however, without a section on Bayes’ theorem, which it does indeed have. The word ‘Bayes’ appears 15 times in the book. (Neither ‘Bayesian’ nor ‘frequentist’ appears.) It’s possible that the author himself was a Bayesian who insisted on sneaking the topic in, or, more likely, it’s simply become understood (by authors and publishers and students) that, even in an intro book on frequentist statistics, Bayes’ theorem must be included. I take this for granted, but it apparently wasn’t always so.

There are older statistics textbooks that don’t explicitly mention Bayes’ theorem. They of course still discuss conditional probability. In fact, I saw a still-in-use college statistics textbook the other day that has a sidebar to an elementary conditional probability formula stating essentially that ‘sometimes people call certain applications of this formula “Bayes’ theorem”‘; the formula they give, however, is incomplete (i.e., it’s a couple of manipulations away from being Bayes’ theorem). Though this sidebar comes across as obviously ad hoc, I take their apparent point to be that where conditioning becomes (hardcore) Bayesian is a matter of one’s project and outlook.

The attitude of ‘I’m a Bayesian who recognizes the need to teach you frequentist methods first’ comes up in one of my favorite statistics textbooks, Danielle Navarro’s *Learning Statistics with R: A Tutorial for Psychology Students and Other Beginners* (Version 0.6.1; 2019; freely available online!).

The final chapter is on Bayesian statistics and begins like this:

The ideas I’ve presented to you in this book describe inferential statistics from the frequentist perspective. I’m not alone in doing this. In fact, almost every textbook given to undergraduate psychology students presents the opinions of the frequentist statistician as

thetheory of inferential statistics, the one true way to do things. I have taught this way for practical reasons. The frequentist view of statistics dominated the academic field of statistics for most of the 20th century, and this dominance is even more extreme among applied scientists. It was and is current practice among psychologists to use frequentist methods. Because frequentist methods are ubiquitous in scientific papers, every student of statistics needs to understand those methods, otherwise they will be unable to make sense of what those papers are saying! Unfortunately—in my opinion at least—the current practice in psychology is often misguided, and the reliance on frequentist methods is partly to blame. In this chapter I explain why I think this, and provide an introduction to Bayesian statistics, an approach that I think is generally superior to the orthodox approach. (Chapter 17)

The first problem presented in that section actually does do the thing I said I don’t run into very often in textbooks, wherein you’re asked to contemplate a problem without spoon-fed priors. The chapter embodies a more philosophically grounded Bayesian view (including with the word ‘rational’ showing up), of the sort I’ve hopefully not been uncharitable to here. (I, in honest moments, identify as a non-hardcore Bayesian, but am wary of, and thus might overcompensate for, the possibility that this is due to a sort of fashion-based social conditioning—which has influenced the books I’ve read and the ideas I have come to be intellectually comfortable with, etc.—rather than to the tools in themselves.) I love this chapter and the book as a whole and of course emphatically recommend it. Navarro seems compelled towards a policy of too much rather than too little information, which means you’ll learn a lot while also getting a sense of what you’re not learning. And her personable tone and candid style make it as entertaining and thought-provoking as it is instructive.

I especially recommend the book for getting a handle on using R for statistics (even if you’re not a psychology major); but even if you’re not so interested in R, it’s also worth reading through for Navarro’s insights on using a range of statistical tools in practice (particularly as a psychologist). The book’s goals mean that it’s light on math (for a free mathematical companion to the book, a great place to start is the JB Statistics YouTube Channel); the book is *not* light on concepts, however—large swaths within don’t refer to R at all. So check it out. And check out Navarro’s blog.

I’ll also include more excerpts from the book below, particularly drawing from Chapters 1, 9, 10, and 17: in Chapter 9, she quickly sketches a rather nuanced picture of the frequentist-Bayesian divide, a particularly nice thing to see given that this is an introductory text for a class not focused on probability theory (its brevity of course requires some questions to be left unanswered, which she herself makes clear). In the chapter, she brings attention to some important conceptual differences between frequentist and Bayesian concepts in practice, most interestingly in her discussion of confidence intervals, of which she notes that “if you are going to use frequentist methods then it’s not appropriate to attach a Bayesian interpretation to them. If you use frequentist methods, you must adopt frequentist interpretations!” (Section 10.5.2).

I mentioned earlier that (for example) confidence intervals tend to be freely referenced in discussions and books on Bayesian statistics (and not only to compare them unfavorably to their Bayesian counterpart: credibility intervals; though of course that is often the upshot). But, as Navarro points out (in her book by a self-proclaimed Bayesian on frequentist statistics), these are quintessentially frequentist objects, which have built into them the idea of repeated—and are, I think it’s fair to say (feel free to correct me), an implication or applied extension of the technical core of frequentist statistics, the frequentist analog to Bayes’ theorem: the central limit theorem. Interestingly, though, I often encounter confidence intervals referred to in more Bayesian terms, as in: ‘we can be 95% certain that the true population mean…’

It seems clear that confidence intervals and similar tools are quintessentially frequentist tools (due to how the central limit theorem works, even if the values you have in play are derived from a Bayesian world view), in a sense that I earlier referred to as a kind of naive or uncharitable picture of frequentism. A more interesting question, though, is whether, for example, credible intervals are always better than confidence intervals (all else being equal), or if either may be preferable to the other in some situations (again, regardless of the philosophical bent of whoever’s employing them). My impression is that the latter is the case (see, for example, Gelman et al’s discussion of the topic [e.g., Page 95]), though it’d be just as interesting to learn that neither is ever clearly better than the other at the start of a problem (though it could still turn out that one appears in retrospective to be preferable: would such retroactive analysis be acceptable practice?).

As Navarro points out, however, it’s very easy to think of confidence intervals—and, I’d say, many such frequentist tools—in Bayesian terms. This suggests to me the possibility of my being more hardcore Bayesian than I realize. Indeed, I’m still not convinced we can’t get to Bayesian destinations by way of (advanced, thoughtful, dauntingly complicated) frequentist methods, though it’s possible I simply don’t realize the degree to which I’ve been influenced by, and take for granted, Bayesian thought, given the era in which I began studying probability.

To coax into relief my Bayesian within, I might do well to go back and look again at some of the older articles arguing against Bayesian, in favor of frequentist, outlooks, including in the aforementioned *Philosophy of Probability** *volume and, of course, Mayo’s book (which seems to be particularly wary of *subjective* Bayesianism, and references phenomena such as “the new frequentist-Bayesian unificationists” who “take pains to show they are not subjective” [page 25]).

While remembering that frequentist methods have saved billions of lives. Consider the trailblazing work of researcher and statistician Florence Nightingale, represented in this fantastic infographic she made in order to convince people of the importance of sanitary conditions for patient care:

Of course, Nightingale used her powers of critical thinking to update her ‘priors’ according to the growing body of evidence she systematically tracked down. That was the point. Even if the approach was closer to what we’d now call frequentist rather than Bayesian.^{5}

Which reminds me again that I’ve yet again strayed far beyond the uncomplicated thesis of this post: Frequentists use Bayes’ theorem too.

I’ll give the last (pre-Addendum) words to Navarro, drawing from Chapters 1, 9, 10, and 17:

**Section 9.2.3**:

…you might be wondering which of them [i.e., the frequentist or Bayesian approach] is

right? Honestly, I don’t know that there is a right answer. As far as I can tell there’s nothing mathematically incorrect about the way frequentists think about sequences of events, and there’s nothing mathematically incorrect about the way that Bayesians define the beliefs of a rational agent. In fact, when you dig down into the details, Bayesians and frequentists actually agree about a lot of things. Many frequentist methods lead to decisions that Bayesians agree a rational agent would make. Many Bayesian methods have very good frequentist properties.For the most part, I’m a pragmatist so I’ll use any statistical method that I trust. As it turns out, that makes me prefer Bayesian methods, for reasons I’ll explain towards the end of the book, but I’m not fundamentally opposed to frequentist methods. Not everyone is quite so relaxed. For instance…

**Section 9.7:**

Many undergraduate psychology classes on statistics skim over this content very quickly (I know mine did), and even the more advanced classes will often “forget” to revisit the basic foundations of the field. Most academic psychologists would not know the difference between probability and density, and until recently very few would have been aware of the difference between Bayesian and frequentist probability. However, I think it’s important to understand these things before moving onto the applications.

**Section 9**(introductory paragraphs to Chapter 9):

…the theory of statistical inference is built on top of probability theory.

**Section 17.1.4**(on Bayes’s theorem):

In the Bayesian paradigm, all statistical inference flows from this one simple rule.

**Section 17.3**

The question that you have to answer for yourself is this: how do you want to do your statistics? Do you want to be an orthodox statistician, relying on sampling distributions andp-values to guide your decisions? Or do you want to be a Bayesian, relying on Bayes factors and the rules for rational belief revision? And to be perfectly honest, I can’t answer this question for you. Ultimately it depends on what you think is right. It’s your call, and your call alone. That being said, I can talk a little about whyIprefer the Bayesian approach.

**Section 17.5.1, Footnote #268:**

So yes, in one sense I’m attacking a “straw man” version of orthodox methods. However, the straw man that I’m attackingis the one that is used by almost every single practitioner. If it ever reaches the point where sequential methods become the norm among experimental psychologists and I’m no longer forced to read 20 extremely dubious ANOVAs a day, I promise I’ll rewrite this section and dial down the vitriol. But until that day arrives, I stand by my claim that default Bayes factor methods are much more robust in the face of data analysis practices as they exist in the real world. Default orthodox methods suck, and we all know it.

**Section 10.4.1, Footnote #157**:

Psychology is hard.

**Section 1.2**(includes a nice discussion of Simpson’s Paradox):

…you should always think of statistics as a tool to help you learn about your data, no more and no less. It’s a powerful tool to that end, but there’s no substitute for careful thought.

**Section 1.3**, in response to the question “I don’t care about jobs, research, or clinical work. Do I need statistics?”

Okay, now you’re just messing with me. Still, I think it should matter to you too. Statistics should matter to you in the same way that statistics should matter toeveryone: we live in the 21st century, and data areeverywhere. Frankly, given the world in which we live these days, a basic knowledge of statistics is pretty damn close to a survival tool! Which is the topic of the next section…

*ADDENDUM: 100 Prisoners and 100 Boxes*I like to tell myself that, rather than being a frequentist or Bayesian or any of the other designations that haven’t been mentioned here today, I freely talk and conceptualize and approach problems in whatever way helps me from moment to moment. And though, as admitted above, I may be more of a (hardcore) Bayesian than I realize or am willing to admit, the question often simply doesn’t occur to me when faced with a probability question, particularly when there’s no explicit conditioning going on.

With many problems, I’m mostly just trying to figure out how to count stuff, which first requires figuring out what in fact is being counted. For an example of this, see my recent exploration of a Gambler’s Ruin–type problem: “Gambler’s Ruin & Random Walk: Probability, Expectation, Steal the Chips.” Nothing Bayesian about it. It feels more frequentist, if anything.

Here’s a harder problem, quoted from an article by Peter Winkler (linked below):

The names of 100 prisoners are placed in 100 wooden boxes, one name to a box, and the boxes are lined up on a table in a room. One by one, the prisoners are led into the room; each may look in at most 50 boxes, but must leave the room exactly as he found it and is permitted no further communication with the others.

The prisoners have a chance to plot their strategy in advance, and they are going to need it, because unless

every single prisoner finds his own nameall will subsequently be executed. Find a strategy for them which which has probability of success exceeding 30%.

Comment:If each prisoner examines a random set of 50 boxes, their probability of survival is an unenviable 1/2^{100}≈ 0.0000000000000000000000000000008. They could do worse—if they all look in the same 50 boxes, their chances drop to zero. 30% seems ridiculously out of reach—but yes, you heard the problem correctly.

This is a hard problem that may even take work to intuitively understand after seeing the solution. As Winkler notes:

Devised by Danish computer scientist Peter Bro Miltersen, a version [of this puzzle] appeared in a prize-winning paper of his and Anna Gal’s.

^{6}But Miltersen didn’t think there was a solution until one was pointed out to him over lunch by colleague Sven Skyum.

And in another paper I’ll link below, mathematician Peter Taylor writes:

But simple as it is, the solution is not, I think, easily found. For me as a mathematician, the absorbing problem is not one of finding the solution (I had to be told the answer) but of understanding it. Having been given the solution,

understand how it works. That already is a wonderful little project.

On seeing a problem like this (which, purely speaking, is a math-centered probability problem rather than a statistics problem), it does not occur to me to think of it as frequentist, Bayesian, or otherwise. Rather, my first thoughts are about figuring out how many ways things can go favorably, which I aim to set in ratio to how many ways they can go favorably or not favorably. At least that’s the basic starting place (you might need, for example, to do this for specific ways of succeeding, then add them all up).

This involves two burning questions: How many ways can *what* go favorably? Once you figure out *what* that is, how do you count it?

Once you know what to count, you could count it by running a simulation to draw out theoretical frequencies or you could actually have people perform the strategy a hundred or a thousand times and see where things go. Or you can use some fairly straightforward math (if you know a little calculus and/or abstract algebra); for this problem, that’d be the precise and recommended and most illuminating way to go, and is the route taken in these three papers, listed in order from easiest to follow to hardest (according to me):

“The Condemned Prisoners and the Boxes” (2012) by Peter Taylor.

“Seven Puzzles You Think You Must Not Have Heard Correctly” (2006) by Peter Winkler.

“Introduction to the General Case of the 100 Prisoners Problem” (2018) by Timothee Schoen.

However hard this problem is, it’s nothing compared to the earlier-mentioned jury problem. So, while philosophical grounding doesn’t matter here, maybe it does there and everywhere else in the world, a messy, complex, fascinating place whose problems are not so well ordered as 100 boxes and 100 prisoners, despite ongoing efforts to bend the world into a tidy model: turning people into sets of check boxes and designing the environments those people navigate to be discrete virtual spaces composed of elements that can be counted and thus known. (Here we begin to see how a probability problem such as this could inform the statistical analysis of collected data: of discrete objects counted.)

Maybe the real world is best suited to Bayesian interpretations. Maybe I am a constantly updating Bayesian epistemic agent (who at the moment is employing that Bayesian machinery to decide on what’s for lunch). Better to be more rather than less aware of the inner workings of that process, even if my preferred formal tools are (sometimes) frequentist.

That in mind, the more I read and listen and ponder, the less I think the philosophical disputes matters so much as being… critically minded… unafraid of intuition and instinct (i.e., the affective dimensions of evaluation)… mathematically sound (e.g., solid axioms)… well practiced with an ever-expanding box of algorithmic tools wielded in service of a good-faith, ethically minded approach to data collection and analysis… a little lucky. No problem.

*Enjoy or find this post useful? Please consider pitching in a dollar or three to help me do a better job of populating this website with worthwhile words and music. Let me know what you'd like to see more of while you're at it. Transaction handled by PayPal.*

#### Footnotes:

- Lambert, Ben.
*A Students Guide to Bayesian Statistics*(Page 18). SAGE Publications. Kindle Edition. - I’m on board with rationality, in some sense or another. In the sense, I suppose, that makes possible what we call a ‘well-functioning democracy.’ But ‘rationality’ in that sense can’t
*just*mean ‘logically consistent,’ as someone can be consistent in plenty of ‘anti-democratic’ ways. So what do we mean by this sense of ‘rational’? - Gelman, Andrew; Carlin, John B.; Stern, Hal S.; Dunson, David B.; Vehtari, Aki; Rubin, Donald B.. Bayesian Data Analysis (Chapman & Hall/CRC Texts in Statistical Science) (Page 3–4). CRC Press. Kindle Edition.
- Gelman, Andrew; Carlin, John B.; Stern, Hal S.; Dunson, David B.; Vehtari, Aki; Rubin, Donald B.. Bayesian Data Analysis (Chapman & Hall/CRC Texts in Statistical Science) (Page 91). CRC Press. Kindle Edition.
- From the website
*A Short History of Probability and Statistics*, citing the OED2:The term

**FREQUENTIST**(one who believes that the probability of an event should be defined as the limit of its relative frequency in a large number of trials) was used by M. G. Kendall in 1949 in Biometrika XXXVI. 104: “It might be thought that the differences between the frequentists and the non-frequentists (if I may call them such) are largely due to the differences of the domains which they purport to cover.” By “non-frequentist” Kendall means what we’d now call ‘Bayesian,’ a word that doesn’t appear in the article, though the name ‘Bayes’ appears 11 times. The article is available at JSTOR: “On the Reconciliation of Theories of Probability,” Biometrika, Vol. 36, No. 1/2 (Jun., 1949), pp. 101-116. - “The Cell Probe Complexity of Succinct Data Structures,” ICALP 2003.