Recently, an article was published in The Interpreter by Bruce Dale and Brian Dale, that uses Bayesian analysis to explore correspondences between The Book of Mormon and Michael Coe’s book on ancient Mesoamerica, and likelihood that Joseph Smith guessed those correspondences. The article has generated a lot of attention. The LDS Living facebook post featuring this article was shared (last I looked) over 450 times. I think the idea of this article is neat and worth exploring, but I have had grave concerns over the study now that I’ve delved into it.

I want to preface this by saying that I believe that the Book of Mormon is a true historical record, and that of all the potential locations, it likely took place in Central America. So I agree with the conclusions of the authors. But I do not agree with how they arrived at these conclusions. In this article, I explore some reasons why.

A Cool Idea…

The central idea of this study is fascinating: We might credit someone with a lucky guess, or several lucky guesses. But the credibility of our “smart guesser” hypothesis becomes more and more strained with each new accurate guess. The Bayesian approach mathematizes this analysis: we start with the hypothesis that Joseph Smith was a lucky guesser, and then with each accurate guess, we update our hypothesis. And as the number of guesses grows, the statistics starts to look dimly on our original hypothesis (that he was a smart guesser), to the point that we must jettison it.

One assumption that critics make is that a single wrong guess invalidates the project as a whole. That is, Joseph Smith could have gotten 100 things right and 1 thing wrong, and that 1 wrong thing casts the whole Book of Mormon as a fraud. The authors of this article (rightly) dismiss this assumption. Just because there’s something in the text that doesn’t match up with what current researchers believe about ancient America, doesn’t mean that the entire text must be thrown out — and much less so if the weight of the evidence actually supports the text.

In other words, even if we conclude that there are errors in the text, we must still grapple with the many things that were correct. If those many right guesses are astronomically improbable, it (could) be more likely than not — regarding those things that we think are mistaken — that Joseph Smith was truthful but made a mistake in the translation, or that the Book of Mormon narrators (primarily Mormon and Moroni) made an error, or that we are in error about what is true and false about ancient America.

… but it has problems.

The authors rightly reject the idea of throwing out the Book of Mormon text based on one piece of (seemingly) contrary evidence. They explain: “These practices of cherry-picking or overweighting/underweighting evidence cannot be allowed in scientific enquiry.” They are correct when they say, “No piece of evidence has infinite weight.” But the authors seem overconfident that theirs is a legitimate scientific inquiry, immune (by virtue of being made numerical) to overweighting/underweighting. By virtue of being “disciplined” and “formal” (their words), they seem to see their analysis as clean of personal bias and methodological flaw. This is far from the case.

Likelihoods are not assigned in any empirical way

The authors state multiple times over that Bayesian analysis is used in the field of medicine, in fraud detection, and other areas. This is repeated several times, to ensure readers know that this is an established statistical method. But it doesn’t matter — I absolutely grant that this is a valid statistical method, and can still argue that it is wrong for this project for a host of reasons. And the primary reason I think this is the wrong approach for this project is because we have no clear way of estimating the probability of an accurate guess. Consider an example the authors use from the field of medicine:

For example, if a disease is somewhat rare, then a randomly selected individual might have “skeptical prior odds” of 1:1000 against them having the disease. If the test has a likelihood ratio of 100 (a good medical test for screening), then our posterior odds following a positive test for the disease would be 1:1000 x 100 = 1:10 against the person actually having the disease.

When they talk about a medical test that has a “likelihood ratio of 100”, that likelihood ratio isn’t pulled out of nowhere. It is estimated rigorously through thousands of applications of the screening, stacking the number of verified false positives, verified false negatives, etc., and then calculating the accuracy of the test. This likelihood ratio is an empirical value — or, at least, a value calculated from a stack of empirical evidence. Those likelihoods are not guesses made by the medical researcher, based on his intuitions on the accuracy of the test.

The authors state that their Bayesian approach overcomes the problems of the method of parallels (e.g., parallelomania) in part because, “by using a numerical Bayes factor, the person performing the analysis explicitly estimates the strength of any given piece of evidence.” On what basis does the person perform this estimate? Personal intuition? They choose 50, 10, and 2 (and conversely, 0.50, 0.10, and 0.02) as the likelihood ratios they assign each piece of evidence. Where did they get these numbers? Are they standard usage when using Bayesian testing to estimate likelihoods that cannot be measured?

To their credit, the authors craft a sort of scale to use here: Specific correspondences are given a likelihood of 0.50, specific and detailed correspondences are given a likelihood of 0.10, and specific, detailed and unusual correspondences are given a likelihood of 0.02. Using this metric, they can give some rhyme and reason to their probability estimates. But my considered opinion is that this merely gives a veneer of rigor to what is ultimately their own guessing game: guessing how likely it is that someone guessed. And quite frankly, I think they missed the mark.

And I’d venture a guess that most use cases of Bayesian analysis don’t require researchers to “guess” based on their own intuitions the likelihoods included in the analysis. (Though I could be wrong, I’ve seen worse. Edit: And I’ve been told that I am, apparently, wrong on that — Bayesian analysis does often involve arbitrary assignments of probability. This doesn’t change my critique here, however.) My suspicion is that — to whatever extent that these likelihood assignments actually matter to the analysis (spoiler: they don’t) — the Bayesian analysis returns not the likelihood that Joseph Smith guessed these things, but the extent to which the researchers believe Joseph Smith guessed things.

The authors make tons of unwarranted and sometimes untrue assumptions

Consider their first example in the paper of something that they consider to be an extremely unlikely as a guess:

One example of Bayesian “strong” evidence is the remarkably detailed description of a volcanic eruption and associated earthquakes given in 3 Nephi 8. Mesoamerica is earthquake and volcano country, but upstate New York, where the Book of Mormon came forth, is not. If the Book of Mormon is fictional, how could the writer of the Book of Mormon correctly describe a volcanic eruption and earthquakes from the viewpoint of the person experiencing the event? We rate the evidentiary value of that correspondence as 0.02. We assume a piece of evidence is “unusual” if it gives facts that very probably were not known to the writer, someone living in upstate New York in the early 19th century, when virtually nothing of ancient Mesoamerica was known.

I simply have no reason to believe that someone living in New York had never heard of a volcano, nor of earthquakes that accompany them, nor of ash clouds that darken the sun when a volcano erupts. In fact, the explosion of Mt. Tambora in 1815 is a defining event in the history of Joseph Smith’s life. Changes in weather and decreased sunlight resulted in crop failures in New England and are what forced the Smith family to move to Palmyra. And I’ve read no evidence that the people at the time weren’t wholly aware of what happened and why. Were the authors entirely unaware of this? What else are they unaware of? It doesn’t take a lot of imagination to see how a young boy of this time could have extrapolated what this event looked like through the eyes of those who lived nearby, and it doesn’t take an expert (or someone with personal experience) to guess correctly.

(Point of fact: Some argue that this is evidence of Joseph Smith guessing wrongly — he describes a dark so dark that people could not light a candle for days, which some argue is not actually how ash clouds work; and if ash clouds did work that way, there would no survivors to report the tale.) (Point of another fact: The Book of Mormon never states that there was a volcano during this event; this itself is a guess, that may or may not be right. We assume it is true, but we cannot be certain of it.)

In other words, the authors are merely guessing on these probabilities, and they are just as liable to getting their guesses wrong as anyone else. In the case of the volcano, I believe they overweighted this evidence by a long shot. Would it be fair to say, as they did, “This practice of overweighting evidence cannot be allowed in scientific enquiry”? If this is how sloppy the authors are with the very first correspondence they chose to highlight, it casts a shadow on the rest of them. The authors are clear that these probability estimates are at times subjective and based on the intuitions and judgments of the researcher, and yet the tone of their paper rides on the credibility that a mathematical analysis offers them. They are using quantitative analysis to evaluate qualitative judgments. This can be done well. I am not sure it was done well here.

Some identified correspondences are particularly weak and flimsy

Again, I want to reiterate here that I believe that the Book of Mormon is historical and true. I just don’t see evidence presented in the paper that there is only a 1/10 chance that Joseph Smith would have included examples of “repopulating old or abandoned cities” if he were merely guessing, or why someone living in New York could not have dreamed up the idea that a community might return to a city they’d fled some time before (e.g., the people of Zeniff). Joseph Smith wasn’t asserting this as some characteristic of ancient American society. He was simply telling a story of a particular people.

The counterargument might be, “Well, sure — but we aren’t looking at any one piece of evidence for or against, but the weight of all the evidence stacked up.” Well, sure! But the weight of all the evidence is a function of the weights assigned to individual pieces of evidence. And while I have not gone through the 100+ correspondances in the paper individually (turns out I don’t have to, more on that later), the few I have seem like they were weighted based on the biases and intuitions of the researchers, intuitions that often differ from my own. Maybe I am more cautious, but I just can’t justify assigning a number — no matter how arbitrary or considered — to the likelihood that Joseph Smith would have included a group of people repopulating an abandoned city if he were merely inventing this story.

In addition, some of their correspondences are frankly strained. Consider, for example:

“Clear enough,” the authors say. I’ve never once read the text that way, nor do I see it as plainly obvious that the text is referring to homosexuality. And even if I did, I have no reason or basis to assign this an evidentiary value of 0.5 as opposed to any other number. They recognize that this is a flimsier correspondence, so they gave it the lowest evidentiary value. But their lowest evidentiary value still doubles the overall case against the hypothesis of “smart guesser.” On another point, the certainty with which the authors conclude that this test is talking about homosexuality betrays their own analytical biases and unwarranted confidence in their own analysis and interpretation.

The authors aren’t measuring what they think they are…

But it turns out that doesn’t matter which of the three probability estimates they assigned this (or any) of their evidences. The authors themselves admit that even if they gave every piece of evidence the weakest likelihood ratio (and every piece of evidence against the text at the highest likelihood ratio), the conclusions would have come out in their favor (by trillions). So all this hand wringing about ensuring that some evidences are weighted more strongly than others based on their relative evidentiary value is pointless; they could all be weighted the same, and their conclusions would be the same. The conclusions in this analysis turn far more on the difference in the number of evidences for and against, than the relative strengths of those evidences. And there were simply more correspondences than contradictions included in the analysis.

Why? The authors explain that in order for something to be counted as evidence against the Book of Mormon, it has to be mentioned in both texts. So if, for example, Coe’s text discusses elements of religious practice that bears no resemblance to what we see in the Book of Mormon, the analysis is silent — because we cannot say that these elements didn’t exist in the Book of Mormon. Only direct, explicit contradictions are counted as evidence against. And my intuitions tell me that this is automatically going to stack the “in favor” pile higher than the “against” pile, especially if one text gets into far more detail about political, cultural, and social practices than the other.

This is because the standard for explicit contradiction is — whether we want it to be or not — higher than the standard for correspondences. The authors may disagree with me on this, but consider: If person A and person B tell a story about an evening they had together. Person A mentions sitting on the grass and watching the blue sky. Person B mentions sitting on the grass and watching the red and purple sunset. Have they contradicted each other? No, because both stories could be true. They could have sat and watched the sunset, which started with a blue sky and ended with a red and purple sky. We cannot mark this as a contradiction.

But consider: we do have a correspondence. Both persons have discussed watching the sky together. We can mark this as a parallel in their stories, and potential evidence that they are the same story. But we ignore the differences, because Person A does not explicitly state that the sky was never red, and Person B does not explicitly state that the sky was never blue. So we cannot use those differences as evidence against their stories being the same story.

In this way, we can start to see how a correspondence is easier to find than a contradiction. Contradictions between two stories are most likely to arise when one person knows the contents of the other story. Most storytellers naturally assert things that were the case, not things that were not. We would not expect Mormon or Moroni to explicitly contradict most elements of Coe’s analysis, even if it turns out that the Book of Mormon is true and Coe is a fraud. (I don’t think he is, just making a point.)

In short, all this Bayesian analysis measures is the relative likelihood of finding correspondences in two stories versus finding explicit contradictions. And it turns out to be far more likely to find correspondences than contradictions. My considered opinion is that this is about the only thing the study actually measured (successfully). And this is especially true as, on top of this, there was an additional double standard: contradictions had to be explicit in the text, but correspondences did not. They could take only “veiled” correspondences and include them in the analysis (consider the homosexuality example). So in addition to the natural propensity for correspondences between two texts to be easier to identify than direct contradictions, the authors artificially tilt the ratio even more.

… and what they are actually measuring, we can observe elsewhere too.

To make my point further, what of these parallels are unique to ancient America? How many ancient civilizations had “political factions organized around a member of the elite”? How many ancient civilizations had “foreigners move in and take over government, often as family dynasties”? How many ancient civilizations had “city administrative area with bureaucrats and aristocrats”? How many ancient civilizations required tribute of conquered peoples? How many ancient civilizations had “political power is exercised by family dynasties”? I could go on. These all seem like the bread and butter of heavily peopled regions of the ancient world, both in the new world and in the old world.

In short, if we were to find a detailed textbook about the ancient peoples and civilizations of India, and run a Bayesian analysis on the correspondences between ancient India and the Book of Mormon (weighed against explicit contradictions in the two books), could we draw the same conclusions as the authors did? I’m certain we could find 100+ such correspondences. And of those things that didn’t line up, I’m certain we could leave them out of the analysis for reasons mentioned above: unless the texts explicitly contradict, we ignore anything that doesn’t correspond. And so when we stack these against other and discover that there’s an umpteen trillion, trillion to one probability that the Book of Mormon didn’t take place in ancient India, we might start to wonder a bit at whether this analysis is telling us what we think it is.

Yes, I know, Joseph Smith never claimed it took place in India, and so there’s no reason to pursue such analysis. We wouldn’t conclude that Joseph Smith wasn’t guessing about something he never claimed to be talking about. But my point is that what we are actually measuring is how much easier it is to find correspondences between two descriptions / stories than it is to find explicit contradictions — not whether or not Joseph Smith was guessing.

And perhaps worst of all, the study treats as “independent” correspondences that are not independent

An anonymous redditor (/r/mfchris) recently brought this issue to my attention:

Another egregious issue is that the Dales treated the probability of each correspondence as being statistically independent, which in the context of someone telling a complex political/religious story isn’t reasonable at all. The authors list 33 correspondences that fall under the political umbrella, and the majority all seem to fall under the basic umbrella of “the Nephite civilization was well developed and politically complex relative to the Native American populations in the northeastern US, and it turns out that some central American civilizations were too.”

While I am willing to grant that this may constitute some level of positive evidence in favor of the historical reality of the Book of Mormon, by treating each correspondence as independent the Dales are making the overarching political correspondence a magnitude of 33 times as important as it is, even if you accept their Bayesian priors. It’s like saying the odds of someone being as tall as me are 100:1, and the odds of someone being as heavy as me are 100:1, so the odds of someone being as big as me are 10000:1, except that this study does it to a far worse degree.

The fact that their conclusion is that the probability that Joseph Smith made it up is 2.69x10e-142 indicates that they did something horrifically wrong in their analysis, as such extreme values would almost never occur, even in empirically grounded applications; such studies in social sciences should almost always lead to less extreme conclusions given the higher uncertainty in quantification.

In short, many of the correspondences in the article are related to each other. For an actual example, here are two separate correspondences: (1) “‘Capital’ or leading city-state dominates a cluster of other communities,” and (2) “Some subordinate city-states shift their allegiance to a different ‘capital’ city”. The first is assigned a likelihood of .02, and the second a likelihood of .1. But these are not independent observations; the existence of #1 dramatically increases the likelihood of #2. And yet, as used in the Bayesian analysis, the authors treat the likelihood of both being true as an order of magnitude less likely than one or the other.

Add on top of this the also highly connected observations that “many cities exist” (.1) and that there were “complex state institutions” (.02) — both of which are bound up with and implied by the earlier observations — and of course you are going to get increasingly ridiculous numbers in your final analysis. (Note: you can’t have “a capital city dominating a cluster of other communities,” and not have “many cities exist.” These should not have been treated as independent observations. Most of these shouldn’t have been.) Add further “parts of the land were densely settled” (uh, yeah, cities), and on we go.

Other examples are similar: “Royalty exists, with attendant palaces, courts and nobles” (.1) and “Some rulers live in luxury” (.5) are not independent observations. Even if the likelihood that Joseph Smith could have guessed that “royalty exists with attendant palaces, courts, and nobles” was 1 in 10, it is simply wrong to conclude that the likelihood that he could have guessed that and that “some rulers lives in luxury” is, combined, 1 in 20. One tends to imply the other. Add on top of that closely related observations of “elaborate thrones” and “gifts to the kind for political advantage”, and we start to see the magnitude of the statistical errors here. Each one of these adds an order of magnitude to the final results that shouldn’t (necessarily) be there.

This is not just an error in the priors of the Bayesian analysis. This is a fundamental error in statistical reasoning.

And in this case, we are really only measuring against Coe himself and his book.

Coe’s book undoubtedly makes thousands of statements of facts about Mesoamerican history, culture, and people that aren’t corresponded by the Book of Mormon. Yet none of those things count as “contradictions”, because they aren’t contradicted explicitly by the Book of Mormon text. And that’s where this analysis falls apart completely. What authors include and don’t include in an academic text depends on their audience and purpose. And Coe didn’t write his book with the Book of Mormon in mind. If he did, he could simply quintuple the number of statements he makes that explicitly contradicts the Book of Mormon text, and throw off the analysis. If a numerical analysis relies on something as fundamentally non-numerical as this, it’s a bad numerical analysis.

Furthermore, this analysis doesn’t compare the Book of Mormon against Mesoamerica. It compares the Book of Mormon against what Coe believes about Mesoamerica. That’s a cool “gotcha” against Coe perhaps, but as a numerical analysis, it doesn’t reveal anything at all about Book of Mormon historicity per se. More to the point, it doesn’t even compare the Book of Mormon against what Coe believes about Mesoamerica; it compares only against what Coe decided to include in this book. After all, by the researchers’ own admission, Coe believes in more facts about Mesoamerica that contradict the Book of Mormon than he included in the text. Which just shows that “contradicting the Book of Mormon” wasn’t the purpose of his text. So we wouldn’t expect the text to be full of contradictions. And that is exactly what we measured.

Conclusion

I have spent a number of years analyzing the research of social scientists (mainly psychologists), and have come to the conclusion that the discipline is filled with excellent researchers who are horrible theoreticians. In other words, psychologists are rich in data but poor in theory. They often do not even know what it is they are actually measuring with their measurement instruments, and more to the point, they often do not know how to make a case that what they are actually measuring is what they intend to.

Furthermore, the field of social sciences are filled with dogmas that are taken as givens by researchers, and this leads to a lopsided scrutiny of their research. Studies that seem to support conclusions favored by the academic establishment are given far less scrutiny than studies that draw into question those conclusions. This means that less careful research can make it into even top tier journals if the conclusions line up with the dogmas of the discipline. I’m currently writing a book on the importance of epistemic humility in our research, and the dangers of being assuming that our methods are more rigorous than they are.

Methodological rigor requires more than merely turning things into numbers. It requires using wisdom and experience to know when to turn things into numbers, and when not to. It requires being clear-headed about what we are measuring and what we are not. It requires that social scientists don’t rely on the veneer of objectivity that numerical analysis provides, especially when evaluating fundamentally non-numerical things. This propensity has wreaked havoc on the social sciences. Bruce and Brian Dale are not social scientists. But I’m seeing inklings of many of the same patterns here.

I believe the Book of Mormon is true. I cannot say the same for this analysis. I would have preferred a qualitative analysis instead, where all the correspondences they list can be presented as valuable and faith promoting, without the pretense of numerical objectivity.