Jockers, Witten & Criddle
"Reassessing Authorship of
  the Book of Mormon"

 Literary and Linguistic Computing

 23:4 (Dec. 2008) 465-491

Abstract   Excerpts   Charts   Matrix, etc.

G. Bruce Schaalje's Rebuttal   Abstract (here)

Chris Smith's Rebuttal   on-line discussion

Transcriber's comments


Copyright © 2008 Oxford University Press on behalf of ALLC and ACH. All rights reserved.
 
"Sidney Rigdon: Creating the Book of Mormon"   |   Literary and Linguistic Computing
LLC  vol. 23   |   "Book of Solomon" articles   |   1830 Book of Mormon   |   Spalding MS


[465]




Reassessing authorship of the
Book of Mormon using delta and nearest
shrunken centroid classification
Matthew L. Jockers
Department of English, Stanford University,
Stanford, CA 94305

Daniela M. Witten
Department of Statistics, Stanford University,
Stanford, CA 94305

Craig S. Criddle
Department of Civil & Environmental Engineering,
Stanford University, Stanford, CA 94305
 

(part of this page not reproduced, due to copyright restrictions)





M. L. Jockers et al.                                                                                   [466]


Background

Since its publication in March 1830, the origin of the Book of Mormon -- particularly its claim of ancient origins -- has been the subject of intense scrutiny and debate. Mormon prophet Joseph Smith Jr (1805-44) claimed that an angelic messenger delivered to him a record written from around 2200 BC to 421 AD by ancient Native Americans in 'reformed Egyptian.' Smith claimed to have used a seer stone to translate the record into English. By February 1831, two competing theories had appeared: Alexander Campbell (1831), founder of the Campbellite religious movement, proposed Smith himself as author while the Cleveland Advertiser (1831) proposed Sidney Rigdon. [1] Rigdon was a former Campbellite preacher who acquired ecclesiastical status on par with Smith's almost immediately after his rapid conversion in October 1830. Campbell (1844) eventually concluded that Rigdon was the probable author of the Book of Mormon, but his initial explanation held that Smith wrote the book by drawing from sermons and local folklore. [2]

One year later, in February 1832, a third candidate-author was named when Mormon missionaries Orson Hyde and Samuel Smith read passages from the Book of Mormon at a schoolhouse in Conneaut (New Salem), Ohio. Nehemiah King, who was present at these readings, claimed that Hyde 'had preached from the [novelistic] writings of Solomon Spalding' (Wright, 1833). [3] Spalding (often 'Spaulding') was a frustrated novelist who, prior to his death in 1816 (i.e. from 1811 to 1815), shared his unpublished novel with his neighbors, family, and associates in Conneaut. In 1833, the Spalding allegations came to the attention of E. D. Howe, who joined with ex-Mormon Philastus Hurlbut to investigate the matter. Hurlbut collected affidavits from Spalding's former neighbors and family in Conneaut. [4] The witnesses recalled having heard much of the plot and several names from the Book of Mormon in a draft novel titled 'Manuscript Found,' a now-lost text that Spalding submitted for publication to a Pittsburgh publisher in late 1812. In Mormonism Unvailed [sic] (1834, 1977) Howe linked Rigdon to Spalding through this publisher. The resulting 'Spalding-Rigdon Theory' holds that Rigdon acquired the Spalding manuscript through his connections to the Pittsburgh publishing shop, added his own theology, and then revealed it to the public through Smith as the Book of Mormon. In the years after Howe's publication, others provided testimony supportive of the theory, including Spalding's widow, Spalding's daughter, the owner of the Pittsburgh publishing shop, and others who claimed that Spalding shared his work with them [5] or who claimed to have seen a copy of 'Manuscript Found' after Spalding's death. [6] Expansions to the theory followed, including a detailed analysis of Rigdon's life by William H. Whitsitt (1886, 1891), and of other likely collaborators, including Smith's second cousin Oliver Cowdery, a schoolteacher with editing experience (Deming, 1888) and Parley P. Pratt, a former disciple of Rigdon (Schroeder, 1901). [7]

Throughout the nineteenth and early twentieth century, the Spalding-Rigdon Theory was the favored explanation for the origin of the Book of Mormon, but Fawn Brodie's (1945) rejection of this theory in her controversial biography of Smith marked a turning point in the debate. Invoking witness tampering and 'false memory syndrome,' Brodie dismissed the affidavits collected by Hurlbut. She believed that a Spalding holograph discovered in Honolulu, Hawaii, and stored within a large envelope with the penciled-in title 'Manuscript Story-Conneaut Creek' (1810), was in fact the lost Spalding document known to the Conneaut witnesses as 'Manuscript Found.' [8] Despite having no evidence that the Honolulu manuscript was the same text that the Conneaut witnesses heard Spalding read to them (and subsequently recognized as a source text for the Book of Mormon), Brodie nonetheless concluded that Spalding could not have been an author of the Book of Mormon because the similarities between

__________
1 For Smith, see http://www.lds-mormon.com/campbell.shtml for Rigdon http://www.sidneyrigdon.com/dbroadhu/OH/miscohio.htm#021531.

2 See also http://www.mun.ca/rels/restmov/texts/acampbell/mh1844/MTTBOM.HTM.

3 See corrected typescript of Wright's letter prepared by Dale Broadhurst in 2001 at http://solomonspalding.com/SRP/saga2/Ashtab3.htm#1833text

4 The eight affidavits collected by Hurlbut and published by Howe (1834) are available online at: http://www.mormonstudies.com/witness.htm. A Mormon response to the Spalding-Rigdon Theory is available at http://farms.byu.edu/display.php?table=review&id=584.   (2nd)

5 Solomon Spalding's wife and daughter were both named Matilda. For a statement from Matilda Spalding Davidson, Solomon's widow, see her letter to the editor of the Boston Recorder 19 April 1839, at http://www.solomonspalding.com/docs1/1897spld.htm. For statements from Matilda Spalding McKinstry, Solomon's daughter, see her interview with Jesse Haven, 1839, and a letter from John Haven, which appeared in the Quincy Whig, published by Benjamin Winchester in The Origin of the Spaulding Story, Concerning the Manuscript Found, 1840. Copies of both are available at http://www.mormonstudies.com/matilda2.htm.   For a statement from Robert Patterson, owner of the Pittsburgh publishing shop where Spalding allegedly took his manuscript, see the statement dated 2 April 1842, to Rev. Samuel Williams in Mormonism Exposed, 1842. See also Samuel Williams's self-published pamphlet (reprinted in the Baptist Home Mission Monthly, May 1883) available at http://www.solomonspalding.com/docs/1842Wilm.htm#pg16b. For statements from others who knew Solomon Spalding, see Abner Jackson's statement in Canton, OH, 20 December 1880, published in The Daily Evening Reporter (Washington, PA) Vol. 4, 7 January 1881. Jackson's father had business dealings with Spalding. His statement is available online at http://www.sidneyrigdon.com/dbroadhu/PA/penn1860.htm#010781. Joseph Miller knew Spalding in the last years of his life, while he lived in Amity, PA. Miller made five statements over more than three decades (1869, 1879, 1882, 1885, and 1890) available at: http://www.sidneyrigdon.com/dbroadhu/PA/penn1860.htm#040869;   http://www.sidneyrigdon.com/dbroadhu/PA/penn1860.htm#020579;   http://www.sidneyrigdon.com/dbroadhu/IA/sain1882.htm#011582; [sic]  http://www.solomonspalding.com/docs2/1885DicE.htm#pg240b;   and http://www.solomonspalding.com/docs1/1890GrgD.htm#pg441.

6 In mid-December of 1833, Philastus Hurlbut allegedly displayed a copy of Spalding's 'Manuscript Found' in or near Kirtland, OH. He was then arrested and incarcerated for threatening the life of Joseph Smith. After his release, Hurlbut never again displayed a copy of 'Manuscript Found' or claimed to possess it. The four witnesses who reported seeing a copy of 'Manuscript Found' were [J.] C. Dowen (Justice of the Peace), James A. Briggs (attorney for Hurlbut), Charles Grover, and Jacob Sherman. Their statements are available at: http://solomonspalding.com/SRP/SRP13p2.htm#Refs2; http://www.sidneyrigdon.com/dbroadhu/CA/natr1988.htm#120088-1c2; http://www.solomonspalding.com/docs/deming.txt; and http://www.solomonspalding.com/docs/deming.txt

7 A synthesis of historical facts supporting the Spalding-Rigdon theory can be found in Cowdrey et al. (2005). See also http://sidneyrigdon.com/dbroadhu/CA/natruths.htm   and http://www.sidneyrigdon.com/wht/1891WhtB.htm.

8 The document was found in 1884 by James H. Fairchild and is now stored at the Mudd Library of Oberlin College.




[467]                                             Reassessing authorship of the Book of Mormon


the Book of Mormon and the text found in Hawaii were 'not sufficient to justify the thesis of common authorship.' [9] Her rejection of the Spalding-Rigdon Theory was so widely accepted that the Spalding-Rigdon Theory came to be regarded by most students of Mormon history as 'an historiographical artifact without credibility among serious scholars' (Bushman, 2005).

Among contemporary secular scholars of Mormonism, the theory of Smith as solitary author is a generally accepted explanation. Twentieth century advocacy of this theory began with I. Woodbridge Riley (1902) who proposed that Smith drew inspiration from locally available source materials, including Ethan Smith's (1825) View of the Hebrews. Riley also speculated that Joseph Smith's writing was influenced by epilepsy-induced visions and that Smith created characters modeled on members of his family, including himself. Brigham Roberts (1857-1933), a Mormon leader and intellectual whose writings are collected in Studies of the Book of Mormon (Madsen, 1985), likewise concluded that Smith had the imagination and source material to produce the Book ofMormon on his own (1985). Smith's textual sources, Roberts argued, likely included View of the Hebrews and Josiah Priest's (1825) Wonders of Nature and Providence. Brodie (1945) advanced similar arguments, and followed Riley's footsteps with speculation regarding Smith's psychology (1971). In recent work, David Persuitte (2000) provides textual parallels to strengthen connections to both Ethan Smith and Josiah Priest, while historian Dan Vogel (2004) expands the psychological speculation, suggesting that the Book of Mormon is best explained as the result of Smith family dynamics and Smith's willingness to engage in a pious fraud.

In addition to historical studies of Smith and the origins of the Book of Mormon (such as those noted above), there have been a smaller number of quantitative, or 'stylometric,' studies. A team of Brigham Young University researchers led by Wayne Larsen conducted the first among these (Larsen 1980). [10] Employing multivariate, cluster, and classification analysis, Larsen, Rencher, and Layton set out to test the proposition that the Book of Mormon is the work of a single author (perhaps Smith) or multiple authors (ancient or nineteenth century). Larsen's study included analysis of thirty-eight frequently occurring common noncontextual words and forty-two rarely occurring non-contextual words. To generate frequency lists, the researchers first assumed that 'the writers of each verse, or partial verse, could be identified according to information given in the text' and thus they 'assigned' verses and partial verses to classes based on their 'careful scrutiny' of the text (1980). They concluded from statistical analysis of this material that the text was not the work of Joseph Smith and that many authors likely wrote it. Using samples of known writings from Solomon Spalding, Sidney Rigdon, and other Smith contemporaries, Larsen claimed further that the multiple styles they detected in the Book of Mormon were not likely to be the work of any of these nineteenth-century authors.

Several problems are now apparent in the methodologies employed by Larsen et al. (1980). First, they grouped verses and partial verses from the Book of Mormon into clusters based on their understanding of speakers (or characters) in the Book of Mormon (i.e. Nephi, Alma, etc.). Because the characters had distinctive vocabulary 'wordprints' within these selections, they concluded that the Book of Mormon was a multi-authored work. [11] They further reasoned that because their selections did not match the styles of potential nineteenthcentury authors, they could conclude that the text was not the work of a nineteenth-century author. However, their analysis did not exclude the possibility that their chosen selections were composites containing different fractional contributions from different nineteenth-century authors.

A further problem stems from Larsen's reliance upon context sensitive words. Though Larsen claims to use only non-contextual words, his list of selected words is questionable. It includes words such as 'behold', 'forth', 'lest', 'nay', 'O', 'unto', 'wherefore', and 'yea' -- words that are common in scripture and thus contextual. They occur at a much higher frequency in the Book of Mormon than in the writings of nineteenth-century authors. Take the word 'unto', for example: It occurs 3,610 times in the Book of Mormon, a rate of 135 occurrences for

__________
9 The original basis for this argument is a statement made by James H. Fairchild in the New York Observer on 5 February 1885, immediately after his discovery of the Spalding Manuscript in Honolulu, Hawaii: 'The theory of the origin of the Book of Mormon in the traditional manuscript of Solomon Spaulding will probably have to be relinquished... Mr. Rice, myself, and others compared it with the Book of Mormon, and could detect no resemblance between the two in general or detail. Some other explanation of the origin of the Book of Mormon must be found, if an explanation is required.' Also see http://www.solomonspalding.com/docs2/1886Fair.htm#fairchild1.

What is often not mentioned is Fairchild's later retraction published by Schroeder (1901): 'With regard to the manuscript of Mr Spaulding now in the Library of Oberlin College, I have never stated, and know of no one who can state, that it is the only manuscript which Spaulding wrote, or that it is certainly the one which has been supposed to be the original of the Book of Mormon. The discovery of this Ms does not prove that there may not have been another, which became the basis of the Book of Mormon. The use which has been made of statements emanating from me as implying the contrary of the above is entirely unwarranted. JAMES H. FAIRCHILD.' Also, relevant is the final statement of Rice (1886):

The Spaulding Manuscript recently discovered in my possession, and published by the Mormons, in no wise determines the question as to the authorship of the Book of Mormon, or of Spaulding's connection with the latter. It shows conclusively that this writing of Spaulding was not the original of the Book of Mormon -- nothing more in that regard. It gives the Mormons the advantage of calling upon their opponents to produce or prove that any other Spaulding Manuscript ever existed -- and that is the gist of the whole matter. Until lately I have been of the opinion that there was no tangible evidence that any other production of Solomon Spaulding, bearing upon the question, could be shown as having ever existed. But correspondence and discussions growing out of the publication of this document, have shaken my faith in that belief, and indeed produced quite a change of opinion on that subject.

Rice's statement was published in 'The Daily Bulletin,' Honolulu, 11 March 1886, Vol. IX, No. 1273. It is available online at: http://www.sidneyrigdon.com/dbroadhu/HI/mischawi.htm#031186

10 Less influential was Taves's (1984) which relied heavily on the disputed methodologies of Morton (1978).

11 A further problem with the Larsen study is the observation made by Burrows (1987) in his study of Jane Austen's fiction that authors can create distinct voices within their own prose. Having said that, differences between author 'voices' are generally less than that between different authors.




M. L. Jockers et al.                                                                                   [468]


every 10,000 words. In the entire Chadwyck-Healey (2000) Early American Fiction collection, a collection of 875 novels spanning the period from 1789 to 1875, the word 'unto' appears 2,346 times, a rate of just 3.8 occurrences for every 10,000 words. [12] Even sympathetic scholars, such as the statistician D. James Croft (1981), caution against reading too much into Larsen's results. [13]

In a paper from around 1988, [14] Mormon investigator John L. Hilton claimed that his group had significantly improved Larsen's techniques and that their results reconfirmed his conclusion that the Book of Mormon is a work of multiple, though ancient, authors. For his analysis of the Book of Mormon, however, Hilton chose to analyze subjectively grouped and edited selections from the Book of Mormon put together in the form of 5,000 word blocks of text. Like Larsen, Hilton assumed that characters such as Nephi and Alma can be viewed as candidate authors, and he selected blocks of text from what he referred to as 'didactic' sections for the characters 'Nephi' and 'Alma.' He then followed Larsen in assuming that each selection could only be the work of a single nineteenth century author, not the work of multiple nineteenth century authors. At best, one might hope to conclude from such an analysis that the chosen selections are not by the same author, but the methodology used does not exclude the possibility of multiple nineteenth century authors. Hilton's methodology thus did not address a key aspect of the Book of Mormon authorship question.

In Appendix 3 of his essay, Hilton identifies the sources for his compilation: not a single manuscript, or the published 1830 version of the Book of Mormon, but instead, a composite compilation of selections from four sources based upon what he and his team judged to be the oldest. The provenance of this material is questionable. Also problematic is that Hilton's compilation of old Mormon manuscripts did not include significant sections and direct quotations from the King James Bible -- sections and quotations that are an acknowledged part of the 1830 Book of Mormon. [15] Most importantly, Hilton's analysis neglected to include a comparison with the work of Rigdon. This omission is difficult to understand given the other potential authors whose work Hilton analyzed. In our work, we include a large amount of newly available Rigdon text of certain provenance, adding to the limited amount available at the time of Larsen's study.

More compelling than the work of Hilton and Larsen is the work of statistician David Holmes (1985, 1991a,b, 1992). In separate papers from 1991 and 1992, Holmes investigated Book of Mormon authorship using a multivariate measurement of vocabulary richness. Holmes compared the Book of Mormon to thirteen writing samples from Joseph Smith, Joanna Southcott, and the King James Bible. [16] He measured the richness of noun usage in the various works: a technique that Holmes claims enables him to discriminate between the 'personal' and the 'prophetic' writings of Joseph Smith as well as between the personal writings of Smith and those of Joanna Southcott. Using this technique, Holmes further discriminates between the prophetic voice of Smith and that of Southcott. Holmes's derives the 'signal' for Smith's prophetic voice from Smith's revelations as they are recorded in Doctrine and Covenants; the personal voice he derives from the letters and diary entries collected in Dean Jessee's The Personal Writings of Joseph Smith.

Detecting differences between Smith's prophetic and personal voice was a key discovery for Holmes. His technique appeared to prove effective in discriminating between authors and between authorial voices in different contexts. From this, Holmes argued that his multivariate measurements of vocabulary richness offered no evidence to support the argument that the Book of Mormon is a work of multiple authors. This conclusion stood in direct contradiction to the previous analyses by Larsen and Hilton. However, two problems are apparent in Holmes's work: first, his reliance upon the letters and diary entries collected by Dean C. Jessee in Personal Writings of Joseph Smith (Smith and Jessee, 2002) as a reliable source for Smith's personal voice and second, his reliance upon the Doctrine and Covenants as a reliable source for Smith's prophetic voice.

Though Holmes was careful to select 'only those letters written by Smith himself [in Smith's hand],

__________
12 The data cited here was extracted from the Early American Fiction (1789-1875) collection published by Chadwyck-Healey (2000).

13 See also, Hilton (1988).

14 There are at least three different versions of Hilton's 'On Verifying Wordprint Studies: Book of Mormon Authorship' available. One version can be found as an undated PDF file in the online BYU Studies archive, see http://byustudies.byu.edu/. Another copy can be acquired though the Harold B. Lee Library of Brigham Young University under the citation Hilton, J. L. 'Book of Mormon "Word Print" measurements using "wrap-around" block counting,' FARMS. FARMS, the Foundation for Ancient Research and Mormon Studies, was recently renamed 'The Maxwell Institute for Religious Scholarship' and is located at Brigham Young University. A third source is a book recently published by FARMS entitled Book of Mormon Authorship Revisited: The Evidence for Ancient Origins. See Chapter Nine: 'On Verifying Wordprint Studies: Book of Mormon Authorship,' p. 225-53.

15 One could argue that Hilton's decision to remove these biblical sections was reasonable given that the authors he was testing for could not have written these sections. In our analysis we leave these sections in as an additional test of our methodology. A full explanation is provided in the methodology section of this article.

16 Joanna Southcott was a late-eighteenth century, self-proclaimed, prophet with no connection to the Book of Mormon. She was selected by Holmes as a figure similar to Smith who could be used for comparative purposes in his analysis.




[469]                                             Reassessing authorship of the Book of Mormon


or preserved in the handwriting of clerks who state specifically that Smith is dictating' (Holmes, 1991b), even this subset of Dean Jessee's collection is problematic. In the opening sentence of his introduction to Personal Writings of Joseph Smith, Jessee declares: 'it matters very little whether or not a person writes his own journals, letters, and speeches or delegates others to write for him' (Jessee, 2002). His point here is that even if written by others, the material reflects the mind of the Smith if not the actual words as written. For authorship attribution analysis, however, we are less concerned with whether a document captures the 'spirit' of an attributed author and more specifically interested in whether the document is written by and in the natural style of the attributed author. With Smith, however, we cannot reasonably conclude this point, that the documents attributed to him are indeed reflections of his individual literary style. On the contrary, in studying Smith and reading Jessee's collection of documents, one becomes immediately and acutely aware of how little we can, even blithely, attribute to Smith and Smith alone. Jessee notes the problems associated with claiming that Smith was the author of the words attributed to him: 'His philosophy' writes Jessee, 'was that "a prophet cannot be his own scribe."' [17] Indeed, even Jessee avoids use of the word 'author' preferring instead 'writings attributed to him [Smith].' Jessee points out that while Smith 'produced a sizable collection of papers, the question remains as to how clearly they reflect his own thoughts and personality [because] we inherit the limitations that produced them... the wide use of clerks taking dictation or even being assigned to write for him, and the editorial reworking of reports of what he did and said' (Smith and Jessee, 2002). Jessee notes further that the 'practice... of inserting eyewitness writings that have been changed from indirect to direct discourse... gives the impression that Joseph wrote them,' when in fact he did not. Referring to one particular case, Jessee writes that the 'impressions of Joseph Smith given... probably reflect the personality of the editor more than they do Joseph's.' Even for the twenty-three letters in Smith's hand, which Jessee republishes in facsimile form, we cannot easily assume that Smith is the sole author. Many of the letters in Jessee's collection show the handwriting of Smith along side and intermingled with the handwriting of other authors, including Rigdon and Cowdery. Even when writing something as personal as a journal entry or letter, we see consistent evidence of collaboration and co-authorship. Unfortunately, such writing cannot be used as a reliable sample of known authorship. [18]

Second, and equally problematic, is Holmes's use of the Doctrine and Covenants as a reliable example of Smith's prophetic voice. This text of revelations is ascribed to Smith, but as is the case with many of his letters and diary entries, he did not write it unaided. Rather, he is reported to have dictated the revelations to one of his scribes. From 1829 to 1838, two of Smith's main scribes were none other than Sidney Rigdon and Oliver Cowdery, who according to the Spalding-Rigdon theory, participated in writing the Book of Mormon. In fact, the Church of Jesus Christ of Latter Day Saints (the Mormon Church) acknowledges that many sections of the Doctrine and Covenants were revealed jointly to Smith and Rigdon or to Smith and Cowdery. [19] The voice signals of one of these men or a mix of their signals could be the 'prophetic voice' Holmes ascribes to Smith. That Holmes would find similarities between the 'prophetic' voice of the Doctrine and Covenants and the Book of Mormon, therefore, is at best evidence of common authorship for the two texts but in no way demonstrates that Smith's 'voice' (divinely inspired or otherwise) is anywhere to be found. [20]


2 A New Approach

For many, the question of who wrote the Book of Mormon remains unresolved. Historical and stylometric research has so far not given us a reliable answer. We offer here a new approach that differs from past work both in source selection and methodology. We examine the entire 1830 Book of Mormon without any a priori assumptions, modifications or pre-selection, and compare it to new, candidate-author samples. Our methodology does not isolate word categories (i.e. contextual or non-contextual nouns), but instead uses the entire

__________
17 See http://www.lightplanet.com/mormons/people/joseph_smith/writings.html.

18 We are aware of objections made to the entire concept of 'authorship,' objections put forth by theorists such as Stanley Fish and Roland Barthes who have proclaimed that the idea of 'authorship' is flawed or 'dead.' It is not our intention to enter these highly theorized debates but rather to proceed from the assumption that writers do actually write documents and that the documents they write can, with appropriate evidence, be reasonably attributed to their authors. In the case of Joseph Smith, we do not believe that even the small number of letters written in his own hand can be reasonably attributed to him. Moreover, were we to concede the reliability of these few letters, we would still not have enough text to constitute an ample sample of known authorship. This is a regrettable circumstance, and we do hope that reliable material will be made available in the future for additional testing.

19 Sections 35, 37, 40, 44, 71, 73, 76, and 100 to Smith and Rigdon, and Sections 6, 7, 13, 18, 24, 26, and 110 to Smith and Cowdery.

20 In another paper, also published in 1991, Holmes includes the Book of Abraham along with the Doctrine and Covenants in order to develop Smith's 'prophetic' voice. The problems we identify with the Doctrine and Covenants are similar in the Book of Abraham.




M. L. Jockers et al.                                                                                   [470]


corpus as a starting point and a mathematically based selection process to define the features of the author samples and the Book of Mormon that we will compare. Our work employs two techniques to determine the probability that each chapter of the Book of Mormon was authored by each of seven authors: Oliver Cowdery, Parley Pratt, Sidney Rigdon, Solomon Spalding, Isaiah-Malachi (from the Bible), Henry Wadsworth Longfellow, and Joel Barlow. The first five have known or alleged connections to the Book of Mormon. The last two are prominent, period-authors who were added as controls. [21]

The first technique, 'delta' (Burrows, 2002, 2003; Hoover, 2004a,b) is well-documented in the literature of computational linguistics, so we omit a detailed description here. The second is 'nearest shrunken centroids' (NSC). NSC is a statistical technique for classification in high-dimensional settings. The problem of authorship attribution is a classification problem because we seek to classify a text sample of unknown authorship into one of a fixed number of known author categories -- in this case a closed set of candidate authors: Rigdon, Spalding, Cowdery, Longfellow, etc. The problem is high-dimensional because we seek to perform classification on the basis of a very large number of words. The method is as follows. First, we compute average word frequency vectors, or centroids, for each known author, on the basis of the text samples of known authorship. Next, we shrink these centroids towards the overall average word frequency vector across all of the authors, in order to make our method more robust to small changes in word frequencies. Finally, we classify a text of unknown authorship by computing its word frequency vector and determining to which of the shrunken centroids it is most similar. NSC was initially intended for a completely different purpose: it was developed to assist cancer diagnosis by classifying patient tumor samples into cancer subtypes based on gene expression measurements. However, from a machine learning perspective, the problem of authorship attribution is surprisingly similar to that of cancer diagnosis: rather than classifying tumors by cancer subtype, we classify texts by author, and instead of using gene expression measurements to perform the classification, we use word frequencies. [22] This creates the seven author categories described above. The same kind of analysis is then done on a Book of Mormon chapter, and the resulting pattern is compared to the pattern of each of the seven potential author categories (Rigdon -- 90%; Longfellow -- 1%; etc.) On this basis, NSC assigns a probability that each potential author wrote each Book of Mormon chapter; just as it would assign a probability that a tissue sample manifests a particular cancer sub-type. More details regarding NSC can be found in Tibshirani (2002, 2003). [23]


3 Source Selection

Because several theories for the origin of the Book of Mormon propose multiple authorship, we cannot investigate it as if it were a single unified text written by a single author, but must instead break it into meaningful samples. Smith reportedly dictated the original document in a series of sessions with scribes. [24] These scribes allegedly wrote down everything he said, without punctuation or attention to grammatical form. A key scribe, Oliver Cowdery, is alleged to have provided the initial editing before publication. Subsequent editors altered punctuation, improved grammar, eliminated redundant phrases, and, in some cases, made changes in the text's content. Our investigation thus begins by excluding any analysis of punctuation (e.g. comma frequency) or of form (verse length, sentence length, etc.), and is limited to words alone. Since it was the published 1830 version of the text that was authorized by the Mormon Church, we examine the entire text, excluding only the chapter summaries that appear before First Nephi, Second Nephi, Jacob, Alma, Helaman, Third Nephi, Forth Nephi, and Ether.

We opted to use the chapter structure currently recognized by modern Mormon Church editors to create our text samples. This results in a total of 239 text segments for testing. This approach yields texts that are generally of adequate size (verses are too small and books too large), recognizes natural breaks in the narrative, facilitates cross-referencing

__________
21 Choosing appropriate controls to use in conjunction with the Book of Mormon was not a trivial matter. The rationale for our choice is delineated in the section of this article titled 'Source Selection.'

22 More specifically, NSC works by computing a vector of 'average word frequencies' for each author and shrinking this vector towards the overall average word frequency vector across all authors in order to reduce the variance and avoid over-fitting the data. A test set sample is then classified by computing its distance to the word frequency vector for each known author, while incorporating knowledge about the variance for each author.

23 See also: http://www-stat.stanford.edu/$tibs/PAM/index.html

24 In early July of 1828, Smith lost the first 116 pages of his alleged translation of the inscribed gold plates. Prior to this loss, Smith claimed to have translated the plates by means of the 'urim and thummim,' a Hebrew instrument of divination that, according to Smith, consisted of a pair of stones fastened to a breastplate and joined in a form similar to that of a large pair of spectacles. When Smith resumed the alleged translation after loss of the 116 pages, however, he reportedly dictated all of the 582 pages of the 1830 Book of Mormon (including large sections quoted verbatim from the Book of Isaiah) by gazing into a hat in which he held a 'seer stone' that allowed him to 'see' the words in English and to thus read from the gold plates that were often purportedly hidden to avoid theft. This seer stone was a stone previously used by Smith to look for gold treasures allegedly buried on the land of local farmers, a practice for which Smith was prosecuted (successfully) in a court of law. For a fee, Smith and his associates would dig for treasures at locations identified with the aid of the seer stone. Inevitably, the treasures were cursed and 'slippery,' preventing their recovery.




[471]                                             Reassessing authorship of the Book of Mormon


to online resources, [25] and avoids the chance that we have imposed our own bias. We consider it important that this method tests the entire corpus approved by Smith in 1830. Book of Mormon samples averaged 1,117 words and ranged in size from 95 to 3,752 words. Our candidate-author samples were equally varied and ranged from a small sample of 114 words to a large sample of 17,797 words with an average sample size of 2,172 words. [26]

For comparative purposes, we acquired digital versions of the Books of Isaiah and Malachi from the King James Bible as well as samples of known writings of Solomon Spalding, Sidney Rigdon, Parley Pratt, and Oliver Cowdery (Appendix A provides a detailed list of source materials). For control purposes, we selected two texts: Henry Wadsworth Longfellow's (1855) Song of Hiawatha and Joel Barlow's Columbiad (1825). Barlow and Longfellow were initially selected as control authors because both are roughly contemporary to the Book of Mormon, both deal to an extent with concepts found in the Book of Mormon, and both employ formulaic patterns consistent with patterns of verse seen in the Book of Mormon. [27] To further test the appropriateness of these control texts, we performed a series of simple hierarchical classification tests using frequently occurring non-contextual words and fifty novels of the same era (1789-1850). [28] The texts written by Longfellow and Barlow consistently clustered close to the Book of Mormon indicating that they were appropriate choices for use as control texts. The Isaiah and Malachi texts also served as pseudo-control texts, since large sections of the Book of Mormon are known to be almost verbatim extracts from them. [29] All of the known author samples were segmented in order to obtain estimates of the variance associated with each author's word use. In total, 239 chapters of the Book of Mormon and 217 samples of known authorship were tested. Using scripts developed for this project, each sample was tokenized in order to produce word counts and relative frequency data for each word within each sample. [30] We did not include Joseph Smith in the analysis because, as noted above, there is currently no reliable corpus of Joseph Smith text.


4 Methodology

As described in the previous section, our data consist of 239 samples of unknown authorship (corresponding to chapters from the Book of Mormon) and 217 samples written by seven known authors. We refer to this analysis as the 'seven-author case.' The number of text samples used for this analysis is as follows: Cowdery (nineteen), Pratt (fifty-three), Rigdon (twenty-three), Spalding (seventeen), Isaiah-Malachi (seventy), Barlow (twelve), and Longfellow (twenty-three). We used a set of 110 words or 'features,' obtained in three steps:
(1) We selected the words that occurred at least once in the samples from each author and also at least once in the Book of Mormon. This resulted in a set of 521 words.

(2) We selected the subset of these 521 words that have a mean relative frequency, across the 456 samples, of at least 0.1%. This resulted in a set of 114 words.

(3) We removed the words 'god,' 'ye,' 'thy,' and 'behold,' as these occurred at much higher frequencies in texts relating to biblical subject matter.

The resulting list of 110 words is available in Appendix B. [31]

In order to compute delta scores and apply NSC, we first converted the 110 word counts for each text into relative word frequencies. For NSC, we formatted the data as a matrix of dimension 456 x 110 (number of samples by number of words). We subtracted out the mean from each column and divided the entries in each column by the standard deviation for that column. We then applied NSC to the data, using the 'pamr' (Prediction Analysis for Microarrays) package that is freely available on the R-statistical software website. [32]

Both delta and NSC involve the selection of tuning parameters. For both methods, this tuning parameter determines the number of words to include in the classifier. In order to determine the success rates of NSC and delta at classifying chapters of known authorship, and in order to select a value

__________
25 See for example: http://scriptures.lds.org/bm/contents

26 It warrants note that in our cross-validation tests, we did not observe a correlation between whether an author was correctly assigned and the length of a text sample. After NSC and delta author assignments were made, we further tested the results for any possible correlation between the size of a Book of Mormon chapter and the author assigned to that chapter. Again no correlation was observed.

Sample sizes varied as follows:
  • Average Book of Mormon sample size 1,117 (range 91-3,752)
  • Average Rigdon sample size 4,561 (range 226-17,797)
  • Average Spalding sample size 2,373 (range 777-8,515)
  • Average Cowdery sample size 1,600 (range 200-10,712)
  • Average Pratt sample size 3,024 (range 114-16,468)
  • Average Longfellow sample size 1,354 (range 668-2,188)
  • Average Barlow sample size 5,460 (range 2,843-6,984)
  • Average Isaiah/Malachi sample size 554 (range 134-1,131)


  • 27 For example, the Longfellow and Barlow texts frequently open verses with the word 'and' as well as frequently stringing together multiple short phrases. Also evident in the three texts is the use of epic or 'biblical' language with a propensity for repetition.

    28 Fifty novels written by fifteen different authors (twelve male, three female) were selected from the Chadwyck-Healey Nineteenth Century American Literature collection. Selection was based solely on publication date (e.g. chronological proximity to the 1830 publication of the Book of Mormon). From these texts we extracted word frequency data and employed hierarchal clustering ('hclust' function with complete linkage) to group the texts based on their similarity. The cluster dendrograms produced by R can be found in the online supporting materials at http://purl.stanford.edu/ir:rs276tc2764

    29 It has been suggested to the authors that the passages from Isaiah and Malachi that appear verbatim in the Book of Mormon might be removed from our candidate author samples and perhaps even replaced by similar books from the Bible. Our purpose in keeping these passages was intentional and meant to serve as an added test of our methodology. Had NSC failed to assign those chapters of the Book of Mormon that contain significant borrowings from Isaiah and/or Malachi to our Isaiah/Malachi sample, then we would have had grounds to question the effectiveness of our classifications overall. As it turned out, NSC effectively assigned to Isaiah/Malachi even those chapters where the direct borrowings were subtle.

    30 All data are available online at http://purl.stanford.edu/ir:rs276tc2764

    31 In order to avoid the problems that John Burrows (2005) indentifies in his study of Shamela, our methodology selects for words common to all authors and then, to control for infrequent words that might be common to only one author and the target text, we further winnow the selection to contain only words that appear at a rate of 0.1%. On this point, Hoover (2002) is also instructive. In his analysis of frequent word sequences, Hoover culls from analysis words that are unique to a single text and words with 'obviously peculiar distributions' such as those that we found for the words: 'god', 'ye', 'thy', and 'behold.'

    32 http://cran.r-project.org/




    M. L. Jockers et al.                                                                                   [472]


    cross-validation. Roughly speaking, cross-validation is performed as follows, for a range of values of the tuning parameter:
    (1) Randomly split the samples of known authorship into two sets: a 'training set,' containing most of the samples, and a 'test set,' containing a smaller portion of the samples.

    (2) Perform the classification method of interest (either delta or NSC) for a given value of the tuning parameter, training on the training set and testing on the test set.

    (3) Compute the error fraction from the number of misclassified test set samples.
    Cross-validation allowed us to estimate the error that we would obtain if we tried to classify the samples of known authorship using NSC and delta. The above process was repeated multiple times, and the average misclassification error rate recorded. The lowest delta error rate of 11.1% was obtained using ninety words. This means that if we used delta to classify a new sample written by one of the seven known authors, then the probability of correct classification would be 88.9%. The lowest NSC error rate was obtained when all 110 words were included; the error rate was 8.8%. This means that we would expect to classify correctly a new sample written by one of the seven candidate authors 91.2% of the time. Since there are seven candidate authors, a classifier that selected an author completely at random would give a correct classification rate of 1/7 or 14.3%, and an average misclassification error rate of 6/7, or 85.7%. Therefore, the low error rates obtained using NSC and delta are impressive. The fact that NSC results in lower error rates indicates that this method is appropriate for authorship attribution, and may in this case be superior to delta.

    Using delta, five of the 239 chapters of the Book of Mormon were incorrectly assigned to control author Longfellow (none to Barlow), an error rate of 2.1%. Using NSC, only two chapters were assigned incorrectly to Longfellow (none to Barlow), an error rate of 0.8%. To provide best estimates of individual chapter authorship for the five authors who are linked historically to the Book of Mormon (Spalding, Rigdon, Cowdery, and Pratt) or who are known to have contributed (Isaiah-Malachi), we also performed a second delta and NSC analysis (hereafter referred to as the 'five-author case') in which we omitted the Barlow and Longfellow control texts. In the five-author case, the lowest NSC error rate was obtained using 108 words (listed in Appendix B).


    5 Results

    For each chapter of the Book of Mormon, using both NSC and delta, we compared the relative probability that a candidate author or a control author contributed to that chapter. We then established a 'ranking' for each of the seven authors (1-7) from most likely to least likely and calculated the percentage point difference between candidates in terms of their probability. In Alma forty-seven (Chapter 147), for example, the first place ranked candidate (using NSC) has a probability of 46.5% where the second place candidate is 46.3%. Given this close proximity, it would be impossible to conclude that one candidate is more likely than the other. In the majority of chapters, however, we do not observe this sort of close probability between first and second candidates. Most chapters (57%) show at least a fifty percentage point difference between first and second choice. Indeed, in forty chapters (17%), the difference between first and second most probable author is over ninety percentage points. Second Nephi twenty-two (chapter forty-four), for example, is a chapter known to contain strong borrowings from the Book of Isaiah. NSC ranks the probability of Isaiah-Malachi as the source for this chapter at 99.99%. In fact, twenty of the twenty-one chapters known to have been borrowed from Isaiah or Malachi are properly attributed at a probability at or above 91% certainty. [33] There was thus only one 'false negative' for chapters that are known to be derived from Isaiah-Malachi (Mosiah fourteen is borrowed from Isaiah fifty-three but was attributed to Longfellow). This is evidence for the effectiveness of NSC classification. Further evidence comes from a consideration of 'false positives' -- chapters attributed incorrectly to Isaiah-Malachi. There are twenty-one known Isaiah chapters and another sixteen that have some relationship to Isaiah or Malachi (about 15% of the chapters in the Book of Mormon). But delta assigns 47% to Isaiah-Malachi, while NSC assigns 27%. This indicates that both delta and NSC had 'false positives' for Isaiah-Malachi, but the NSC false positive error rate was about half that of delta.

    NSC and delta agree on the first place assignment for 147 of 239 chapters (62% agreement). In cases where there is not first place agreement between the two methods, there are seventy-six chapters in which the first place candidate of one method agrees with the second place candidate in the other. There are a total of 223 chapters (93%) in which the two methods name the same author in either the first or second place. In the 147 chapters where both methods agree on first place, there are two chapters assigned to Cowdery, two to Longfellow, four to Pratt, thirty-four to Spalding, forty-six to Rigdon, and fifty-nine to

    __________
    33 The twenty-one chapters from the Book of Mormon that use the same words in the same sequence as Isaiah or use slightly modified wordings are: 1 Nephi 20 (84% identical to Isaiah 48), 1 Nephi 21 (87% identical to Isaiah 49), 2 Nephi 7 (79% identical to Isaiah 50), 2 Nephi 8 (93% identical to Isaiah 51), 2 Nephi 12 (86% identical to Isaiah 2), 2 Nephi 13 (94% identical to Isaiah 3), 2 Nephi 14 (95% identical to Isaiah 4), 2 Nephi 15 (96% identical to Isaiah 5), 2 Nephi 16 (96% identical to Isaiah 6), 2 Nephi 17 (98% identical to Isaiah 7), 2 Nephi 18 (97% identity to Isaiah 8), 2 Nephi 19 (96% identical to Isaiah 9), 2 Nephi 20 (97% identical to Isaiah 10), 2 Nephi 21 (99% identical to Isaiah 11), 2 Nephi 22 (97% identical to Isaiah 12), 2 Nephi 23 (94% identical to Isaiah 13), 2 Nephi 24 (93% identical to Isaiah 14), Mosiah14 (identical to Isaiah 53), 3 Nephi 22 (identical to Isaiah 54), 3 Nephi 24 (includes all of Malachi 3), and 3 Nephi 25 (identical to Malachi 4).




    [473]                                             Reassessing authorship of the Book of Mormon


    Table 1 Number of chapters assigned to each author based on NSC probability assignments for the seven-author case (Barlow and Longfellow controls included)
    If chapter assignments were random, the expected number for each choice would be 239/7 = 34.


    Table 2 Number of chapters assigned to each author based on delta probability assignments for the seven-author case (Barlow and Longfellow controls included)
    If chapter assignments were random, the expected number for each choice would be 239/7 = 34   (magnify)


    Isaiah-Malachi. In the seventy-six chapters where there is agreement between a first choice in one method and a second choice in another, there are forty-two cases, which are inverses of each other, that is there are forty-two cases in which the author listed as first place in one method is listed in second place in the other method. In these instances, there are nine cases in which Rigdon is paired with Cowdery, twenty cases in which Rigdon is paired with Isaiah-Malachi, eight cases in which Rigdon is paired with Spalding, two cases in which Spalding is paired with Pratt, and three cases in which Spalding is paired with Isaiah-Malachi.

    Examining the NSC results (Table 1), we note the following for the most likely positions of the first and second most probable candidate: Rigdon is either the first or second most probable candidate author in 197 out of 239 chapters; Spalding is either the first or second most likely candidate in 110 out of 239 chapters; and Isaiah-Malachi is first or second in 101 chapters. Cowdery appears thirtyseven times in first or second place and Pratt appears twenty-four times. Barlow is never seen in first place and appears only once in second place Longfellow is first in just two chapters and second in only six. Additionally, we note that for the least likely positions of the sixth and seventh most probable candidates, Rigdon never shows up in last (seventh) place and appears only twice in sixth place. Spalding never shows up in either seventh (last) or sixth place. Isaiah-Malachi appears four times in last place and fourteen times in sixth. Cowdery appears 116 times in sixth or seventh place, Pratt thirty-four times. Barlow appears in the sixth or seventh place 175 times and Longfellow 133 times. Table 2 shows similar results generated from the delta classification.

    Summing the probabilities for each candidate-author across the entire Book of Mormon, allows visualization of the relative presence of each candidate-author's 'signal' in the overall text (Figs 1 and 2).

    In both classification methods, the signals for Rigdon, Isaiah-Malachi, and Spalding are dominant and the signals for control authors Longfellow and Barlow are comparatively small or altogether absent. The Pratt and Cowdery signals are present but small beside the signals for Rigdon, Isaiah-Malachi and Spalding. Both NSC and delta tend to agree closely in terms of the relative presence of the Rigdon and Spalding signals. The greatest disagreement between the two methods appears in relation to the Isaiah-Malachi signal where delta assigns 47% of the chapters to Isaiah-Malachi as a first choice while NSC




    M. L. Jockers et al.                                                                                   [474]



        Fig. 1 Overall attribution %s as assigned by NSC     Fig. 3 1st & 2nd place NSC assignments for the 7-author case


        Fig. 2 Overall attribution %s as assigned by delta     Fig. 4 1st & 2nd place delta assignments for the 7-author case

    assigns 28%. The actual Isaiah-Malachi percentage can be estimated at around 36 chapters, or 15% of all chapters. In other words, 15% of the Book of Mormon is derived from Isaiah-Malachi or contains excerpts from Isaiah-Malachi. This indicates that while both delta and NSC had false positives, NSC had many fewer and is closer to the actual or true value.

    Figures 3 and 4 show the number of chapters assigned to each author as either the first or second most likely attribution. Again, we note the dominance of Rigdon, Isaiah-Malachi, and Spalding in both first and second place assignment and the comparatively small presence of both the control authors and the other candidates.

    All of the above results are for the seven-author case. The five-author case gave highly similar results. For the first most likely attribution, identical results were obtained for 226 of the 236 chapters




    [475]                                             Reassessing authorship of the Book of Mormon


    (96% agreement). For the first and second most likely attributions, identical results were obtained for 223 of the 239 chapters (93% agreement).


    6 Discussion

    In the cross-validation tests, NSC was more effective, so our discussion and conclusions are based on the NSC data unless otherwise specified. Use of NSC enabled assignment of author probabilities based on the closeness of each chapter within the Book of Mormon to the known linguistic signals of a set of candidate nineteenth-century authors. These probabilities in turn made it possible to gauge the relative presence of one signal over another. The low signals for the control texts indicate that NSC effectively identified and did not select the control authors (Figs 5 and 6).

    At a macro level, the signals for Rigdon, Isaiah-Malachi, and Spalding dominate the Book of Mormon, with NSC assigning 85% of the text to one of these three candidate authors (delta assigns 93% to these three). Taken together, Rigdon and Spalding account for 57% of the first place assignments and 68% of the second place assignments. Isaiah-Malachi accounts for 28% of the first place assignments and 16% of the second place assignments. Overall only ten chapters lack a signal for Rigdon or Spalding in one of the two most probable positions and four of these ten chapters are chapters known to be derived from the Book of Isaiah, as discussed below. In other words, of the 239 chapters in the Book of Mormon, 229 show either the Rigdon or Spalding signal prominently. Together Rigdon and Spalding receive 64% of the combined first and second place assignments, Isaiah-Malachi receives 21% and other candidates or control authors receive 15% (Fig. 3).

    It is well-accepted that some chapters from the Old Testament books of Isaiah and Malachi served as source material for the Book of Mormon. Both delta and NSC correctly classified all twenty-one chapters of the Book of Mormon that contain strong borrowings from Isaiah and/or Malachi, and all were classified with a probability at or above 84% by NSC. [34] That said, both methods also detected the Isaiah-Malachi signal in chapters that are not obviously derived from the Old Testament. For NSC, there are forty-three such chapters and, in thirty-six of these, the probabilities strongly favor Isaiah-Malachi over the other possible candidates. [35] These are chapters where the style and word usage patterns are close to those in the Old Testament. Others (Walters, 1990; Tanner, 1998; Marquardt, 2000; Persuitte, 2000; Palmer, 2002) have spent considerable time tracing the direct correspondences between the Book of Mormon and the King James version of the Bible, so we will not delve into the specifics of these Isaiah-Malachi attributions other than to note that fifteen of these thirty-six chapters are directly related to Isaiah and one of the thirty-six is a chapter borrowed from the New Testament. [36] Discounting the sixteen chapters that have some connection to the Bible, this leaves twenty chapters attributed to Isaiah-Malachi for reasons that are not obvious. We note, however, that many of these twenty chapters have thematic similarity to Ethan Smith's View of the Hebrews (Walters, 1990; Persuitte, 2000), which is believed by many to be linked to the Book of Mormon through Cowdery, [37] and a future analysis might utilize Ethan Smith's text as a potential source. Figure 7A shows the relative presence of the Isaiah-Malachi signal across the entire Book of Mormon and Fig. 7B shows chapters attributed to Isaiah-Malachi.

    The prominence of the Rigdon and Spalding signals are significant and provide strong support for the Spalding-Rigdon authorship theory: that Rigdon acquired one or more manuscripts written by Spalding and then modified them, by incorporating his own theology, to create the 1830 version of the Book of Mormon. Figure 8A illustrates the presence of the Rigdon signal through each chapter of the Book of Mormon, and Fig. 8B shows chapters attributed to Rigdon. The graph shows a dominant Rigdon signal in First Nephi, the non-Isaiah fraction of Second Nephi, Jacob, Enos, Words of Mormon, Mosiah, Helaman, the non-Isaiah fraction of Third Nephi, Mormon, Ether, and Moroni, with an intermittently strong signal in the Book of Alma. Especially, noteworthy here is the fairly regular distribution of the signal across the entire text. A gap in the Rigdon signal appears in sections known to

    __________
    34 Ibid.

    35 The thirty-six chapters not conventionally understood as being derived from Isaiah/Malachi that were assigned with a probability above 50% to Isaiah/Malachi are: 2 Nephi 28, 3 Nephi 9, 2 Nephi 26, 2 Nephi 29, 2 Nephi 3, Ether 4, 3 Nephi 21, 2 Nephi 7, 2 Nephi 27, Helaman 13, 2 Nephi 30, 2 Nephi 4, 3 Nephi 30, 3 Nephi 16, 3 Nephi 20, Mosiah 10, Mosiah 17, Mosiah 20, Jacob 5, Mormon 8, Ether 2, 3 Nephi 27, 2 Nephi 10, 3 Nephi 14, 1 Nephi 12, Mosiah 12, 2 Nephi 9, Jacob 3, Ether 3, 3 Nephi 11, 1 Nephi 2, 1 Nephi 7, Mosiah 24, Helaman 14, Mosiah 3, and 1 Nephi 13.

    36 The 2 Nephi 9, 2 Nephi 27, 2 Nephi 28, and 2 Nephi 30 make heavy use of Isaiah's phraseology, such as 'Holy One of Israel.' 'Holy One of Israel' appears twenty-seven times in Isaiah, but just five times in the rest of the Bible. The 2 Nephi 10, 2 Nephi 26, 2 Nephi 27, 2 Nephi 28, 2 Nephi 29, 2 Nephi 30, Jacob 5, 3 Nephi 20, and 3 Nephi 21 are prophetic chapters dealing with the theme of restoration and expounding on Isaiah. Excerpts from Isaiah are found in 1 Nephi 13, 2 Nephi 26, 2 Nephi 27, 2 Nephi 30, Mosiah 12, 3 Nephi 16, 3 Nephi 20, 3 Nephi 21, and Mormon 8. Verse 20 in 3 Nephi 20 and verse 23 in Mormon 8 both include admonitions to remember or search the words of Isaiah. The 3 Nephi 14 is a chapter borrowed from the New Testament (Matthew 7).

    37 Ethan Smith was pastor of the Congregationalist Church attended by the family of Oliver Cowdery.




    M. L. Jockers et al.                                                                                   [476]


    (this page not reproduced, due to copyright restrictions)





    [477]                                             Reassessing authorship of the Book of Mormon




    Fig. 7 (A) Chapter-by-chapter probability of Isaiah-Malachi as author (seven-author case). (B) Chapters for which the first place NSC assignment was Isaiah-Malachi (five-author case)   (magnify)

    be copied from Isaiah (Fig. 7), in portions of the book of Alma attributed to Spalding (Fig. 9), and it is sporadic in the first quarter of the text, in the section known to scholars of Mormonism as replacement material added after Smith's loss of 116 pages he claimed to have translated. [38] The lost pages contained material that would have ended near the beginning of the Book of Mosiah. It is generally held that Smith resumed his purported translation at Mosiah and continued through to the end of the Book of Mormon, returning at the end of the process to replace the lost pages. One possible scenario is that Smith and/or Rigdon prepared a replacement in fall of 1828 by drawing from source material at hand such as the Book of Isaiah (which features prominently in this part of the text) and perhaps from Ethan Smith's View of the Hebrews.

    Figure 9A illustrates the presence of the Spalding signal through each chapter of the Book of Mormon and Fig. 9B the chapters attributed to Spalding. Noteworthy here is (1) the small Spalding signal in sections of the Book of Mormon that were likely added to replace the 116 pages (i.e. the first quarter of the book -- First Nephi through Words of Mormon), and (2) the fact that the chapters with a dominant Spalding signal are primarily narrative and non-theological, and thus consistent with

    __________
    38 According to the official Mormon Church account, Smith received the gold plates upon which the Book of Mormon was inscribed from an angel on 22 September 1827. He is said to have begun his translation sometime between December 1827 and February 1828. In mid-June of 1828, the first 116 pages of the document were lost. Shortly after completing the first 116 pages of the document, Smith's scribe Martin Harris took the document home to show his wife. Stories differ as to whether the pages were lost, stolen, or destroyed by Harris's wife.




    M. L. Jockers et al.                                                                                   [478]


    descriptions of 'Manuscript Found,' the missing Spalding document that is alleged to be foundational to the Book of Mormon (Howe, 1834, 1977). The prominence of the Spalding signal in the Book of Alma is especially noteworthy. Dale Broadhurst has identified these chapters as likely Spalding contributions based on his careful comparison of phrases found both in the Book of Alma and Spalding's 'Manuscript Story.' [39] Similar thematic and linguistic patterns between the Book of Mormon and Spalding's 'Manuscript Story' have also been identified by Holley (1989). [40]

    Figure 10A shows the distribution of the Oliver Cowdery signal and Fig. 10B the chapters attributed to Cowdery. [41] The Cowdery signal is most prominent in the middle third of the book with a strong cluster of authorial assignments (fourteen first-place and four second-place) in the Book of Alma. Where



    Fig. 8 (A) Chapter-by-chapter probability of Sidney Rigdon as author (seven-author case). (B) Chapters for which the first place NSC assignment was Rigdon (five-author case)   (magnify)

    Cowdery is the most probable author, he is paired with Rigdon as second most probable author in all but two cases; where Cowdery is assigned as second most probable author (seventeen chapters), Rigdon is first most likely in fourteen of these. All of this suggests a strong correlation between Cowdery and Rigdon and the likelihood that if Cowdery contributed to the Book of Mormon, he may have done so in collaboration with Rigdon. The Cowdery signal appears only where the Rigdon signal is also

    __________
    39 The bar charts provided in panel B in Figs 5-10 were inspired by Broadhurst and can be compared to his charts available online. The attributions made by Dale Broadhurst and a detailed analysis are available at http://solomonspalding.com/SRP/MEDIA/SRPpap16.htm#Alma and http://solomonspalding.com/SRP/SRPpap10.htm

    40 See http://www.sidneyrigdon.com/vern/vernP0.htm#pg03

    41 The 1 Nephi 6, 2 Nephi 32, Mosiah 2, Alma 5, Alma 7, Alma 9, Alma 26, Alma 29, Alma 32, Alma 33, Alma 36, Alma 38, Alma 39, Alma 40, Alma 54, Alma 60, Alma 61, 3 Nephi 12, 3 Nephi 13, and Moroni 1.




    [479]                                             Reassessing authorship of the Book of Mormon




    Fig. 9 (A) Chapter-by-chapter probability of Solomon Spalding as author (seven-author case). (B) Chapters for which the first place NSC assignment was Spalding (five-author case)   (magnify)

    prominent and in many cases the difference between the strength of the two signals is marginal. Also noteworthy is that the Cowdery signal appears most prominently in the middle third of the book. His signal appears after the Book of Mosiah and near the beginning of the Book of Alma -- the point in the manuscript where Smith supposedly began to dictate with Cowdery as his scribe, and when the speed of translation reportedly increased significantly. [42] It is in these sections of the Book of Mormon, especially the third quarter of the Book of Alma, that we find the Cowdery signal -- in well-composed chapters that deal with such topics as the nature of faith (Alma thirty-two), atonement through Christ (Alma thirty-six), [43] and liberty (Alma sixty-one). Still, if Cowdery had a direct hand in the authorship of the Book of Mormon it was likely a lesser one. [44] It is more likely that his primary role was editorial given both the historical and stylometric data.

    Figure 11A shows the distribution of the Parley P. Pratt signal and Fig. 11B the chapters attributed to Pratt. [45] Pratt is the most likely author for nine chapters with five occurring in First Nephi, one in Mosiah, and two small chapters appearing, back-to-back, in Moroni (Fig. 11B). Pratt was an

    __________
    42 Cowdery was Smith's primary scribe from 7 April 1829 to 2 June 1829.

    43 NSC assigned Alma chapter thirty-six to Oliver Cowdery. This chapter is a chiasm: an inverted parallel literary form. According to Edwards (2004) the pattern of the chiasm found in Alma thirty-six 'establishes with 99.98% certainty that this chiasm occurred in this book by design and rules out the hypothesis that it occurred by chance.' According to J. W. Welch (2003), publications in the New England area describing use of chiasmus as a Biblical literary form were 'available for purchase in bookshops or from traveling salesmen' in 1825. During that time period, Oliver Cowdery worked as a traveling salesman, selling books and pamphlets (Cowdrey et al., 2005).

    44 According to History of the Church and section eight of the Doctrine and Covenants (another book of Mormon scripture), Cowdery attempted to translate a part of the Book of Mormon, but met with limited success. Both History of the Church of Jesus Christ of Latter-day Saints and the Doctrine and Covenants are credited to Smith, but the extent of Smith's actual contribution is unknown. Many historical accounts and revelations attributed to Smith were changed after-the-fact by others and/or co-produced with others, including Rigdon and Cowdery.

    45 The 1 Nephi 4, 1 Nephi 5, 1 Nephi 11, 1 Nephi 16, 1 Nephi 18, Mosiah 9, Mormon 2, Moroni 2, and Moroni 3.




    M. L. Jockers et al.                                                                                   [480]




    Fig. 10 (A) Chapter-by-chapter probability of Oliver Cowdery as author (seven-author case). (B) Chapters for which the first place NSC assignment was Cowdery (five-author case)   (magnify)

    early leader in the Mormon church and one of the original Quorum of the Twelve Apostles. In 1826, however, he was a wandering tin peddler who 'knew everybody in Western New York and Northern Ohio' (Schroeder, 1901; Shook, 1914). He lived near Rigdon's residence in Bainbridge, Ohio, and joined Rigdon's congregation. [46] During the same period, Rigdon is reported to have collaborated with 'two or three different persons' in 'adjacent places' to create the Book of Mormon. [47] Sometime around 1827, Pratt decided to sell all his goods and take up the ministry. It has been suggested that Pratt was 'the medium through whom Rigdon made the acquaintance of Smith when seeking a suitable tool for his purpose' (Williams, 1842; Eaton, 1882). [48] While traveling in 1830, ostensibly to see family, Pratt reported sudden inspiration that led him to Palmyra, New York, where he quickly converted to Mormonism and was baptized by Oliver Cowdery. He and Cowdery then reportedly delivered a copy of the published 1830 version of the Book of Mormon to Rigdon. Pratt's conversion is

    __________
    46 See http://www.solomonspalding.com/docs/1901schr.htm and http://solomonspalding.com/docs2/1914Shk1.htm#pgvii

    47 In Bainbridge, Rigdon reportedly became involved in what appears to be 'automatic writing': using a seancelike process to create the Book of Mormon. A description of that process is given in a letter to the editor titled 'The Mormon Bible' which appeared in the New Northwest on 9 September 1880. The letter reads: 'We are in receipt of a letter from Mr O. P. Henry, an Astoria subscriber, who says, in reference to an article in the Oregonian of recent date concerning the origin of the Mormon Bible, that his mother, who is yet alive, lived in the family of Sidney Rigdon for several years prior to her marriage in 1827; that there was in the family what is now called a "writing medium," also several others in adjacent places, and the Mormon Bible was written by two or three different persons by an automatic power which they believed was inspiration direct from God, the same as produced the original Jewish Bible and Christian New Testament. Mr H. believes that Sidney Rigdon furnished Joseph Smith with these manuscripts, and that the story of the "hieroglyphics" was a fabrication to make the credulous take hold of the mystery; that Rigdon, having learned, beyond a doubt, that the so-called dead could communicate to the living, considered himself duly authorized by Jehovah to found a new church, under a divine guidance similar to that of Confucius, Moses, Jesus, Mohammed, Swedenborg, Calvin, Luther or Wesley, all of whom believed in and taught the ministration of spirits. The New Northwest gives place to Mr Henry's idea as a matter of general interest. The public will, of course, make its own comments and draw its own conclusions.' See http://www.sidneyrigdon.com/dbroadhu/NW/miscnw04.htm#081680. Dale Broadhurst has confirmed several aspects of the above account, and compiled additional historical evidence pointing to Bainbridge as the likely location for production of the 1827 version of the Book of Mormon. See http://sidneyrigdon.com/books/Hnry1942.htm and http://www.sidneyrigdon.com/books/Brew1945.htm

    48 See http://solomonspalding.com/docs/1882PatA.htm.




    [481]                                             Reassessing authorship of the Book of Mormon




    Fig. 11 (A) Chapter-by-chapter probability of Parley Pratt as author (seven-author case). (B) Chapters for which the first place NSC assignment was Pratt (five-author case)   (magnify)

    described in contradictory accounts, as is his role in delivering the Book of Mormon to Rigdon (Schroeder, 1901).

    In five of the nine chapters attributed to Pratt, Pratt is paired with Spalding in second place and in four with Rigdon in second place. Pratt receives fifteen second-place assignments: most of them (ten) as a second to Spalding, and three as a second to Rigdon. The largest proportion (one-third) of the assignments to Pratt as the second most probable author occurs in Alma, and there are two cases in First Nephi. If Pratt contributed to the Book of Mormon, he played a minor role and was likely most involved in First Nephi, where there are several first and second place Pratt assignments.

    In the stylometric studies cited earlier, Larson et al. (1980) and Hilton (1988) attempted to test the hypothesis that the Book of Mormon's purported ancient authors had dissimilar writing styles. Recent studies in cultural and linguistic evolution suggest another relevant hypothesis by demonstrating that writing styles in ancient texts tend to become increasingly divergent over time (Farmer, 2006). Our chapter-by-chapter analysis tested both hypotheses and found that the Book of Mormon




    M. L. Jockers et al.                                                                                   [482]


    does not display patterns consistent with the type of ancient record it purports to be. For example, two of the Book of Mormon's alleged principal authors were Nephi and Moroni. They allegedly lived about 1,000 years apart. NSC assigned many of their chapters to Rigdon. For example, NSC assigned both First Nephi ten and Moroni eight to Rigdon with >93% probability. The Book of Mormon also attributes many chapters to a single ancient author, but our results frequently disconfirmed this. For example, where the Book of Mormon attributes Mormon five, six, and seven to an ancient author named Mormon, NSC assigned chapters five and seven to Rigdon (89 and 92% probability, respectively) and chapter six to Spalding (72% probability). Chapters five and seven contain references to the future redemption of the House of Israel, a concept popular in the early nineteenth century and embraced by Rigdon, while chapter six is a war narrative similar to other such narratives penned by Spalding, a veteran of the American Revolutionary War. These results stand in contrast to claims that the Book of Mormon is of ancient authorship.


    7 Conclusions

    NSC has proved highly useful for authorship classification. It has a lower cross-validation error rate than delta, a lower rate of false positive assignments, and a probability-based output that enabled in-depth interpretation of the results, including speculation regarding possible connections between candidate authors. The NSC results are consistent with the Spalding-Rigdon theory of authorship. Evidence supporting this conclusion includes the prominence of signals for Spalding and Rigdon; the presence of strong Spalding signals in sections of the Book of Mormon previously linked to Spalding; the presence of a dominant Rigdon signal in most theological sections, and a strong Spalding signal in the more secular, narrative sections. Our findings are consistent with historical scholarship indicating a central role for Rigdon in securing and modifying a now-missing Spalding manuscript. The high number of Spalding-Rigdon pairings in first and second place strongly suggests that Spalding and Rigdon were responsible for a large part of the text. Pearson's chi-square test of independence was performed and indicates that the distribution of first-place assignments is significantly different from uniform (P < 2 x 10 -16). Similarly, the distribution of second-place assignments differs significantly from uniform (P < 2 x 10 -16). Clearly, far more chapters are attributed to Rigdon, Spalding, and Isaiah-Malachi than might be expected due to mere chance. Other connections detected through this work are also consistent with the historical record, including the likelihood of a lesser, largely editorial role for Cowdery and a possibly minor, if unexpected, role for Pratt.

    Based on this evidence, we find the original claims of Howe (1834, 1977) and the more recent assertions of Cowdrey and coworkers quite plausible; it seems likely that the 1830 version of the Book of Mormon was the creation of Sidney Rigdon, a Reformed Baptist Preacher, who had motives, means, and opportunity to carry out the project (Cowdrey et al., 2005). We acknowledge that because our samples of Rigdon prose all come after 1830, some could argue that Rigdon's prose was influenced by the Book of Mormon and not vice versa. To raise such an objection, however, one would have to argue that Rigdon was so influenced by the Book of Mormon that he consciously or unconsciously adopted, even internalized, the most subtle and unremarkable linguistic patterns found in certain portions of the text, but not in others.

    Prior exposure to the Book of Mormon most certainly did not influence Solomon Spalding who died fourteen years before it was published. Yet our data strongly support the historical claim that a lost Spalding manuscript served as a source text for the backbone narrative of the Book of Mormon. The document that we used for samples of Spalding's writing ('Manuscript Story' also known as 'The Oberlin Manuscript') does not match the eyewitness descriptions of 'Manuscript Found,' the draft novel that Spalding read to friends and family in Conneaut, nor does it match the Book of Mormon. [49] The Spalding-Rigdon theory rests

    __________
    49 Several thematic similarities to the Book of Mormon have been suggested by Holley (1989), Broadhurst (http://www.sidneyrigdon.com/) and Chandler (http://mormonstudies.com/). Tom Donofrio (see http://www.mormonstudies.com/early1.htm) has shown that Spalding's Oberlin Manuscript and The Book of Mormon both contain phrases borrowed from David Ramsay (1749-1815), a friend and biographer of George Washington and author of History of the American Revolution. They also contain phrases from Mercy Otis Warren (1728-1814), author of History of the Rise, Progress and Termination of the American Revolution (1805). These borrowed phrases are concentrated in sections on war within the Book of Alma, where the Spalding signal that NSC detected is most pronounced (Fig. 9A and B, chapters 138-143 in particular). See: http://www.mormonstudies.com/early1.htm;   http://www.postmormon.org/exp_e/index.php/magazine/pmm_article_full_text/211.

    50 For a description of the Joseph Smith papers project, see http://www.josephsmithpapers.net.

    51 Since Van Wagoner's 1994 biography of Sidney Rigdon, Mormon history researchers have become increasingly aware of the pivotal role Rigdon played in the emergence of Mormonism. Recently, for example, Reynolds (2005) concluded that Rigdon was the likely author of 'The Lectures on Faith,' a series of seven lectures previously attributed to Joseph Smith. These lectures played a key role in the development of early Mormon theology.




    [483]                                             Reassessing authorship of the Book of Mormon


    heavily on the assumption that additional Spalding manuscripts once existed, and that material from one of these manuscripts provided the narrative framework for the Book of Mormon. This additional manuscript would be the one that the Conneaut witnesses and others identified as being the 'source' of the Book of Mormon. While not that manuscript, the Oberlin Manuscript nevertheless provides us with a reliable sample of Spalding's prose and the linguistic signal detected in it appears with significant regularity throughout the Book of Mormon.

    Of course, we have not considered every possible candidate-author who may have influenced the composition of the Book of Mormon. We have, however, selected from among the most likely candidates, excepting perhaps Joseph Smith. In the case of Joseph Smith, we had no reliable samples of prose to test. When reliably identified materials become available, their addition to this analysis would be worth considering. An effort to compile such writings is currently underway. [50]

    Knowledge of who likely constructed the Book of Mormon has significant implications for scholarship in Mormon history and for religious and cultural studies generally, as it addresses the foundation of an emerging world religion now estimated at thirteen million members. Our analysis supports the theory that the Book of Mormon was written by multiple, nineteenth-century authors, and more specifically, we find strong support for the Spalding-Rigdon theory of authorship. In all the data, we find Rigdon as a unifying force. His signal dominates the book, and where other candidates are more probable, Rigdon is often hiding in the shadows. [51]

    __________



    (part of this page not reproduced, due to copyright restrictions)







    M. L. Jockers et al.                                                                                   [484]


    (this page not reproduced, due to copyright restrictions)





    [485]                                             Reassessing authorship of the Book of Mormon


    (this page not reproduced, due to copyright restrictions)





    M. L. Jockers et al.                                                                                   [486]


    (this page not reproduced, due to copyright restrictions)





    [487]                                             Reassessing authorship of the Book of Mormon


    (this page not reproduced, due to copyright restrictions)





    M. L. Jockers et al.                                                                                   [488]


    (this page not reproduced, due to copyright restrictions)





    [489]                                             Reassessing authorship of the Book of Mormon


    (this page not reproduced, due to copyright restrictions)





    M. L. Jockers et al.                                                                                   [490]


    (this page not reproduced, due to copyright restrictions)






    [491]                                             Reassessing authorship of the Book of Mormon


    Appendix B

    A total of 110 words used in NSC Classification for the seven-author case
    1 a
    2 after
    3 again
    4 against
    5 all
    6 among
    7 an
    8 and
    9 are
    10 as
    11 at
    12 away
    13 be
    14 because
    15 been
    16 before
    17 but
    18 by
    19 came
    20 children
    21 come
    22 day
    23 did
    24 do
    25 down
    26 earth
    27 even
    28 every
    29 father
    30 for
    31 forth
    32 from
    33 go
    34 great
    35 had
    36 hand
    37 have
    38 he
    39 her
    40 him
    41 his
    42 i
    43 if
    44 in
    45 into
    46 is
    47 it
    48 king
    49 know
    50 land
    51 made
    52 man
    53 many
    54 may
    55 me
    56 men
    57 might
    58 more
    59 my
    60 name
    61 no
    62 not
    63 now
    64 o
    65 of
    66 on
    67 one
    68 or
    69 our
    70 out
    71 over
    72 pass
    73 people
    74 power
    75 said
    76 say
    77 shall
    78 should
    79 so
    80 son
    81 that
    82 the
    83 their
    84 them
    85 then
    86 there
    87 therefore
    88 these
    89 they
    90 things
    91 this
    92 those
    93 thus
    94 time
    95 to
    96 up
    97 upon
    98 us
    99 was
    100 we
    101 were
    102 when
    103 which
    104 who
    105 will
    106 with
    107 words
    108 would
    109 you
    110 your

    A total of 108 words used in NSC Classification for the five-author case

    1 a
    2 after
    3 again
    4 against
    5 all
    6 among
    7 an
    8 and
    9 are
    10 as
    11 at
    12 away
    13 be
    14 because
    15 been
    16 before
    17 but
    18 by
    19 came
    20 children
    21 come
    22 day

    23 do
    24 down
    25 earth
    26 even
    27 every
    28 for.
    29 forth
    30 from
    31 go
    32 great
    33 had
    34 hand
    35 have
    36 he
    37 her
    38 him
    39 his
    40 i
    41 if
    42 in
    43 into
    44 is

    45 it
    46 king
    47 know
    48 land
    49 made
    50 man
    51 many
    52 may
    53 me
    54 men
    55 might
    56 more
    57 my
    58 name
    59 no
    60 not
    61 now
    62 o
    63 of
    64 on
    65 one
    66 or

    67 our
    68 out
    69 over
    70 pass
    71 people
    72 power
    73 said
    74 say
    75 shall
    76 should
    77 so
    78 son
    79 that
    80 the
    81 their
    82 them
    83 then
    84 there
    85 therefore
    86 these
    87 they
    88 things

    89 this
    90 those
    91 thus
    92 time
    93 to
    94 up
    95 upon
    96 us
    97 was
    98 we
    99 were
    100 when
    101 which
    102 who
    103 will
    104 with
    105 words
    106 would
    107 you
    108 your


     

    Book of Mormon Authorship Attribution Charts
    (after Jockers, et al., 2008)



    (view high resolution version)


    More Book of Mormon Authorship Attribution Charts:

    Sidney Rigdon Chapters
    Rigdon/Spalding Chapters
    Rigdon/Cowdery Chapters
    Rigdon/Pratt Chapters
    Solomon Spalding Chapters
    Spalding/Cowdery Chapters
    Spalding/Pratt Chapters
    Oliver Cowdery Chapters
    Cowdery/Pratt Chapters
    P. P. Pratt Chapters
    Combination of Four Authors




     

    Transcriber's  Comments




    The Stanford Researchers' 2008 Article

    The Book of Mormon computerized authorship study was conducted at Stanford University, by Matthew L. Jockers, Daniela M. Witten and Craig S. Criddle, in 2006-7. They submitted their report to Literary and Linguistic Computing April of 2008 and a pre-publication version of the "Reassessing Authorship" paper appeared on the Oxford Journals web-site that summer. The journal printed the paper in its Dec., 2008 issue, which was mailed out to subscribers early in 2009. Both the digitized pre-publication version and the final published version have subsequently been moved to the "Pay per View" section of the Oxford site. An on-line copy of the pre-pub paper is still available at Scribed.com, and snippet views from the published journal pages recently became available at the Google Books site.


    Criticism of the Jockers Study

    Even before the actual publication of the paper from Jockers, et al., (hereafter: Jockers' 2008 paper), a number of critics voiced their disapprobation with both the study and its peer-reviewed report. For example, in Dec., 2008 Jeff Lindsay (a Mormon internet blogger) voiced his concern that the Jockers study may have been "Rigged for Rigdon." His reasoning, which was frequently repeated by other pro-LDS writers, was that the inclusion of author-candidates such as Sidney Rigdon, in the attribution of probable Book of Mormon authorship, betrayed a bias on the part of the Jockers team -- a bias which carried over into the computerized study methodology and insured that Sidney Rigdon, Solomon Spalding, Oliver Cowdery and Parley P. Pratt (all early 19th century Americans who did some writing meant for publication) would be credited with having written some parts of the sacred Mormon book.

    Additional criticism, offered by various Mormons possessing some knowledge of statistical methods, focused upon the appropriateness of Jockers' use of NSC (nearest shrunken centroid) classification as a means by which to assign authorship probabilities for various chapters in the Book of Mormon. The common reasoning behind this LDS disapproval seems to be that NSC methodology is best used to sort out the known contributors to a literary corpus, and that it should not be used to make authorship guesses, in cases where the list of contributing writers is undetermined.

    A rough analogy for the LDS argument can be stated in terms of a hypothetical anthology of short stories penned by, say, a half-dozen contemporary English writers. If the title-page of such an anthology lists only the names of the included authors, without specifying which parts of the book each wrote, then the anonymous short stories are at least confined to the authorship of a known list of author-candidates, and thus might be sorted out by various "word-printing" techniques -- including the employment of NSC methodology.

    If NSC classification techniques can only reasonably be applied to texts in which the anonymous authors are known to be confined to a certain list of author-candidates, then perhaps the LDS argument is a valid one. If NSC classification can be extended to sorting out authorship attributions for suspected candidates, then Jockers' methodology will be upheld in future additions to the professional literature.


    The Inclusion of Isaiah-Malachi

    In what was evidently a sort of experimental compromise, Jockers included the full text of the KJV Isaiah, coupled with the first four chapters of the KJV Malachi, as an artificial "known contributor" to the Book of Mormon. Since undisputed excerpts from Isaiah and Malachi are reproduced within the Book of Mormon text, the inclusion of this "known contributor" was evidently supposed to compensate for the above-cited problem of there being no agreed upon "true" author-candidates among the list tested for Book of Mormon authorship. The inclusion of Isaiah-Malachi satisfied the Jockers team, that the computerized methodology it utilized in the 2006-7 study could indeed recognize a contributing author with a high degree of accuracy. While this fact may have been pleasing to the Stanford researchers, their inclusion of Isaiah-Malachi inevitably resulted in the generation of a number of "false positive" authorship attributions, which only served to increase Mormon suspicions that practically the entire study results were comprised of false positives.

    In its reporting, the Jockers team ranked the probable authorship of each Book of Mormon chapter, in terms of how much its demonstrated use of language matched the "word-prints" of the team's author-candidates (which included Spalding, Rigdon, Isaiah-Malachi, etc.) The computerized methodology employed also provided listings for the second most probable author and the third most probable author of each of the book's chapters. When the authorship attribution/distribution charts were constructed for the study results, each bar depicting a Book of Mormon chapter was assigned its designated numerical value, producing bars of various heights for each author-candidate, across the entire text of the Book of Mormon. The Jockers team did not produce a master compilation representing ALL of the authorship attribution probabilities on a single bar graph, but if such a graphic had been constructed, it might have looked something like the Spalding/Rigdon/Cowdery/Pratt combination chart drawn up by this commentator.

    Obviously the total value of authorship probabilities assigned to any particular Book of Mormon chapter cannot exceed 100%. Therefore, the "false positive" authorship probabilities generated by the inclusion of Isaiah and Malachi (as quasi-19th century writers), has to some degree skewed, or adulterated, the numerical values for the other, actual 19th century author-candidates in several instances.

    In a subsequent re-run of their computerized classification the Stanford researchers removed Isaiah-Malachi from their list of author-candidates. The outcome of this removal required the reassignment of a few Book of Mormon chapters to different probable authors, but did not result in any major changes.


    The Inclusion of Joseph Smith, Jr.

    The 2008 Jockers paper provides the Stanford team's reasoning on the decision to leave Joseph Smith out of the computerized study. At a later date Jockers was able to construct what might be a reliable Smith word-print and he has employed that addition to the above mentioned re-run of the computerized classification. By adding in Smith (and leaving out Isaiah-Malachi) the study results changed enough to warrant some new reporting by the Stanford team. Besides composing a paper on the probable "voice" of Smith in the published LDS revelations (the Doctrine and Covenants), Jockers has also prepared un update of his 2008 data for Book of Mormon authorship attribution. Although these new findings remain unpublished (at the end of 2010) a preview of the updated results may be seen in charts featured at the "Book of Solomon" web-pages. For example, the "Book of Solomon" authorship charts for Mosiah, Alma part 1, Alma part 2 and Ether each include color-coding in their topmost tier, which reflects the most recent Jockers data. The authorship attributions presented in these charts differ somewhat from what is found published in the 2008 Jockers paper. Most notably, Joseph Smith, Jr. is credited with having written Mosiah 13, Alma 20, Alma 29, Alma 37-38 and Alma 41. It must be stressed, however, that these are merely preliminary, unpublished findings, which are subject to alteration when finally published.

    When the updated Jockers data is finally published, it would be useful if those results also included the probable authorship of the "Preface" to the 1830 Book of Mormon. That text was ostensibly the composition of Joseph Smith, Jr. and its inclusion in the NSC classification might help circumvent criticisms of there being no known Book of Mormon authors included in an application of NSC classification methodology. On the other hand, if the "Preface" was not written by Smith, perhaps its actual authorship could be determined by further "word-print" analysis and testing.


    G. Bruce Schaalje's Criticism

    In addition to repeating previous LDS claims regarding the supposed inappropriateness of Jockers' use of NSC (nearest shrunken centroid) classification as a means by which to assign authorship probabilities for various chapters in the Book of Mormon, Bruce Schaalje (a BYU Statistics Professor) has added a new criticism of the 2008 Jockers paper. Schaalje performed some statistical pre-tests, in order to ascertain whether or not any of the author-candidates selected for testing by the Jockers team had actually produced writings similar to Book of Mormon contents. Schaalje's objective appears to have been to determine the reasonableness of including 19th century writers such as Solomon Spalding, Sidney Rigdon, Oliver Cowdery and Parley P. Pratt in the list of suspected Book of Mormon writers. If it could be demonstrated that such persons could not logically have been among the possible authors of Book of Mormon chapters, then their exclusion from NSC classification would be doubly justified.

    Among the preliminary statistical examinations the BYU statistician performed on Jockers' textual data (available on-line) were Schaalje's various "pca" (principal component analysis) tabulations and graphic renderings. By this means Schaalje was able to combine all of Jockers' selected texts (including the Book of Mormon chapters) into a single data pool. The contents of that literary data pool could then be sorted out by various means -- for example, by determining a mean value for the mass of frequently occurring non-contextual words found in the texts, and then comparing each individual text to that mathematical average. The actual contents and process is more complex than the description here stated, but Schaalje's results consisted of all the principal coordinates determined for representation in two dimensional chartings of his (or Jockers') multivariate data. All the included texts could thus be ranked in order of their numerical deviation from the entire data pool's mean value, and then tabulations and charts prepared in which two paired measurements were expressed (as +/- x/y values) in plots on a four-quadrant chart. For example, the values of each text's greatest, and second greatest deviation from the mean could be depicted on a pca chart such as this:


    (view higher resolution image)

    The above chart was adapted from a preliminary graphic prepared by Bruce Schaalje for inclusion in an unpublished paper addressing the results of Jockers' Book of Mormon authorship attribution study. The data represented in the graphic are the 239 Book of Mormon chapters (plotted as blue circles) and the texts selected by Jockers as his 19th century author-candidates. Schaalje removed the pseudo-author Isaiah-Malachi and the actual authors Barlow and Longfellow from the chart generation process, in order to simplify the results therein represented.

    As can be seen in Schaalje's pc1/pc2 chart, nearly all of the Book of Mormon chapters cluster in a cloud of plots spanning the +/+, +/- and -/+ quadrants. Hardly any of the Book of Mormon plots fall into the -/- quadrant, where the majority of the 19th century writers' texts are located. Indeed, the various 19th century authors tend to cluster into small clouds of their own, which only partially overlap.

    Since so few of the Book of Mormon chapter plots fall anywhere near those of the 19th century writers, it appears reasonable for the observer to conclude that the output of those writers has little in common with the Book of Mormon. Put another way -- the Book of Mormon chapters must share some attributes not found in the 19th century writers' works; and the 19th century writers must be somewhat alike, in ways that they are not like the Book of Mormon.

    Taken at face value, Schaalje's findings might convince the non-specialist "layman" reader that there is no reason to pursue any additional testing of Jockers' author-candidates. However, professional analysts may reach different conclusions. When the principal component analysis is extended to produce pairings of the third, fourth, fifth, etc. greatest deviations from the mean, those additional charts appear to show over-all comparisons in which some Book of Mormon chapter plots comingle with those of the 19th century writers. In particular, Sidney Rigdon's texts' plots may be found in close proximity to some Book of Mormon chapters' plots on these supplementary pca charts. Clearly some additional research and analysis is called for, prior to our accepting Schaalje's findings as a preemptive disqualification of Jockers. Unfortunately Schaalje's paper is currently unavailable for consultation, having been withdrawn from its previous location on the web, where, in 2010, it appeared under the heading: "'Extensions to the nearest shrunken centroid classification method, with special reference to Book of Mormon authorship' submitted to Literary and Linguistic Computing, April 17, 2009."


    More on Schaalje's Rebuttal

    Although Professor Schaalje has withdrawn his pre-pub draft paper from public view, a copy may still be viewed on the web, via an arcane Google word search. The reader must first of all search for the following string of words at the Google site: "Jeff, amateur statistician or not, you hit the nail on the head." When the search engine returns its results, Google's "quick view" option will retrieve a cached version of Schaalje's paper.

    Glenn Thigpen, after consulting "“Extensions to the nearest shrunken centroid classification method, with special reference to Book of Mormon," provides the following summary:

    Author Candidates were Joseph Smith, Sidney Rigdon (1831-1846), Sidney Rigdon (1863-1873), Solomon Spalding, Oliver Cowdery, and Parley P. Pratt using the NSC methods from the Jockers study. The tests were run on the fifty-one papers known to be authored by Alexander Hamilton without Hamilton included as a possible author.

    Results:

    Sidney Rigdon - 28 of 51 with posterior probabilities ranging as high as 0.9999
    Parley P. Pratt - 12
    Olivery Cowdery - 11
    Joseph Smith Jr. - 0
    Solomon Spalding - 0

    Bruce added a "latent author" variable to test for the possibility that none of the candidate authors were actually the true author of the text. The math is beyond me but to quote from Bruce's paper "We propose a latent author with a distribution of literary features just barely consistent with the new text."

    Using the latent author method and the same candidate author set, two of the papers were attributed to Rigdon and the rest to the latent author.

    A third experiment was run, this time using Hamilton as an author candidate, using twenty-five of the 51 papers as a training text for Hamilton and using the remaining twenty-six texts for the actual tests. Hamilton was identified as the author of all of those texts.

    Please note that this information was taken from a draft preprint of the paper. Bruce has polished it up because he said that "there has been some movement" on the publication of the paper, so we would expect some differences in the final product. Bruce did not indicate that there were any changes to the conclusions and data.

    From what I can understand of the graphs that Bruce has produced on his paper, the two texts that Rigdon scored on against the latent author were included in the twenty-six final texts using Hamilton as an additional candidate.


    Additions by G. Bruce Schaalje

    On Aug. 29, 2010 Bruce Schaalje personally added some clarification to Glenn's previous comments, as follows:
    The real problem [with the 2008 Jockers' conclusions] is that Jockers and Criddle completely misundertstood the meaning of the phrase 'relative probability.' They admitted in one sentence of their paper:

    "For each chapter of the Book of Mormon, using both NSC and delta, we compared the relative probability that a candidate author or a control author contributed to that chapter"

    -- that the NSC method, like the Burrows’ Delta method only produces relative probabilities. Actually Burrows doesn’t even go that far; he simply says that they are probability RANKINGS, and give very little information about even the relative sizes of the probabilties. They only give information about the probability orderings. But Jockers and Criddle then ignored the fact that they had at best relative probabilities, and interpreted their probabilities as ABSOLUTE PROBABILITIES. For example, Jockers and Criddle said, right after the above quoted sentence:

    "In Alma forty-seven (Chapter 147), for example, the first place ranked candidate (using NSC) has a probability of 46.5% where the second place candidate is 46.3%. Given this close proximity, it would be impossible to conclude that one candidate is more likely than the other. In the majority of chapters, however, we do not observe this sort of close probability between first and second candidates. Most chapters (57%) show at least a fifty percentage point difference between first and second choice. Indeed, in forty chapters (17%), the difference between first and second most probable author is over ninety percentage points."

    -- This is utter nonsense. These aren’t probabilities, and YOU CAN’T TALK ABOUT DIFFERENCES between relative probabilities. In other words, if author A gets an NSC 'probability' of 80% and author B gets an NSC probability of 10%, all that can be said is that author A is 8 TIMES more likely to be the author of the text than author B. Author A is NOT 70% more probable than author B, It could well be that author A's true probability of authorship is only 1%, implying that author B’s probability of authorship is 1/8%. Even if some author got a relative probability of 99%, it could still have an absolute probability of 1%. There is no way of knowing.

    Hence, graphs like Jockers’ and Criddle’s Fig. 8 (A) "Chapter-by-chapter probability of Sidney Rigdon as author" ARE COMPLETELY DEVOID OF MEANING. Relative probabilities have no meaning in and of themselves – they have to be compared to something, and even then they give NO IDEA of what the absolute probabilities are.

    The Hamilton [material] in our [as yet, unpublished] paper was not the main point. We just wanted show that how silly the relative probabilities can be if they are interpreted as absolute probabilities.

    The main message of our paper (Schaalje et al.) is that the by positing an unknown (latent) author, some idea of the absolute probabilities can be derived. For Jockers' and Criddle’s Book of Mormon chapter data, it turns out that the absolute probabilities of authorship by Spalding or Rigdon are, almost without exception, very small. The deliberate archaic language of the Book of Mormon may have something to do with this, but that was not anything discussed by Jockers and Criddle.

    Another point that I have been trying futilely to get through to Dale [Broadhurst] is that it makes NO DIFFERENCE if, as Dale says,

    "When the principal component analysis is extended to produce pairings of the third, fourth, fifth, etc. greatest deviations from the mean, those additional charts appear to show over-all comparisons in which some Book of Mormon chapter plots comingle with those of the 19th century writers."

    The pca plot is one view of the data. DIFFERENCES THAT APPEAR IN ONE VIEW ARE REAL DIFFERENCES, even if we can make them disappear by viewing the data from another point of view. A difference that appears in one view of the data is a real difference, PERIOD. I won’t take time to go over my q-tip explanation again, but read about it in previous threads [at the MA&D message board] on the subject.

    One other thing. I didn’t remove our paper from the Web. Our department changed our homepage, and somehow the link to the preliminary version of our paper (for Literary and Linguistic Computing) got broken. I won’t put the paper back up, however, because we have made several changes to the paper in accordance with the reviewers’ comments (which we finally received).


    A suggestion for future exploration

    Since there is currently a volatile disagreement over the appropriateness of applying NSC classification methodology to bodies of texts for which the authorship cannot be independently established, perhaps it would be best to place such disputation and inquiry "on hold," so that textual researchers can pursue other lines of investigation, free of the NSC controversy. One type of discovery seems relatively non-controversial, and that is the mapping of known Book of Mormon authors' voices across the book's entire text. At the very least, some additional effort devoted to this project might tell us more about the composition of the "Nephite Record" and how it varies in structure, content and language, chapter-by-chapter. Some laudable work has already been done along these lines and it could easily be extended and refined.

    Consider the three small selections of Jockers' charted data, taken from the middle of the Book of Mormon (illustrated in the excerpt provided below):


    (view higher resolution image)

    Jockers attributes selection #1 (Alma 47-49) to Solomon Spalding, with a relatively high degree of probability and a relatively low degree of intrusion (secondary attributions) from other possible authors. In selection #2 (Alma 59-60), much the same can be said for an Oliver Cowdery authorship attribution. Finally, in selection #3, (3Nephi 23, 26) two chapters are exclusively assigned to Sidney Rigdon, with a very high degree of probability. Is it possible that these three different sections of the Book of Mormon text had three individual authors? Jockers thinks so -- and even if he is wrong in his identification of those three authors, he may be correct in concluding that the three excerpts from the Book of Mormon came from three different writers. This is a deduction that might be tested by various different computerized methodologies.

    The first step that needs to be taken in such a determination, is an examination to discover whether the individual chapters in the three different selections are respectively homogeneous. Is there reason to conclude that Alma 47, 48 and 49 all came from the same pen? If so, do contents of the three chapters resemble one another to such a degree that a common word-print can be derived by combining their texts? The same questions might be asked regarding Alma 59-60 and regarding 3Nephi 23, 26.

    Assuming for a moment that there is sufficient reason to attribute these three excerpts to three different authors, would it then be possible to derive three separate, unique word-prints from those selections? If word-printing can be successfully carried out, such that three different authorship "voices" are thus identified, what might be the results of comparing those three word-prints to the entire Book of Mormon?

    Such an experiment, carried to conclusion, would avoid the professed problems raised by Prof. Schaalje in his criticism of the 2008 Jockers paper. No matter the identity of those three derived "voices," they could be constructively compared with the remainder of the Book of Mormon, in an effort to discover more instances of their presence in the full text. Were such an experiment carried out, using NSC classification, what authorship distribution patterns might be expected in the final results? Would those charted sets of authorship attribution correspond closely with Jockers' 2008 assignments for Rigdon, Spalding and Cowdery? Or would the results provide a picture markedly different from the one obtained by charting Jockers' data?

    Perhaps some initial hints of the probable results of this sort fo experiment could be obtained in advance by consulting Schaalje's pca charts. Do Alma 47, 48 and 49 cluster near one another in Schaalje's findings? Do Alma 59 and 60 plot next to each other? How about 3Nephi 23 and 26? Recall also, that Schaalje only released charts for pc1/pc2 data. If additional data sets for pc3/pc4, etc. were generated, where would the "voices" derived from the three suggested Book of Mormon excerpts plot out, in respect to the plots for Cowdery, Rigdon and Spalding? These seem to be questions that could be answered fairly easily, given a little time and expert effort. And the answers might help us determine whether Schaalje or Jockers has come closer to determining verifiable Book of Mormon authorship.


    A final word

    Readers of these comments should keep in mind that the 2008 Jockers paper identifies Solomon Spalding, Sidney Rigdon, Oliver Cowdery and Parley P. Pratt as contributors to the Book of Mormon, only in terms of relative probability. If more author-candidates were appended to Jockers' list, the new additions would likely score at least a few "hits," and thus reduce the probability percentages already obtained for the original author-candidates. If the "true writers" of the Book of Mormon have been somehow left off of Jockers' list, then of course they would score no probability percentages at all. It is also possible that no "true writers" of the book were included among Jockers' selection of 19th century writers, and that no matter what probability percentages were assigned in the 2008 paper, all those authorship figures are meaningless in the real world.

    The Stanford researchers are confident that they have identified at least two or three of the true writers, based not only upon their topmost scores in the calculation of percentages for each Book of Mormon chapter, but also in the size of the gaps registered in numerous instances where Spalding, Rigdon, Cowdery or Pratt came out as the most likely author. Say, for instance, that Rigdon is rated as having a 90% probability for having written a certain Book of Mormon chapter, and that the next nearest author-candidate for that chapter scored only a 5% probability. The 85 point gap between first and second place in the authorship attribution results would indicate the existence of some language phenomenon other than simple coincidence. The sum total of all such gaps' numerical values tells an informed statistician, that something other than meaningless numerical randomness must be responsible for those sizeable separations.

    So, the Jockers team may have had the good luck to discover the "true writers" of a number of the Book of Mormon's chapters. Or -- future statistical analysts may uncover and document mistakes in the methods and conclusions associated with the 2008 paper. So far almost two years have passed without the appearance of a challenge to the Jockers team in the peer-reviewed professional literature. When this unchallenged status comes to an end, that news will be inserted into these comments.

    (under construction)




     

    Return to Top of Page



    Spalding Studies  |    New Spalding Studies Library  |    Old Spalding Studies Library
    Mormon Classics     Cowdery Bookshelf     Old Newspapers     HistoryVault



    Last Revised: Aug. 30, 2010