JANE AUSTEN HAS NOT YET FOUND HER WORKS MUCH STUDIED through the processes of data-mining or “distant reading,” a relatively new investigative strategy made possible by computational analysis. Distant reading describes a process in which the raw textual data of multiple texts is fed to a computer to allow researchers to find textual patterns, such as frequencies of word usage, which an ordinary reader would not normally be able to perceive. In recent decades, distant reading has usually been employed to discover patterns within literary corpora which are larger than Austen’s six finished novels, corpora such as Irish novels or nineteenth-century census data. But now a team of literary and computational researchers at the University of Nebraska-Lincoln has created a site for public use, Austen Said: Patterns of Speech in Jane Austen’s Major Novels (austen.unl.edu), which, employing distant reading, should expand significantly what we can know about Austen’s practices of diction.1 For Austen Said, all six of Austen’s major novels have been encoded and then run through a program (TokenX) which determines unique frequencies of words.2 We distinguish each speaker by his or her speech in the novels (along with each speaker’s traits such as marital status, age, gender, and class) in ways that have found new and interesting aspects of Austen’s use of diction. Each word in the novels has been assigned to a given character or the narrator, or, as in the case of indirect speech, to a mix of characters or character and narrator. Importantly, we have found that the very act of determining exactly who speaks what where on a word-by-word basis uncovers new challenges to the interpretation and analysis of how Austen creates characters through speech.
The program we used, TokenX, works through creating frequency tables that sort for the unique utterances of a given character (or type of character, such as “female” or “male”).3 In some cases, one can make a kind of impressionistic picture of a given character through the list of his or her unique words. For instance, here are some of the words Darcy (and Darcy alone) uses in Pride and Prejudice (in these cases, twice each): accusations, carelessness, catch, consulting, defiance, existing, explaining, faithful, indirect, lessen, liberally, meanly, motives, offers, purposely, repetition, school, and support.
Words unique to the narrator of Pride and Prejudice are perhaps less surprising, as many of them set the stage or describe action, such as “followed” (25 times), “breakfast” and “listened” (20 times each), and “pause” and “seated” (15 times each).4
Direct dialogue, however, only tells part of the tale. One cannot account for all the intricacies of Austen’s use of diction by simply coding direct dialogue and leaving the rest to the narrator, because the narrator is often speaking in the voice of her characters, using free indirect discourse (FID). Outside of direct dialogue, free indirect discourse is the most common, economical, and sophisticated way novels relay information about thoughts and speech.5 For instance, early in Pride and Prejudice we learn what Mr. Bingley thought of the Meryton assembly at which he meets the Bennet family: “Bingley had never met with pleasanter people or prettier girls in his life; every body had been most kind and attentive to him, there had been no formality, no stiffness, he had soon felt acquainted with all the room; and as to Miss Bennet, he could not conceive an angel more beautiful” (16).6 The narrator is speaking, but the language is really Bingley’s, including his use of clichés and hyperbole (e.g., “an angel more beautiful”). One cannot account for diction in Austen’s novels without paying attention to indirect speech, especially FID. Bingley needs to have credit for saying “an angel more beautiful,” not merely because it is his language but also because it is plainly not the language the narrator would choose for herself. Because FID always blends the speech of the narrator with that of a character, it creates opportunities for tremendous narrative flexibility, as narrative voice can vary in terms of how closely it mimics the language of characters, and a narrator can make ironic points at a character’s expense even as the narration seems to let characters speak or think for themselves.7
Austen’s employment of FID was revolutionary, for while earlier authors had used it to some degree, it remained to Austen to take advantage of the wide range of how FID could be deployed to manipulate our ironic understanding of her characters. FID’s use in the eighteenth-century European novel was rudimentary; one finds some use in Goethe, for instance, and in Frances Burney and Samuel Richardson and some other writers, but nowhere is FID sustained or complex. As many scholars have noted, it took Jane Austen, publishing six novels between 1811 and 1817, to discover and exploit the full potential of FID, including FID’s capacity to display complex ironies. As William Galperin has pointed out, Austen’s discovery of what FID could do was comparable in the history of the novel to the discovery of the atomic bomb in the history of warfare; thereafter, things were never the same, and FID became a basic feature of the novel as genre.8 The complexities of Austen’s FID create fascinating challenges to the coding interpreter; in what follows, we attempt to demonstrate some of the questions which arise when one is trying to come to a reasonable (but no more than reasonable) hermeneutic certainty about who said what in her novels.
Thus, our quest to identify who speaks what in the novels required going beyond the determination of direct dialogue alone. Because of Austen’s widespread employment of FID, there are multitudinous points at which both the narrator and a given character are speaking at once (approximately 20 to 30% of the narrator’s language is FID).9 Accounting for all of the words in Austen’s novels thus requires marking shared speech, both FID and simple indirect speech (when one character reports the speech of another). We found that this marking is finicky and tedious work: it is done literally word by word, and for accuracy we felt it was necessary to incorporate a second review of each and every choice. During this process, however, we were able to uncover many peculiarities in Austen’s narrative voice that would remain elusive even to the careful reader, peculiarities that only become evident through the act of making a choice about who is speaking every word. In some ways, coding texts as thoroughly as we have for the Austen Said project constitutes a new mode of reading, in that making the decisions about who speaks what (or what combination of characters are speaking) requires negotiating Austen’s ambiguities through a process of assumptions about characters and mind-states which Austen has herself created. Throughout, we relied on informed inference about what a character was likely to say or think, inferences themselves made possible by Austen’s precise rendering of character.
Inferring the mind-states of characters was vital in making these determinations because in FID the narrator renders not merely the point of view of a given character (focalization) but also the flavor of a character’s speech or thought. In the most direct form of FID, the narrator ventriloquizes for the character; in other words, the reader senses that by changing pronouns from third person to first person, one would have a rendering of the character’s direct speech or thought. For instance, when Colonel Brandon first calls on Elinor and Marianne in London, Elinor’s question to him in FID can readily be turned into direct dialogue: “she asked if he had been in London ever since she had seen him last” becomes (inferentially) “she asked, ‘have you been in London ever since I saw you last’” (SS 162). This simple form of FID, however, is by no means the only way in which Austen mixes the voices of her characters.
It is a significant challenge in many places in Austen’s narratives to distinguish readily among speech acts by the narrator, a character, a character focalized (either through the narrator or another character), and free indirect discourse. There are no necessary grammatical markers of FID. While some FID instances in Austen are clearly marked by introductory phrases or punctuation, many others are contained in phrases much less grammatically revealing. One of the major problems presenting itself here is the inherent subjectivity of the reader in the experience of FID in a given text; as Richard Aczel argues, “the phenomenon [that is] FID is ultimately a construction of the reader working on the basis of contextual cues” (478). Some instances of FID are clearly identifiable via punctuation—for instance, through exclamation marks, since the narrator herself never speaks with such emphasis. For example, we discover FID in the narrator’s review of Harriet’s first experience at Hartfield: “the humble, grateful, little girl went off with highly gratified feelings, delighted with the affability with which Miss Woodhouse had treated her all the evening, and actually shaken hands with her at last!” (E 25). Here the shift from the narrator’s straightforward assessment of Harriet as a “humble, grateful, little girl” to Harriet’s view of Emma’s “affability” is marked not only through the change in perspective by which Emma is referred to as “Miss Woodhouse” but also through Harriet’s naïve excitement, expressed through that last poignant exclamation mark. FID is also signaled by shifts in tense and person, such as the employment of third person in the past tense with a modality, as, for example, when Elizabeth Bennet “asked her [mother] if Charlotte Lucas had been at Longbourn” (PP 43). Yet such seemingly clear-cut examples of FID constitute merely a small fraction of potential FID instances, and others are a great deal more variable than we had previously assumed.
Throughout, we encountered the need to distinguish between focalization and FID. Focalization occurs when the narrator renders a character’s point of view, feelings, motives, or thought. All FID is a kind of focalization, but not all focalization is FID. To be FID, the speech must have something of the flavor of the exact words the character would have used in either speech or thought; FID must be a kind of ventriloquism by the narrator. Focalizations which only convey a character’s feelings are often not FID, as in this passage from Emma: “Part of [Mrs. Weston’s] meaning was to conceal some favourite thoughts of her own and Mr. Weston’s on the subject, as much as possible. There were wishes at Randalls respecting Emma’s destiny, but it was not desirable to have them suspected” (41). These sentences speak to Mrs. Weston’s motives, hoping to conceal from Emma that she wishes Emma would marry Frank Churchill, but it seems unreasonable to conclude that Mrs. Weston would have said to herself something like “part of my meaning is to conceal some favourite thoughts of my own.” During focalization, the narrating voice remains present in the text, merely invoking the channeled character’s feelings without giving us what the character was thinking or saying in his or her own words.
Most often, focalization without FID is found in narrative instances in which a character experiences strong emotional reactions or urges or has sudden realizations, such as at the crisis point of the Box Hill expedition, when the narrator tells us, “Emma could not resist” (370). Here, the reader is invited to partake in an instinctive reaction of Emma’s; in a sense, Austen lets us momentarily be Emma in a knee-jerk reaction directed at Miss Bates, a reaction which the reader likely understands all too well. However, even if Austen invites us into Emma’s head and heart, the major factor discerning FID remains missing: Emma’s voice qua voice does not merge with the narrator. In discerning between focalization and free indirect discourse, we therefore often depended upon a readerly judgment about the character and his or her personality (to whatever extent possible, of course), for example by asking, “Would the character have used those precise words, or have had this exact level of self-awareness, in describing this situation?” In the example above, the question would thus become “Would Emma have thought the precise words ‘I cannot resist’ in this situation prior to teasing Miss Bates?” No: instead, what is described is the flash of succumbing to temptation; the words in Emma’s mind are presumably not “I cannot resist” but some version of the clever and devastating thing she actually says, “‘Ah! ma’am, but there may be a difficulty. Pardon me—but you will be limited as to number—only three at once’” (370).
The task of distinguishing between focalization and FID is made thornier when the narrator is describing the thoughts or speech of her wittier heroines, because it is sometimes difficult to discern the narrator’s ironies from those of her characters. For instance, here is Emma self-confidently reviewing the progress of what she assumes is growing mutual love between Harriet and Mr. Elton:
as she had no hesitation in following up the assurance of [Mr. Elton’s] admiration, by agreeable hints, she was soon pretty confident of creating as much liking on Harriet’s side, as there could be any occasion for. She was quite convinced of Mr. Elton’s being in the fairest way of falling in love, if not in love already. She had no scruple with regard to him. (42)
Her rating of Harriet’s “liking” of Mr. Elton, “as much . . . as there could be any occasion for,” seems an instance of FID, as it expresses her condescending attitude towards Harriet, and the judgment that Mr. Elton is “in the fairest way of falling in love” also expresses her airy manipulations. But does she say to herself something like “I have no scruple with regard to him”? We could imagine a case for so determining, but we could also imagine a case for seeing the judgment as simple focalization. The answer relies in part on distinguishing between the ironies that the narrator and Emma would distinctly employ—and we would suggest then that “she had no scruple with regard to him” is almost beyond the coder’s determinative grasp.
Distinguishing FID for Catherine Morland in Northanger Abbey is easier, as Catherine’s mind is no match for that of Austen’s narrator. For instance, when Catherine reviews her folly in having been over-led by Gothic novels to suspect General Tilney of having murdered his wife, the narrator tells us that
Charming as were all Mrs. Radcliffe’s works, . . . it was not in them perhaps that human nature, at least in the midland counties of England, was to be looked for. Of the Alps and Pyrenees, with their pine forests and their vices, they might give a faithful delineation; and Italy, Switzerland, and the South of France, might be as fruitful in horrors as they were there represented. Catherine dared not doubt beyond her own country, and even of that, if hard pressed, would have yielded the northern and western extremities. (200)
The complex syntax and the creative pairing of “pine forests” with “vices” as well as the ironic characterization of such places as the South of France as “fruitful in horrors” shows that the narrator’s voice is focalized on Catherine’s perspective, not engaging in free indirect discourse, and the final sentence confirms that the passage is simply focalized because it renders an impression of Catherine’s reasoning which includes the not-even-entirely articulated thought which would have been a thought had Catherine been “hard pressed.”10 The devastating irony of writing off Yorkshire and the West Country (“the northern and western extremities”) as possible sites for Gothic murders is not beyond Catherine’s feelings or more inchoate reasoning, but the exact expression of the idea in the narrator’s formulation is beyond her simpler mind.
Another complication we faced in this project is that free indirect discourse wavers in its intensity. Sometimes it seems as if one could simply replace the third person pronouns with first person ones, and there would be what could be inferred as the character’s direct speech, as in the earlier example of Elinor’s question to Colonel Brandon. In other cases, the FID seems less fully direct ventriloquism of a character; instead, the narrator’s speech has something of the flavor of—or only hints at—a character’s speech. To complicate the situation, the same passage of FID can change in intensity, as in the following passage which focalizes on Lydia’s perspective, with FID waxing and waning in its intensity:
Lydia was exceedingly fond of him. He was her dear Wickham on every occasion; no one was to be put in competition with him. He did every thing best in the world; and she was sure he would kill more birds on the first of September, than any body else in the country. (318)
Only the “her dear Wickham” of the first two sentences is FID, but very strong FID, as “my dear Wickham” is inferentially the exact thing Lydia goes about saying all the time, while the last sentence is also strong FID, complete with Lydia’s usual hyperbole and fixation on trivial measures of greatness (killing more birds than other people). We found scaling to be one of the more difficult parts of this project, for though FID clearly varies in intensity, ascribing it to a mathematical scale seems to offer a false scientific precision.11
Even a clear differentiation between indirect and free indirect discourse is often a complex matter. Speech often travels through a variety of characters in Austen’s novels before being delivered to the reader, most often by one character ventriloquizing another’s words. In Pride and Prejudice, for example, Jane asserts that “‘they are very pleasing women when you converse with them. Miss Bingley is to live with her brother and keep his house’” (15), information that the context implies came to Jane from Miss Bingley herself. In other words, if we were to trace the origin of these words, Miss Bingley may have originally said the words “I will live with my brother and keep his house,” or she may have only indicated the substance of this remark in other words. Thus, while this speech instance certainly fulfills the requirements for the question “would the character have thought or said this” which would rule out an instance of focalization, it does not register as coming from Miss Bingley’s unique mind-state in any discernible way. In this example, the difference between indirect and free indirect discourse calls for guesswork.
We were on firmer ground in a similar instance of reportage in Pride and Prejudice when, within Mrs. Gardiner’s letter to Elizabeth, we learn that Lydia had told her aunt that “[s]he was sure they should be married some time or other” (322-23). Here, not only is it highly inferential that Lydia very likely used or thought these exact words, but the reader additionally is afforded a personal visit into Lydia’s stubborn and careless mind. Even further, the narrator here seems grammatically to introduce Lydia’s words (“she was sure that”) to prepare the reader explicitly for a narrative shift.
The narrative shift such as “She was sure that” is one we termed “FID-introductions-per-narrator.” Although such introductions are a highly reliable grammatical indicator of FID, they seem only to occur in fewer than half of the cases of FID in Austen’s fiction. “FID-introductions-per-narrator” typically includes phrases which indicate a given character’s contemplation of something in combination with, or with the implication of, the word “that.” This pattern could include any term synonymous with “thought” or “said” in a variation from the established tense of the narrative, including, for example, “remembered,” “recounted,” or “had stated” (the past participle indicating a deviation from the previously established narrative tense), “had considered,” and “had questioned.” In other words, much of Austen’s FID is introduced by a kind of formula: “character” “verb expressing thought or speech” “that” (e.g., “Mr. Gardiner added in his letter, that they might expect to see their father at home on the following day” [PP 298]).12
In addition to these markers, any term indicating a passive transference of information from one character to another (or from the character to the narrator) that is subsequently transmitted to the reader also alerts the reader to a potential incidence of FID: for example, phrases such as “was told,” “had found out,” or “had always thought.” In these cases of passive transmission, FID becomes doubled—the narrator speaks through a character using another character’s voice—and sometimes even tripled, situations which only intensify the quandaries for the digital coder. Even further, we found that these grammatical indications for FID (these “introductions”) are often removed from FID occurrences by as much as several paragraphs and can even occur after the FID (e.g., “such were the thoughts of the character”). And, as noted before, many instances of FID came with no introduction at all.
FID without an introduction requires careful attention to context and to one’s knowledge of the character, because Austen readily shifts among focalization and FID, often without any grammatical marker of the change. One example of such shifts occurs in the following passage from Persuasion where they are palpable for the reader only in context (here Anne has just read Wentworth’s letter):
The absolute necessity of seeming like herself produced then an immediate struggle; but after a while she could do no more. She began not to understand a word they said, and was obliged to plead indisposition and excuse herself. They could then see that she looked very ill—were shocked and concerned—and would not stir without her for the world. This was dreadful! Would they only have gone away, and left her in the quiet possession of that room, it would have been her cure; but to have them all standing or waiting around her was distracting. (238)
The phrase “would not stir without her for the world” is FID because it expresses the clichés of concern uttered by Mrs. Musgrove and the other women in the room, while “[t]his was dreadful!” seems to be Anne’s FID, though it could be simple focalization. But arguably the phrase “would they only have gone away” is FID, while “to have them all standing or waiting around her was distracting” returns to focalization (Anne is distracted but it does not seem likely that she says to herself something like “to have them all standing or waiting around me is distracting”).13
Another new complexity revealed by our project concerns the personography, the list of characters who speak. Such a personography of Austen’s novels would be very simple if we were limited only to speaking characters and the narrator. But once one turns one’s attention to indirect speech and FID, the personography widens considerably, and the variants need to be recorded. Sometimes living characters cite the dead, as when Darcy writes to Elizabeth about his father’s expressed wishes in his will regarding Wickham. Sometimes a group of people render judgment through the narrator in FID, as when Meryton society seems to speak: “The Bennets were speedily pronounced to be the luckiest family in the world, though only a few weeks before, when Lydia had first run away, they had been generally proved to be marked out for misfortune” (350) (again, hyperbole such as “the luckiest family in the world” marks speech which does not belong to the narrator). Sometimes characters speak in chorus in indirect speech, as when Jane and Elizabeth on several occasions speak the exact same thing at the same time. There are also some occasions in which direct speech of one character is boxed within direct speech of another, as when Darcy quotes Elizabeth’s past speech back to her at a later point in the novel: “‘Your reproof, so well applied, I shall never forget: ‘had you behaved in a more gentleman-like manner.’ Those were your words’” (367).
Other FID complications include moments in which FID is rendered in direct speech (somewhat confusingly for the modern reader), as in the second chapter of Mansfield Park when Edmund Bertram has his first recorded conversation with Fanny:
“William did not like she should come away—he had told her he should miss her very much indeed.” “But William will write to you, I dare say.” “Yes, he had promised he would, but he had told her to write first.” “And when shall you do it?” She hung her head and answered, hesitatingly, “she did not know; she had not any paper.” (16)
Sure of himself, Edmund speaks in the first person, but Fanny’s direct speech is so hesitant that Austen renders her speech in third person, in FID. Fanny switches to direct speech a few lines later, under the pressure of true alarm, as she exclaims, “‘My uncle!’” (16).
Or FID can arise within FID, as in this example from Sense and Sensibility, which closes the second volume, in which Sir John reports to the Dashwood sisters about the reception the Steele sisters have enjoyed while staying with Fanny Dashwood:
Sir John, who called on them more than once, brought home such accounts of the favour they were in, as must be universally striking. Mrs. Dashwood had never been so much pleased with any young women in her life, as she was with them; had given each of them a needle book, made by some emigrant; called Lucy by her christian name; and did not know whether she should ever be able to part with them. (254)
Here Sir John’s FID speech includes the FID report of the Steeles’ reception either from Fanny Dashwood or Lucy; either Fanny (or Lucy reporting her words) must have said something like “I have never been so pleased with any young women in my life, as I am with them” and “I do not know whether I shall ever be able to part with them,” for this particular kind of feminine gush is not within the pattern of Sir John’s speech.14 The reference to “some emigrant,” brusque and dismissive, argues for the FID as issuing from Lucy, because the source of the gift does not matter to Lucy, even though the emigrant is probably an impoverished woman who had escaped revolutionary France; rather, what matters to Lucy is the mark of regard the gift comprises.
Or, direct speech can arise within FID, as when Elizabeth muses on Darcy’s crimes just before his first proposal. Here the narrator renders Elizabeth’s thought in FID while incorporating within that FID the direct speech of Colonel Fitzwilliam:
He had ruined for a while every hope of happiness for the most affectionate, generous heart in the world; and no one could say how lasting an evil he might have inflicted.
“There were some very strong objections against the lady,” were Colonel Fitzwilliam’s words, and these strong objections probably were, her having one uncle who was a country attorney, and another who was in business in London. (186)
Obviously, the statement about “strong objections” directly quotes Colonel Fitzwilliam, as it is even contained in the appropriate quotation marks and rendered in the first person as direct speech. Colonel Fitzwilliam is not present in this scene, however; instead, Elizabeth merely remembers his words verbatim. Simultaneously, we experience the scene from within Elizabeth’s mind for it is clearly Elizabeth whose state of confusion is rendered and who remembers those words; we find ourselves in a clear case of free indirect discourse. From a narratological standpoint, then, whose words are these? Although originally spoken by Colonel Fitzwilliam, can a different character gain “ownership” of a speech utterance enough to divorce both the original speaker and the implied narrator from those words, as Elizabeth seems to do here? Again, the very act of trying to make judgments fit for the distinctions required by coding has not only exposed such problematic considerations as these but has also led us to think about Austen’s narrative practices in a different light.
Beyond these complications lie others, as our speaker-based coding of the novels reveals many complexities in regards to the transmission of speech (or knowledge) within the novels’ larger contexts. For example, in Austen’s novels we deal repeatedly with letters, through which knowledge is passed not only to the reader, but also from one character to another. In The Technology of the Novel, Tony E. Jackson proposes that letters in Austen connote not only the relative permanence of the written word even within her own writing, but, more importantly, that the inherent opportunity of “a rereading will always, to one degree or another, reveal some change in the reader” (38). In one example, he focuses on Elizabeth’s rereading of her sister Jane’s letters, through which Austen indicates the change that has taken place in Elizabeth’s consciousness (and conscience), and argues that Austen inserts the rereading scene into her narrative “like an engineer who senses that a user of her invented product will simply enjoy that product without any awareness of its actual complexity” (40).
But what happens if the transmission of information in letters becomes even more complex, as happens later in the novel? In chapter ten of the third volume, the reader learns the circumstances surrounding Lydia’s elopement with Wickham through Elizabeth’s perusal of a letter written by her aunt, Mrs. Gardiner. In this letter, Mrs. Gardiner relates to Elizabeth (through a second level of transmission if we consider Elizabeth’s reading as the first transmission to the reader) what she has learned from her husband (third level) as a result of Mr. Gardiner’s conversation with Mr. Darcy (fourth level), who in turn has found out from several others (including a “Mrs. Younge” who seems aptly named as the kind of person who harbors young fugitives) what had happened to Lydia and Wickham (fifth level). The task of conclusively coding for the (original) speaker forces us to consider all five levels of information transmission and to mark this speech/writing act as occurring through such channels as Mrs.Younge-told-Darcy-told-Mr.Gardiner-told-Mrs.Gardiner-told-Elizabeth-told-reader. Complexities such as these also remind us to consider the narratological implications of a passage that is otherwise easily “read over” in terms of its implied speakers.
Another interesting consideration in the context of knowledge and language transmission concerns clear falsehoods. For instance, in Northanger Abbey, John Thorpe tries to convince Catherine to join him on a ride and to abandon her previous commitment to take a walk with Henry and Eleanor Tilney. Thorpe claims to have seen them enter a carriage earlier and to have overheard Henry Tilney exclaim to a passerby “‘that they were going as far as Wick Rocks’” (86). As Catherine (and the reader) find out soon thereafter, this is a blatant and, as far as can be said of a fictional character, an intentional lie. But, at that specific moment of Thorpe’s utterance, both Catherine and the reader are led to believe that Henry actually had said those words. For the purpose of tracing the origin of the words by character, then, should these be considered Henry’s words, or are we to employ retrospective knowledge and attribute them (correctly) to Thorpe? Our solution was to code them as indirect speech but not as FID, as Thorpe is quoting an imaginary Henry Tilney, not the real one; this solution may show that sophistry sometimes entered our process!
The Austen Said project required us to make attributions of speech instances to specific characters as correctly as possible. We were thus forced to track language continually while reading: who speaks at what time? who repeats someone else’s words? is an awareness spoken out loud within the novel or is it just a thought of the character? is the narrator merely focalizing a character, or are these the words which that character would actually have phrased in this way? These kinds of questions are certainly not considerations which the casual reader of Austen’s novels has to make; in fact, the reader does not notice these complexities, generally, just as we do not pay much attention to occasions of indirect speech or FID in our own lives—they seem natural, as water seems to a fish. And many of the intricacies and dilemmas we have discovered do not matter much even to the closer examination of the narratives—it probably does not make a difference to most scholars, for instance, which “one of the girls,” either Louisa or Henrietta, whispers to Captain Wentworth the information that Mrs. Musgrove is “‘thinking of poor Richard’” (67) in chapter 8 of Persuasion.15 Nonetheless, the Austen Said project, and the different mode of reading we were forced into for its realization, clearly reveal new knowledge and prompt new critical questions in terms of Austen’s narrative craft.
1Austen Said: Patterns of Diction in Jane Austen’s Major Novels (austen.unl.edu) is a team project that has flourished under the sponsorship of the Center for the Digital Research in the Humanities [CDRH] at the University of Nebraska-Lincoln, in part through a Digital Humanities Fellowship award to Laura White from 2012 to 2014. The team includes Laura White (John E. Weaver Professor of English); Brian Pytlik Zillig (Professor of Libraries, Digital Initiatives Librarian at the CDRH, and creator of TokenX); Laura Weakly (Metadata Encoding Specialist, CDRH); and Karin Dalziel (Digital Design and Development Specialist, CDRH). Stephen Ramsay (Susan J. Rosowski Associate Professor of English, Fellow of the CDRH, and major developer of the MONK [Metadata Offer New Knowledge] Project) provided important early algorithmic analysis of the Pride and Prejudice data, and we have also consulted with Matthew Jockers (Associate Professor of English and Fellow, CDRH) about algorithmic comparisons between the Austen data and larger data sets of eighteenth- and nineteenth-century novels. David Moberly, then a master’s student in the English Department, was the project’s research assistant in spring 2011; Carmen Smith, a Ph.D. student in English, began in that position in spring 2012 and was, with Laura White, the main coder of all the original texts.
2The texts were coded according to the Text Encoding Initiative [TEI] guidelines and run through TokenX, a program that provides an easy-to-use interface for text analysis (especially through frequency tables) and visualization.
3Austen’s texts have been studied through frequency tables before. Burrows’s 1987 study tracks the frequency of very common words, such as “of” and “the,” to find surprising correlations between characters, and even the development of character through narrative time. His results are most striking with Fanny Price, perhaps unsurprisingly, as Fanny is the only heroine whom we see speaking in childhood.
4Users may wish to create word clouds from the data they can collect from the frequency tables of unique utterances to create a rendering of a given character, though users must keep in mind that what is here tracked are only unique utterances of individual words, not words shared with any other character or the narrator.
5For a bibliography of scholarly discussions on free indirect discourse and the larger problem of ascribing mental states and utterances to characters, see the “Background—Further Reading” section of Austen Said.
6Citations throughout this essay are to the Chapman editions of the novels. The texts used in Austen Said were created by comparing two previously digitized, open-source editions—one from the University of Adelaide, the other from Project Gutenberg. When differences occurred that were not obvious Optical Character Recognition errors, a third source, noted in the metadata for each text, was consulted.
7As the International Society for the Study of Narrative defines FID,
Free indirect speech [or] free indirect discourse involves both a character’s speech and the narrator’s comments or presentation. Famously utilized by James Joyce, free indirect discourse is a more comprehensive method of representation—one which many times makes indistinguishable the thoughts of the narrator and the thoughts of a character. Thus, the method typically privileges the past tense, yet cannot be discerned through merely grammatical indicators. (“Free Indirect Discourse”)
We would amend the definition provided by the august society by pointing out that FID was even more famously utilized by Jane Austen.
8Personal communication, New Brunswick, New Jersey, March, 2006.
9The “Novel Visualizations” section of the site renders a color-coded view of these different kinds of diction in the novel.
10It is possible that the zeugma which links “pine forests” with “vices,” as equally inhabiting the South of France, could be accounted as a shade of FID if one takes the pairing as one made by Catherine’s artless mind—she thinks of forests and vices—but the zeugma itself reflects the narrator’s ironic skill, which Catherine lacks.
11We encourage readers to look at “Novel Visualizations” for a visual record of the scaling, where more intense colors represent more intense FID. This section of the site allows the user to see patterns of indirect speech and FID, including the intensity of scale of FID, at a glance. You can scroll through the pages of a given novel, noting where indirect speech is most predominant and where it ebbs. You can also see exactly how we coded individual passages, and if you disagree with our interpretation of a given passage, we would be glad to hear your case for a different one.
12These introductions, where they occur, have been coded throughout and can be seen in the “Novel Visualizations” part of the site.
13Such complexities allow a potentially hostile view which would argue that coding for instances of FID in a given text necessarily deducts from the objectivity of the project as a whole and depends entirely on the subjectivity of the editor(s). It is certainly true that subjectivity entered into our determinations—but it was a subjectivity trained by Austen’s own narrative clues about what a given character would indeed say beyond what is recorded in the novel as his or her direct speech.
14I thank Susan Allen Ford for the suggestion that the FID here may be of Lucy’s speech. Knowing Lucy, the reader may even suspect that, if this is a rendering of Lucy’s speech, Lucy may be lying or stretching the truth in her report of Fanny Dashwood’s speech. It would not be beyond her. On the other hand, there is substantial evidence that Fanny Dashwood does indeed favor the Steele sisters (partly to make her sisters-in-law feel bad), and so the FID could come from Fanny herself, in which case “some emigrant” becomes the thoughtless characterization of Sir John.
15As it happens, we coded this speech as Louisa’s, for Louisa is in the ascendency with Captain Wentworth at that point and is in general the more forward of the two sisters.