Text analysis is described as “the use of computers as an aide in the interpretation of electronic texts” (Sinclair and Rockwell 242), and is often used as an aid to understand large document collections that are too extensive for a close-reading (Huijnen et al. 72). However, a study by Gibbs and Owens, using a sampling of historians, found that the respondents had little interest in using digital tools as a method for historical analysis.[1] Indeed, Huijnen et al. suggest that “historians have only just begun to explore what it means doing history from the perspective of both humanities and computer sciences” (72). There maybe various reasons why historians have not opted to use text analysis tools on a larger scale, and is discussed elsewhere by Robertson. Nevertheless, Huijnen et al. believe that text analysis tools are useful for historical research as they often “trigger historians, to draw their attention to potentially interesting cases to explore” (83). In order to investigate this further, this blog post looks at Voyant Tools, for the purpose of examining text analysis as a complementary tool for qualitative historical research.

Voyant ( is a free, online text analysis program which provides good support documentation, and is compatible with a wide range of document formats, including plain text, HTML, XML, PDF, RTF, and MS Word (Sinclair and Rockwell 259). The tool allows users to interact with the text through the visualisation of a word cloud of most frequent words and Keyword-in-Context (KWIC) displays where the word can be found in the context of a sentence. It also allows for the generation of graphs of word frequency within a single text, and multiple texts. While the interface was designed to provide “a low technical bar of entry” for humanists, the tools also provide “more advanced operations” (Sinclair and Rockwell 259). For example, alternative visualisations of word frequencies and trends are available through other tools in the Voyant environment such as Bubblelines and Knots.



Voyant offers three options for uploading text as shown above. If the tool is being used to compare online documents through the use of URL’s, users need to be aware that supplementary or introductory information on a page with a text transcript will also be analysed and affect results. Thus, the preparation of files to be analysed is the first step to a successful experience with this tool. Once the text is prepared and uploaded to the ‘Add Text’ box, a series of windows show a breakdown of the text, and the gear icon in any window can be clicked, to apply a filter for ‘stopwords’.



One of the advantages of this tool is its ability to export results from the main environment, through “live tool widgets” which can then be embedded in blogs and websites (Sinclair and Rockwell 259). This adds support to an argument, as other researchers may corroborate the findings for themselves.




While historians tend to read primary texts closely, they may not always see word patterns which may add further meaning. By way of example, I uploaded to Voyant, the US President Eisenhower’s farewell address to the nation from 1961. The speech itself is considered to contain “multiple messages and nuances” (Ledbetter 2-3); however, its prominence lies within the warning to the American people, encapsulated in the sentence: “In the councils of government, we must guard against the acquisition of unwarranted influence, whether sought or unsought, by the military-industrial complex.” Whilst I have read this speech many times, I did not notice the word frequency in one particular paragraph until I used Voyant. To demonstrate this, I easily exported a Voyant skin to embed below.



The Corpus Summary reveals that ‘balance’ is the most frequently occurring word, with nine appearances in the entire text. Clicking on the word ‘balance’ in either the Cirrus Word Cloud, or Corpus Summary, will open a window on the right showing a word frequency graph. Subsequently, clicking on the highest peak in the graph will highlight the word in the Corpus Reader window to show where the word is mentioned most in the text. In this case, the word ‘balance’ is mentioned seven times in one paragraph. The tool is not designed to explain why this happens, but prompts a reason for further investigation, and not in the context of the “military-industrial complex”. Rather, I now become interested in this paragraph in terms of Eisenhower’s political philosophy, and consider whether the speech might also be aimed at members of the Republican Party, who merely pay lip service to republicanism? Of course this warrants further research, but, for me this has generated a line of inquiry which I had not considered hitherto.[2] Thus, in this instance, I agree with Huijnen et al. that text analysis tools have the potential to “trigger historians” towards alternative avenues of investigation.

In using the tool to compare multiple documents, the preparation of files is most important as Voyant does not handle text vs. date, or text vs. geography. In order to compare documents geographically, a user needs to divide a corpus into geographical areas; and in comparing texts over time, the files need to be prepared in a chronological order. This is reflected in the digital history studies by Baker at the British Library, Anderson at Rice University, and the Emory Library Project, though, their results seem to justify the time spent in the preparation of files. However, there is a scarcity of literature which speculates on the impact of Voyant on the historical community; thus, it is hard to assess whether potential results would justify the time and human resources needed to compartmentalise larger document collections for the purpose of text analysis through Voyant.

In demonstrating a simple example for uploading texts in a chronological order, I copied the URL’s from the Jimmy Carter Presidential Library and Museum for the State of the Union Speeches given by US President Jimmy Carter from 1978-1981 and entered them in order on separate lines. In the Cirrus Word Cloud, I clicked on the word ‘Soviet’, and revealed the following graph which I then exported.



As one can see, there is a very dramatic peak of the word ‘Soviet’ in 1980, this coincides with information that I already knew as I remember from my childhood the controversy surrounding the staging of the Olympic Games in Moscow that year. Nonetheless, for younger students, Voyant reveals that the Soviet Union was a hot topic for President Carter that year. Conveniently, clicking on the highest peak in this graph will bring a reader back to the data in the Voyant environment, and the reader can examine for themselves each sentence where the word occurs, and so, find out that the USSR invaded Afghanistan, an event that infuriated President Carter.[3] Thus, although the graph does not explain the reason for the peak in word frequency, Voyant nonetheless enables users to return to the data and investigate further for themselves, which is also beneficial as a teaching aid.

In conclusion, Voyant offers potential to reveal new areas of inquiry through word frequency patterns and is designed to be user-friendly for humanists. However, the preparation of files is important to achieve adequate results, as the tool does not handle text vs. date, or text vs. geography, and it takes time to put a large corpus of text in order. The tool provides for the exportation of results, and allows for others to return to the original data to corroborate findings, which is a significant advantage in terms of methodological transparency. Overall, I found the tool easy-to-use, with some interesting results, and would certainly use it again.



[1] From a sampling of 213 historians, mostly from Western Europe and North America, a study by Gibbs and Owens suggests that “finding references and information is a much higher priority than using tools to analyze primary sources”. Moreover, while respondents applauded the growing availability of primary sources online, there was “little comment about a need for, or interest in, any specific tools to help make use of these archives in novel ways.” Thus, Gibbs and Owens surmise that “the uses of digital tools among our respondents are of the most general kind: Google searches and the use of digitized primary and secondary sources” (italics in original).

[2] Sinclair and Rockwell suggest that “using computers to perform formal operations on texts does not require humanists to approach texts from a positivistic perspective: we can ask formal questions of texts in service of speculative or hermeneutic objectives” (255-256). This type of approach is expressed as “algorithmic criticism” by Stephen Ramsay who suggests “‘one would not ask how the ends of interpretation were or were not justified by means of the algorithms imposed, but rather, how successful the algorithms were in provoking thought and allowing insight’” (qtd. in Sinclair and Rockwell 256).

[3] Indeed, Jack Mattock, Director of Soviet Affairs in the State Department at the time, later wrote: “President Jimmy Carter reacted to the Soviet invasion of Afghanistan with a fury that at times failed to take into account the ultimate effect of his actions. He prohibited or severely limited most commercial ties with the Soviet Union. He appealed to athletes throughout the world to boycott the summer Olympic Games scheduled for Moscow in 1980.”

