A Statistical Analysis of the narrators of ‘Ulysses’ or ‘why ‘Ulysses’ isn’t wisdom literature’

The second time I read Ulysses,in advance of an undergraduate seminar, it was around the ninetieth anniversary of the original text’s publication. The newspapers were printing archive material relating to the novel, extended supplements about its importance from the usual quarters, as well as reviews of recently published monographs from both young and established scholars. Unfortunately, the critical trend of the time was to read Ulysses as wisdom literature. Critics urged prospective readers of the novel to wrest Joyce from the scholars and bring him ‘back to the people’. This school of thought treated Leopold Bloom as a model of the way in which the contemporary urban subject should be living: aloof, polite, well-intentioned but not dogmatic on political issues. Moderately informed, but more often wrong, a reader, but not self-serious, an everyman. Ulysses’ structural indebtedness to cornerstones of The Canon such as William Shakespeare’s Hamlet and Homer’s The Odyssey frequently undergirds this line of argument, demonstrative in itself of how easily high literary art and everyday life may be set next to one another. This generally requires critics to treat the characters of Bloom and Stephen Dedalus as two opposites in need of the other. Each has a little to impart on life, love and literature, whether it be to reflect a little deeper on themselves or their marriage, move past their respective losses or to find in each other their lost son/father.

This interpretation of the novel reads it along a linear trajectory, as Stephen and Bloom come together to form Blephen and Stoom. Through computation it may be possible to examine the writing style of later chapters, and determine whether or not they bear formal witness to this change in character. We must first however, consider the difficulty of locating where Joyce’s narrators actually are. Part of what makes Joyce’s writing style so unique is his use of free indirect discourse, a mode of writing in which the reality of the text is inflected by the consciousness(es) of the beholder(s). As such, putting a category on each episode of Ulysses as though it were narrated by one person or a combination of persons might seem reductive; it very much is. But in fusing computation and literature, certain assumptions have to be made.

In carrying out this analysis, I made use of R’s ‘Stylo’ package, which contains tools for breaking a number of texts into equal sizes, removing words which are not common to most samples, calculating the relative frequencies of these words, transforming these observations into new combinations of variables called ‘components’ with greater explanatory potential, and clustering them together. These words appear below:

These might seem like boring terms, as literary critics we tend to look past them to more evocative ones like ‘serpentine’ or ‘columbanus’ but unfortunately, in computational terms it is the relative frequencies of these ‘particles’ or ‘function words’ that provide the most secure means of modelling a writer’s particular idiom. These samples were then plotted on a correlation matrix, which can be taken as an index of similarity, based on where they cluster:

The six different narrators of Ulysses appearing in the index above are:

‘Anon’, who narrates the episode ‘Cyclops’

‘Blephen’, a composite delineation for episodes in which both characters feature, such as ‘Circe’, ‘Eumaeus’, ‘Ithaca’ and ‘Oxen of the Sun’

Bloom, who narrates ‘Hades’, ‘Calypso’, ‘Lestrygonians’ and ‘The Lotus Eaters’, Gerty, who narrates at least half of ‘Nausicaa’ (this is a controversial point within the literature, it might by Bloom who is narrating for her)

Molly, who narrates the book’s final chapter ‘Penelope’,

and finally Stephen, who narrates the first three episodes ‘Telemachus’, ‘Nestor’ and ‘Proteus’, as well as the novel A Portrait of the Artist as a Young Man, which has been thrown in here for comparison.

Here’s the same plot as above with the labels more clearly indicated

The first thing we could note is the gender divide. Molly and Gerty both spread over to the right, with Molly as an outlier. Both are more proximate to the A Portrait samples than any other, which are all taken from the earlier parts of the novel, suggesting that Joyce writes women and young children using the same number of words at the same rate. As the Gerty samples move through the episode, they move closer and closer to the Bloom cluster, visually conforming that the episode starts in Gerty’s voice before he takes over, and that Bloom doesn’t think much of women’s intelligence in the main either.

Overall we can say that there doesn’t look to be a fusing of perspectives here as such. Rather than the Blephen episodes meeting halfway between the Stephen and Bloom, Stephen and Bloom already seem quite comfortably clustered at the novel’s outset. Based on the divide between Stephen’s episodes of Ulysses and A Portrait, we might say that the way in which Stephen narrates A Portrait is very different from the way in which he narrates Ulysses.This is justified I think by how sensitive the analysis is to changes in narrator, demonstrated by the Gerty/Bloom example already discussed, as well as the fact that the earlier part of Aeolous, in which Bloom is present, clusters with his samples, whereas the second part, after Stephen’s entered, clusters with the Stephen samples.

Below is the plot with the Portrait samples removed:

Words Stephen’s narration is most likely to use in comparison to Bloom
Words Bloom’s narration is more likely to use in comparison to Stephen

 

There are a number of ways one could use these results to interrogate the notion of Ulysses as wisdom literature. We could begin by asking after the gendered aspects of the adjective ‘wise’, and ask why so many of these books which teach us how one might best live are written by men (and how tone-deaf this argument can sound because to read Ulysses one might almost think married women weren’t let out of the house) or we could ask what interests an Irish model of bourgeois respectability might serve, along the lines of an Irish ‘keep calm and carry on’ poster.

Ulysses as a guide to life risks rendering it a novel of parts coming together, the middle-class intellectual and the middle-class working stiff holding hands across whatever barricade is supposed to be dividing them. Not that I would go to the other extreme and frame it as one of dissolution. Ulysses’ shape is one I would be loathe to put a vector to in fact; to say that Stephen and Bloom’s relationship moves from a) state to b) state would be too easy by half.

What makes Ulyssesan interesting novel to me is its self-referentiality, the dialogue it establishes between the novel and its supposed referent of ‘real Dublin’, which is made most clear in ‘Circe’, but also in the book’s other failed attempts to understand itself, as in the cases of the characters referenced as being in particular places at particular times who may or may not be Bloom, the McIntosh mystery or the puzzle of crossing Dublin without passing a pub. In this context, I think ‘Eumaeus’ appearing as a stylistic outlier is significant.

It is in this episode that we get information about a sequence of coincidences, and resonant differences between Bloom and Stephen’s lives. The depth of these coincidences (which I won’t provide a summary of here, because I think they’re among the most poignant parts of the novel) gesture towards something a bit more cosmically ordered than the rest of the novel even as they take place within the circumscribed rituals of Irish urban middle-class life in the early twentieth century. ‘Eumaeus’ is written in a chill tone which most closely resembles that of a scientific paper, eliding the indirect discourse which ostensibly defines the rest of the text, and it is the fact that these connections are raised here rather than anywhere else that the true interest in their relationship, such as it is, is to be found.

These connections which remain unrealised by the two, rather than bring us to some Forsterian notion of connection should raise instead questions of alienation and of their unity in separation. It presents problems both epistemological and political, about how our reality is structured, the means through which it is circumscribed and how it is more defined by how little of it we are aware of rather than how much. Rather than teaching us ‘how to live’ Ulysses shows us how we do not live, how we probably won’t live and how it could so easily have been otherwise. It is no more an explanation for life as it is an explanation of itself, or Homer, or Ireland.

Literary Style and the dialectic

The notion of literary style is a fraught matter for critics. This is not just since the cultural and textualist ‘turn’ of the sixties and seventies, when post-structuralist methodologies became commonplace in university departments. Rather, the origin of style brings us to the origin of the individual and it is for this reason that Frederic Jameson believes ‘style’ to be a bourgeois concept. In an account which accords with Hans Georg-Gadamer’s, which locates the word’s origin in the context of jurisprudence, Jameson argues that style owes its existence to the classical notion of rhetoric, as interpreted in nineteenth-century pedagogy, the means by which an orator might speak in a form which is appropriately ‘high’. In both of these accounts, style’s interconnectedness with the rise of bourgeoisie or liberal state-capitalist formations of the age of Enlightenment is emphasised.

Here, we see a socio-historical account of style, one which might have taken Barthes’ theory as its foundation; that it is impossible to have a theory of pure style, as it is fundamentally an historical phenomenon. Jameson is similarly sceptical, but writes also that any literary criticism worthy of the name is obligated to consider ‘sentences themselves’. How these two methods could be productively fused is as something of a fissure in literary studies, between those who would treat literary texts in formal terms, the stylistic reductionists, and others, who would read it according to a sociological or Marxist schema. We might refer to this latter category as culturalists for the sake of ease. Of course, dialectical methods of reading are so ingrained into how we are trained to think about texts as scholars, whether we happen to be constructing a dialogue between a text and its context, or interrogating our own biases, it can be difficult to conceive of what a purely formalist literary criticism might look like. Despite conventional wisdom holding there were plenty around Cambridge in the thirties who were invested solely in words on the page, one cannot help but find indications of their broader and more wide-ranging interests in their actual writings. Likewise, culturalist critics might well concede that stylistic components, such as particular words, lengths of sentences, play a role in forming the style of a literary text, but there is a difficulty in deciding at which point a sufficient number of these discrete linguistic signals aggregate to achieve a structural significance or scale. It is for its treatment of style as an abstract system which cannot be rationalised down to its concrete manifestations that Jameson charges Anglo-American literary criticism as being undialectical.

In parsing this particular issue, we might turn to Adorno’s writings in Dialectic of Enlightenment, in which he theorises the distance between the individual stylistic marker and the entire work, in the context of a socio-economic and cultural totality. Adorno’s analysis is mostly concerned with the cultural changes which have been wrought by the existence of the cultural industry within late-stage capitalism, the ‘iron system’ in which

the maintenance of forms and the preservation of individuals coincide only by chance.

By Adorno’s account, the technologies of commercialised society have so irreparably transformed all social and cultural institutions to the extent that art now serves a solely industrial function. There can be no such thing as amusement under late-stage capitalism; we have leisure only so that we can be more productive. These changes have come about, of course, due to the higher-order industries on which the culture industry depends, as well as the actions of individual managers within these industries, ‘the people at the top’ whose behaviours reproduce these higher-order systemic changes. The subject no longer has thoughts but rather is thought herselfby the system, she registers signals in the form of physical, psychic automatisms, but continues to assume as though her own autonomy exists; that this is beyond the reach of the external network of circumstances, economic, historical, social, which in fact radically proscribe the remit of her behaviours.

This loss of freedom in society finds its corollary in the degree to which the culture of industrial society has been homogenised: ‘Under monopoly all mass culture is identical…Every detail is so firmly stamped with sameness that nothing can appear which is not marked at birth, or does not meet with approval at first sight’. This determinism is one of the defining features of Adorno’s thought; even that which violates the tenets of cultural industry will merely replicate this same homogeneity overall. If for example, Orson Welles was to violate the terms of the industry,

he is forgiven because his departures from the norm are regarded as a calculated mutation which serve all the more strongly to confirm the validity of the system.

These innovators are co-opted once again by the same system, and Adorno witheringly compares them to state-capitalist land-reformers. So repetitive are most films produced by the Hollywood studio system of Adorno’s time, he claims the attentive film-goer will know the ending of the film within the first few minutes, but, as before, if the attentive film-goer is wrong-footed by a surprise twist, this just confirms the banality of the enterprise.

Many have argued that Adorno’s undialectical anglophone readers have, in their eagerness to claim popular culture as an object worthy of scholarly attention, over-emphasised and caricatured his curmudgeonly tendencies. A charitable reading might present Adorno as being concerned predominantly with the superstructure, but there is, I think, a little too much of the grumpy old man to his claim that a perfection of formal technique be it in the context of Hollywood film or jazz, may be claimed as just another symptom of the cultural industry’s failure to create truly great art, because these perfections of technique are buttressed by deliberate ‘blunders’. I think Adorno is sufficiently correct for his work to be analytically useful, but it rather ironically lacks the ability to tolerate contradiction, and such a view runs the risk of lapsing into non-dialectical territory. Adorno is, after all, presumably referring to actual films he’s seen, actual jazz renditions of classical compositions, and treating these within his analyses as socially/historically embedded would do greater justice to his schema. Examples of how apparently individual agents incline towards producing the interests of capital without abandoning Adorno’s analytical pessimism are plentiful, but I’ll single out Susan Faludi’s The Terror Dream, or this podcast here.

Treating the history of literature in dialectical terms would be less invested in the individual stylistic innovations perpetuated by writers, and heed ‘the sheer quantity of words with which a given historical period is saturated’ to a greater extent. In a commercial society, for instance, in which the subject is bombarded constantly with advertisements, newspapers, articles, tweets, the author of literature is obliged to administer to the reader a sequence of shocks in order to gain their attention, and it is this which serves to colour our literary culture and why modern poetry maintains an interest with density in language rather than transparency. This might go some distance to accounting for the disappearance of organised novelistic form, but such claims would benefit from an awareness of popular trends of consumption, those which undermine theories constructed by scholars operating in a relative vacuum, in order to avoid falling into Adorno’s conservatism, and in maintaining one’s pursuit of the dialectic (however defined).

Quantifying Modernism and the avant-garde

Introduction and Methodology

This post will document a statistical analysis which was carried out on a corpus of 500 novels. 250 of these texts are generally categorised as ‘realist’ and will be used as a benchmark against which we might define modernist literary style, a mode of writing which arose in the early twentieth century, (though it should be noted that this chronology is increasingly subject to revision due to the work of new modernist scholars).

The first novel in the naturalistic corpus, chronologically speaking, is Jane Austen’s novelLady Susan, and was written in the year 1794. The final one is Thomas Hardy’s novel Jude the Obscure, which was published in 1895. This corpus contains the complete prose works, a phrase here encompassing novels, novellas and short story collections, of fifteen writers, Jane Austen, Emily, Anne and Charlotte Bronte, Stephen Crane, Honoré de Balzac, Charles Dickens, Fyodor Dostoevsky, George Eliot, Gustave Flaubert, Elizabeth Gaskell, Thomas Hardy, William Makepeace Thackeray, Leo Tolstoy and Émile Zola.

The corpus of 250 modernist novels begins in the year 1869, with Henry James’ first bloc of short stories, and continues all the way to Samuel Beckett’s 1988 novella ‘Stirrings Still’, so there is some overlap between these two corpora’s starting and end points. This modernist corpus otherwise consists of the complete works of nineteen writers such as Djuna Barnes, Samuel Beckett, Jorge Luis Borges, Elizabeth Bowen, Joseph Conrad, William Faulkner, F. Scott FitzGerald, Ford Madox Ford, Ernest Hemingway, Henry James, James Joyce, Franz Kakfa, D.H. Lawrence, Katherine Mansfield, Flann O’Brien, Marcel Proust, Gertrude Stein, Edith Wharton and Virginia Woolf.

This disproportion between the two corpora, with fifteen realists versus ninteen modernists, may seem disconcerting at first, but what is required in order for the statistical analyses to function is for the number of observations to be equal, rather than the number of novelists. Unfortunately, realist authors wrote more novels than modernist authors, and this compromised our ability to retain the same number of authors on each end of the generic spectrum.

One other aspect to consider is the international dimension. The realist corpus includes ten novelists who wrote in English, but there are also two Russian and three French realists, two of whom, Zola and the aforementioned Balzac, were far more prolific than any other writer in either corpus. Zola and Balzac composed 86 and 34 novels, short story collections or novellas respectively. This has the consequence that well over half of the realist corpus is in translation from another language in comparison to just under 10% of the modernist corpus. I intend to address this when I am at a later stage in my research. There has been some work published on the issues surrounding the quantification of literature in translation and across language, but I do not yet possess a sufficient breadth of knowledge in this field to comment intelligently on the matter. I do think it is important to have French and Russian writers included in the realist corpus on the basis that many of them, be they Tolstoy, Flaubert or Balzac, exerted a significant influence on their modernist successors.

Whether or not these are ‘the best’ or most accurate translations is sort of beside the point, from the reading I have done around the issue of literary translation, their being subject to change over time is in the nature of how text is received and re-constituted in different eras for different communities of readers (this discussion between Will Self and Kafka’s translators is particularly illuminating in this context, please do not be put off by Self, he gives the translators so much space to discuss the process, you really should watch it). The germane point here is that the translations being analysed in this instance could not be considered to be the most contemporary. There might be an argument for retaining these older translations on the basis that they are more likely to be the versions of the text which would have been circulating in the early twentieth century and therefore the translations modernist authors would have been more likely to have read, but making this claim would require a greater burden of proof, such as what languages each author read novels in and what their reading habits were more generally.

So, to turn to the analysis. My research is directed towards the quantitative analysis of grammar, the rationale being that we could, by examining varying quantities of particular categories of words, such as verbs, adjectives or prepositions, develop an understanding of how literary fiction changes from the beginning of the nineteenth century until the end of the twentieth, and, more specifically, how literary modernism departs from, or, perhaps remains contiguous with, this previous generation of novel writing. This was carried out using a POS tagger from the Natural Language Toolkit in Python.

Results

From realism to modernism:

  • average sentence length decreases by 4 words, from an average 22 words to 18 words per sentence.
  • Personal pronouns (I, you, he, she, it, we, they, me, him, her, us, and them) increase by 1% from 5% to 6%. Interrogative pronouns (who and where) also decrease by 0.01% from 0.03% to 0.02%
  • Verbs in the past tense increase by 1% from 6% to 7%.
  • Adverbs increase by 0.5% from 4.5% to 5%.
  • Prepositions, (after, in, to, on, and with) decrease by 0.4% from 10.9% to 10.5%
  • Wh Determiners (words beginning with wh, such as ‘where’ or ‘who’ acting to modify the noun phrase) decrease by 0.2% from 0.6% to 0.4%.
  • Particles (parts of speech with grammatical function with no meaning such as ‘up’ in the phrase ‘I tidied up the room’) increase by 0.1% from 0.4% to 0.5%.
  • Non third-person singular present verbs (verbs in first or second person) decrease by 0.1% from 1.6% to 1.5%.
  • Existentials (words such as ‘there’ which indicates that something exists) increase by 0.04%, from 0.17% to 0.21%.
  • Superlative adjectives (adjectives such as ‘best’, ‘biggest’, ‘worst’) decrease by 0.01% from 0.14% to 0.13%.

It will not have escaped your attention that a lot of these percentages are quite small. The extent to which any given text is made up of this hyper-specific categories is pretty minimal in the first place, so this is why many of these quantities seem so laughably tiny. Rest assured that they are statistically significant, this does not mean that they are important, this requires a greater burden of proof, more analyses, more exploration, but that they are noteworthy considering the quantities involved.

One boxplot which might be of interest, is the one below, which shows the ‘spread’ of the data for average sentence length between realism and modernism.

What we see on the left is the variation of the sentence length data (the term ‘variation’ here meaning the general ‘dispersedness’ of the data) for realism, which goes from 10 to roughly 35 words per sentence with an outlier or two on either end, whereas if we consider modernism, we have everything from zero (Samuel Beckett’ novel How It Is which has no full stops in it) up to forty-five, with far more outliers on the higher end. Higher outliers, are data points with values greater than 1.5 times the interquartile range above the third quartile, lower outliers, of which there are three, are more than 1.5 times below the first quartile. For one’s own general knowledge, the modernist outliers for sentence length are

  • William Faulkner’s Absalom! Absalom! (46.4), and Intruer in the Dust (42.3)
  • Marcel Proust’s Swann’s Way (42.9), In a Budding Grove (40.2) In a Budding Grove (40.2), Time Re-gained (38), The Prisoner (37.2) and The Captive (35.7) The Guermantes Way (34.1) and Sodom and Gomorrah (30.9).
  • Samuel Beckett’s Texts for Nothing and The Unnamable have 40.5 and 32.9 words per sentence respectively
  • Gertrude Stein’s novels The Making of Americans and Everybody’s Autobiography have 33.9 and 33.5 respectively.
  • Henry James’ The Ivory Tower and The Young Lovell score 31.8 and 29 respectively.
  • The three lower outlier values for sentence length are all written by Beckett, such as the aforementioned How It Is and also Worstward Ho (4.9) and Ill Seen Ill Said (7).

It can be tempting I think, when we see these sorts of names surface so prominently, in conjunction with a visual confirmation of the existence of an avant-garde to think that modernism in its most pure form was a kind of relentless maximalism, an uncompromising movement towards longer sentences, more pronouns, and that all other manifestations of it are inadequate or insufficient in some way. This is a kind of a boring and masculinist overview of the genre, which takes, I think, too many of the claims made by its most dogmatic adherents at face value, and it’s not a modernism I’m particularly interesting in defending or instantiating. There can also, of course, be a regressive or rearguard aspect to modernism, which is perceptible in the following boxplot, which displays the distribution of past tense verbs.

As was pointed out above, modernism displays an increase in past tense verbs overall, but here we see a large number of outlier values moving against the overall trend. These novels are:

  • James Joyce’s Ulysses (4.3%) and Finnegans Wake (2.7%)
  • William Faulkner’s As I Lay Dying (4.2%) and Requiem for a Nun (3.6%)
  • Samuel Beckett’s Malone Dies (3.9%), Fizzles (2.5%), Company (2%), Texts for Nothing (1.8%), The Unnamable (1.7%), Worstward Ho (1.6%), Ill Seen Ill Said (1.4%) and a corpus of his miscellaneous and unpublished short fiction (2.2%).
  • Joseph Conrad and Ford Madox Ford’s collaborative novel The Nature of a Crime (2.6%)
  • Virginia Woolf’s The Waves (2.4%)
  • Gertrude Stein’s Tender Buttons (1.7%)

The higher modernism outlier is Virginia Woolf’s 1937 novel The Years (10%) and the lower realism outlier is Balzac’s 1841 novel Letters of Two Brides(2.7%)

In this way we can see that modernism is not just a unidirectional commitment to a narrow sequence of stylistic changes. Instead, it’s a contradictory movement in which a number of different stylistic markers jostle against and subvert one another. In this particular instance, for example, we can perceive the authors most generally understood to be among the most uncompromising; Joyce, Beckett, Stein, Woolf and Faulkner, resisting the overall trend.

From the two boxplots I’ve generated so far, you might have noticed that in, modernism tends to generate a greater number of outliers, and I can confirm that this trend of a greater degree grammatical heterogeneity manifesting itself in modernist novel-writing than naturalistic novel-writing persists across the other categories of grammar, which you can validate by looking at the complete analysis here.

This struck me as important development, so I quantified the extent of each data point’s outlier-ness, and then grouped them according to author. These values were then divided by the number of outlier data points, because some of these novelists only have a small number of novels in the corpus versus others. Austen’s complete works would be totally outnumbered by Balzac’s for instance. The results appear below:

Please do note the values on the y-axis; Jane Austen is barely above zero because the only outlier text she wrote is Mansfield Park, which marks itself out for its disproportional use of adjectives. I thought it better to not exclude her from the plot though, because, I didn’t want it to turn into even more of a boy’s club than it might otherwise be. It would be useful, and exciting I think, to conceive of this plot as an indication of early breaches with conventional form, perhaps some nineteenth century anticipations of modernism. Reading Dostoevsky, Zola and Balzac in this manner would all be coterminous with changes taking place in the study of modernism now, but reading Thackeray and Eliot in these terms might be a more surprising development, and I’d be interested to read these texts in light of what we’re seeing here.

The modernism plot for deviation appears below:

The unlabelled entry between Faulkner and James is Hemingway

From this plot we can see that the most avant-gardist prose writers, considered from the perspective of their grammar, appear to be Beckett, Stein, Woolf, Conrad and Joyce. Of course, this is nowhere near a definitive answer as to what modernist style is, or who its most innovative practitioners were; these measurements are atomistic and are quantifying individual words. But style is not just words in isolation, style is agglomerations of words, spaces between words, the clandestine networks and relations the phrases these words add up to compose in the mind of the reader, and, if these digital methodologies are to have any chance of illustrating this shift (an inadequate term in the first instance, since it is more an accumulation of changes distributed over a broad corpus than a sudden or transformational one that we are here concerned with) it is in these cumulative terms that style must be quantified, in order to avoid drifting into the reductive and schematic scientism that numerical analyses of this kind are frequently accused of perpetuating.

A (Proper) Statistical analysis of the prose works of Samuel Beckett

mte5ndg0mdu0odk1otuzndiz

Content warning: If you want to get to the fun parts, the results of an analysis of Beckett’s use of language, skip to sections VII and VIII. Everything before that is navel-gazing methodology stuff.

If you want to know how I carried out my analysis, and utilise my code for your own purposes, here’s a link to my R code on my blog, with step-by-step instructions, because not enough places on the internet include that.

I: Things Wrong with my Dissertation’s Methodology

For my masters, I wrote a 20000 word dissertation, which took as its subject, an empirical analysis of the works of Samuel Beckett. I had a corpus of his entire works with the exception of his first novel Dream of Fair to Middling Women, which is a forgivable lapse, because he ended up cannibalising it for his collection of short stories, More Pricks than Kicks.

Quantitative literary analysis is generally carried out in one of two ways, through either one of the open-source programming languages Python or R. The former you’ve more likely to have heard of, being one of the few languages designed with usability in mind. The latter, R, would be more familiar to specialists, or people who work in the social sciences, as it is more obtuse than Python, doesn’t have many language cousins and has a very unfriendly learning curve. But I am attracted to difficulty, so I am using it for my PhD analysis.

I had about four months to carry out my analysis, so the idea of taking on a programming language in a self-directed learning environment was not feasible, particularly since I wanted to make a good go at the extensive body of secondary literature written on Beckett. I therefore made use of a corpus analysis tool called Voyant. This was a couple of years ago, so this was before its beta release, when it got all tricked out with some qualitative tools and a shiny new interface, which would have been helpful. Ah well. It can be run out of any browser, if you feel like giving it a look.

My analysis was also chronological, in that it looked at changes in Beckett’s use of language over time, with a view to proving the hypothesis that he used a less wide vocabulary as his career continued, in pursuit of his famed aesthetic of nothingness or deprivation. As I wanted to chart developments in his prose over time, I dated the composition of each text, and built a corpus for each year, from 1930–1987, excluding of course, years in which he just wrote drama, poetry, which wouldn’t be helpful to quantify in conjunction with one another. Which didn’t stop me doing so for my masters analysis. It was a disaster.

II: Uniqueness

Uniqueness, the measurement used to quantify the general spread of Beckett’s vocabulary, was obtained by the generally accepted formula below:

unique word tokens / total words

There is a problem with this measurement, in that it takes no account of a text’s relative length. As a text gets longer, the likelihood of each word being used approaches 1. Therefore, a text gets less unique as it gets bigger. I have the correlations to prove it:

screen-shot-2016-11-03-at-12-18-03There have been various solutions proposed to this quandary, which stymies our comparative analyses, somewhat. One among them is the use of vectorised measurements, which plot the text’s declining uniqueness against its word count, so we see a more impressionistic graph, such as this one, which should allow us to compare the word counts for James Joyce’s novels, A Portrait of the Artist as a Young Man and his short story collection, Dubliners.

screen-shot-2016-11-03-at-13-28-18

All well and good for two or maybe even five texts, but one can see how, with large scale corpora, this sort of thing can get very incoherent very quickly. Furthermore, if one was to examine the numbers on the y-axis, one can see that the differences here are tiny. This is another idiosyncrasy of stylostatistical methods; because of the way syntax works, the margins of difference wouldn’t be regarded as significant by most statisticians. These issues relating to the measurement are exacerbated by the fact that ‘particles,’ the atomic structures of literary speech, (it, is, the, a, an, and, said, etc.) make up most of a text. In pursuit of greater statistical significance for their papers, digital literary critics remove these particles from their texts, which is another unforgivable that we do anyway. I did not, because I was concerned that I was complicit in the neoliberalisation of higher education. I also wrote a 4000 word chapter that outlined why what I was doing was awful.

IV: Ambiguity

The formula for ambiguity was arrived at by the following formula:

number of indefinite pronouns/total word count

I derived this measurement from Dr. Ian Lancashire’s study of the works of Agatha Christie, and counted Beckett’s use of a set of indefinite pronouns, ‘everyone,’ ‘everybody,’ ‘everywhere,’ ‘everything,’ ‘someone,’ ‘somebody,’ ‘somewhere,’ ‘something,’ ‘anyone,’ ‘anybody,’ ‘anywhere,’ ‘anything,’ ‘no one,’ ‘nobody,’ ‘nowhere,’ and ‘nothing.’ Those of you who know that there are more indefinite pronouns than just these, you are correct, I had found an incomplete list of indefinite pronouns, and I assumed that that was all. This is just one of the many things wrong with my study. My theory was that there were to be correlations to be detected in Beckett’s decreasing vocabulary, and increasing deployment of indefinite pronouns, relative to the total word count. I called the vocabulary measure ‘uniqueness,’ and the indefinite pronouns measure I called ‘ambiguity.’ This in tenuous I know, indefinite pronouns advance information as they elide the provision of information. It is, like so much else in the quantitative analysis of literature, totally unforgivable, yet we do it anyway.

V: Hapax Richness

I initially wanted to take into account another phenomenon known as the hapax score, which charts occurrences of words that appear only once in a text or corpus. The formula to obtain it would be the following:

number of words that appear once/total word count

I believe that the hapax count would be of significance to a Beckett analysis because of the points at which his normally incompetent narrators have sudden bursts of loquaciousness, like when Molloy says something like ‘digital emunction and the peripatetic piss,’ before lapsing back into his ‘normal’ tone of voice. Once again, because I was often working with a pen and paper, this became impossible, but now that I know how to code, I plan to go over my masters analysis, and do it properly. The hapax score will form a part of this new analysis.

VI: Code & Software

A much more accurate way of analysing vocabulary, for the purposes of comparative analysis when your texts are of different lengths, therefore, would be to randomly sample it. Obviously not very easy when you’re working with a corpus analysis tool online, but far more straightforward when working through a programming language. A formula for representative sampling was found, and integrated into the code. My script is essentially a series of nested loops and if/else statements, that randomly and sequentially sample a text, calculate the uniqueness, indefiniteness and hapax density ten times, store the results in a variable, and then calculate the mean value for each by dividing the result by ten, the number of times that the first loop runs. I inputted each value into the statistical analysis program SPSS, because it makes pretty graphs with less effort than R requires.

VII: Results

I used SPSS’ box plot function first to identify any outliers for uniqueness, hapax density and ambiguity. 1981 was the only year which scored particularly high for relative usage of indefinite pronouns.

screen-shot-2016-11-03-at-12-27-38

It should be said that this measure too, is correlated to the length of the text, which only stands to reason; as a text gets longer the relative incidence of a particular set of words will decrease. Therefore, as the only texts Beckett wrote this year, ‘The Way’ and ‘Ceiling,’ both add up to about 582 words (the fifth lowest year for prose output in his life), one would expect indefiniteness to be somewhat higher in comparison to other years. However, this doesn’t wholly account for its status as an outlier value. Towards the end of his life Beckett wrote increasingly short prose pieces. Comment C’est (How It Is) was his last novel, and was written almost thirty years before he died. This probably has a lot to do with his concentration on writing and directing his plays, but in his letters he attributed it to a failure to progress beyond the third novel in his so-called trilogy of Molloy, Malone meurt (Malone Dies) and L’innomable (The Unnamable). It is in the year 1950, the year in which L’inno was completed, that Beckett began writing the Textes pour rien (Texts for Nothing), scrappy, disjointed pieces, many of which seem to be taking up from where L’inno left off, similarly the Fizzles and the Faux Départs. ‘The Way,’ I think, is an outgrowth of a later phase in Beckett’s prose writing, which dispenses the peripatetic loquaciousness and the understated lyricism of the trilogy and replaces it with a more brute and staccato syntax, one which is often dependent on the repetition of monosyllables:

No knowledge of where gone from. Nor of how. Nor of whom. None of whence come to. Partly to. Nor of how. Nor of whom. None of anything. Save dimly of having come to. Partly to. With dread of being again. Partly again. Somewhere again. Somehow again. Someone again.

Note also the prevalence of particle words, that will have been stripped out for the analysis, and the ways in which words with a ‘some’ prefix are repeated as a sort of refrain. This essential structure persists in the work, or at least the artefact of the work that the code produces, and hence of it, the outlier that it is.

screen-shot-2016-11-03-at-12-55-13

From plotting all the values together at once, we can see that uniqueness is partially dependent on hapax density; the words that appear only once in a particular corpus would be important in driving up the score for uniqueness. While there could said to be a case for the hypothesis that Beckett’s texts get less unique, more ambiguous up until 1944, when he completed his novel Watt, and if we’re feeling particularly risky, up until 1960 when Comment C’est was completed, it would be wholly disingenuous to advance it beyond this point, when his style becomes far too erratic to categorise definitively. Comment C’est is Beckett’s most uncompromising prose work. It has no punctuation, no capitalisation, and narrates the story of two characters, in a kind of love, who communicate with one another by banging kitchen implements off another:

as it comes bits and scraps all sorts not so many and to conclude happy end cut thrust DO YOU LOVE ME no or nails armpit and little song to conclude happy end of part two leaving only part three and last the day comes I come to the day Bom comes YOU BOM me Bom ME BOM you Bom we Bom

VIII: Conclusion

I would love to say that the general tone is what my model is being attentive to, which is why it identified Watt and How It Is as nadirs in Beckett’s career but I think their presence on the chart is more a product of their relative length, as novels, versus the shorter pieces which he moved towards in his later career. Clearly, Beckett’s decision to write shorter texts, make this means of summing up his oeuvre in general, insufficient. Whatever changes Beckett made to his aesthetic over time, we might not need to have such complicated graphs to map, and I could have just used a word processor to find it — length. Bom and Pim aside, for whatever reason after having written L’inno none of Beckett’s creatures presented themselves to him in novelistic form again. The partiality of vision and modal tone which pervades the post-L’inno works demonstrates, I think far more effectively what is was that Beckett was ‘pitching’ for, a new conceptual aspect to his prose, which re-emphasised its bibliographic aspects, the most fundamental of which was their brevity, or the appearance of an incompleteness, by virtue of being honed to sometimes less than five hundred words.

The quantification of differing categories of words seems like a radical, and the most fun, thing to quantify in the analysis of literary texts, as the words are what we came for, but the problem is similar to one that overtakes one who attempts to read a literary text word by word by word, and unpack its significance as one goes: overdetermination. Words are kaleidoscopic, and the longer you look at them, the more threatening their darkbloom becomes, the more they swallow, excrete, the more alive they are, all round. Which is fine. Letting new things into your life is what it should be about, until their attendant drawbacks become clear, and you start to become ambivalent about all the fat and living things you have in your head. You start to wish you read poems instead, rather than novels, which make you go mad, and worse, start to write them. The point is words breed words, and their connections are too easily traced by computer. There’s something else about knowing that their exact correlations to a decimal point. They seem so obvious now.

Seán O’Casey’s ‘Juno and the Paycock,’ Easter 1916 and re-invention

sean-ocaseyHistorian Roy Foster recently gave a lecture in Trinity College entitled: “”An Inheritance From Our Forefathers”? Historians and the Memory of the Irish Revolution.” In his speech, Foster proposed a radically different reading on the events surrounding 1916. For most people, 1916 marks the ostensible beginning of the modern Irish independence movement moving beyond the cultural sphere, events which follow neatly through to the War of Independence, the civil war, the declaration of the Republic and the establishment of partition as a political framework. In this decade of commemoration, already a cliché, a more unified, or retrospectively conscious perspective may be welcome in buckling existing narratives. They’ll help to assuage to tedium of an apparently endless sequence of vapid panel discussions on radio that rarely seem to move beyond a Leaving Certificate level of historical analysis, window-dressing Republicanism to keep Sinn Féin out of government, or worst of all, newspaper supplements.

Rather than seeing the Rising as the beginning, Foster proposes viewing it as an end-point or termination of pre-revolutionary trends, a marker of a generational crisis. For Foster, the real revolution was the series of land acts of the late-nineteenth century which incentivised English landlords to sell their land to Irish farmers, who in many cases, turned out to be more draconian in extracting rents from  their tenants than their English counterparts. This massive transfer of capital and establishment of a native land-owning class could explain why Ireland was capable of ‘settling’ so (relatively) quickly after its revolution and consequent political convulsions, why its quite radical rising became conservatised with such rapidity. Those who were most involved the rising and inculcated its participants were members of a radical, educated middle class and Foster frames them as somewhat immature angsty young people, rebelling against their parents and fashioning their own values in opposition to forces that they regarded as oppressive.

This notion of Easter 1916 as an exercise in re-invention or the formulation of a novel identity interests me as I wrote my undergraduate dissertation on a trilogy of novels, in which identity and the re-invention thereof forms a substantial part of its subject matter, namely Roddy Doyle’s The Last Round-up Trilogy, in the three novels,  A Star Called Henry, Oh, Play That Thing and The Dead Republic, we see the protagonist, Henry Smart conceptualise himself as a mythic figure from out the Celtic mists, a working class hero, linchpin of the IRA, self-conscious exile, self-made man, immigrant hero, inveterate capitalist and finally, a family man, of a sort. I should add, that while he’s doing all this gallivanting, he’s abandoned his wife, daughter and son.

One also thinks of Johnny’s line in Sean O’Casey’s play Juno and the Paycock, probably the only point in the play when he isn’t complaining about noise, (a potential gesture towards his fairly obvious PTSD, a result of his role in fighting for Ireland during the War of Independence) when it is made known that the Boyle family is to inherit a small fortune. Johnny’s immediate contribution is: “We’ll be able to get out o’ this place now, an’ go somewhere we’re not known,” perhaps indicating how closely related re-invention and revolt was in the mind of the revolutionary generation.

This is all covered more thoroughly in Foster’s recent study, Vivid Faces: The Revolutionary Generation in Ireland 1890-1923. At least, I imagine it is. I haven’t actually read it.

Reading Lessons from Martin Heidegger

martin-heidegger-2Trying to derive an aesthetic system or outlook from Martin Heidegger’s writings on art in Poetry, Language, Thought is an errand for fools; Heidegger explicitly rules out the idea that his hermeneutic philosophy, or at least, his philosophy which inclines itself towards hermeneutics, is concerned with aisthesis, or the apprehension of an artwork. Instead, he subsumes it within his wider philosophical task, to get to the nature of Being, note the capital B.

For Heidegger, Western philosophy has insufficiently grappled with ontology. René Descartes made a mistake in trying to determine what is, Heidegger thinks he should have thought a bit more about what is is. What exactly we mean by Being is complicated by the alienating processes of industrialisation, mercantilism and urbanisation, which have left us with an increasingly utilitarian sense of things in the world. Instead of enquiring into the nature of what something is, we define it relative to its use-value. Heidegger writes that art is also part of this wider enquiry into Being, that this is the primary function of ‘poets’ – which I decide to extend as a catch-all term for artists in a more general sense – to do exactly what it is that Heidegger is doing, and reach a more nuanced definition of Being. This might seem like a self-involved or solipsistic manoeuvrer, but if you came from a national literary tradition as philosophically inclined as Heidegger (Rilke, Goethe) you might well agree with him.

So how would one read a text in a Heideggerian way? Well, Heidegger was always more interested in the posing of further questions than in proposing resolutions. There’s very little in Poetry, Language, Thought that one could hope to derive a positive methodology from, unless saying something like ‘The answer to this has six primary components,’ and providing a long digression on said components is your notion of pragmatism. Interestingly, one of his students, more invested in heremeneutic philosophy as an autonomous branch of philosophical enquiry, Hans Georg-Gadamer, is similarly anti-systematic, perceiving the work of art as something that makes you subject to its meaning-makings. In this schema, the process of interpretation is something that leaves the putative reader behind, meaning overtakes your agency as it establishes itself. Which I think could be productively linked with the writings of Heidegger which attempt to justify National Socialism. Digression for another time.

Rather than describe how the work of art works on us, Heidegger divvies it up into increasingly thin components, the allegory of the form/content binary, within which there is the form-matter, which is distinct in itself, the process of ‘worlding’ that a work of art inaugurates, ‘the earth’ on which the work dwells and many, many other features which contemporary literary critics would probably understand, rightly or wrongly, as relating to a work’s context.

There is a tendency in the wake of Jacques Derrida, particularly when he seemed to be such an attentive reader of these philosophers supposedly foundational to post-structuralism, such as Heidegger, Nietzsche, Kierkegaard, that within these philosopher’s works are the germs of Derrida’s system of thought. Therefore Heidegger’s insistence on the context being made up of these manifold sections, interdependently and intricately linked, may create a sense that this structure is about to be deconstructed, and lapse into its own angst. In fact, Heidegger is very clear that these sections retain their formal integrity, each may be articulated relative to and within the other, as is the case in Derrida’s re-formulation of Ferdinand de Saussure’s differential networks of meaning, but within this mutual articulation, they remain solid. This comes across in a very interesting passage that describes the process of building a bridge:

“It does not just connect banks that are already there. The banks emerge as banks only as the bridge lies across the stream. The bridge designedly causes them to lie across from each other. One side is set off against the other by the bridge…With the banks, the bridge belongs to the stream the one and the other expanse of the landscape around the stream.”

By coming to an understanding of what is outlined in this perhaps wilfully obtuse paragraph, Heidegger hopes that we may come to an understanding of art which will provide a place of dwelling rather than merely a refuge, a place that we can authentically ‘live’ within, rather than merely taking refuge. Hear, hear, I say, probably.

Deleuze and Guattari’s Geology of Literary Style

rhizomeWhen I was drafting my PhD proposal, I read a few sources on literary style, in order to come to a working definition of style, or an academic consensus on the matter to rail against. I didn’t want something simplistically formalistic that referred to vehicles, tenors, modes or what have you, but I also didn’t want a post-Derridean account, that described style as a limit-case/fault line/discourse rupture, an everything and nothing at once. These kind of critical stymieings, excessive nuancing to the point of inertia have gotten a bit wearying after five years of seeing them deployed, so I was hoping to get to some kind of working definition. Emphasis on ‘working’ considering I would be carrying out pragmatic actual tasks, via computation, which were to be finalised once I had my definition.

It was surprisingly challenging to track one down, and more often than not I was thrown back onto my own reflections on literary style, and what we talk about when we talk about it. Here, I think we stumble across its primary shortcoming as a delineator. People talk about Virginia Woolf’s interior, lyrical style, Jorge Luis Borges’ staid, cold style and Ernest Hemmingway’s staccato, pared back style. The difficulty with these simplistic accounts is that an author’s style generally encapsulates what it is that makes them unique in literary discourse in general. This isn’t necessarily surprising; most of what we detect in a writer’s style is what throws us out of our reading habits. When Foster Wallace frenetically re-instates the subject of a clause at its end, a technique he becomes increasingly reliant on asInfinite Jest proceeds, we notice it, and it becomes increasingly to the fore in our sense of his style.  But, in the grand scheme of the one-thousand some page novel, the extent to which this technique is made use of is statistically speaking, insignificant. Sentences like “She tied the tapes,” in Between the Acts, for instance, pass our awareness by because of their pedestrian qualities, much like many other sentences that contain words such as ‘said,’ because of the extent to which any text’s fabric is predominantly composed of such “filler.”

This dearth of attention directed to the ‘particles’ of literary materials, is a lot of what digital humanities projects present themselves as a corrective to, by looking at the macroeconomic, we can transcend our human fixation on shiny objects (read: pretty sentences), and gain a fuller understanding of a text’s style, liberated from the shortcomings of our usual reading habits.

Of course, this newfound command over an entire text does not prevent the critic from mounting flawed arguments; many digital humanities projects from its earlier experiments in literary analysis too frequently gave into Rubik’s cube thinking, attempting to tame indeterminacy, by solving a text via enumerative techniques. This is exactly the kind of objective approach I didn’t want to fall into when visualising and narrating data trends.

Franco Moretti’s work in the Stanford Lit Lab proved beneficial in opening me up to more diffuse and multi-perspectival digital methodologies; by visualising a text on a number of different textual levels. Moretti’s contention that the data shows the activation of different stylistic features scale is directly correlated to the differentiation of textual functions is positively invigorating, as it is as far removed from the Rubik’s cube mentality as is possible to get; it essentially concedes that what we see when we look at a text depends on the way that we’re looking at it. Yes, Moretti is talking about topic modelling rather than style, but for my purposes I’ll ignore that. I also enjoy that it seems to be a computational analogue to the psychedelic nature of literary criticism – the longer we look at a text, even a shorter one, perhaps even especially a shorter one, the more we see. Diversifying our means of approach therefore provides the critic with a disparate sequence of differentiated visualisations, Enright may be meaningfully analogous to, dunno, Proust from the perspective of the entire text, but on a word to word, sentence to sentence, chapter to chapter, etc. comparison, we may turn up more unexpected results.

I still lacked a conceptual, theoretical system to connect this approach with, until I read the third chapter of Gilles Deleuze and Félix Guattari’s A Thousand Plateaus, ’10, 000 BC: The Geology of Morals (Who Does the Earth Think It Is?)’ In this chapter, Deleuze and Guattari make use of the discipline of geology in order to outline a number of theories concerning form, content, ideology and the articulations thereof.  The unorthodox appropriation of geology is part of Deleuze and Guattari’s wider usage of theories and concepts outside of traditional philosophy, in order to subvert the staid formula of normative philosophical argumentation, wherein a summary is given of problem 1, why the solution A posited by philosopher z is insufficient and why solution B posited by philosopher y is even more so, and how both (and every other philosophy in the history of the discipline, by extension) have overlooked a solution that I alone have realised. This is all beside the point and I mention it only to indicate how smart I am.

In any case, the earth, and, for my purposes, a literary text is composed of a number of strata, differing layers, which contain, compose and construct otherwise transitory particles, making them subject to more macroeconomic structures of order. In this way, they simplify their contents, as particles move between these strata erratically. One should think of strata as totalising senses of an author’s style, whereas the particles are more subtle, granular features that disappear and re-appear in and outside of particular strata. Form and content are singularly intermingled on the level of the stratum, and are merely a function of primary and secondary articulation.

Strata in turn are composed of epistrata and parastrata, which further undermines any attempt someone, like a mad person, would make to get a stable grasp on exactly what it is Deleuze and Guattari mean when they lay out this seemingly intractable schema. The strata model is a challenge to systematic modes of thought, such as structuralism, so it offers no stability, but for me, this is precisely its appeal. Any interpretation on a particular textual level, such as stratum d, which we could equate to word choice, for instance, samples one among many protean strata, composed of other strata, made relative to a machinic assemblage, itself a stratified metastratum, which becomes involved in its, the strata’s dual articulations along the lines of form and content. Simple.

The key here is that it avoids closure, it is a theoretical construct that is anathema to pragmatists, and on that basis, even if my numbers add up, any conclusions I reach with them will be, by virtue of association,  strictly provisional.