A Statistical Analysis of the narrators of ‘Ulysses’ or ‘why ‘Ulysses’ isn’t wisdom literature’

The second time I read Ulysses,in advance of an undergraduate seminar, it was around the ninetieth anniversary of the original text’s publication. The newspapers were printing archive material relating to the novel, extended supplements about its importance from the usual quarters, as well as reviews of recently published monographs from both young and established scholars. Unfortunately, the critical trend of the time was to read Ulysses as wisdom literature. Critics urged prospective readers of the novel to wrest Joyce from the scholars and bring him ‘back to the people’. This school of thought treated Leopold Bloom as a model of the way in which the contemporary urban subject should be living: aloof, polite, well-intentioned but not dogmatic on political issues. Moderately informed, but more often wrong, a reader, but not self-serious, an everyman. Ulysses’ structural indebtedness to cornerstones of The Canon such as William Shakespeare’s Hamlet and Homer’s The Odyssey frequently undergirds this line of argument, demonstrative in itself of how easily high literary art and everyday life may be set next to one another. This generally requires critics to treat the characters of Bloom and Stephen Dedalus as two opposites in need of the other. Each has a little to impart on life, love and literature, whether it be to reflect a little deeper on themselves or their marriage, move past their respective losses or to find in each other their lost son/father.

This interpretation of the novel reads it along a linear trajectory, as Stephen and Bloom come together to form Blephen and Stoom. Through computation it may be possible to examine the writing style of later chapters, and determine whether or not they bear formal witness to this change in character. We must first however, consider the difficulty of locating where Joyce’s narrators actually are. Part of what makes Joyce’s writing style so unique is his use of free indirect discourse, a mode of writing in which the reality of the text is inflected by the consciousness(es) of the beholder(s). As such, putting a category on each episode of Ulysses as though it were narrated by one person or a combination of persons might seem reductive; it very much is. But in fusing computation and literature, certain assumptions have to be made.

In carrying out this analysis, I made use of R’s ‘Stylo’ package, which contains tools for breaking a number of texts into equal sizes, removing words which are not common to most samples, calculating the relative frequencies of these words, transforming these observations into new combinations of variables called ‘components’ with greater explanatory potential, and clustering them together. These words appear below:

These might seem like boring terms, as literary critics we tend to look past them to more evocative ones like ‘serpentine’ or ‘columbanus’ but unfortunately, in computational terms it is the relative frequencies of these ‘particles’ or ‘function words’ that provide the most secure means of modelling a writer’s particular idiom. These samples were then plotted on a correlation matrix, which can be taken as an index of similarity, based on where they cluster:

The six different narrators of Ulysses appearing in the index above are:

‘Anon’, who narrates the episode ‘Cyclops’

‘Blephen’, a composite delineation for episodes in which both characters feature, such as ‘Circe’, ‘Eumaeus’, ‘Ithaca’ and ‘Oxen of the Sun’

Bloom, who narrates ‘Hades’, ‘Calypso’, ‘Lestrygonians’ and ‘The Lotus Eaters’, Gerty, who narrates at least half of ‘Nausicaa’ (this is a controversial point within the literature, it might by Bloom who is narrating for her)

Molly, who narrates the book’s final chapter ‘Penelope’,

and finally Stephen, who narrates the first three episodes ‘Telemachus’, ‘Nestor’ and ‘Proteus’, as well as the novel A Portrait of the Artist as a Young Man, which has been thrown in here for comparison.

Here’s the same plot as above with the labels more clearly indicated

The first thing we could note is the gender divide. Molly and Gerty both spread over to the right, with Molly as an outlier. Both are more proximate to the A Portrait samples than any other, which are all taken from the earlier parts of the novel, suggesting that Joyce writes women and young children using the same number of words at the same rate. As the Gerty samples move through the episode, they move closer and closer to the Bloom cluster, visually conforming that the episode starts in Gerty’s voice before he takes over, and that Bloom doesn’t think much of women’s intelligence in the main either.

Overall we can say that there doesn’t look to be a fusing of perspectives here as such. Rather than the Blephen episodes meeting halfway between the Stephen and Bloom, Stephen and Bloom already seem quite comfortably clustered at the novel’s outset. Based on the divide between Stephen’s episodes of Ulysses and A Portrait, we might say that the way in which Stephen narrates A Portrait is very different from the way in which he narrates Ulysses.This is justified I think by how sensitive the analysis is to changes in narrator, demonstrated by the Gerty/Bloom example already discussed, as well as the fact that the earlier part of Aeolous, in which Bloom is present, clusters with his samples, whereas the second part, after Stephen’s entered, clusters with the Stephen samples.

Below is the plot with the Portrait samples removed:

Words Stephen’s narration is most likely to use in comparison to Bloom
Words Bloom’s narration is more likely to use in comparison to Stephen

 

There are a number of ways one could use these results to interrogate the notion of Ulysses as wisdom literature. We could begin by asking after the gendered aspects of the adjective ‘wise’, and ask why so many of these books which teach us how one might best live are written by men (and how tone-deaf this argument can sound because to read Ulysses one might almost think married women weren’t let out of the house) or we could ask what interests an Irish model of bourgeois respectability might serve, along the lines of an Irish ‘keep calm and carry on’ poster.

Ulysses as a guide to life risks rendering it a novel of parts coming together, the middle-class intellectual and the middle-class working stiff holding hands across whatever barricade is supposed to be dividing them. Not that I would go to the other extreme and frame it as one of dissolution. Ulysses’ shape is one I would be loathe to put a vector to in fact; to say that Stephen and Bloom’s relationship moves from a) state to b) state would be too easy by half.

What makes Ulyssesan interesting novel to me is its self-referentiality, the dialogue it establishes between the novel and its supposed referent of ‘real Dublin’, which is made most clear in ‘Circe’, but also in the book’s other failed attempts to understand itself, as in the cases of the characters referenced as being in particular places at particular times who may or may not be Bloom, the McIntosh mystery or the puzzle of crossing Dublin without passing a pub. In this context, I think ‘Eumaeus’ appearing as a stylistic outlier is significant.

It is in this episode that we get information about a sequence of coincidences, and resonant differences between Bloom and Stephen’s lives. The depth of these coincidences (which I won’t provide a summary of here, because I think they’re among the most poignant parts of the novel) gesture towards something a bit more cosmically ordered than the rest of the novel even as they take place within the circumscribed rituals of Irish urban middle-class life in the early twentieth century. ‘Eumaeus’ is written in a chill tone which most closely resembles that of a scientific paper, eliding the indirect discourse which ostensibly defines the rest of the text, and it is the fact that these connections are raised here rather than anywhere else that the true interest in their relationship, such as it is, is to be found.

These connections which remain unrealised by the two, rather than bring us to some Forsterian notion of connection should raise instead questions of alienation and of their unity in separation. It presents problems both epistemological and political, about how our reality is structured, the means through which it is circumscribed and how it is more defined by how little of it we are aware of rather than how much. Rather than teaching us ‘how to live’ Ulysses shows us how we do not live, how we probably won’t live and how it could so easily have been otherwise. It is no more an explanation for life as it is an explanation of itself, or Homer, or Ireland.

Quantifying Modernism and the avant-garde

Introduction and Methodology

This post will document a statistical analysis which was carried out on a corpus of 500 novels. 250 of these texts are generally categorised as ‘realist’ and will be used as a benchmark against which we might define modernist literary style, a mode of writing which arose in the early twentieth century, (though it should be noted that this chronology is increasingly subject to revision due to the work of new modernist scholars).

The first novel in the naturalistic corpus, chronologically speaking, is Jane Austen’s novelLady Susan, and was written in the year 1794. The final one is Thomas Hardy’s novel Jude the Obscure, which was published in 1895. This corpus contains the complete prose works, a phrase here encompassing novels, novellas and short story collections, of fifteen writers, Jane Austen, Emily, Anne and Charlotte Bronte, Stephen Crane, Honoré de Balzac, Charles Dickens, Fyodor Dostoevsky, George Eliot, Gustave Flaubert, Elizabeth Gaskell, Thomas Hardy, William Makepeace Thackeray, Leo Tolstoy and Émile Zola.

The corpus of 250 modernist novels begins in the year 1869, with Henry James’ first bloc of short stories, and continues all the way to Samuel Beckett’s 1988 novella ‘Stirrings Still’, so there is some overlap between these two corpora’s starting and end points. This modernist corpus otherwise consists of the complete works of nineteen writers such as Djuna Barnes, Samuel Beckett, Jorge Luis Borges, Elizabeth Bowen, Joseph Conrad, William Faulkner, F. Scott FitzGerald, Ford Madox Ford, Ernest Hemingway, Henry James, James Joyce, Franz Kakfa, D.H. Lawrence, Katherine Mansfield, Flann O’Brien, Marcel Proust, Gertrude Stein, Edith Wharton and Virginia Woolf.

This disproportion between the two corpora, with fifteen realists versus ninteen modernists, may seem disconcerting at first, but what is required in order for the statistical analyses to function is for the number of observations to be equal, rather than the number of novelists. Unfortunately, realist authors wrote more novels than modernist authors, and this compromised our ability to retain the same number of authors on each end of the generic spectrum.

One other aspect to consider is the international dimension. The realist corpus includes ten novelists who wrote in English, but there are also two Russian and three French realists, two of whom, Zola and the aforementioned Balzac, were far more prolific than any other writer in either corpus. Zola and Balzac composed 86 and 34 novels, short story collections or novellas respectively. This has the consequence that well over half of the realist corpus is in translation from another language in comparison to just under 10% of the modernist corpus. I intend to address this when I am at a later stage in my research. There has been some work published on the issues surrounding the quantification of literature in translation and across language, but I do not yet possess a sufficient breadth of knowledge in this field to comment intelligently on the matter. I do think it is important to have French and Russian writers included in the realist corpus on the basis that many of them, be they Tolstoy, Flaubert or Balzac, exerted a significant influence on their modernist successors.

Whether or not these are ‘the best’ or most accurate translations is sort of beside the point, from the reading I have done around the issue of literary translation, their being subject to change over time is in the nature of how text is received and re-constituted in different eras for different communities of readers (this discussion between Will Self and Kafka’s translators is particularly illuminating in this context, please do not be put off by Self, he gives the translators so much space to discuss the process, you really should watch it). The germane point here is that the translations being analysed in this instance could not be considered to be the most contemporary. There might be an argument for retaining these older translations on the basis that they are more likely to be the versions of the text which would have been circulating in the early twentieth century and therefore the translations modernist authors would have been more likely to have read, but making this claim would require a greater burden of proof, such as what languages each author read novels in and what their reading habits were more generally.

So, to turn to the analysis. My research is directed towards the quantitative analysis of grammar, the rationale being that we could, by examining varying quantities of particular categories of words, such as verbs, adjectives or prepositions, develop an understanding of how literary fiction changes from the beginning of the nineteenth century until the end of the twentieth, and, more specifically, how literary modernism departs from, or, perhaps remains contiguous with, this previous generation of novel writing. This was carried out using a POS tagger from the Natural Language Toolkit in Python.

Results

From realism to modernism:

  • average sentence length decreases by 4 words, from an average 22 words to 18 words per sentence.
  • Personal pronouns (I, you, he, she, it, we, they, me, him, her, us, and them) increase by 1% from 5% to 6%. Interrogative pronouns (who and where) also decrease by 0.01% from 0.03% to 0.02%
  • Verbs in the past tense increase by 1% from 6% to 7%.
  • Adverbs increase by 0.5% from 4.5% to 5%.
  • Prepositions, (after, in, to, on, and with) decrease by 0.4% from 10.9% to 10.5%
  • Wh Determiners (words beginning with wh, such as ‘where’ or ‘who’ acting to modify the noun phrase) decrease by 0.2% from 0.6% to 0.4%.
  • Particles (parts of speech with grammatical function with no meaning such as ‘up’ in the phrase ‘I tidied up the room’) increase by 0.1% from 0.4% to 0.5%.
  • Non third-person singular present verbs (verbs in first or second person) decrease by 0.1% from 1.6% to 1.5%.
  • Existentials (words such as ‘there’ which indicates that something exists) increase by 0.04%, from 0.17% to 0.21%.
  • Superlative adjectives (adjectives such as ‘best’, ‘biggest’, ‘worst’) decrease by 0.01% from 0.14% to 0.13%.

It will not have escaped your attention that a lot of these percentages are quite small. The extent to which any given text is made up of this hyper-specific categories is pretty minimal in the first place, so this is why many of these quantities seem so laughably tiny. Rest assured that they are statistically significant, this does not mean that they are important, this requires a greater burden of proof, more analyses, more exploration, but that they are noteworthy considering the quantities involved.

One boxplot which might be of interest, is the one below, which shows the ‘spread’ of the data for average sentence length between realism and modernism.

What we see on the left is the variation of the sentence length data (the term ‘variation’ here meaning the general ‘dispersedness’ of the data) for realism, which goes from 10 to roughly 35 words per sentence with an outlier or two on either end, whereas if we consider modernism, we have everything from zero (Samuel Beckett’ novel How It Is which has no full stops in it) up to forty-five, with far more outliers on the higher end. Higher outliers, are data points with values greater than 1.5 times the interquartile range above the third quartile, lower outliers, of which there are three, are more than 1.5 times below the first quartile. For one’s own general knowledge, the modernist outliers for sentence length are

  • William Faulkner’s Absalom! Absalom! (46.4), and Intruer in the Dust (42.3)
  • Marcel Proust’s Swann’s Way (42.9), In a Budding Grove (40.2) In a Budding Grove (40.2), Time Re-gained (38), The Prisoner (37.2) and The Captive (35.7) The Guermantes Way (34.1) and Sodom and Gomorrah (30.9).
  • Samuel Beckett’s Texts for Nothing and The Unnamable have 40.5 and 32.9 words per sentence respectively
  • Gertrude Stein’s novels The Making of Americans and Everybody’s Autobiography have 33.9 and 33.5 respectively.
  • Henry James’ The Ivory Tower and The Young Lovell score 31.8 and 29 respectively.
  • The three lower outlier values for sentence length are all written by Beckett, such as the aforementioned How It Is and also Worstward Ho (4.9) and Ill Seen Ill Said (7).

It can be tempting I think, when we see these sorts of names surface so prominently, in conjunction with a visual confirmation of the existence of an avant-garde to think that modernism in its most pure form was a kind of relentless maximalism, an uncompromising movement towards longer sentences, more pronouns, and that all other manifestations of it are inadequate or insufficient in some way. This is a kind of a boring and masculinist overview of the genre, which takes, I think, too many of the claims made by its most dogmatic adherents at face value, and it’s not a modernism I’m particularly interesting in defending or instantiating. There can also, of course, be a regressive or rearguard aspect to modernism, which is perceptible in the following boxplot, which displays the distribution of past tense verbs.

As was pointed out above, modernism displays an increase in past tense verbs overall, but here we see a large number of outlier values moving against the overall trend. These novels are:

  • James Joyce’s Ulysses (4.3%) and Finnegans Wake (2.7%)
  • William Faulkner’s As I Lay Dying (4.2%) and Requiem for a Nun (3.6%)
  • Samuel Beckett’s Malone Dies (3.9%), Fizzles (2.5%), Company (2%), Texts for Nothing (1.8%), The Unnamable (1.7%), Worstward Ho (1.6%), Ill Seen Ill Said (1.4%) and a corpus of his miscellaneous and unpublished short fiction (2.2%).
  • Joseph Conrad and Ford Madox Ford’s collaborative novel The Nature of a Crime (2.6%)
  • Virginia Woolf’s The Waves (2.4%)
  • Gertrude Stein’s Tender Buttons (1.7%)

The higher modernism outlier is Virginia Woolf’s 1937 novel The Years (10%) and the lower realism outlier is Balzac’s 1841 novel Letters of Two Brides(2.7%)

In this way we can see that modernism is not just a unidirectional commitment to a narrow sequence of stylistic changes. Instead, it’s a contradictory movement in which a number of different stylistic markers jostle against and subvert one another. In this particular instance, for example, we can perceive the authors most generally understood to be among the most uncompromising; Joyce, Beckett, Stein, Woolf and Faulkner, resisting the overall trend.

From the two boxplots I’ve generated so far, you might have noticed that in, modernism tends to generate a greater number of outliers, and I can confirm that this trend of a greater degree grammatical heterogeneity manifesting itself in modernist novel-writing than naturalistic novel-writing persists across the other categories of grammar, which you can validate by looking at the complete analysis here.

This struck me as important development, so I quantified the extent of each data point’s outlier-ness, and then grouped them according to author. These values were then divided by the number of outlier data points, because some of these novelists only have a small number of novels in the corpus versus others. Austen’s complete works would be totally outnumbered by Balzac’s for instance. The results appear below:

Please do note the values on the y-axis; Jane Austen is barely above zero because the only outlier text she wrote is Mansfield Park, which marks itself out for its disproportional use of adjectives. I thought it better to not exclude her from the plot though, because, I didn’t want it to turn into even more of a boy’s club than it might otherwise be. It would be useful, and exciting I think, to conceive of this plot as an indication of early breaches with conventional form, perhaps some nineteenth century anticipations of modernism. Reading Dostoevsky, Zola and Balzac in this manner would all be coterminous with changes taking place in the study of modernism now, but reading Thackeray and Eliot in these terms might be a more surprising development, and I’d be interested to read these texts in light of what we’re seeing here.

The modernism plot for deviation appears below:

The unlabelled entry between Faulkner and James is Hemingway

From this plot we can see that the most avant-gardist prose writers, considered from the perspective of their grammar, appear to be Beckett, Stein, Woolf, Conrad and Joyce. Of course, this is nowhere near a definitive answer as to what modernist style is, or who its most innovative practitioners were; these measurements are atomistic and are quantifying individual words. But style is not just words in isolation, style is agglomerations of words, spaces between words, the clandestine networks and relations the phrases these words add up to compose in the mind of the reader, and, if these digital methodologies are to have any chance of illustrating this shift (an inadequate term in the first instance, since it is more an accumulation of changes distributed over a broad corpus than a sudden or transformational one that we are here concerned with) it is in these cumulative terms that style must be quantified, in order to avoid drifting into the reductive and schematic scientism that numerical analyses of this kind are frequently accused of perpetuating.

Literary Cluster Analysis

I: Introduction

My PhD research will involve arguing that there has been a resurgence of modernist aesthetics in the novels of a number of contemporary authors. These authors are Anne Enright, Will Self, Eimear McBride and Sara Baume. All these writers have at various public events and in the course of many interviews, given very different accounts of their specific relation to modernism, and even if the definition of modernism wasn’t totally overdetermined, we could spend the rest of our lives defining the ways in which their writing engages, or does not engage, with the modernist canon. Indeed, if I have my way, this is what I will spend a substantial portion of my life doing.

It is not in the spirit of reaching a methodology of greater objectivity that I propose we analyse these texts through digital methods; having begun my education in statistical and quantitative methodologies in September of last year, I can tell you that these really afford us no *better* a view of any text then just reading them would, but fortunately I intend to do that too.

This cluster dendrogram was generated in R, and owes its existence to Matthew Jockers’ book Text Analysis with R for Students of Literature, from which I developed a substantial portion of the code that creates the output above.

What the code is attentive to, is the words that these authors use the most. When analysing literature qualitatively, we tend to have a magpie sensibility, zoning in on words which produce more effects or stand out in contrast to the literary matter which surrounds it. As such, the ways in which a writer would use the words ‘the’, ‘an’, ‘a’, or ‘this’, tends to pass us by, but they may be far more indicative of a writer’s style, or at least in the way that a computer would be attentive to; sentences that are ‘pretty’ are generally statistically insignificant.

II: Methodology

Every corpus that you can see in the above image was scanned into R, and then run through a code which counted the number of times every word was used in the text. The resulting figure is called the word’s frequency, and was then reduced down to its relative frequency, by dividing the figure by total number of words, and multiplying the result by 100. Every word with a relative frequency above a certain threshold was put into a matrix, and a function was used to cluster each matrix together based on the similarity of the figures they contained, according to a Euclidean metric I don’t fully understand.

The final matrix was 21 X 57, and compared these 21 corpora on the basis of their relative usage of the words ‘a’, ‘all’, ‘an’, ‘and’, ‘are’, ‘as’, ‘at’, ‘be’, ‘but’, ‘by’, ‘for’, ‘from’, ‘had’, ‘have’, ‘he’, ‘her’, ‘him’, ‘his’, ‘I’, ‘if’, ‘in’, ‘is’, ‘it’, ‘like’, ‘me’, ‘my’, ‘no’, ‘not’, ‘now’, ‘of’, ‘on’, ‘one’, ‘or’, ‘out’, ‘said’, ‘she’, ‘so’, ‘that’, ‘the’, ‘them’, ‘then’, ‘there’, ‘they’, ‘this’, ‘to’, ‘up’, ‘was’, ‘we’, ‘were’, ‘what’, ‘when’, ‘which’, ‘with’, ‘would’, and ‘you’.

Anyway, now we can read the dendrogram.

III: Interpretation

Speaking about the dendrogram in broad terms can be difficult for precisely the reason that I indicative above; quantitative/qualitative methodologies for text analysis are totally opposed to one another, but what is obvious is that Eimear McBride and Gertrude Stein are extreme outliers, and comparable only to each other. This is one way unsurprising, because of the brutish, repetitive styles and is in other ways very surprising, because McBride is on record as dismissing her work, for being ‘too navel-gaze-y.’

Jorge Luis Borges and Marcel Proust have branched off in their own direction, as has Sara Baume, which I’m not quite sure what to make of. Franz Kafka, Ernest Hemingway and William Faulkner have formed their own nexus. More comprehensible is the Anne Enright, Katherine Mansfield, D.H. Lawrence, Elizabeth Bowen, F. Scott FitzGerald and Virginia Woolf cluster; one could make, admittedly sweeping judgements about how this could be said to be modernism’s extreme centre, in which the radical experimentalism of its more revanchiste wing was fused rather harmoniously with nineteenth-century social realism, which produced a kind of indirect discourse, at which I think each of these authors excel.

These revanchistes are well represented in the dendrogram’s right wing, with Flann O’Brien, James Joyce, Samuel Beckett and Djuna Barnes having clustered together, though I am not quite sure what to make of Ford Madox Ford/Joseph Conrad’s showing at all, being unfamiliar with the work.

IV: Conclusion

The basic rule in interpreting dendrograms is that the closer the ‘leaves’ reach the bottom, the more similar they can be said to be. Therefore, Anne Enright and Will Self are the contemporary modernists most closely aligned to the forebears, if indeed forebears they can be said to be. It would be harder, from a quantitative perspective, to align Sara Baume with this trend in a straightforward manner, and McBride only seems to correlate with Stein because of how inalienably strange their respective prose styles are.

The primary point to take away here, if there is one, is that more investigations are required. The analysis is hardly unproblematic. For one, the corpus sizes vary enormously. Borges’ corpus is around 46 thousand words, whereas Proust reaches somewhere around 1.2 million. In one way, the results are encouraging, Borges and Barnes, two authors with only one texts in their corpus, aren’t prevented from being compared to novelists with serious word counts, but in another way, it is pretty well impossible to derive literary measurements from texts without taking their length into account. The next stage of the analysis will probably involve breaking the corpora up into units of 50 thousand words, so that the results for individual novels can be compared.

Re-reading Eimear McBride’s ‘A Girl is a Half-Formed Thing’

A book that I’m looking forward to reading, that doesn’t exist yet, is an academic account of how Irish contemporary fiction went, in such a short space of time, from social realism, to the precociously sentenced art writing with dissociative narrators that now composes the Irish literary milieu. It’s the sort of thing that was probably brewing for a long time, these trends tend to be, but I first became aware of it when Eimear McBride’s A Girl is a Half-Formed Thing was published in 2013. It caused a bit of stir in the literary press at the time, for its supposed uncompromising experimentalism, and its fraught, J.K. Rowling-esque publication history. Critics compared it to Marcel Proust or Samuel Beckett, but I don’t think there was a single review that didn’t mention James Joyce.

In the works of Sara Baume, Joanna Walsh or Claire-Louise Bennett, there are certainly comparisons to be made along these lines, but I think McBride is the novelist of the current generation who is suffering most egregiously under these comparisons. This leads to a kind of distortion that McBride has spoken about recently, saying that it’s ‘a way of not being seen’. Claire Lowdon, writing on McBride’s prose style in Areté, has used the Joyce comparisons as a way of demeaning the novel’s experimental qualities, saying that they are ‘redundant’ and ‘artificial’:

Having invoked Joyce, Joyce has to be McBride’s standard. She has taken all the difficulty and none of the brilliance.

Lowdon’s reading is important and thorough, but I have problems with it. The most significant one being that I think it’s nonsensical to say that just because a work is in some way formally indebted to Joyce has to be 1) as good, 2) as innovative and 3) as good and as innovative in exactly the same ways. I think it’s a very strange point to make that we should benchmark a writer relative to their influences , particularly when this is a comparison furthered more by the laziness of critics than something that McBride has taken upon herself. It’s also inadequate to assume McBride and Joyce’s modernisms are coterminous; I happen to think that they’re rather distinct in a number of significant ways.

Firstly, it’s clear that A Girl is more formally aligned with the Wake than with Ulysses, but taken relative to the former, A Girl manifests far less attention to the materiality of language. In A Girl, there’s less puns, there’s less references, there’s less leitmotifs. It’s also possible to make sense of A Girl without reference to other works. But it’s a mistake to regard this as McBride’s failure to live up to her twentieth century modernist aesthetics. An example from the novel’s opening that Lowdon cites reads as follows:

For you. You’ll soon. You’ll give her name. In the stitches of her skin she’ll wear your say. Mammy me? Yes you. Bounce the bed I’d say. I’d say that’s what you did. Then lay you down. They cut you round. Wait and hour and day.

‘Wait and hour and day’, carries with it the vague association with the phrase ‘a year and a day’ but it doesn’t strictly make sense in that context, there’s no clear reason for the semantic distortion. But there’s also no requirement that there is, nor that it add up to some enormous mythic framework in the same way that the Wake does. I think that once we approach the novel from this position, one which takes account of McBride’s actual concerns, we’ll be able to come to a more sophisticated understanding that doesn’t amount to downgrading her because of her perceived inadequacy in relation to Joyce.

By her own admission McBride retains an interest in nineteenth century novels with less self-consciousness about their language or processes of meaning-making. She has cited the work of the Russian novelist Fyodor Dostoevsky as significant, particularly as an example of proto-modernism, or modernism in a nascent stage of its development, wherein human intersubjectivity was beginning to make itself known within the novel while the tenets of realistic fiction was still trying to accommodate it. Being aware of the fact that The Lesser Bohemians is not the novel under discussion, it’s important to note the way in which it demonstrates this interplay. Within the context of what has been referred to by the author as a ‘modernist monologue’ there is a very sensationalistic narrative in which a character lays out their life story in a very direct and straightforward manner in the same way that you might find extended and directly rendered narratives nested within nineteenth century novels. McBride has said that this is a very deliberate formal mechanic which is pertinent to the text’s thematic concerns, as it is a novel about relating to another person in spite of one’s traumatic past:

In the end you tell a person and you have to use the words that they’ll understand.

What makes McBride’s modernism distinct then, is the centrality it gives to the conveying of narrative information, deploying it as a means of bringing the reader closer to

physical experience, to write about the female experience…the reader can partake in the experience.

McBride has said that the language of A Girl, was written in a way that would create a physical experience for the reader, an immediacy on the page that is reminiscent of theatre. She’s expressed frustration at the content of many of her reviews which have emphasised the quality of the language at the expense of the novel’s content, which she regards as very significant. This stands in contrast to the tradition of the Wake or other modernist works famed for their unintelligibility, such as Gertrude Stein’s The Making of Americans: Being a History of a Family’s Progress is a novel that she has spoken about dismissively for being ‘too navel-gaze-y.’

This stated interest in what the book is ‘about’ and a reader-centric ethic, is I think at least a partial reversal of expectations within the modernist tradition. McBride’s modernism is therefore conceptualised, not as a constructed textual estrangement from reality, but an attempt to bring it closer, to a dwelling-place of authentic being. Not that it’s likely to close off such comparisons in the future.

Can a recurrent neural network write good prose?

At this stage in my PhD research into literary style I am looking to machine learning and neural networks, and moving away from stylostatistical methodologies, partially out of fatigue. Statistical analyses are intensely process-based and always open, it seems to me, to fairly egregious ‘nudging’ in the name of reaching favourable outcomes. This brings a kind of bathos to some statistical analyses, as they account, for a greater extent than I’d like, for methodology and process, with the result that the novelty these approaches might have brought us are neglected. I have nothing against this emphasis on process necessarily, but I do also have a thing for outcomes, as well as the mysticism and relativity machine learning can bring, alienating us as it does from the process of the script’s decision making.

I first heard of the sci-fi writer from a colleague of mine in my department. It’s Robin Sloan’s plug-in for the script-writing interface Atom which allows you to ‘autocomplete’ texts based on your input. After sixteen hours of installing, uninstalling, moving directories around and looking up stackoverflow, I got it to work.I typed in some Joyce and got stuff about Chinese spaceships as output, which was great, but science fiction isn’t exactly my area, and I wanted to train the network on a corpus of modernist fiction. Fortunately, I had the complete works of Joyce, Virginia Woolf, Gertrude Stein, Sara Baume, Anne Enright, Will Self, F. Scott FitzGerald, Eimear McBride, Ernest Hemingway, Jorge Luis Borges, Joseph Conrad, Ford Madox Ford, Franz Kafka, Katherine Mansfield, Marcel Proust, Elizabeth Bowen, Samuel Beckett, Flann O’Brien, Djuna Barnes, William Faulkner & D.H. Lawrence to hand.

My understanding of this recurrent neural network, such as it is, runs as follows. The script reads the entire corpus of over 100 novels, and calculates the distance that separates every word from every other word. The network then hazards a guess as to what word follows the word or words that you present it with, then validates this against what its actuality. It then does so over and over and over, getting ‘better’ at predicting each time. The size of the corpus is significant in determining the length of time this will take, and mine required something around twelve days. I had to cut it off after twenty four hours because I was afraid my laptop wouldn’t be able to handle it. At this point it had carried out the process 135000 times, just below 10% of the full process. Once I get access to a computer with better hardware I can look into getting better results.

How this will feed into my thesis remains nebulous, I might move in a sociological direction and take survey data on how close they reckon the final result approximates literary prose. But at this point I’m interested in what impact it might conceivably have on my own writing. I am currently trying to sustain progress on my first novel alongside my research, so, in a self-interested enough way, I pose the question, can neural networks be used in the creation of good prose?

There have been many books written on the place of cliometric methodologies in literary history. I’m thinking here of William S. Burroughs’ cut-ups, Mallarmé’s infinite book of sonnets, and the brief flirtation the literary world had with hypertext in the 90’s, but beyond of the avant-garde, I don’t think I could think of an example of an author who has foregrounded their use of numerical methods of composition. A poet friend of mine has dabbled in this sort of thing but finds it expedient to not emphasise the aleatory aspect of what she’s doing, as publishers tend to give a frosty reception when their writers suggest that their work is automated to some extent.

And I can see where they’re coming from. No matter how good they get at it, I’m unlikely to get to a point where I’ll read automatically generated literary art. Speaking for myself, when I’m reading, it is not just about the words. I’m reading Enright or Woolf or Pynchon because I’m as interested in them as I am in what they produce. How synthetic would it be to set Faulkner and McCarthy in conversation with one another if their congruencies were wholly manufactured by outside interpretation or an anonymous algorithmic process as opposed to the discursive tissue of literary sphere, if a work didn’t arise from material and actual conditions? I know I’m making a lot of value-based assessments here that wouldn’t have a place in academic discourse, and on that basis what I’m saying is indefensible, but the probabilistic infinitude of it bothers me too. When I think about all the novelists I have yet to read I immediately get panicky about my own death, and the limitless possibilities of neural networks to churn out tomes and tomes of literary data in seconds just seems to me to exacerbate the problem.

However, speaking outside of my reader-identity, as a writer, I find it invigorating. My biggest problem as a writer isn’t writing nice sentences, given enough time I’m more than capable of that, the difficulty is finding things to wrap them around. Mood, tone, image, aren’t daunting, but a text’s momentum, the plot, I suppose, eludes me completely. It’s not something that bothers me, I consider plot to be a necessary evil, and resent novels that suspend information in a deliberate, keep-you-on-the-hook sort of way, but the ‘what next’ of composition is still a knotty issue.

The generation of text could be a useful way of getting an intelligent prompt that stylistically ‘borrows’ from a broad base of literary data, smashing words and images together in a generative manner to get the associative faculties going. I’m not suggesting that these scripts would be successful were they autonomous, I think we’re a few years off one of these algorithms writing a good novel, but I hope to demonstrate that my circa 350 generated words would be successful in facilitating the process of composition:

be as the whoo, put out and going to Ingleway effect themselves old shadows as she was like a farmers of his lake, for all or grips — that else bigs they perfectly clothes and the table and chest and under her destynets called a fingers of hanged staircase and cropping in her hand from him, “never married them my said?” know’s prode another hold of the utals of the bright silence and now he was much renderuched, his eyes. It was her natural dependent clothes, cattle that they came in loads of the remarks he was there inside him. There were she was solid drugs.

“I’m sons to see, then?’ she have no such description. The legs that somewhere to chair followed, the year disappeared curl at an entire of him frwented her in courage had approached. It was a long rose of visit. The moment, the audience on the people still the gulsion rowed because it was a travalious. But nothing in the rash.

“No, Jane. What does then they all get out him, but? Or perfect?”

“The advices?”

Of came the great as prayer. He said the aspect who, she lay on the white big remarking through the father — of the grandfather did he had seen her engoors, came garden, the irony opposition on his colling of the roof. Next parapes he had coming broken as though they fould

has a sort. Quite angry to captraita in the fact terror, and a sound and then raised the powerful knocking door crawling for a greatly keep, and is so many adventored and men. He went on. He had been her she had happened his hands on a little hand of a letter and a road that he had possibly became childish limp, her keep mind over her face went in himself voice. He came to the table, to a rashes right repairing that he fulfe, but it was soldier, to different and stuff was. The knees as it was a reason and that prone, the soul? And with grikening game. In such an inquisilled-road and commanded for a magbecross that has been deskled, tight gratulations in front standing again, very unrediction and automatiled spench and six in command, a

I don’t think I’d be alone in thinking that there’s some merit in parts of this writing. I wonder if there’s an extent to which Finnegans Wake has ‘tainted’ the corpus somewhat, because stylistically, I think that’s the closest analogue to what could be said to be going on here. Interestingly, it seems to be formulating its own puns, words like ‘unrediction,’ ‘automatiled spench’ (a tantalising meta-textual reference I think) and ‘destynets’, I think, would all be reminiscent of what you could expect to find in any given section of the Wake, but they don’t turn up in the corpus proper, at least according to a ctrl + f search. What this suggests to me is that the algorithm is plotting relationships on the level of the character, as well as phrasal units. However, I don’t recall the sci-fi model turning up paragraphs that were quite so disjointed and surreal — they didn’t make loads of sense, but they were recognisable, as grammatically coherent chunks of text. Although this could be the result of working with a partially trained model.

So, how might they feed our creative process? Here’s my attempt at making nice sentences out of the above.

— I have never been married, she said. — There’s no good to be gotten out of that sort of thing at all.

He’d use his hands to do chin-ups, pull himself up over the second staircase that hung over the landing, and he’d hang then, wriggling across the awning it created over the first set of stairs, grunting out eight to ten numbers each time he passed, his feet just missing the carpeted surface of the real stairs, the proper stairs.

Every time she walked between them she would wonder which of the two that she preferred. Not the one that she preferred, but the one that were more her, which one of these two am I, which one of these two is actually me? It was the feeling of moving between the two that she could remember, not his hands. They were just an afterthought, something cropped in in retrospect.

She can’t remember her sons either.

Her life had been a slow rise, to come to what it was. A house full of men, chairs and staircases, and she wished for it now to coil into itself, like the corners of stale newspapers.

The first thing you’ll notice about this is that it is a lot shorter. I started off by traducing the above, in as much as possible, into ‘plain words’ while remaining faithful to the n-grams I liked, like ‘bright silence’ ‘old shadows’ and ‘great as prayer’. In order to create images that play off one another, and to account for the dialogue, sentences that seemed to be doing similar things began to cluster together, so paragraphs organically started to shrink. Ultimately, once the ‘purpose’ of what I was doing started to come out, a critique of bourgeois values, memory loss, the nice phrasal units started to become spurious, and the eight or so paragraphs collapsed into the three and a half above. This is also ones of my biggest writing issues, I’ll type three full pages and after the editing process they’ll come to no more than 1.5 paragraphs, maybe?

The thematic sense of dislocation and fragmentation could be a product of the source material, but most things I write are about substance-abusing depressives with broken brains cos I’m a twenty-five year old petit-bourgeois male. There’s also a fairly pallid Enright vibe to what I’ve done with the above, I think the staircases line could come straight out of The Portable Virgin.

Maybe a more well-trained corpus could provide better prompts, but overall, if you want better results out of this for any kind of creative praxis, it’s probably better to be a good writer.

A Deleuzian Theory of Literary Style

I’m always surprised when I read one of the thinkers generally, and perhaps lazily, lumped in to the general category of post-structuralist, when I find how great a disservice the term does to their work. To read Derrida, Foucault or Deleuze, is not to find a triad of philosophers who struggle to produce a coherent system via addled half-thoughts in order to deconstruct, stymie or relativise everything. In fact, I’m not sure there’s another philosopher I’ve read who displays greater attention to detail in their work than Derrida, and Deleuze, far from being a deconstructionist, presents us with painstaking and intricate schemata and models of thought. The rhizome, to take the most well-known concept associated with Deleuze and his collaborator, Félix Guattari, doesn’t provide us with a free-for-all, but an intricately worked-out model to enable further thought. Difference and Repetition is likewise painstaking, and so involved is Deleuze’s model of difference, applying it in great depth to my theory of literary style, might be something to do if one wished to be a mad person, particularly since, at an early stage in the work, he attempts to map his concepts to particular authors, such as Borges, Joyce, Beckett and Proust. But I’ll do my best.

My notion of literary style has been influenced by the fact of my dealing with the matter via computation, i.e. multi-variate analysis and machine learning. All the reading I’m doing on the subject, is leading me towards a theory of literary style founded on redundancy. When I say redundancy, I don’t mean that what distinguishes literary language from ‘normal’ language is its superfluity, an excess of that which it communicates. For the Russian formalists, this was key in defining literary language, its surfeit of meaning. I don’t like this distinction much, as it assumes that we can neatly cleave necessary communication from unnecessary communication, as if there were a clear demarcation between the words we use for their usage (utilitarian) and the words we use for their beauty (aesthetic). The lines between the two are generally blurred, and both can reinforce the function of the other. The shortcomings of this category become yet more evident when we take into account authors who might have a plain style, works which depend on a certain reticence to speak. Of course, a certain degree of recursion sets in here, as we could argue that it is in the showcased plainness of these writers that the superfluity of the work manifests itself. Which presents us with the inevitable conclusion that the definition is flawed because its a tautology; it’s excessive because it’s literary, it’s literary because it’s excessive.

My own idea of redundancy comes from a number of articles in the computational journal Literary and Linguistic Computing, the entire corpus of which, from the mid-nineties until today, I am slowly making my way through. It provides an interesting narrative of the ways in which computational criticism has evolved in these years. At first, literary critics would have been sure that the words that traditional literary criticism tends to emphasise, the big ones, the sparkly ones, the nice ones, were most indicative of a writer’s style. What practitioners of algorithmic criticism have come to realise however, is that it is the ‘particles’ of literary matter, that are far more indicative of a writer’s style, the distribution of words such as ‘the’, ‘a’, ‘an’, ‘and’, ‘said,’ which are sometimes left out of corpus stylistics altogether, dismissed as ‘stopwords,’ bandied about too often in textual materials of all kinds to be of any real use. It’s a bit too easy, with the barest dash of an awareness of how coding works, to start slipping into generalisations along the lines of neuroscience, so I won’t go too mad, but I will say that this is an example of the ways in which humans tend to identify patterns, albeit maybe not necessarily the determining, or most significant patterns, in any given situation.

We’re magpies when we read, for better or worse. When David Foster Wallace re-instates the subject of a clause at its end, a technique he becomes increasingly reliant on as Infinite Jest proceeds, we notice it, and it becomes increasingly to the fore in our sense of his style. But, in the grand scheme of the one-thousand some page novel, the extent to which this technique is made use of is statistically speaking, insignificant. Sentences like ‘She tied the tapes,’ in Between the Acts, for instance, pass our awareness by because of their pedestrian qualities, much like many other sentences that contain words such as ‘said,’ because of the extent to which any text’s fabric is predominantly composed of such filler.

In Difference and Repetition, Deleuze is concerned with reversing a trend within Western philosophy, to mis-read the nature of difference, which he traces back to Plato and Kant, and the idealist/transcendentalist tendencies within their thought. They believed in singular, ideal forms, against which the notion of the Image is pitched, which can only be inferior, a simulacrum, as they are derivative copies. Despite his model of the dialectic, Hegel is no better when it comes to comprehending difference; Deleuze sees the notion of synthesis as profoundly damaging to difference, as the third-way synthesis has a tendency to understate it. Deleuze dismisses the process of the dialectic as ‘insipid monocentrality’. Deleuze’s issue seems to be that our notions of identity, only allow difference into the picture as a rupture, or an exception which vindicates an overall sense of homogeneity. Difference should be emphasised to a greater extent, and become a principle of our understanding:

Such would be the nature of a Copernican revolution which opens up the possibility of difference having its own concept, rather than being maintained under the domination of a concept in general already understood as identical.

Recognising this would be the advent of difference-in-itself.

This is all fairly consistent with Deleuze’s sense of Being as being (!) in a constant state of becoming, an experiential-led model of ontology which doesn’t aim for essence, but praxis. It would be fairly unproblematic to map this onto literary style; literary stylistics should likewise depend on difference, rather than similarity which only allows difference into the picture as a rupture; difference should be our primary criterion when examining the ways in which style becomes itself.

Another tendency of the philosophical tradition as Deleuze understands it is a belief in the goodness of thought, and its inclination towards moral, useful ends, as embodied in the works of Descartes. Deleuze reminds us of myopia and stupidity, by arguing that thought is at its most vital when at a moment of encounter or crisis, when ‘something in the world forces us to think.’ These encounters remind us that thought is impotent and require us to violently grapple with the force of these encounters. This is not only an attempt to reverse the traditional moral image of thought, but to move towards an understanding of thought as self-engendering, an act of creation, not just of what is thought, but of thought itself.

It would be to take the least radical aspect of this conclusion to fuse it with the notion of textual deformance, developed by Jerome McGann, which is of particular magnitude within the digital humanities, considering that we often process our text via code, or visualise it, and build arguments from these simulacra. But, on a level of reading which is, technologically speaking, less sophisticated, it reflects the way in which we generate a stylistic ideal as we read, a sense of a writer’s style, whether these be based on the analogue, magpie method (or something more systematic, I don’t want to discount syllable-counts, metric analyses or close readings of any kind) or quantitative methodologies.

By bringing ourselves to these points of crisis, we will open up avenues at which fields of thought, composed themselves of differential elements, differential relations and singularities, will shift, and bring about a qualitative difference in the environment. We might think of this field in terms of a literary text, a sequence of actualised singularities, appearing aleatory outside of their anchoring context as within a novel. Readers might experience these as breakthrough moments or epiphanies when reading a text, realising that Infinite Jest apes the plot of William Shakespeare’s Hamlet, for example, as it begins to cast everything in a new light. In this way, texts are made and unmade according to the conditions which determine them. I for one, find this to be so much more helpful in articulating what a text is than the blurb for post-structuralism, (something like ‘endlessly deferred free-play of meaning’). Instead, we have a radical, consistently disarticulating and re-articulating literary artwork in a perpetual, affirming state of becoming, actualised by the reader at a number of sensitive points which at any stage might be worried into bringing about a qualitative shift in the work’s processes of meaning making.

Modernist Stylistic Variables

The question that this blog post sets itself is: What differences and similarities can be detected in modernist and contemporary authors on the basis of three stylistic variables; hapax, unique and ambiguity, and how are these stylistic variables related to one another?

I: The Data

The data to be analysed in this project were derived from an analysis of twenty-one corpora of avant-garde literary prose through use of the open-source programming language R. The complete works of the authors James Joyce, Virginia Woolf, Gertrude Stein, Sara Baume, Anne Enright, Will Self, F. Scott FitzGerald, Eimear McBride, Ernest Hemingway, Jorge Luis Borges, Joseph Conrad, Ford Madox Ford, Franz Kafka, Katherine Mansfield, Marcel Proust, Elizabeth Bowen, Samuel Beckett, Flann O’Brien, Djuna Barnes, William Faulkner & D.H. Lawrence were used.

Seventeen of these writers were active between the years 1895 and 1968, a period of time associated with a genre of writing referred to as ‘modernist’ within the field of literary criticism. The remaining four remain alive, and have novels published as early as 1991, and as late as 2016. These novelists are known for their identification as latter-day modernists, and perceive their novels as re-engaging with the modernist aesthetic in a significant way.

I.II Uniqueness

The unique variable is a generally accepted measurement used within digital literary criticism to quantify the ‘richness’ of a particular text’s vocabulary. The formula for uniqueness is obtained by dividing the number of distinct word types in a text by the total number of words. For example, if a novel contained 20000 word types, but 100000 total words, the formula for obtaining this text’s uniqueness would be as follows:

20000/100000 = Uniqueness is equal to 0.2

I.III Ambiguity

Ambiguity is a measure used to calculate the approximate obscurity of a text, or the extent to which it is composed of indefinite pronouns. The indefinite pronouns quantified in this study are as follows, ‘another’, ‘anybody’, ‘anyone’, ‘anything’, ‘each’, ‘either’, ‘enough’, ‘everybody’, ‘everyone’, ‘everything’, ‘little’, ‘much’, ‘neither’, ‘nobody’, ‘no one’, ‘nothing’, ‘one’, ‘other’, ‘somebody’, ‘someone’, ‘something’, ‘both’, ‘few’, ‘everywhere’, ‘somewhere’, ‘nowhere’, ‘anywhere’, ‘many’, ‘others’, ‘all’, ‘any’, ‘more’, ‘most’, ‘none’, ‘some’, ‘such’. The formula for ambiguity is:

number of indefinite pronouns / number of total words

I.IV Hapax

Finally, the hapax variable calculates the density of hapax legomena, words which appear only once in a particular author’s oeuvre. The formula for this variable is:

number of hapax legomena / number of total words

a bar chart giving an overview of the data

II: Data Overview

Even before analysing the data in great depth, the fact that these variables are interrelated with one another stands to a logical analysis. Hapax and unique are best understood as an indication of a text’s heterogeneity, as if a text is hapax-rich, the score for uniqueness will be similarly elevated. Ambiguity, as it is a set of pre-defined words, can be considered a measure of a text’s homogeneity, and if the occurrences of these commonplace words are increasing, hapax and uniqueness will be negatively effected. The aim of this study will be to first determine how these measures vary according to the time frame in which the different texts were written, i.e. across modern and contemporary corpora, which correlations between stylistic variables exist, and which of the three is most subject to the fluctuations of another.

more overviews for each variable

IV.I: The Three Groups Hypothesis

A number of things are clear from these representations of the data. The first finding is that the authors fall into approximately three distinct groups. The first is the base- level of early twentieth-century modernist authors, who are all relatively undifferentiated. These are Ernest Hemingway, Virginia Woolf, William Faulkner, Elizabeth Bowen, Marcel Proust, F. Scott Fitzgerald, D.H. Lawrence, Joseph Conrad and Ford Madox Ford. They are all below the mean for the hapax and unique variables.

boxplot of outliers for the unique hapax variable

The second group reach into more extreme values for unique and hapax. These are Djuna Barnes, Jorge Luis Borges, Franz Kafka, Flann O’Brien, James Joyce, Eimear McBride and Sara Baume. Three of these authors are even outliers for the hapax variable, which can be seen in the box plot.

Joyce’s position as an extreme outlier in this context is probably due to his novel Finnegans Wake (1939), which was written in an amalgam of English, French, Irish, Italian and Norwegian. It’s no surprise then, that Joyce’s value for hapax is so high. The following quotation may be sufficient to give an indication of how eccentric the language of the novel is:

La la la lach! Hillary rillarry gibbous grist to our millery! A pushpull, qq: quiescence, pp: with extravent intervulve coupling. The savest lauf in the world. Paradoxmutose caring, but here in a present booth of Ballaclay, Barthalamou, where their dutchuncler mynhosts and serves them dram well right for a boors’ interior (homereek van hohmryk) that salve that selver is to screen its auntey and has ringround as worldwise eve her sins (pip, pip, pip)

Though Borges’ and Barnes’ prose may not be as far removed from modern English as Finnegans Wake, both of these authors are known for their highly idiosyncratic use of language; Borges for his use of obscure terms derived from archaic sources, and Barnes for reversing normative grammatical and syntactic structures in unique ways.

The third and final group may be thought of as an intermediary between these two extremes, and these are Katherine Mansfield, Samuel Beckett, Will Self and Anne Enright. These authors share characteristics of both groups, in that the values for ambiguity remain stable, but their uniqueness and hapax counts are far more pronounced than the first group, but not to the extent that they reach the values of the second group.

boxplot displaying stein as an extreme outlier for ambiguity

Gertrude Stein is the only author who’s stylistic profile doesn’t quite fit into any of the three groups. She is perhaps best thought of as most closely analogous to the first group of early twentieth century modernists, but her extreme value for ambiguity should be sufficient to distinguish her in this regard.

The value for ambiguity remains fairly stable throughout the dataset, the standard deviation is 0.03, but if Stein’s values are removed from the dataset, the standard deviation narrows from 0.03 to 0.01.

Two disclaimers need to be made about this general account from the descriptive statistics and graphs. The first is that there is a fundamental issue with making such a schematic account of these texts. The grouping approach that this project has taken thus far is insufficiently nuanced as it could probably be argued that McBride could just as easily fit into the third group as the second. Therefore, the stylistic variables do not adequately distinguish modern and contemporary corpora from one another.

IV.II Word Count

word count for the most prolific authors

It should not escape our attention that those authors who score lowest for each variable and that the first group of early twentieth-century author are the most prolific. The correlation between word count and the stylistic variables was therefore constructed.

Pearson correlation for word count and stylistic variables

Both the Pearson correlation and Spearman’s rho suggest that word count is highly negatively correlated with hapax and unique (as word count increases, hapax and unique decreases and vice versa), but not with ambiguity.

Spearman’s rho for word count and stylistic variables

The fact that the Spearman’s rho scores significantly higher than the Pearson suggests that the relationship between the two are non-linear. This can be seen in the scatter plot.

scatter plot showing the relationship between word count and uniqueness

In the case of both variables, the correlation is obviously negative, but the data points fall in a non-linear way, suggesting that the Spearman’s rho is the better measure for calculating the relationship. In both cases it would seem that Joyce is the outlier, and most likely to be the author responsible for distorting the correlation.

scatter plot displaying the relationship between word count and hapax density
Pearson correlations for word count and each stylistic variable

SPSS flags the correlation between hapax and unique as being significant, as this is clearly the most noteworthy relationship between the three stylistic variables. The Spearman’s rho exceeded the Spearman correlation by a marginal amount, and it was therefore decided that the relationship was non-linear, which is confirmed by the scatter plot below:

Spearman’s rho correlation for word count and stylistic variables

The stylistic variables of unique and hapax are therefore highlycorrelated.

VI: Conclusion

As was said already, the notion that stylistic variables are correlated stands to reason. However, it was not until the correlation tests were carried out that the extent to which uniqueness and hapax are determined by one another was made clear.

The biggest issue with this study is the issue that is still present within digital comparative analyses in literature generally; our apparent incapacity to compare texts of differing lengths. Attempts have been made elsewhere to account for the huge difference that a text’s length clearly makes to measures of its vocabulary, such as vectorised analyses that take measurements in 1000 word windows, but none have yet been wholly successful in accounting for this difference. This study is therefore one among many which presents its results with some clarifiers, considering how corpora of similar lengths clustered together with one another to the extent that they did. The only author that violated this trend was Joyce, who, despite a lengthy corpus of 265500 words, has the highest values for hapax and uniqueness, which marks his corpus out as idiosyncratic. Joyce’s style is therefore the only of the twenty-one authors that we can say has a writing style that can be meaningfully distinguished from the others on the basis of the stylistic variables, because he so egregiously reverses the trend.

But we hardly needed an analysis of this kind to say Joyce writes differently from most authors, did we.

Will Self’s ‘Umbrella’ and post-modern modernity

As has been repeated in any number of the literary outlets which give Will Self column inches, Self has thumbed his nose at the British literary establishment, readers and writers alike, by returning to the ground zero of avant-garde prose writing in his trilogy of Umbrella, Shark and the forthcoming Phone. I held off reading Umbrella for some time, for the same reason that one generally doesn’t read a novel written by one of the authors that one might rate highly, sensing in advance that it will be in some way a disappointment, particularly when said author has set themselves the task of re-invigorating an dormant genre in which one is steeped in, on a semi-professional basis.

But I did listen to, and read, an awful lot of interviews in which Self spoke on why he’s returning to modernism as a wellspring for his own fiction. In one of these interviews, which unfortunately, I can’t seem to find, Self says that one of the things he was trying to avoid, was writing a post-modern version of modernity. At the time I heard it, I had no idea what that might mean, or what a post-modern modernity might look like. After having read Umbrella, whether Self intended it or not, I have a far better understanding of the phrase, because I think that a post-modern modernity is exactly what Self has stumbled upon in Umbrella.

The plot moves between roughly three time frames, centred around four individuals, the primary one being Zack Busner, a fixture in many of Self’s works, Busner generally functions as a composite of the author and the late neurologist Oliver Sacks. In Umbrella, Busner is a psychiatrist based in London, treating Audrey Death for her encephalitic lethargica, which has left her in a catatonic state for decades. In some parts of the novel, Busner is doing so in 1970, and in other parts, he looks back on the affair in 2010. While this is happening, the narrative will jump back to the Audrey’s early adulthood in the opening decades of the century, working in a munitions factory, getting involved in radical socialist circles. Her brothers, Stanley and Albert, are also focalisers of the narrative at points, albeit in very different ways. Indirect discourse and interior monologue are probably the two best known characteristics of modernist prose, and these two take the lion’s share of the novel’s foray into experimentation, allowing for the character’s voices to blend suggestively with the narrator’s, making it difficult to tell where Audrey, Busner, Albert and Stanley are speaking amidst the barrage of music-hall pieces, street rhymes and song lyrics. Side Note: Azaelia Banks and The Kinks feature. Unfortunately, Self generally does so through use of italics. Here’s a typical example:

The boyfriend hadn’t minded gotta split, man and Busner was split…a forked thing digging its way inside her robe. She fiddled with bone buttons at her velvety throat. His skin and hairs snagged on the mirrors, his fingers did their best with her nipples. She looked down on me from below … one his calves lay cold on the floorboards. There was the faint applause of pigeons from outside the window —

Italics are used here to allow us access to Busner’s mind, his memory, and for Lear references. There’s nothing bad in here (or in the novel overall, Self’s sentences are staggering for how rhymically attuned they are, particularly when he dallies with academic verbiage and sub-clauses to the extent that he does), the problem is you sort of know where these turns are coming from the typography. There was a ‘Remastered’ version of Ulysses published about six years ago, produced by Robert Gogan, in which the interior monologue appeared in italics. The three or four people in the world who care about such things were outraged at the simplification, seeing the text as having been purged of its ambiguity. I think this periodic italicisation is to Umbrella’s detriment overall; it substitutes a reading that might have demanded even more of you for a more surreal-looking typeface.

My own notion of Umbrella’s modernism would therefore be rather distinct from the identification made between Umbrella and this rather inflexible and monolithic modernism made in some literary journalism, because I don’t see it as modernist in the same way that the ‘men of 1914’ are modernists. Although they might have one thing in common.

will-self-1420801432

Self’s modernism is a selling point serving a rather specific function in today’s literary marketplace. Self’s modernism builds upon his persona as a surly performer on television news-panel shows and newspaper columns, going out of his way to discourage people from reading his books by his performative hauteur and dismissive attitude regarding everything. Returning to a praxis of literary art some six decades out of date is the logical conclusion of being Will Self. For Self, being a latter day modernist is to reject the commodification of the literary artwork, and insist upon the right of the author to write something wholly non-commercial. Umbrella therefore carries with it a critique of commodity culture, and the proliferation of screens, which Self also decries regularly, believing it to signal an end to the novel. However, the canard of modernism’s opposition to commodity culture has been overhyped after postmodern novelists made such a point of engaging with the novel as a commodity, and one should remember that modernism was deeply involved in the marketplace of its time; Ezra Pound began using zeitgeist-y words like ‘modern’ and ‘futurity’ to draw Marinetti’s audiences, who were substantially larger than his own when he first came to London. Performative modernism, cultivated for the purchasing attentions of a well-groomed and discerning élite is one of the things that Self gets right regarding his channeling of the genre.

Umbrella also seems to draw on modernism’s sometimes overlooked heritage, as it is at least somewhat to blame for the volume of secondary literature written subsequent to its boom and bust. From even a vague knowledge of these texts we might produce some foundational aspects of modernism; that it is taken to entail a shift in consciousness and human subjectivity, that exposure to slaughter and death on an industrial scale led to an ambivalence regarding technology and a sundering of rigid social hierarchies, an increasing mediation of our reality through mass media, growth of radical political movements such as feminism and socialism, etc. etc. etc. Our responses to these texts are thereby pre-determined; we know what we can expect from a canonical modernist text.

Which is why the modernism of Umbrella seems post-modern. It’s hard to read Audrey’s re-animation in the 1970’s, or Busner’s recollection of the time in 2010, as a meta-commentary on Umbrella’s resuscitation of the genre. The fact that Audrey worked in a munitions factory, as a radical socialist and feminist, that one of her brothers, Stanley, went to fight in the war, while her other brother, Albert, Pynchon-like, became an arms manufacturer selling weapons which fuelled the conflict, that in her comatose state she rehearses the actions of her time at the lathe, seems to have been dictated by our relationship to modernism in our contemporary setting. In the novel’s closing stages, Audrey’s status as a symbol of technology’s encroachment into our subjectivity is made overt:

The final words Audrey Death had spoken before relapsing into a merciful swoon were a string of nonsensical fractions — eighteen over four-point-two, ninety-four over fourteen-point-seven, sixty-six-point-three over thirty-three…that, even as he accepted the futility of the exercise, Busner had tried to fit into some conceptual framework. Were they, perhaps, the numerical analogue of her brain-chemistry’s intro-conversions between the discrete and the continuous, the quantifiable and the relativistic?

The irony here is that the paragraph in which Self is telling you exactly what the novel is about, features a character attempting to make sense of a random string of numbers. This is far from what the book is, a novel which has been compulsively over-determined in any number of columns, interviews and lectures which, taken collectively, probably come to a length equal to the text. While the modernists can be considered guilty of pushing particular interpretations — they often wrote about their own work, in the way that authors often do, by pretending to write objectively on other authors, The Waste Land came with annotations (parodic ones, but annotations nonetheless) — it feels as though Self’s foray into it is too overtly packaged as such. It’s probably my own fault for consuming it as I did, a book has to be sold after all, and no one made me read those six Guardian interviews. I should wrap up by saying that this novel is very good, and that you should read it, and, in true modernist style, ‘the rest is noise’.

The Political Economy of the New Modernists

0_f8f35a66-5c0f-4e93-bd4d-aecc410e5ba7

A few weeks ago I saw the inaugural event of the Dublin Book Festival, which was a panel discussion between the novelists Anne Enright, Lisa McInerney and the poet Pat Boran. They were speaking on the publication of a book entitled Beyond the Centre, a collection of 26 essays reflecting on the 25th anniversary of the Irish Writer’s Centre, from the perspective of various figures from within Dublin’s literary scene. It was a great panel, and Seán Rocks did one of the best jobs as a moderator that I can recall seeing. Enright was caustic and witty, going off on how The Irish Times will commission hundreds of articles by female writers about being a woman watching the US election, but none about policy, how she doesn’t think men have a gender, and her recollections of the younger writers of her generation being shunted into the backs of vans at the start of their careers while the Johns Banville and McGahern were driven around in limos.

As someone writing a doctorate which involves an analysis of Enright’s fiction, I was hoping that the things she said would stray into areas pertinent to my work. I knew she was unlikely to talk about quantitative analysis, and the sorts of things that my dissertation will actually be pivoting around, but if at all possible I hope to cram some stuff about the socio-economic milieu that the new modernists come out of, into my dissertation, as a refutation to the infuriating yet pervasive canard of industrialisation + world war = first-wave modernism.

Enright obliged, and I got a substantial amount of notes on how the currently established generation of authors got a leg up early in their careers from a cultural exchange in the nineties arranged by the then Irish and French presidents, Mary Robinson and François Mitterand. Enright has written in the past on what it was like to live in the Ireland of the 80’s, with the intensifying contradictions between the Republic of McQuaid, with its laws against suicide, contraception, homosexuality, and the newly globalised, open to foreign investment Ireland, beginning to become apparent in our public discourse.

As Diarmaid Ferriter writes in his book, Ambiguous Republic: Ireland in the 1970’s, these signs of ‘increased modernisation, secularisation, Europeanisation and consumerism have to be placed in the context of a republic that…had ultimately created a conservative, authoritarian governing culture, that…created a very wide definition of dissent’. There is in this quotation, a nuanced and useful reading of these two different Irelands in tandem with one another, rather than as divergent. Too often in cultural studies of Ireland, I’m made aware of the phenomenon of the ‘time warp,’ and the ways in which parts of the Irish political landscape seem to be rooted in truisms not from the last century, but the one before that. Ferriter’s take is more subtle than this, thankfully.

richard-hearns-ireland-of-the-welcomes-cover

The time warp is a conceptual tool that tries to account for the ways in which Ireland as a state can simultaneously manage to be the beneficiary of an economic boom powered by the development of information technologies on the West coast of the United States while being complicit in the captivity and enslavement of women, to give just one example. As we well know, the capitalist nation state, both historically and in our present moment, is not a static enough concept to abhor contradictions of this kind. It might even be said to thrive on them. It is for this reason that the concept of the time warp is a bit useless, in that it instantiates a notion that we are always moving forward in some way; despite the appearance that some of these ‘kinks’ might give off, they’ll be ironed out in good time. (There’s a well-meaning senator with a report on the matter brewing in some back office on Kildare Street for nigh on half the term of the currently sitting government, and a seventieth of the Dáil might even show up on the day it’s to be discussed, just sit tight.) In order for particular ideologies to function, pockets of our society in which the most vulnerable reside must have their existences subject to relegation or dismissal as time warps, as if artefacts of the nineteenth century have the habit of peskily colonising the twenty-first. This gesture allows us to dispense with aspects of our national identities which might otherwise bring us to a point of contradiction. To take one example, Ireland can simultaneously believe itself to be a nation that is charitable, and LGBT-friendly, while placing many of those fleeing persecution (sometimes for their sexual orientation) in detention centres for an indefinite span of time.

Enright, among other things I’m sure, considers herself a product of this particularly Irish cultural discord, writing rather brilliantly in her work, Making Babies: Stumbling into Motherhood, about a particularly divisive time in Irish public life, the eighties, and its role in her attempted suicide, which I will now quote from at length:

I fell out of the world, temporarily, on Easter Monday 1986…Maybe I had Seasonal Affective Disorder, maybe it is genetic, maybe it was me being in my twenties, maybe it was Ireland being in the 1980s.

The older I get the more political I am about depression, or less essentialist — it is not because of who you are, but where you are placed. Ireland broke apart in the eighties, and I sometimes think that the crack happened in my own head. The constitutional row about abortion was a moral civil war that was fought out in people’s homes — including my own — with unfathomable bitterness. The country was screaming at itself about contraception, abortion, and divorce. It was a hideously misogynistic time. Not the best environment for a young woman establishing a sexual identity, you might say, especially one with adolescent morbidity and tendencies towards ecstatic suffusions of light, one who was over-achieving, but somehow in all the wrong ways, one who was both maverick and clever. I mean, what do we need here, a diagram?

…I…wrote some books. They were fragmented books, because this is what I knew best, but also, I fancied, because I lived in an incoherent country. They were slightly surreal, because Ireland was unreal. They dealt with ideas of purity, because the chastity of Irish women was one of the founding myths of the Nation State (well that was my excuse). But they were also full of corpses. Beautiful ones, speaking ones, sexual ones, bitter ones; corpses who did not forgive, or rot. Who was the corpse? It was myself, of course, but also Christ, the dead body on a stick. And it is the past that lies down but will not shut up, the elephant in the national living-room.

400255

To read these paragraphs, and the other paragraphs in the same chapter (do pick it up, it is so, so good) is to become aware of how irrelevant women’s health and their autonomy was to the Irish establishment of the time. It’s no surprise then, that the Irish literary establishment was mostly suspicious regarding the raft of new wordists who came to a kind of prominence in the late eighties and early nineties, the vanguard of whom was probably Roddy Doyle, though Enright also named Patrick McCabe as a trailblazer. This generation’s early novels weren’t reviewed, and when they were, they were eviscerated. This apparent lack of a domestic audience, or the unwillingness of the tastemakers to cultivate one, required that Irish authors sell themselves abroad, and only then, by commodius vicus of recirculation, return to the domestic market. This route generally led to euphemistic conversations about formal qualities such as ‘lyricism’ and other such words acting as stand-ins for question marks over one’s authenticity.

This is why the cultural exchange’s timing was so opportune, and made, by necessity, Irish authors far more permeable to international influences. They all gained hugely from it, ‘they’ meaning, I assume Enright, Joseph O’Connor and Deirdre Madden.

Donal Donovan and Antoin C. Murphy’s study, The Fall of the Celtic Tiger: Ireland and the Euro Debt Crisis requires us to take a leap forward about by just under two decades and outline the ways in which Ireland’s position changed from a peripheral, insufficiently industrialised state, ‘the poorest of the rich,’ to a contemporary globalised market economy within the framework of the European Union. No Irish citizen who remembers the eighties will be unaware of the effect that this union has had on our general standards of living. I think. I wasn’t alive at the time. But I am interested in what this change from peripheral backwater to post-modern globalised economy has on our self-perception. It is perhaps inevitable that we encounter the time warp once again, albeit in the context of Ireland’s leap into means:

while the ‘catch-up’ paradigm explains part of the story, the speed and extent of Ireland’s transformation was primarily driven by high-tech multinationals, the vanguard of a major worldwide revolution in information technology…in the post-industrial high-tech world, these concepts had started to become anachronistic.

3725fb7900000578-3734775-image-a-65_1471015661159

So too do many governing metaphors of the literary landscape become de-legitimised. The matter of literary influence in particular, becomes increasingly knotty in a global marketplace. Brian Dillon writes in the London Review of Books that if there is a modernist resurgence in Irish literature today, it is less a return, than a demonstration of the extent to which authors today can draw from any number of traditions, even experimental ones. As such, it is less important to talk about the new modernists because they’re Irish, but what this literary self-identification signifies. Not all of this is voluntary, of course; just being a female novelist in Ireland has a profound political resonance, as anyone familiar with the career of Edna O’Brien will know.

The Irish free State made clear its suspicion regarding modernism and modern art in general, by introducing film censorship in 1923. The first Irish review of Ulysses was also blocked by the printer of The Dublin Magazine, forcing its author, Con Levanthal, to set up a one-off journal, Klaxon. The Catholic Truth Society took an active role in Ireland’s cultural life over the next few decades by stymieing the dissemination of anything perceived as indecent, modern, or Protestant. Those of the literary world reacted to this with outrage, as these bans generally effected avant-garde works rather than pornographic ones, but their objections never translated into popular political support. David Dickson, in Dublin: The Making of a Capital City,points out that this emphasis on censorship can ignore the extent to which musical and theatrical forms often thrived, but for the most part, Dublin was a place to leave in favour of other urban capitals, where one was more likely to obtain a patron, public or private.

This policy didn’t make for good neighbours, of course. As Eavan Boland wrote, ‘No two establishments in this community regard one another with more suspicion than those of the Arts and the State.’ This was due to the fact that the Free State’s scepticism regarding modernism extended, to the arts in general. The Arts Council existed, in name only, up until its role was formalised in the late seventies. Up until then, it provided cheques to artists on a hand to mouth basis, had no women on its board and had no particular remit or code of behaviour. Public funding for the arts was also about 30% less than in the United Kingdom.

Related to this, (I know I’m moving around a lot, but it’ll come good in the end), Garret FitzGerald’s analysis of Ireland joining the EU was as follows:

Our independence was won for us just in time to enable most of Ireland to enter to European Community as one of Europe’s ancient nations, rejoining once again the Europe from which for so many centuries she was cut off by the imposition of British rule. We shall negotiate our entry as a sovereign state…the voice of Ireland will be heard in Europe in the decades ahead. But for the sacrifices of those who won our freedom, none of this could have been. We have the right to believe that they will feel as they view this prospect that their sacrifices were not all in vain.

Despite the gloss that FitzGerald puts on Ireland’s joining the union as in a continuity of Irish independence movements, Ferriter argues that Ireland joined primarily because England was joining. The dominant understanding of Ireland’s membership is one of economic, social and cultural gain; lucrative agricultural grants, social justice legislation, worker protections, consumer and environmental regulation, all have their origins in EU initiatives. In a cultural sense however, it can be seen an inducing another form of peripherality, relative to the wider continent, rather than to England. Ireland is, after all, a relatively small state in a union driven by larger nations. Joe Lee has argued that joining the union has had the effect of encouraging our leaders to continue to apportion blame for their failures to external factors, rather than scrutinising and reforming our own industries and regulatory frameworks. The playwright Brian Friel viewed the Irish state around this time as a ‘tenth-rate image of America’ and indeed, there seemed to be little to distinguish the Ireland open to multi-national capital and foreign direct investment, a consumer-driven economy in the post-modern sense, from any other Western city.

Works from Enright’s oeuvre such as The Portable Virgin, The Wig my Father Wore and The Forgotten Waltz, all fit rather nicely within this interpretation, and inventively engage with the conversation between traditional mainstays of Irish identity and the post-modern market economy which had grown up around them, which made the old certainties complicit, as much as it ‘unsettled’ them.

I’ll talk about the ending of the short short story ‘The Portable Virgin’ because it seems to encapsulate a lot of what I’m talking about:

I am sitting on Dollymount Strand going through Mary’s handbag, using her little mirror, applying her ‘Wine Rose and Gentlelight Colourize Powder Shadow Trio’, her Plumsilk lipstick, her Venetian Brocade blusher and her Tearproof (thank God) mascara.

My revenge looks back at me, out of the mirror. The new fake me looks twice as real as the old. Underneath my clothes my breasts have become blind, my iliac crests mottle and bruise. Strung out between my legs is a triangle of air that pulls away from sex, while my hands clutch. It used to be the other way around.

I root through the bag, looking for a past. At the bottom, discoloured by Wine Rose and Gentlelight, I find a small, portable Virgin. She is made of transparent plastic, except for her cloak, which is coloured blue. ‘A present from Lourdes’ is written on the globe at her feet, underneath her heel and the serpent. Mary is full of surprises. Her little blue crown is a screw-off top, and her body is filled with holy water, which I drink.

The narrator is having an affair, the ins and outs of which we can never be totally certain -each player’s identities remain fluid throughout the story. Dollymount Strand is a significant enough place to consider sumjex and objex, but when one’s extra-marital activities have been ironically genuflecting before a Judi Dench costume drama, also about infidelity and inappropriately stately furniture, the stakes feel as though they have been heightened. The various accoutrements of contemporary female identity ‘Gentlelight Colourize (note the American zee) Powder Shadow’ are to the fore, and while the tacky symbolic representation of old Ireland has been discoloured by the errant make-up, it’s still there. At least until it’s sent surging out to sea at the end. Enright, being a sophisticated as well as an intellectual novelist, doesn’t foreground this sort of thing, that is to say, it doesn’t place demands on the reader as such, it never gets in the way of the fun.

A Girl is a Half-Formed Thing, with its profound sense of formal dislocation, and an origin point within the economically depressed, culturally stifled Ireland of the 1980’s, is another important node of discussion here; McBride has encouraged such analyses by making reference to it as a sort of a refracted autobiography. But while tracing over the wrecked and bloodied sockets of a fragmented subjectivity, it also aims to revivify the cornerstones of the institutionalised modernisms as practiced by James Joyce and Samuel Beckett. No part of the novel makes this point clearer than the novel’s beginning, because it is its beginning, and uncompromising off the bat:

For you. You’ll soon. You’ll give her name. In the stitches of her skin she’ll wear your say. Mammy me? Yes you. Bounce the bed, I’d say. I’d say that’s what you did. Then lay you down. They cut you round. Wait and hour and day.

Not as much to ‘play’ with as Enright might give us, shorter sentences, shorter words, less things, but more baggage, meaning this, of course, in the best possible way. What we have is a swift and deep immersion into the materiality of language, all the rhymes, assonances, repetition and rhythm of which it’s capable, which, in an increasingly bland literary marketplace, is revolutionary. After having read The Lesser Bohemians, and Claire Lowdon’s review of the two of them, I’m slightly loathe to praise it without clarifiers, but I do think there is a lot that it is good in its incorporation of the elements familiar to the Irish misery memoir within a high modernist register. Because misery is for life, not just for the realists.

I hope it will be clear from all this that contemporary modernists draw on a history of formal experimentation, regarded with suspicion by the Irish state with a view to challenging the received wisdom of its theocratic tendencies, marginalisation and violent oppression of women.

A (Proper) Statistical analysis of the prose works of Samuel Beckett

mte5ndg0mdu0odk1otuzndiz

Content warning: If you want to get to the fun parts, the results of an analysis of Beckett’s use of language, skip to sections VII and VIII. Everything before that is navel-gazing methodology stuff.

If you want to know how I carried out my analysis, and utilise my code for your own purposes, here’s a link to my R code on my blog, with step-by-step instructions, because not enough places on the internet include that.

I: Things Wrong with my Dissertation’s Methodology

For my masters, I wrote a 20000 word dissertation, which took as its subject, an empirical analysis of the works of Samuel Beckett. I had a corpus of his entire works with the exception of his first novel Dream of Fair to Middling Women, which is a forgivable lapse, because he ended up cannibalising it for his collection of short stories, More Pricks than Kicks.

Quantitative literary analysis is generally carried out in one of two ways, through either one of the open-source programming languages Python or R. The former you’ve more likely to have heard of, being one of the few languages designed with usability in mind. The latter, R, would be more familiar to specialists, or people who work in the social sciences, as it is more obtuse than Python, doesn’t have many language cousins and has a very unfriendly learning curve. But I am attracted to difficulty, so I am using it for my PhD analysis.

I had about four months to carry out my analysis, so the idea of taking on a programming language in a self-directed learning environment was not feasible, particularly since I wanted to make a good go at the extensive body of secondary literature written on Beckett. I therefore made use of a corpus analysis tool called Voyant. This was a couple of years ago, so this was before its beta release, when it got all tricked out with some qualitative tools and a shiny new interface, which would have been helpful. Ah well. It can be run out of any browser, if you feel like giving it a look.

My analysis was also chronological, in that it looked at changes in Beckett’s use of language over time, with a view to proving the hypothesis that he used a less wide vocabulary as his career continued, in pursuit of his famed aesthetic of nothingness or deprivation. As I wanted to chart developments in his prose over time, I dated the composition of each text, and built a corpus for each year, from 1930–1987, excluding of course, years in which he just wrote drama, poetry, which wouldn’t be helpful to quantify in conjunction with one another. Which didn’t stop me doing so for my masters analysis. It was a disaster.

II: Uniqueness

Uniqueness, the measurement used to quantify the general spread of Beckett’s vocabulary, was obtained by the generally accepted formula below:

unique word tokens / total words

There is a problem with this measurement, in that it takes no account of a text’s relative length. As a text gets longer, the likelihood of each word being used approaches 1. Therefore, a text gets less unique as it gets bigger. I have the correlations to prove it:

screen-shot-2016-11-03-at-12-18-03There have been various solutions proposed to this quandary, which stymies our comparative analyses, somewhat. One among them is the use of vectorised measurements, which plot the text’s declining uniqueness against its word count, so we see a more impressionistic graph, such as this one, which should allow us to compare the word counts for James Joyce’s novels, A Portrait of the Artist as a Young Man and his short story collection, Dubliners.

screen-shot-2016-11-03-at-13-28-18

All well and good for two or maybe even five texts, but one can see how, with large scale corpora, this sort of thing can get very incoherent very quickly. Furthermore, if one was to examine the numbers on the y-axis, one can see that the differences here are tiny. This is another idiosyncrasy of stylostatistical methods; because of the way syntax works, the margins of difference wouldn’t be regarded as significant by most statisticians. These issues relating to the measurement are exacerbated by the fact that ‘particles,’ the atomic structures of literary speech, (it, is, the, a, an, and, said, etc.) make up most of a text. In pursuit of greater statistical significance for their papers, digital literary critics remove these particles from their texts, which is another unforgivable that we do anyway. I did not, because I was concerned that I was complicit in the neoliberalisation of higher education. I also wrote a 4000 word chapter that outlined why what I was doing was awful.

IV: Ambiguity

The formula for ambiguity was arrived at by the following formula:

number of indefinite pronouns/total word count

I derived this measurement from Dr. Ian Lancashire’s study of the works of Agatha Christie, and counted Beckett’s use of a set of indefinite pronouns, ‘everyone,’ ‘everybody,’ ‘everywhere,’ ‘everything,’ ‘someone,’ ‘somebody,’ ‘somewhere,’ ‘something,’ ‘anyone,’ ‘anybody,’ ‘anywhere,’ ‘anything,’ ‘no one,’ ‘nobody,’ ‘nowhere,’ and ‘nothing.’ Those of you who know that there are more indefinite pronouns than just these, you are correct, I had found an incomplete list of indefinite pronouns, and I assumed that that was all. This is just one of the many things wrong with my study. My theory was that there were to be correlations to be detected in Beckett’s decreasing vocabulary, and increasing deployment of indefinite pronouns, relative to the total word count. I called the vocabulary measure ‘uniqueness,’ and the indefinite pronouns measure I called ‘ambiguity.’ This in tenuous I know, indefinite pronouns advance information as they elide the provision of information. It is, like so much else in the quantitative analysis of literature, totally unforgivable, yet we do it anyway.

V: Hapax Richness

I initially wanted to take into account another phenomenon known as the hapax score, which charts occurrences of words that appear only once in a text or corpus. The formula to obtain it would be the following:

number of words that appear once/total word count

I believe that the hapax count would be of significance to a Beckett analysis because of the points at which his normally incompetent narrators have sudden bursts of loquaciousness, like when Molloy says something like ‘digital emunction and the peripatetic piss,’ before lapsing back into his ‘normal’ tone of voice. Once again, because I was often working with a pen and paper, this became impossible, but now that I know how to code, I plan to go over my masters analysis, and do it properly. The hapax score will form a part of this new analysis.

VI: Code & Software

A much more accurate way of analysing vocabulary, for the purposes of comparative analysis when your texts are of different lengths, therefore, would be to randomly sample it. Obviously not very easy when you’re working with a corpus analysis tool online, but far more straightforward when working through a programming language. A formula for representative sampling was found, and integrated into the code. My script is essentially a series of nested loops and if/else statements, that randomly and sequentially sample a text, calculate the uniqueness, indefiniteness and hapax density ten times, store the results in a variable, and then calculate the mean value for each by dividing the result by ten, the number of times that the first loop runs. I inputted each value into the statistical analysis program SPSS, because it makes pretty graphs with less effort than R requires.

VII: Results

I used SPSS’ box plot function first to identify any outliers for uniqueness, hapax density and ambiguity. 1981 was the only year which scored particularly high for relative usage of indefinite pronouns.

screen-shot-2016-11-03-at-12-27-38

It should be said that this measure too, is correlated to the length of the text, which only stands to reason; as a text gets longer the relative incidence of a particular set of words will decrease. Therefore, as the only texts Beckett wrote this year, ‘The Way’ and ‘Ceiling,’ both add up to about 582 words (the fifth lowest year for prose output in his life), one would expect indefiniteness to be somewhat higher in comparison to other years. However, this doesn’t wholly account for its status as an outlier value. Towards the end of his life Beckett wrote increasingly short prose pieces. Comment C’est (How It Is) was his last novel, and was written almost thirty years before he died. This probably has a lot to do with his concentration on writing and directing his plays, but in his letters he attributed it to a failure to progress beyond the third novel in his so-called trilogy of Molloy, Malone meurt (Malone Dies) and L’innomable (The Unnamable). It is in the year 1950, the year in which L’inno was completed, that Beckett began writing the Textes pour rien (Texts for Nothing), scrappy, disjointed pieces, many of which seem to be taking up from where L’inno left off, similarly the Fizzles and the Faux Départs. ‘The Way,’ I think, is an outgrowth of a later phase in Beckett’s prose writing, which dispenses the peripatetic loquaciousness and the understated lyricism of the trilogy and replaces it with a more brute and staccato syntax, one which is often dependent on the repetition of monosyllables:

No knowledge of where gone from. Nor of how. Nor of whom. None of whence come to. Partly to. Nor of how. Nor of whom. None of anything. Save dimly of having come to. Partly to. With dread of being again. Partly again. Somewhere again. Somehow again. Someone again.

Note also the prevalence of particle words, that will have been stripped out for the analysis, and the ways in which words with a ‘some’ prefix are repeated as a sort of refrain. This essential structure persists in the work, or at least the artefact of the work that the code produces, and hence of it, the outlier that it is.

screen-shot-2016-11-03-at-12-55-13

From plotting all the values together at once, we can see that uniqueness is partially dependent on hapax density; the words that appear only once in a particular corpus would be important in driving up the score for uniqueness. While there could said to be a case for the hypothesis that Beckett’s texts get less unique, more ambiguous up until 1944, when he completed his novel Watt, and if we’re feeling particularly risky, up until 1960 when Comment C’est was completed, it would be wholly disingenuous to advance it beyond this point, when his style becomes far too erratic to categorise definitively. Comment C’est is Beckett’s most uncompromising prose work. It has no punctuation, no capitalisation, and narrates the story of two characters, in a kind of love, who communicate with one another by banging kitchen implements off another:

as it comes bits and scraps all sorts not so many and to conclude happy end cut thrust DO YOU LOVE ME no or nails armpit and little song to conclude happy end of part two leaving only part three and last the day comes I come to the day Bom comes YOU BOM me Bom ME BOM you Bom we Bom

VIII: Conclusion

I would love to say that the general tone is what my model is being attentive to, which is why it identified Watt and How It Is as nadirs in Beckett’s career but I think their presence on the chart is more a product of their relative length, as novels, versus the shorter pieces which he moved towards in his later career. Clearly, Beckett’s decision to write shorter texts, make this means of summing up his oeuvre in general, insufficient. Whatever changes Beckett made to his aesthetic over time, we might not need to have such complicated graphs to map, and I could have just used a word processor to find it — length. Bom and Pim aside, for whatever reason after having written L’inno none of Beckett’s creatures presented themselves to him in novelistic form again. The partiality of vision and modal tone which pervades the post-L’inno works demonstrates, I think far more effectively what is was that Beckett was ‘pitching’ for, a new conceptual aspect to his prose, which re-emphasised its bibliographic aspects, the most fundamental of which was their brevity, or the appearance of an incompleteness, by virtue of being honed to sometimes less than five hundred words.

The quantification of differing categories of words seems like a radical, and the most fun, thing to quantify in the analysis of literary texts, as the words are what we came for, but the problem is similar to one that overtakes one who attempts to read a literary text word by word by word, and unpack its significance as one goes: overdetermination. Words are kaleidoscopic, and the longer you look at them, the more threatening their darkbloom becomes, the more they swallow, excrete, the more alive they are, all round. Which is fine. Letting new things into your life is what it should be about, until their attendant drawbacks become clear, and you start to become ambivalent about all the fat and living things you have in your head. You start to wish you read poems instead, rather than novels, which make you go mad, and worse, start to write them. The point is words breed words, and their connections are too easily traced by computer. There’s something else about knowing that their exact correlations to a decimal point. They seem so obvious now.