Modernist Stylistic Variables

The question that this blog post sets itself is: What differences and similarities can be detected in modernist and contemporary authors on the basis of three stylistic variables; hapax, unique and ambiguity, and how are these stylistic variables related to one another?

I: The Data

The data to be analysed in this project were derived from an analysis of twenty-one corpora of avant-garde literary prose through use of the open-source programming language R. The complete works of the authors James Joyce, Virginia Woolf, Gertrude Stein, Sara Baume, Anne Enright, Will Self, F. Scott FitzGerald, Eimear McBride, Ernest Hemingway, Jorge Luis Borges, Joseph Conrad, Ford Madox Ford, Franz Kafka, Katherine Mansfield, Marcel Proust, Elizabeth Bowen, Samuel Beckett, Flann O’Brien, Djuna Barnes, William Faulkner & D.H. Lawrence were used.

Seventeen of these writers were active between the years 1895 and 1968, a period of time associated with a genre of writing referred to as ‘modernist’ within the field of literary criticism. The remaining four remain alive, and have novels published as early as 1991, and as late as 2016. These novelists are known for their identification as latter-day modernists, and perceive their novels as re-engaging with the modernist aesthetic in a significant way.

I.II Uniqueness

The unique variable is a generally accepted measurement used within digital literary criticism to quantify the ‘richness’ of a particular text’s vocabulary. The formula for uniqueness is obtained by dividing the number of distinct word types in a text by the total number of words. For example, if a novel contained 20000 word types, but 100000 total words, the formula for obtaining this text’s uniqueness would be as follows:

20000/100000 = Uniqueness is equal to 0.2

I.III Ambiguity

Ambiguity is a measure used to calculate the approximate obscurity of a text, or the extent to which it is composed of indefinite pronouns. The indefinite pronouns quantified in this study are as follows, ‘another’, ‘anybody’, ‘anyone’, ‘anything’, ‘each’, ‘either’, ‘enough’, ‘everybody’, ‘everyone’, ‘everything’, ‘little’, ‘much’, ‘neither’, ‘nobody’, ‘no one’, ‘nothing’, ‘one’, ‘other’, ‘somebody’, ‘someone’, ‘something’, ‘both’, ‘few’, ‘everywhere’, ‘somewhere’, ‘nowhere’, ‘anywhere’, ‘many’, ‘others’, ‘all’, ‘any’, ‘more’, ‘most’, ‘none’, ‘some’, ‘such’. The formula for ambiguity is:

number of indefinite pronouns / number of total words

I.IV Hapax

Finally, the hapax variable calculates the density of hapax legomena, words which appear only once in a particular author’s oeuvre. The formula for this variable is:

number of hapax legomena / number of total words

a bar chart giving an overview of the data

II: Data Overview

Even before analysing the data in great depth, the fact that these variables are interrelated with one another stands to a logical analysis. Hapax and unique are best understood as an indication of a text’s heterogeneity, as if a text is hapax-rich, the score for uniqueness will be similarly elevated. Ambiguity, as it is a set of pre-defined words, can be considered a measure of a text’s homogeneity, and if the occurrences of these commonplace words are increasing, hapax and uniqueness will be negatively effected. The aim of this study will be to first determine how these measures vary according to the time frame in which the different texts were written, i.e. across modern and contemporary corpora, which correlations between stylistic variables exist, and which of the three is most subject to the fluctuations of another.

more overviews for each variable

IV.I: The Three Groups Hypothesis

A number of things are clear from these representations of the data. The first finding is that the authors fall into approximately three distinct groups. The first is the base- level of early twentieth-century modernist authors, who are all relatively undifferentiated. These are Ernest Hemingway, Virginia Woolf, William Faulkner, Elizabeth Bowen, Marcel Proust, F. Scott Fitzgerald, D.H. Lawrence, Joseph Conrad and Ford Madox Ford. They are all below the mean for the hapax and unique variables.

boxplot of outliers for the unique hapax variable

The second group reach into more extreme values for unique and hapax. These are Djuna Barnes, Jorge Luis Borges, Franz Kafka, Flann O’Brien, James Joyce, Eimear McBride and Sara Baume. Three of these authors are even outliers for the hapax variable, which can be seen in the box plot.

Joyce’s position as an extreme outlier in this context is probably due to his novel Finnegans Wake (1939), which was written in an amalgam of English, French, Irish, Italian and Norwegian. It’s no surprise then, that Joyce’s value for hapax is so high. The following quotation may be sufficient to give an indication of how eccentric the language of the novel is:

La la la lach! Hillary rillarry gibbous grist to our millery! A pushpull, qq: quiescence, pp: with extravent intervulve coupling. The savest lauf in the world. Paradoxmutose caring, but here in a present booth of Ballaclay, Barthalamou, where their dutchuncler mynhosts and serves them dram well right for a boors’ interior (homereek van hohmryk) that salve that selver is to screen its auntey and has ringround as worldwise eve her sins (pip, pip, pip)

Though Borges’ and Barnes’ prose may not be as far removed from modern English as Finnegans Wake, both of these authors are known for their highly idiosyncratic use of language; Borges for his use of obscure terms derived from archaic sources, and Barnes for reversing normative grammatical and syntactic structures in unique ways.

The third and final group may be thought of as an intermediary between these two extremes, and these are Katherine Mansfield, Samuel Beckett, Will Self and Anne Enright. These authors share characteristics of both groups, in that the values for ambiguity remain stable, but their uniqueness and hapax counts are far more pronounced than the first group, but not to the extent that they reach the values of the second group.

boxplot displaying stein as an extreme outlier for ambiguity

Gertrude Stein is the only author who’s stylistic profile doesn’t quite fit into any of the three groups. She is perhaps best thought of as most closely analogous to the first group of early twentieth century modernists, but her extreme value for ambiguity should be sufficient to distinguish her in this regard.

The value for ambiguity remains fairly stable throughout the dataset, the standard deviation is 0.03, but if Stein’s values are removed from the dataset, the standard deviation narrows from 0.03 to 0.01.

Two disclaimers need to be made about this general account from the descriptive statistics and graphs. The first is that there is a fundamental issue with making such a schematic account of these texts. The grouping approach that this project has taken thus far is insufficiently nuanced as it could probably be argued that McBride could just as easily fit into the third group as the second. Therefore, the stylistic variables do not adequately distinguish modern and contemporary corpora from one another.

IV.II Word Count

word count for the most prolific authors

It should not escape our attention that those authors who score lowest for each variable and that the first group of early twentieth-century author are the most prolific. The correlation between word count and the stylistic variables was therefore constructed.

Pearson correlation for word count and stylistic variables

Both the Pearson correlation and Spearman’s rho suggest that word count is highly negatively correlated with hapax and unique (as word count increases, hapax and unique decreases and vice versa), but not with ambiguity.

Spearman’s rho for word count and stylistic variables

The fact that the Spearman’s rho scores significantly higher than the Pearson suggests that the relationship between the two are non-linear. This can be seen in the scatter plot.

scatter plot showing the relationship between word count and uniqueness

In the case of both variables, the correlation is obviously negative, but the data points fall in a non-linear way, suggesting that the Spearman’s rho is the better measure for calculating the relationship. In both cases it would seem that Joyce is the outlier, and most likely to be the author responsible for distorting the correlation.

scatter plot displaying the relationship between word count and hapax density
Pearson correlations for word count and each stylistic variable

SPSS flags the correlation between hapax and unique as being significant, as this is clearly the most noteworthy relationship between the three stylistic variables. The Spearman’s rho exceeded the Spearman correlation by a marginal amount, and it was therefore decided that the relationship was non-linear, which is confirmed by the scatter plot below:

Spearman’s rho correlation for word count and stylistic variables

The stylistic variables of unique and hapax are therefore highlycorrelated.

VI: Conclusion

As was said already, the notion that stylistic variables are correlated stands to reason. However, it was not until the correlation tests were carried out that the extent to which uniqueness and hapax are determined by one another was made clear.

The biggest issue with this study is the issue that is still present within digital comparative analyses in literature generally; our apparent incapacity to compare texts of differing lengths. Attempts have been made elsewhere to account for the huge difference that a text’s length clearly makes to measures of its vocabulary, such as vectorised analyses that take measurements in 1000 word windows, but none have yet been wholly successful in accounting for this difference. This study is therefore one among many which presents its results with some clarifiers, considering how corpora of similar lengths clustered together with one another to the extent that they did. The only author that violated this trend was Joyce, who, despite a lengthy corpus of 265500 words, has the highest values for hapax and uniqueness, which marks his corpus out as idiosyncratic. Joyce’s style is therefore the only of the twenty-one authors that we can say has a writing style that can be meaningfully distinguished from the others on the basis of the stylistic variables, because he so egregiously reverses the trend.

But we hardly needed an analysis of this kind to say Joyce writes differently from most authors, did we.

Will Self’s ‘Umbrella’ and post-modern modernity

As has been repeated in any number of the literary outlets which give Will Self column inches, Self has thumbed his nose at the British literary establishment, readers and writers alike, by returning to the ground zero of avant-garde prose writing in his trilogy of Umbrella, Shark and the forthcoming Phone. I held off reading Umbrella for some time, for the same reason that one generally doesn’t read a novel written by one of the authors that one might rate highly, sensing in advance that it will be in some way a disappointment, particularly when said author has set themselves the task of re-invigorating an dormant genre in which one is steeped in, on a semi-professional basis.

But I did listen to, and read, an awful lot of interviews in which Self spoke on why he’s returning to modernism as a wellspring for his own fiction. In one of these interviews, which unfortunately, I can’t seem to find, Self says that one of the things he was trying to avoid, was writing a post-modern version of modernity. At the time I heard it, I had no idea what that might mean, or what a post-modern modernity might look like. After having read Umbrella, whether Self intended it or not, I have a far better understanding of the phrase, because I think that a post-modern modernity is exactly what Self has stumbled upon in Umbrella.

The plot moves between roughly three time frames, centred around four individuals, the primary one being Zack Busner, a fixture in many of Self’s works, Busner generally functions as a composite of the author and the late neurologist Oliver Sacks. In Umbrella, Busner is a psychiatrist based in London, treating Audrey Death for her encephalitic lethargica, which has left her in a catatonic state for decades. In some parts of the novel, Busner is doing so in 1970, and in other parts, he looks back on the affair in 2010. While this is happening, the narrative will jump back to the Audrey’s early adulthood in the opening decades of the century, working in a munitions factory, getting involved in radical socialist circles. Her brothers, Stanley and Albert, are also focalisers of the narrative at points, albeit in very different ways. Indirect discourse and interior monologue are probably the two best known characteristics of modernist prose, and these two take the lion’s share of the novel’s foray into experimentation, allowing for the character’s voices to blend suggestively with the narrator’s, making it difficult to tell where Audrey, Busner, Albert and Stanley are speaking amidst the barrage of music-hall pieces, street rhymes and song lyrics. Side Note: Azaelia Banks and The Kinks feature. Unfortunately, Self generally does so through use of italics. Here’s a typical example:

The boyfriend hadn’t minded gotta split, man and Busner was split…a forked thing digging its way inside her robe. She fiddled with bone buttons at her velvety throat. His skin and hairs snagged on the mirrors, his fingers did their best with her nipples. She looked down on me from below … one his calves lay cold on the floorboards. There was the faint applause of pigeons from outside the window —

Italics are used here to allow us access to Busner’s mind, his memory, and for Lear references. There’s nothing bad in here (or in the novel overall, Self’s sentences are staggering for how rhymically attuned they are, particularly when he dallies with academic verbiage and sub-clauses to the extent that he does), the problem is you sort of know where these turns are coming from the typography. There was a ‘Remastered’ version of Ulysses published about six years ago, produced by Robert Gogan, in which the interior monologue appeared in italics. The three or four people in the world who care about such things were outraged at the simplification, seeing the text as having been purged of its ambiguity. I think this periodic italicisation is to Umbrella’s detriment overall; it substitutes a reading that might have demanded even more of you for a more surreal-looking typeface.

My own notion of Umbrella’s modernism would therefore be rather distinct from the identification made between Umbrella and this rather inflexible and monolithic modernism made in some literary journalism, because I don’t see it as modernist in the same way that the ‘men of 1914’ are modernists. Although they might have one thing in common.

will-self-1420801432

Self’s modernism is a selling point serving a rather specific function in today’s literary marketplace. Self’s modernism builds upon his persona as a surly performer on television news-panel shows and newspaper columns, going out of his way to discourage people from reading his books by his performative hauteur and dismissive attitude regarding everything. Returning to a praxis of literary art some six decades out of date is the logical conclusion of being Will Self. For Self, being a latter day modernist is to reject the commodification of the literary artwork, and insist upon the right of the author to write something wholly non-commercial. Umbrella therefore carries with it a critique of commodity culture, and the proliferation of screens, which Self also decries regularly, believing it to signal an end to the novel. However, the canard of modernism’s opposition to commodity culture has been overhyped after postmodern novelists made such a point of engaging with the novel as a commodity, and one should remember that modernism was deeply involved in the marketplace of its time; Ezra Pound began using zeitgeist-y words like ‘modern’ and ‘futurity’ to draw Marinetti’s audiences, who were substantially larger than his own when he first came to London. Performative modernism, cultivated for the purchasing attentions of a well-groomed and discerning élite is one of the things that Self gets right regarding his channeling of the genre.

Umbrella also seems to draw on modernism’s sometimes overlooked heritage, as it is at least somewhat to blame for the volume of secondary literature written subsequent to its boom and bust. From even a vague knowledge of these texts we might produce some foundational aspects of modernism; that it is taken to entail a shift in consciousness and human subjectivity, that exposure to slaughter and death on an industrial scale led to an ambivalence regarding technology and a sundering of rigid social hierarchies, an increasing mediation of our reality through mass media, growth of radical political movements such as feminism and socialism, etc. etc. etc. Our responses to these texts are thereby pre-determined; we know what we can expect from a canonical modernist text.

Which is why the modernism of Umbrella seems post-modern. It’s hard to read Audrey’s re-animation in the 1970’s, or Busner’s recollection of the time in 2010, as a meta-commentary on Umbrella’s resuscitation of the genre. The fact that Audrey worked in a munitions factory, as a radical socialist and feminist, that one of her brothers, Stanley, went to fight in the war, while her other brother, Albert, Pynchon-like, became an arms manufacturer selling weapons which fuelled the conflict, that in her comatose state she rehearses the actions of her time at the lathe, seems to have been dictated by our relationship to modernism in our contemporary setting. In the novel’s closing stages, Audrey’s status as a symbol of technology’s encroachment into our subjectivity is made overt:

The final words Audrey Death had spoken before relapsing into a merciful swoon were a string of nonsensical fractions — eighteen over four-point-two, ninety-four over fourteen-point-seven, sixty-six-point-three over thirty-three…that, even as he accepted the futility of the exercise, Busner had tried to fit into some conceptual framework. Were they, perhaps, the numerical analogue of her brain-chemistry’s intro-conversions between the discrete and the continuous, the quantifiable and the relativistic?

The irony here is that the paragraph in which Self is telling you exactly what the novel is about, features a character attempting to make sense of a random string of numbers. This is far from what the book is, a novel which has been compulsively over-determined in any number of columns, interviews and lectures which, taken collectively, probably come to a length equal to the text. While the modernists can be considered guilty of pushing particular interpretations — they often wrote about their own work, in the way that authors often do, by pretending to write objectively on other authors, The Waste Land came with annotations (parodic ones, but annotations nonetheless) — it feels as though Self’s foray into it is too overtly packaged as such. It’s probably my own fault for consuming it as I did, a book has to be sold after all, and no one made me read those six Guardian interviews. I should wrap up by saying that this novel is very good, and that you should read it, and, in true modernist style, ‘the rest is noise’.

Will Self reads Jorge Luis Borges’ ‘On Exactitude in Science’

13323-will_self_reading_-1660-editNovelist Will Self reading author Jorge Luis Borges’ short, short, tiny small short story, ‘On Exactitude in Science’

http://www.theguardian.com/books/audio/2013/jan/04/will-self-jorge-luis-borges

Franz Kafka’s ‘Metamorphosis’ and vermin

Folk who know nothing else about the Czech novelist Franz Kafka know that he wrote a short story in which the protagonist, Gregor Samsa, is turned into a cockroach. The irony is that this is more a function of illustrations of the novella than it is derived from the text. In the below talk on translating Kafka from the London Review of Books, (featuring Will Self, translators Anthea Bell, Joyce Crick, Karen Seago and Amanda Hopkinson) one topic of conversation is the fact that this boils down to a mistranslation of the first sentence of The Metamorphosisfrom the original German.

The word ungeziefer is more accurately traduced as ‘vermin,’ ‘pest’ or ‘insect,’ than ‘cockroach.’ Although trying to get a firm grasp on what kind of insect Gregor Samsa has become, not to denigrate Vladimir Nabokov’s efforts, is irrelevant. Kafka specifically directed his publisher to not provide any illustrations of Gregor post-metamorphosis: “The insect itself is not to be drawn. It is not even to be seen from a distance.” Retaining indeterminacy = the order of the day.

Some anatomical features that are reported – lots of adhesive legs, a sensitive head, a hard back, suggestive of a thorax or carapace, indicates that has become insectoid, but exactly what kind is never said directly. The chambermaid addressing Samsa, almost fondly, as a ‘dung-beetle’ shouldn’t be trusted; it is more suggestive of her idiosyncrasies than suggestive of Samsa’s new form.

‘Vermin’ would seem to be a far more resonant translation in this case, as I would argue that it conveys another layer of meaning, beyond the surface monstrosity of Samsa’s condition. ‘Vermin’ amounts also to a subtle condemnation of his environment. In the same way that the word ‘weed’ refers to a plant growing where it is unwelcome, a ‘vermin’ is an animal in an environment where one judges it to be intruding. ‘Vermin,’ could just as easily refer to rats, foxes or even dogs.

The word ‘vermin’ in the first sentence therefore anticipates the Samsa family’s attitude to Gregor, as they becoming increasingly unwilling to share their household with him, no matter how certain they are that the insect is their son, though how they come to believe this is never outlined. The Samsas take action to euthanize him, ironically, after Gregor is tempted out of his room by the sounds of his sister playing the violin, appealing to his inner, very human, self. In some ways this is surprising; as the story continues, our sense of Samsa’s interiority recedes. Although on the other hand, he has been kept in a room for a few months and has been reduced to overhearing his family’s conversations in the other room. One shouldn’t necessarily blame him for withdrawing into himself, away from the reader’s vantage point.

In another sense, Gregor’s loss of personality traits or human characteristics has also begun long before the narrative proper begins. On awaking to this new state of affairs, he seems utterly unperturbed, regarding it as a mere inconvenience, and is far more troubled by the time that the train he intends to catch to work leaves the station, how much time he must allow himself if he is to catch it etc, rather than finding himself no longer in his own body. The tone of commonsense pragmatism with which he attempts to placate his freaking out family members, (he has lost the ability to speak) is one of the most horrifying aspects of the text, and points to how alienated Gregor is from himself:

“Do you want to let me set out, do you? You see Chief Clerk, you see, I’m not stubborn, I like my work; the travel is arduous but I couldn’t live without it.”

Perhaps the most obvious reason for this is the menial nature of Gregor’s occupation as a travelling salesman; based on how quickly his supervisor turns up at his home to reprimand him, one might conceive this text as a fabulist critique of the dehumanising nature of modern work.

One should also remember also that the word ‘vermin,’ and words like it, were used in the Nuremberg rallies, and the belief that the Jewish people were unwelcome within the lebensraum in the same way that Samsa is in his own home, was to have disastrous consequences in the decades following Kafka’s death.