«

Apr 02

Sentiment Analysis – Further Down the ‘R’abbit Hole

“Curiouser and curiouser!” Cried Alice (she was so much surprised, that for the moment she quite forgot how to speak good English).” – Lewis Carroll, Alice’s Adventures in Wonderland

It seems rather strange to think that, just under eight months ago, I had not written any computer code (I’m not including little bits of BASIC from the ’80s), and yet lines of code or the blinking cursor of Terminal no longer instil a sense of rising panic. Although programming has a very steep learning curve, it is relatively easy to gain a basic understanding, and, with this, the confidence to experiment.

R has rapidly become my favourite programming language, so I was interested to follow a link from Scott Weingart’s blog post ‘Not Enough Perspectives Pt. 1’ to Matthew Jockers’ new R package ‘Syuzhet’. As this is an area I hope to research as part of my PhD I decided to give it a try, using the ‘Introduction to the Syuzhet Package‘ (Jockers, 2015) as a guide. I used a short text from Jane Austen’s juvenilia – ‘LETTER the FOURTH From a YOUNG LADY rather impertinent to her friend’. I removed the speech marks from the text as this causes problems with the code.

The code:

# Experiment based on http://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html
# ‘Introduction to the Syuzhet Package’ Jockers 20-2-2015

# Having installed the syuzhet package from CRAN, access it using library:
library(syuzhet)

# Input a text, for longer texts use get_text_as_string()
# This text is from Austen’s Juvenillia from Project Gutenberg
example_text <- “We dined yesterday with Mr Evelyn where we were introduced to a very
agreable looking Girl his Cousin…[I haven’t included the whole text I used – it can be viewed here]
This was an answer I did not expect – I was quite silenced, and never felt so awkward in my Life – -.”

# Use get_sentences to create a character vector of sentences
s_v <- get_sentences(example_text)

# Check that all is well!
class(s_v)
str(s_v)
head(s_v)

# Use get_sentiment to assess the sentiment of each sentence. This function
# takes the character vector and one of four possible extraction methods
sentiment_vector <- get_sentiment(s_v, method = “bing”)
sentiment_vector

# The different methods give slightly different results – same text different method
afinn_s_v <- get_sentiment(s_v, method = “afinn”)
afinn_s_v

# An estimate of the “overall emotional valence” of the passage or text
sum(sentiment_vector)

# To calculate “the central tendency, the mean emotional valence”
mean(sentiment_vector)

# A summary of the emotions in the text
summary(sentiment_vector)

# To visualise this using a line plot
plot(sentiment_vector, type = “l”, main = “Plot Trajectory ‘LETTER the FOURTH From a YOUNG LADY'”,
xlab = “Narrative Time”, ylab = “Emotional Valence”)
abline(h = 0, col = “red”)

# To extract the sentence with the most negative emotional valence
negative <- s_v[which.min(sentiment_vector)] negative

# and to extract the most positive sentence
positive <- s_v[which.max(sentiment_vector)] positive

# Use get_nrc_sentiment to categorize each sentence by eight emotions and two
# sentiments and returns a data frame
nrc_data <- get_nrc_sentiment(s_v)

# To subset the ‘sad’ sentences
sad_items <- which(nrc_data$sadness > 0)
s_v[sad_items]

# To view the emotions as a barplot
barplot(sort(colSums(prop.table(nrc_data[, 1:8]))), horiz = T, cex.names = 0.7,
las = 1, main = “Emotions in ‘Letter the Fourth'”, xlab = “Percentage”,
col = 1:8)

The Results:

The sentiment vector – this assigns a value to each sentence in the text.

[1] 0 2 1 -1 -1 0 0 0 0 0 0 2 0 0 -1 0 1 -2 0 0 1 1 4 0 0 -2 -1
[28] -3 0 -2 1 0 0 2 1 -2 1 -1 0 -1 0 -1 0 -1

Visualising the sentiment vector as a line graph shows the fluctuations within the text:

Screen Shot 2015-04-01 at 19.22.55

Visualising the emotions within the text:

Screen Shot 2015-04-01 at 19.28.18

 

My Thoughts

I have only started to explore this package and have applied it to a very short passage (44 sentences), while this shows what Syuzhet can do in a general way, it does not demonstrate its full capabilities. In addition, as I haven’t fully read up on the package and the thinking behind it, my analysis may well be plagued with errors.

However, these are my thoughts so far. Running a brief trial, using three of the methods available, highlights some of the difficulties of sentiment analysis, while all three identified the same sentence as the most ‘negative’:

[1] “I dare say not Ma’am, and have no doubt but that any\nsufferings you may have experienced could arise only from the cruelties\nof Relations or the Errors of Freinds.”

each of the methods identified a different sentence as being the most ‘positive’:

bing – [1] “Perfect Felicity is not the property of Mortals, and no one has a right\nto expect uninterrupted Happiness.”

afinn – [1] “I was extremely pleased with her\nappearance, for added to the charms of an engaging face, her manner and\nvoice had something peculiarly interesting in them.”

nrc – [1] “I recovered myself however in a few moments and\nlooking at her with all the affection I could, My dear Miss Grenville\nsaid I, you appear extremely young – and may probably stand in need of\nsome one’s advice whose regard for you, joined to superior Age, perhaps\nsuperior Judgement might authorise her to give it.”

This is something Jockers discusses further in his blog post ‘My Sentiments (Exactly?)‘, highlighting that sentiment analysis is difficult for humans, as well as machines:

This human coding business is nuanced.  Some sentences are tricky.  But it’s not the sarcasm or the irony or the metaphor that is tricky. The really hard sentences are the ones that are equal parts positive and negative sentiment.Matthew Jockers

However, he also points out:

One thing I learned was that tricky sentences, such as the one above, are usually surrounded by other sentences that are less tricky.Matthew Jockers

It seems that a combination of close and ‘distant’ reading, what Mueller calls ‘scaled reading’, is likely to be of most use if analysis at the sentence level is desired. Having only a relatively recent and limited experience of programming in R, I have found using the Syuzhet package very straightforward and am looking forward to using it again very soon.

 

UPDATE: 3rd April 2015

There is a great deal of academic discussion surrounding the methods discussed here. As I read further I will add another post exploring the core points and including a reading list.

6,152 total views, 1 views today

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>