jump to navigation

Sentiment Analysis & the English Language 12 November 2010

Posted by Oliver Mason in Sentiment Analysis.
1 comment so far

Currently I am working on Sentiment Analysis, so I will probably post a series of smaller posts on issues that I come across. Today I was looking at YouGov’s website and public opinions of Nick Clegg, the social anthropologist currently serving as deputy prime minister. Here is a snapshot of the current opinions at the time, and you can see that most of them are classed as negative:

Most? I would say all… but apparently whatever system YouGov use for sentiment analysis cannot cope with idioms. And shooting yourself in the foot is not exactly a tricky one to identify I should think.

But this raises a more complex issue: there are many ways to express opinions, attitudes, judgements, etc in language. This is a much larger problem than counting the number of ‘positive’ and ‘negative’ words in a text. To begin with, words in isolation rarely have a meaning; opinions are usually subjective; and then there’s irony and sarcasm.

Yes, Clegg did really well when he supported the Tories on tuition fees…

Continuing on this theme, here’s another issue: the assumption that the scope of a sentiment is the whole text. Here’s an opinion (positive) from the same site about student protests:

I completely support the right to protest; however, violence is unreasonable.

This is somewhat positive, supporting the students, but in the second clause there is an additional judgment condemning the violent incidents that happened at the demonstration. This seems to suggest that the proper carrier of attitude should be the clause, rather than the sentence, let alone the text. Not everything is just black and white.

Replacing a stack with concurrency 23 April 2008

Posted by Oliver Mason in erlang, NLP, programming.
4 comments

For some language processing task I needed a reasonably powerful parser (a program to identify the syntactic structure of a sentence). So I dug out my copy of Winograd (1983) (“Language as a Cognitive Process”) and set about implementing an Augmented Transition Network parser in Erlang.

Now, the first thing you learn about natural language is that it is full of ambiguities, and so there will always be several alternatives available, several possible paths through the network which defines the grammar. The traditional solution is to dump all the alternatives on a stack, and look at them when the current path has been finished with. You can either go depth-first, where you complete the current path before you get the next one off the stack, or breadth-first, where you advance all paths by one step at a time, kind of pseudo-parallel.

Having to deal with a stack is tedious, as you need to keep track of the current configuration: which network are you at, what node, what position in the sentence, etc. But then, it occurred to me, there’s an easier way to do it (at least it’s easier in Erlang!): every time you come to a point where you have multiple alternatives, you spawn a new process and pursue all of them in parallel.

The only overhead you need is a loop which keeps track of all the processes currently running. This loop receives the results of successful paths, and gets notified of unsuccessful ones (where the process terminates without having found a valid structure). No need for a stack, and hopefully very efficient processing on multi-core machines as a free side-effect.

I’m still amazed how easy it was to implement. I wouldn’t have fancied doing that in Java or even C. For my test sentences I had about 8 to 10 processes running in parallel most of the time, but it depends on the size of the grammar and the length of the sentence really. What I liked about this was that it seemed the natural way to do in Erlang, where working with processes is just so easy.

And also, another nail in the coffin for the claim that you can’t use Erlang for handling texts easily!