Founder and Chief Scientist Jason Baldridge – “The Professor” here at People Pattern – recently took a trip to the Big Apple to give a tutorial at the Sentiment Analysis Symposium. The tutorial covered many aspects of practical sentiment analysis and related topics like geolocation and authorship attribution. In preparing the material for the tutorial, Baldridge drew from his research at the University of Texas at Austin and also from his experience developing enterprise software systems at People Pattern and Converseon.
Automated sentiment analysis has been around for well over a decade, and many vendors provide solutions for it. Baldridge’s tutorial covered basic methods, such as using word lists to predict whether a given text is positive or negative — these methods famously fail in the context of words like “not”, for example when one says “this is not a great book”. Much work has focused instead on learning from data, where humans have annotated many texts (often movie reviews or tweets) as to whether they express a positive, negative or neutral sentiment; these can be used as input to machine learning methods that learn to discriminate between these different classes. However, to simply classify a text in its entirety is not enough: one must look to aspect-based sentiment analysis to get a more meaningful outcome. As an example, “this car sure looks great but its handling is so terrible that I cannot consider buying it” — here the speaker expresses a positive sentiment toward the styling aspect of the vehicle but a negative one toward the way it drives. For many applications, such as sentiment analysis for product reviews, this finer detail is essential.
Baldridge also covered evaluation for sentiment analysis. Evaluation can be done at many different levels, from simply overall accuracy, to detailed to precision and recall breakdowns per label, to obtaining good aggregate measures of overall sentiment toward a brand or product. One of the main themes of this portion of the tutorial is that humans themselves don’t agree with each other all the time on sentiment analysis tasks — in fact, they often disagree, especially with respect to decisions about whether a given text is positive vs neutral or negative vs neutral. And, you can get very different accuracy measures by using different methodologies and measuring different aspects of the task.
The tutorial then moved beyond sentiment analysis into other kinds of tasks, such as predicting demographic and psychographic values for social profiles. People use language to communicate not just ideas, but also to selectively reveal their internal mental and psychological life. By expressing opinions about ideas, people, and products, a person conveys secondary information about their personality, their values, and their background.
For example, consider the problem of predicting whether a person named Leslie is a man or a woman — there are plenty of men and women with that name, but we can also take into account any text written by a person named “Leslie”. It turns out that there are some rather obvious differences between men and women in terms of their word choice — for example, men say “truck” and “beer” at higher rates than women, and women use “baby” and “purse” more often. But there are subtler patterns as well, as discussed in James Pennebaker’s book The Secret Life of Pronouns: men use “the” slightly more frequently and women use “because” and auxiliary verbs (“will”, “might”, etc.) slightly more frequently. And it goes further: social psychologists have looked into other topics such as how language reveals subtle correlations with deception and depression.
Similarly, it is possible to infer locations that a particular text refers to. There are obvious cases, such as “I’m in Austin, Texas!”, but we can also make inferences about a person being in Austin if they are talking about topics like breakfast tacos, bbq, live music, and SXSW. Baldridge discussed his research at the University of Texas at Austin that works on this problem as a text classification task, predicting locations for Wikipedia pages and Twitter accounts using only linguistic cues such as these.
Practical sentiment analysis explores ways that we understand others by gleaning information from general discussions, expressions and other sub-conscious habits. In Baldridge’s tutorial, you will have a chance to explore concepts of sentiment analysis and opinion mining from unstructured text. In other words, how do we infer attributes of other people based on what they say in informal settings?
Given the great volume of text created and readily accessible online, tremendous value can be derived from these many levels of analysis, especially for marketers, political campaigns, and the like. Science based marketing and data driven strategies are the future for successful brand/audience communication. Take a look at what The Professor has to say- Baldridge’s tutorial will take you below the surface of opinion mining to explore the details and uses of practical sentiment analysis.
Recent Comments