Sentiment analysis (or opinion mining) is defined as the task of finding the opinions of authors about specific entities. The decision-making process of people is affected by the opinions formed by thought leaders and ordinary people. When a person wants to buy a product online he or she will typically start by searching for reviews and opinions written by other people on the various offerings. Sentiment analysis is one of the hottest research areas in computer science. Over 7,000 articles have been written on the topic. Hundreds of startups are developing sentiment analysis solutions and major statistical packages such as SAS and SPSS include dedicated sentiment analysis modules. There is a huge explosion today of 'sentiments' available from social media including Twitter, Facebook, message boards, blogs, and user forums. These snippets of text are a gold mine for companies and individuals that want to monitor their reputation and get timely feedback about their products and actions. Sentiment analysis offers these organizations the ability to monitor the different social media sites in real time and act accordingly. Marketing managers, PR firms, campaign managers, politicians, and even equity investors and online shoppers are the direct beneficiaries of sentiment analysis technology.
It is common to classify sentences into two principal classes with regard to subjectivity: objective sentences that contain factual information and subjective sentences that contain explicit opinions, beliefs, and views about specific entities. Here, I mostly focus on analyzing subjective sentences. However, I refer to the usage of objective sentences when describing a sentiment application for stock picking.
As an example, here is a review about a hotel in Manhattan.
"The king suite was spacious, clean, and well appointed. The reception staff, bellmen, and housekeeping were very helpful. Requests for extras from the maid were always provided. The heating and air conditioning functioned well; this was good as the weather was variable. The sofa bed was the best I've ever experienced. The king size bed was very comfortable. The building and rooms are very well soundproofed. The neighborhood is the best for shopping, restaurants, and access to subway. Only "complaint" has to do with high-speed Internet access. It's only available on floors 812."
Overall the review is very positive about the hotel. It refers to many different aspects of the hotel including: heating, air conditioning, staff courtesy, bed, neighborhood, and Internet access. Sentiment analysis systems must be able to provide a sentiment score for the whole review as well as analyze the sentiment of each individual aspect of the hotel.
I present the main research problems related to sentiment analysis and some of the techniques used to solve them, then review some of the major application areas where sentiment analysis is being used today. I conclude with some of the open research problems in this field. Due to limited space, I am not be able to cover the whole range of problems and techniques; but refer the reader to some of the extensive reviews written on this topic.20,21,27
In this review, I will focus on five specific problems within the field of sentiment analysis:
Before explaining each of these problems in detail, let's review a general architecture of a generic sentiment analysis system. The architecture is shown in Figure 1.
The input to the system is a corpus of documents in any format (PDF, HTML, XML, Word, among others). The documents in this corpus are converted to text and are pre-processed using a variety of linguistic tools such as stemming, tokenization, part of speech tagging, entity extraction, and relation extraction. The system may also utilize a set of lexicons and linguistic resources. The main component of the system is the document analysis module, which utilizes the linguistic resources to annotate the pre-processed documents with sentiment annotations. The annotations may be attached to whole documents (for document-based sentiment), to individual sentences (for sentence-based sentiment) or to specific aspects of entities (for aspect-based sentiment). These annotations are the output of the system and they may be presented to the user using a variety of visualization tools.
This is the simplest form of sentiment analysis and it is assumed that the document contains an opinion on one main object expressed by the author of the document. Numerous papers have been written on this topic. There are two main approaches to document-level sentiment analysis: supervised learning and unsupervised learning.
The supervised approach assumes that there is a finite set of classes into which the document should be classified and training data is available for each class. The simplest case is when there are two classes: positive and negative. Simple extensions can also add a neutral class or have some discrete numeric scale into which the document should be placed (like the five-star system used by Amazon). Given the training data, the system learns a classification model by using one of the common classification algorithms such as SVM, Naïve Bayes, Logistic Regression, or KNN. This classification is then used to tag new documents into their various sentiment classes. When a numeric value (in some finite range) is to be assigned to the document then regression can be used to predict the value to be assigned to the document (for example, in the Amazon five-star ranking system). Research28 has shown that good accuracy is achieved even when each document is represented as a simple bag of words. More advanced representations utilize TFIDF, POS (Part of Speech) information, sentiment lexicons, and parse structures.
Unsupervised approaches to document-level sentiment analysis are based on determining the semantic orientation (SO) of specific phrases within the document. If the average SO of these phrases is above some predefined threshold the document is classified as positive and otherwise it is deemed negative. There are two main approaches to the selection of the phrases: a set of predefined POS patterns can be used to select these phrases36 or a lexicon of sentiment words and phrases can be used.34 A classic method to determine the SO of a given word or phrase is to calculate the difference between the PMI (Pointwise Mutual Information) of the phrase with two sentiment words.36 PMI(P,W) measures the statistical dependence between the phrase P and the word W based on their co-occurrence in a given corpus or over the Web (by utilizing Web search queries). The two words used in Turney36 are 'excellent' and 'poor.' The SO measures whether P is closer in meaning to the positive word ('excellent') or the negative word ('poor').
A few researchers1,37 have used machine translation to perform document-level sentiment analysis in languages such as Chinese and Spanish that lack the vast linguistic resources available in English. (Their method works by translating the documents to English and then performing sentiment analysis on these documents using a sentiment analyzer in English.
A single document may contain multiple opinions even about the same entities. When we want to have a more fine-grained view of the different opinions expressed in the document about the entities we must move to the sentence level.
We assume here that we know the identity of the entity discussed in the sentence. We further assume there is a single opinion in each sentence. This assumption can be relaxed by splitting the sentence into phrases where each phrase contains just one opinion. Before analyzing the polarity of the sentences we must determine if the sentences are subjective or objective. Only subjective sentences will then be further analyzed. (Some approaches also analyze objective sentences, which are more difficult.) Most methods use supervised approaches to classify the sentences into the two classes.40 A bootstrapping approach was suggested in Hai32 in order to reduce the amount of manual labor needed when preparing a large training corpus. A unique approach based on the minimum cuts was proposed in Pang and Lee.26 The main premise of their approach is that neighboring sentences should have the same subjectivity classification.
After we have zoned in on the subjective sentences we can classify these sentences into positive or negative classes. As mentioned earlier, most approaches to sentence-level sentiment analysis are either based on supervised learning17 or on unsupervised learning.40 The latter approach is similar in nature to that of Turney,36 except that it uses a modified log-likelihood ratio instead of PMI and the number of seed words that are used to find the SO of the words in the sentence is much larger.
Recent research24 has shown that it is advisable to handle different types of sentences by different strategies. Sentences that need unique strategies include conditional sentences, question sentences and sarcastic sentences. Sarcasm is extremely difficult to detect and it exists mainly in political contexts. One solution for identifying sarcastic sentences is described in Tsur et al.35
The two previous approaches work well when either the whole document or each individual sentence refers to a single entity. However, in many cases people talk about entities that have many aspects (attributes) and they have a different opinion about each of the aspects. This often happens in reviews about products or in discussion forums dedicated to specific product categories (such as cars, cameras, smartphones, and even pharmaceutical drugs). As an example here is a review of Kindle Fire taken from the Amazon website:
"As a long-time Kindle fan I was eager to get my hands on a Fire. There are some great aspects; the device is quick and for the most part dead-simple to use. The screen is fantastic with good brightness and excellent color, and a very wide viewing angle. But there are some downsides too; the small bezel size makes holding it without inadvertent page-turns difficult, the lack of buttons makes controls harder, the accessible storage memory is limited to just 5GB."
Classifying this review as either positive or negative toward the Kindle would totally miss the valuable information encapsulated in it. The author provides feedback about many aspects of the Kindle (like speed, ease of use, screen quality, bezel size, buttons, and storage memory size). Some of these aspects are reviewed positively while some of the others get a negative sentiment.
Aspect-based sentiment analysis (also called feature-based sentiment analysis) is the research problem that focuses on the recognition of all sentiment expressions within a given document and the aspects to which they refer.
The classic approach, which is used by many commercial companies, to the identification of all aspects in a corpus of product reviews is to extract all noun phrases (NPs) and then keep just the NPs whose frequency is above some experimentally determined threshold.12 One approach is to reduce the noise in the found NPs.30 The main idea is to measure for each candidate NP the PMI with phrases that are tightly related to the product category (like phones, printers, or cameras). Only those NPs that have a PMI above a learned threshold are retained. For instance, for the printer category such phrases, for example, would be "printer comes with" or "printer has."
Aspect-based sentiment analysis is the research problem that focuses on the recognition of all sentiment expressions within a given document and the aspects to which they refer.
Another approach to aspect identification is to use a phrase dependency parser that utilizes known sentiment expressions to find additional aspects (even infrequent ones).39
We can also view the problem of aspect identification as an information extraction problem and then use a tagged corpus to train a sequence classifier such as a Conditional Random Field (CRF)18 to find the aspects.14
I have just discussed identification of explicit aspects, that is, aspects that are mentioned explicitly in the sentences. However, there are many aspects that are not mentioned explicitly in the sentences and can be inferred from the sentiment expressions that mention them implicitly. These aspects are called implicit aspects. Examples of such aspects are weight, which can be inferred from the fragment "this phone is too heavy," or size, which can be inferred from "the camera is quite compact." One way to extract such implicit aspects is suggested in Liu10 where a two-phase co-occurrence association rule mining approach is used to match implicit aspects (sentiment expressions) with explicit aspects.
With these two sets we can use a simple algorithm2 that determines the polarity of each sentiment expression based on a sentiment lexicon, sentiment shifters (such as negation words), and special handling of adversative conjunctions, such as 'but.' The final polarity of each aspect is determined by a weighted average of the polarities of all sentiment expressions inversely weighted by the distance between the aspect and the sentiment expression.
In many cases users do not provide a direct opinion about one product but instead provide comparable opinions such as in these sentences taken from the user forums of Edmonds.com: "300 C Touring looks so much better than the Magnum," "I drove the Honda Civic, it does not handle better than the TSX, not even close." The goal of the sentiment analysis system in this case is to identify the sentences that contain comparative opinions, and to extract the preferred entity(-ies) in each opinion.
One of the pioneering papers on comparative sentiment analysis is Jindal and Liu.15 This paper found that using a relatively small number of words we can cover 98% of all comparative opinions. These words are:
Since these words lead to a very high recall, but low precision, a naïve Bayes classifier was used to filter out sentences that do not contain comparative opinions. The classifier used sequential patterns as features. The sequential patterns were discovered by the class sequential rule (CSR) mining algorithm. A simple algorithm to identify the preferred entities based on the type of comparative used and the presence of negation is described in Ding et al.3
As we have seen in the previous discussion, the sentiment lexicon is the most crucial resource for most sentiment analysis algorithms. Here, I briefly mention a few approaches for the acquisition of the lexicon. There are three options for acquiring the sentiment lexicon: manual approaches in which people code the lexicon by hand, dictionary-based approaches in which a set of seed words is expanded by utilizing resources like WordNet,8 and corpus-based approaches in which a set of seed words is expanded by using a large corpus of documents from a single domain.
Clearly, the manual approach is in general not feasible as each domain requires its own lexicon and such a laborious effort is prohibitive. I will focus on the other two approaches. The dictionary-based approach starts with a small set of seed sentiment words suitable for the domain at hand. This set of words is then expanded by using Word Net's synonyms and antonyms. One of the elegant algorithms is proposed in Kamp et al.16 The method defines distance d(t1, t2) between terms t1 and t2 as the length of the shortest path between t1 and t2 in WordNet. The orientation of t is defined as SO(t) = (d(t, bad) d(t, good))/d(good, bad). |SO(t)| is the strength of the sentiment of t, SO(t) > 0 entails t is positive, and t is negative otherwise. The main disadvantage of any dictionary-based algorithm is that the acquired lexicon is domain independent and hence does not capture the specific peculiarities of any specific domain. More advanced dictionary-based approaches are reported in Dragut et al.4 and Peng and Park.29
The sentiment lexicon is the most crucial resource for most sentiment analysis algorithms.
If we want to create a domain-specific sentiment lexicon we have to use one of the many corpus-based algorithms. A classic work11 in this area introduced the concept of sentiment consistency that enables one to identify additional adjectives that have a consistent polarity as a set of seed adjectives. A set of linguistic connectors (AND, OR, NEITHER-NOR, EITHER-OR) was used to find adjectives that are connected to adjectives with known polarity. Consider the sentence "the phone is both powerful and light." If we know that 'powerful' is a positive word, we can assume that by utilizing the connector AND the word 'light' is positive as well. In order to eliminate noise the algorithm created a graph of adjectives by using connections induced by the corpus and after a clustering step, positive and negative clusters are formed.
An approach called double propagation for simultaneous acquisition of a domain-specific sentiment lexicon and a set of aspects was introduced in Qiu et al.31 This approach used the minipar19 parser to parse the sentences in the corpus and find associated aspects and sentiment expressions. The algorithm starts with a seed set of sentiment expressions and uses a set of predefined dependency rules and the minipar parser to find aspects that are connected to the sentiment expressions. It then uses the found aspects to find more sentiment expressions that in turn find more aspects. This mutual bootstrapping process stops when no more aspects or sentiment expressions can be added. For example, in "Kindle Fire has an amazing display," the adjective 'amazing' modifies the noun 'display,' so given that 'amazing' is a sentiment expression and we have the rule "a noun which is modified by a sentiment expression is an aspect," we can extract 'display' as an aspect. Conversely, if we know 'display' is an aspect, then using a similar rule we can infer that 'amazing' is a sentiment expression. The algorithm uses several additional constraints to reduce the effect of noise.
Migrating a sentiment lexicon from one domain to another domain was studied in Du et al.5 An algorithm for acquiring a slightly different type of lexicon called a connotation lexicon is reported in Feng et al.9 A connotation lexicon contains words that express sentiment either explicitly or implicitly. For instance, award and promotion have positive connotations and cancer and war have negative connotations.
The most common application of sentiment analysis is in the area of reviews of consumer products and services. There are many websites that provide automated summaries of reviews about products and about their specific aspects. A notable example of that is "Google Product Search."
Twitter and Facebook are a focal point of many sentiment analysis applications. The most common application is monitoring the reputation of a specific brand on Twitter and/or Facebook. One application that performs real-time analysis of tweets that contain a given term is tweetfeel (http://www.tweetfeel.com).
Sentiment analysis can provide substantial value to candidates running for various positions. It enables campaign managers to track how voters feel about different issues and how they relate to the speeches and actions of the candidates. An analysis of tweets related to the 2010 campaign can be found at http://www.nytimes.com/interactive/us/politics/2010-twitter-candidates.html.
Another important domain for sentiment analysis is the financial markets. There are numerous news items, articles, blogs, and tweets about each public company. A sentiment analysis system can use these various sources to find articles that discuss the companies and aggregate the sentiment about them as a single score that can be used by an automated trading system. One such system is The Stock Sonar (http://www.thestocksonar.com).7 This system (developed by Digital Trowel) shows graphically the daily positive and negative sentiment about each stock alongside the graph of the price of the stock. An example of such a graph is shown in Figure 2. The sentiment for CHK is extremely negative and indeed the stock went down considerably between April 21, 2012 and May 22, 2012. The graph is interactive, so a click on any point will reveal the events and sentiment expressions behind the various increases in positive or negative sentiment, as shown in Figure 3.
StockTwits (http://www.stocktwits.com) is a site that shows all tweets that contain at least one stock ticker in them (A '$' sign must be before the ticker of the stock to signal it is a ticker). The following are three tweets about Google (Ticker: GOOG) from Sunday, July 29, 2012.
Detecting sentiment on the first tweet will be done by utilizing comparative sentiment analysis techniques. We will conclude that the writer is positive on PriceLine (PCLN) and Apple (AAPL) and negative on Google. Analyzing the second tweet will reveal a negative sentiment on Google (shorting opportunity). Since Google closed on Friday, July 27, 2012 at $634.96, the author predicts a down movement of 1.57% to $525. Clearly, we need to be able to get historical prices of stocks to do proper analysis of the tweets. The third and last tweet is the most difficult to analyze since it requires background knowledge not available inside the tweet. We need to know that Kinect is a product of Microsoft (MSFT) and hence the author has a positive opinion on MSFT and a negative opinion on Google (by utilizing the sentiment shifter "NOT"). These examples show some of the challenges facing sentiment analysis systems when trying to analyze short messages that include reference to additional objects (products and stock prices in this case). The systems must utilize background knowledge in order to determine the relationship between the sentiment targets and the other objects.
An application that utilizes comparative sentiment analysis to assess the market structure of sedan cars and drugs for diabetes is described in Netzer et al.25 In Figure 4 we can see a visual map that shows the various connections between drugs and symptoms. Two types of connections are extracted by the sentiment analysis system: Drug Causes Symptom (negative, shown in red) and Drug Remedies Symptom (positive, shown in blue).
There are many open research issues in sentiment analysis, including:
The following resources contain sentiment lexicons that can be used within sentiment analysis systems:
This article reviewed some of the main research problems within the field of sentiment analysis and discussed several algorithms that aim to solve each of these problems. I have also described some of the major applications of sentiment analysis and provided a few major open challenges. Many of the commercial sentiment analysis systems still use simplistic techniques in order to avoid these open challenges and hence their performance leaves a lot to be desired. Providing satisfactory solutions to these challenges will make the area of sentiment analysis far more widespread.
I thank Lyle Ungar, Bing Liu, Benjamin Rosenfeld, and Roy Bar-Haim for helpful comments on drafts of this article.
3. Ding, X., Liu, B. and Zhang, L. Entity discovery and assignment for opinion mining applications. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009).
5. Du, W., Tan, S., Cheng, X. and Yun, X. Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon. In Proceedings of ACM International Conference on Web Search and Data Mining (2010).
6. Esuli, A. and Sebastiani, F. Determining term subjectivity and term orientation for opinion mining. In Proceedings of Conf. of the European Chapter of the Association for Computational Linguistics (2006).
9. Feng, S., Bose, R. and Choi, Y. Learning general connotation of words using graph-based algorithms. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (Edinburgh, Scotland, UK, 2011). 10921103.
14. Jakob, N. and Gurevych, I. Extracting opinion targets in a single-and cross-domain setting with conditional random fields. In Proceedings of Conference on Empirical Methods in Natural Language Processing (2010).
17. Kim, S.-M. and Hovy, E. Crystal: Analyzing predictive opinions on the Web. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (2007).
18. Lafferty, J., McCallum, A. and Pereira, F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th International Conf. on Machine Learning. Morgan Kaufmann, San Francisco, CA, 2001, 282289.
19. Lin, D. Minipar; http://webdocs.cs.ualberta.ca/lindek/minipar.htm. 2007.
23. Mohammad, S.M. and Turney, P.D. Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (2010).
24. Narayanan, R., Liu, B. and Choudhary, A. Sentiment analysis of conditional sentences. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (Singapore, 2009). Association for Computational Linguistics, 180189.
26. Pang, B. and Lee, L. A Sentimental Education: Sentiment Analysis using Subjectivity Summarization based on minimum cuts. In Proceedings of the Association for Computational Linguistics (2004), 271278.
28. Pang, B., Lee, L. and Vaithyanathan, S. Thumbs up? Sentiment Classification using machine learning techniques. In Proceedings of EMNLP-02, 7th Conference on Empirical Methods in Natural Language Processing (Philadelphia, PA, 2002). Association for Computational Linguistics, Morristown, NJ, 7986.
29. Peng, W. and Park, D.H. Generate adjective sentiment dictionary for social media sentiment analysis using constrained nonnegative matrix factorization. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (2011).
35. Tsur, O., Davidov, D. and Rappoport, A. A great catchy name: semi-supervised recognition of sarcastic sentences in online product reviews. In Fourth International AAAI Conference on Weblogs and Social Media (2010).
37. Wan, X. Using bilingual knowledge and ensemble techniques for unsupervised Chinese sentiment analysis. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (Honolulu, Hawaii, 2008). Association for Computational Linguistics, 553561.
38. Wilson, T., Wiebe, J. and Hoffmann, P. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing (2005), 347354.
40. Yu, H. and Hatzivassiloglou, V. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (2003).
©2013 ACM 0001-0782/13/04
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from email@example.com or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2013 ACM, Inc.