Opinion Mining on Figures Comments in Chinese

Juan Li (Signal and information processing)

Directed by Quan Zhang



Opinion Mining is a new topic in Natural Language Processing, it is also a hotspot problem in recent years. The target of Opinion Mining is to extract evaluation information (called opinion) from subjective text automatically. Opinion Mining may have great influence in the Electronic Business, Public Opinion survey and other social life, thus it is a valuable researching field.

This dissertation carried out the research work in two layers: word and sentence. The aim of this work is: to extract opinion information from figure comment sentences automatically. As to word, we use dictionary and statistical method to recognize the words and to determine their orientation. And as to sentence, we adopt two methods to mine the opinion from the sentences, one is template-based method, and the other is the sentence-category-analysis-based method.

The main work of this dissertation is shown below:

1. To implement a system for the orientation determination of Chinese words based on polarity dictionary, synonyms dictionary and bi-gram. The method uses polarity dictionary to determine the orientation for those single- orientation words, and uses synonyms dictionary combining with bi-gram to determine those multiple-orientation words. The system gets a precision of more than 81%.

2. To implement a sentence opinion mining system using template-based method. This method extracts opinion templates from the training corpus, and then uses these templates to extract the opinion elements. The method gets a precision of 75.3%.

3. To implement a sentence opinion mining system based on sentence category analysis. We summarize the orientation rules for sentence categories, which find out the semantic chunks containing the opinion elements firstly and then locate the opinion elements by templates. The method gets a precision of 86.57%.

4. To establish resource for opinion mining which is suitable to figure evaluating, including polarity dictionary, synonyms dictionary, people-feature collections, etc. We eatablished the polarity dictionary(6572 items) by converging some existing dictionaries(7167 items) and HowNet polar words collection(6846 items), then selecting those applies to people. We also collected the synonyms dictionary and people-feature collections.

The result of this dissertation can be applied to online public opinion tracking system by providing macroscopic orientation of the comments on some figures. It also lays a foundation for other opinion extraction applications. Further research about article opinion mining can be carried out based on this work.



Keywords: Opinion Mining; Opinion Extraction; Orientation analyzing; Sentence category analysis; Template-Based