The Recognition and Tagging of Chinese Place Name and Time Expression

Nuo Li (Signal and information processing)

Directed by Quan Zhang


 ABSTRACT

 

Place names and time expressions are two kinds of key information which describe the background of a concrete event. Accurate recognition of place names and time expressions would help to improve the performance of word segmentation, recognition of out of vocabulary words and recognition of named entities. Meanwhile, this work is also the foundation for information retrieval, content extraction, and question and answer system, therefore it is very important. However, recognition and tagging of place names and time expressions are very difficult as their numerous different forms.

This dissertation focuses on designing and implementing a place names and time expressions recognition and tagging system. More attention is paid to digging the context information of place names and time expressions. Firstly, we recognize place names and time expressions by statistic methods and rules. Secondly, we recognize place names and time expressions again by maximum entropy model. While analyzing the feature functions of maximum entropy model, we also make use of the semantic information to improve the result. Finally, we research tagging task of place names and time expression.

The main research work in this dissertation is listed following:

1. To implement Chinese place name recognition system. We analyze a lot of Chinese place names to get the features. Then, we gain the initial results of recognition by using the statistic data and N-gram method. The recall of initial recognition achieves to 97%. Afterwards, we utilize the mature maximum entropy model to combine different context features. The F value of system on real corpus comes to 88%(closed), and 84%(open).

2. As to the feature functions of maximum entropy model, we introduce HNC concept features. The result of experiment proved that semantic features contribute to 1% improvement. Meanwhile, we also try to change the length of maximum entropy model windows and analyzes the result.

3. We implement time expressions recognition and tagging system. Same as place names, we analyze the features of time expressions at the first step. Based on international time expression tagging standard, we improve the rules in Chinese time expressions tagging parts. The F value of recognition of time expression by maximum entropy model reach to 81%(closed). For the correctly recognized result, we implement the tagging system. The F value of tagging reaches to 86%. Finally, we research the relationship of time expressions and the time of an event.

4. Based on recognition of place names and time expressions, we research the tagging of place names. We design and achieve area name information database. This database includes population, area, longitude and latitude, post number, domination and so on. At last, we use the database to supervise the tagging of place names.

To summarize, we analyze the features of place names and time expressions. Then, we adopt a two-step recognition method to recognize place names and time expressions. Based on recognition, we also analyzed the tagging task. The result of this dissertation could be used at the recognition and extraction of place names and time expressions, or working as a module of word segmenting, text retrieval, machine translation or other language information processing systems.

 

Keyword: two-step recognition of place names; maximum entropy model; feature function; HNC theory; time expressions tagging;