Chinese Person and Organization Entity Names Recognition

 Based on Conceptual Relationship Knowledge

Jia Ning(Signal and Information Processing)
Directed by Zhang Quan


Abstract

Entity name recognition is a basic problem in Natural Language Processing. It is widely used in Information Extraction, Information Retrieval, Q&A and Machine Translation. As entities names are large number and with various structures, the automatic recognition is a valuable researching field.

This dissertation focuses on person and organization names recognition. This dissertation presents method based on conceptual relationship knowledge. Person and organization names are tag of language space. Their tag in conceptual language space is pp tag. The sentence category knowledge and domain sentence category knowledge contains relationship between semantic chunks and anticipation of semantic chunks concept. Using the two kinds of knowledge will extract the semantic chunks which contains pp. After analyze structure of the semantic chunk, the position of pp in semantic chunk will be found. Then, recognition arithmetic extracts person and organization names from semantic chunk.

The main points of the contribution in this dissertation are listed following:

1.      Presented a method for person and organization names recognition based on sentence category analysis and domain sentence category. The method includes three steps. First, we extract semantic chunks which contain pp by using semantic chunk relationship rules. Second, we analysis the structure of semantic chunks which are extracted in step1, and extract the parts which contain pp. Third, we recognize person and organization names from the parts in step2. The experiment shows that the method gets precision more than 99% for extraction of semantic chunk containing pp concept.

2.      Studied semantic relationship knowledge in sentence category space and HNC knowledge database. Designed semantic chunk relationship rules of conceptual layer and lexical layer. Found semantic chunk relationship rules database aimed at pp concept. The experiment shows that the semantic chunk relationship rules are effective for extracting semantic chunks which contain pp concept from sentence.

3.      Found the mapping between domain sentence category space and sentence category space. The corresponding between two spaces includes obvious corresponding and unobvious corresponding. For obvious corresponding, the mapping is found by classify semantic chunk of domain sentence category and sentence category. So, anticipation for semantic chunk of domain sentence category can be used for semantic chunk of sentence category by the mapping.

4.      Presented the principle of object-content structure in GBK without sentence ecdysis. Designed method for object-content decomposition in GBK without sentence degeneration into chunk. There are two pivotal problems for object-content decomposition. The one is whether GBKs structure is object-content structure. Another problem is judge which part is object and which part is content. This dissertation resolved the two problems and designed rules for all GBK in basic sentence category.

5.      Studied ellipsis caused by semantic chunk share between sentences, especially ellipsis of pp concept. This dissertation presented method for ellipsis resolution with relationship between sentences and analyze for semantic chunk structure. The experiment shows that the method can resolve ellipsis caused by full semantic chunk share exactly, and resolve the one caused by partial semantic chunk share effectively.

In summary, based on the HNC theory frame, this dissertation presents the method for person and organization names recognition based on sentence category analysis, domain sentence category and analysis for semantic chunk structure. Furthermore, this dissertation studied resolution for several problems in HNC theory, such as object-content decomposition in GBK, semantic relationship knowledge of conceptual layer and lexical layer, resolving of pp concepts ellipsis, etc. The studies in this dissertation reinforced practicability of HNC theory and provided a new approach for HNC theorys utility.

Key words:  HNC; Person Name Recognition; Organization Name Recognition; Conceptual Relationship Knowledge