Research on Anaphora Resolution 
and Ellipsis Recovery within Chinese Text

by WANG Hou-feng 
Directed by HUANG Zeng-yang


Abstract

Natural Language provides people with rich mechanisms for varying the expression of the same topic. However, it also challenges linguists and computational researchers to do with a great number of nontrivial problems in Natural Language Processing (NLP), two of them are anaphora solution and ellipsis recovery.

The process of identifying the antecedent of anaphor is called anaphora resolution. The process of deciding where ellipsis occur and retrieving component left out is called ellipsis recovery. It is necessary to find reconciliatory constituent for anaphora and ellipsis resolution.

Anaphors and ellipsis occur throughout text, and their high frequencies makes two kinds of resolution key issues in discourse processing which have attracted attention of increasing researchers. Some evaluation, known as the Message Understanding Conference (MUC) were developed to measure the state of the art of Information Extraction (IE) systems that attempt to extract predefined sort of information from natural language text. Undoubtedly, anaphora resolution is an important part of IE, therefore, is important content of MUC, because without it, an IE system could generate multiple unrelated extractions when recognizing the same entity. The result may be misrepresentation.

In order resolve Chinese anaphora and ellipsis, HNC based tactics are presented in the report. Main topics are as follow:

Giving brief introduction into basic concepts of HNC (Hierarchical Network of Concept), and emphasis on those of Sentence Category (SC) and semantic chunk

Semantic constraint relations between pronoun and antecedent are described in rule format. Some of them serve as filters to eliminate those unsuitable antecedent candidates, and others are used to choose the most salient one from remainder ones.

In the report, demonstrative anaphora is discussed in two forms: the single demonstrative word such as "this" ("宸" in Chinese character or "zhe" in Chinese Pin Yin) that serves anaphor and the demonstrative expression (or definite description a noun phrase followed by demonstrative word). The constructive patterns of demonstrative expression and the analysis models of these patterns are described in order to recognize demonstrative expressions separated by a semantic chunk or a sentence in Chinese text. Resolution steps of demonstrative anaphora are given. They are as follow: partitioning demonstrative anaphora into three classes and one of them need not be resolved, giving simple decision principles of refering-forward and refering-back; describing the semantic type of antecedent; presenting resolution rules of demonstrative anaphora and processing tactics of quantifier. At last, a general method to generate coreference link is given.

Giving a standard to decide where ellipsis occur in a sentence and method to retrieve constituent left out in a sentence which is similar to anaphora resolution.

 

Keywords: Pronominal Anaphora Resolution, Demonstrative Anaphora Resolution, Ellipsis Recovery, Antecedent, Sentence Category, Semantic Chunk, Sentence Ecdysis.