An MT-oriented Study of Sentence Category and 
Sentence Format Transfer from Chinese to English

ZHANG Ke-liang (Signal and Information Processing)
Directed by HUANG Zeng-yang


Abstract



Machine translation (MT) worldwide has made remarkable progress as a result of the past 50-odd years of strenuous endeavor. Among all the available MT systems that resorted to various linguistic theories and translation technologies, however, none have ever broken through the snowline of 70% accuracy of translations. As to Chinese-English MT, the status quo is even worse. Due to the absence of a linguistic theory suited to the analysis and understanding of the Chinese language, and also due to the inadequacies in the models of natural language representation and processing, all the Chinese-English MT systems available now still have a very long way to go before they can satisfactorily meet the needs of consumers. 
The Hierarchical Network of Concepts (HNC) theory, which is in nature intended for exploring the human cognitive mechanism of language acquisition, is well suited to the task of computer understanding of natural languages. The present research, an MT-oriented study of sentence category (SC) and sentence format (SF) transfer from Chinese to English, has twofold significant meanings. Firstly, by taking English as a starting-point and test bed we can check to what extent the HNC theory is applicable to languages apart from Chinese, and thus further develop this innovative theory and accelerate its pace of advancing toward the world. Secondly, by exploring the transfer rules and mechanism from source language to target language in MT, we can promote the research on HNC-based MT engine, and thus lay foundations for the development of HNC MT systems.
Two main methods are taken in this study. One is comparison and contrast, i.e., a comparative study is made of Chinese and English in terms of the quantity, form and distribution of their respective SCs and SFs. The other is induction and generalization, i.e., the general rules about the SC and SF transfer from Chinese to English are inferred through the tagging and analysis of a bilingual corpus of aligned Chine-English sentence pairs. 
The present study aims to explore the general rules underlying the SC and SF transfer of sentences from Chinese to English under the guidance of the HNC theory, especially those thoughts on SC, SF and MT. This paper mainly includes the following aspects:
(1) Introducing the HNC viewpoints on MT, analyzing the framework of a possible MT system based on the HNCˇˇtheory, and proposing general strategies and guidelines for HNC-based MT systems to follow;
(2) Defining the categories of SC transfer and a formal way to describe them, i.e. TransFrame;
(3) Defining some novel SFs that are characteristic of the English language, making a comparative study of Chinese and English in terms of SF, and discussing in detail the general rules underlying the SF transfer between the two languages;
(4) Investigating the general rules underlying the SC and SF transfer of such significant sentence categories as yes-no judgment sentence, bearing sentence, chunk-extended action sentence, concise state sentence, effect sentence, existential sentence, comparative sentence, and so on;
(5) Collecting enough bi-directional and bilingual Chinese-English raw materials, building corpora of aligned Chinese-English sentence pairs, and tagging and analyzing the tagged materials.


Keywords: machine translation (MT); Hierarchical Network of Concepts (HNC) theory; sentence category (SC); sentence format (SF); Chinese-English transfer