Analysis and Processing on the Composing of

GBK without Sentence Ecdysis

 

Liang Xiong 


Abstract

    This dissertation focuses on the composing of general-object semantic chunks without sentence ecdysis in the Chinese language. The research of the dissertation comes from implementing the system of Sentence Category Analysis(SCA), which lacks comprehensive research on General Object Semantic Chunk(GBK), as it is more difficult to describe the composing of sentences than that of semantic chunks. The research on composing of GBK is the weakest part on the platform of SCA because there are a lot of difficulties in the process of composing GBK.

    There are two main kinds of GBK: GBK with sentence ecdysis and GBK without sentence ecdysis. The latter constitutes the foundation of GBK system. This dissertation tried to analyse the composing of general-object semantic chunks without sentence ecdysis. The dissertation has put forward three kinds of basic combination methods of GBK without sentence ecdysis, including Coordinate Combination, Modificatory Combination and Noun Conglomeration Combination. The dissertation still studies the composing of the named entity and the boundary of GBK. This has extended the research on GBK.

    As to methodology, this dissertation employed mainly induction and statistic methods in the part of linguistic depicting, deduction and theoretical explanation in the part of processing rules.

    Based on the previous research on the HNC theory, this dissertation studies the related problems on the analysis of GBK. The main contributions and creative points of this dissertation are listed as the following:

1The dissertation has carried out comprehensive research on the composing of GBK without sentence ecdysis for the very first time. It has put forward three kinds of basic combination methods of GBK without sentence ecdysis. Moreover, the characteristics and regulations of composing of each combination have been discussed detailedly in this dissertation.

2Through the research on composing of the three basic combination methods, the dissertation has discovered that the priority of combination and the order is under the control of the degree of concept differentiation. In one close GBK without sentence ecdysis, the degree of concept differentiation is a gradually decreasing sequence. The degree of concept differentiation can be used to recognize the GBK.

3The dissertation has carried on the overall research and put forward the initial solution to the boundary of GBK, which includes three categoriesGBK-GBKGBK-EK and GBK-fK. The GBK boundary processing has combined the degree of concept differentiation and the knowledge of sentence category.

4  The dissertation has put forward the main structure of the named entities, including named words, domain words and common words. It also obeys the rules about the decreasing degree of concept differentiation, so the degree of concept differentiation can be used as a clue for Named Entity Recognition(NER).

    As summing up the above, this dissertation studies the composing of GBK without sentence ecdysis, which includes three kinds of basic combination relations, the named entity processing and GBK boundary processing. It has also put forward the homologous processing strategy. The fruits of its labour will make up the shortage of SCA and contribute to improving the compositive performance ability of SCA platform.

Key words:  General Object Semantic Chunk(GBK); HNC Theories; degree of concept differentiation; without sentence ecdysis