Analysis
and Processing on the Composing of
GBK
without Sentence Ecdysis
Liang Xiong
Abstract
This dissertation focuses on the composing of general-object semantic chunks without sentence ecdysis in the Chinese language. The research of the dissertation comes from implementing the system of Sentence Category Analysis(SCA), which lacks comprehensive research on General Object Semantic Chunk(GBK), as it is more difficult to describe the composing of sentences than that of semantic chunks. The research on composing of GBK is the weakest part on the platform of SCA because there are a lot of difficulties in the process of composing GBK.
There are two main kinds of GBK: GBK with sentence ecdysis and GBK without sentence ecdysis. The latter constitutes the foundation of GBK system. This dissertation tried to analyse the composing of general-object semantic chunks without sentence ecdysis. The dissertation has put forward three kinds of basic combination methods of GBK without sentence ecdysis, including Coordinate Combination, Modificatory Combination and Noun Conglomeration Combination. The dissertation still studies the composing of the named entity and the boundary of GBK. This has extended the research on GBK.
As to methodology, this dissertation employed mainly induction and statistic
methods in the part of linguistic depicting, deduction and theoretical
explanation in the part of processing rules.
Based on the previous research on the HNC theory, this dissertation studies the related problems on the analysis of GBK. The main contributions and creative points of this dissertation are listed as the following:
1¡¢The dissertation has carried out comprehensive research on the composing of GBK without sentence ecdysis for the very first time. It has put forward three kinds of basic combination methods of GBK without sentence ecdysis. Moreover, the characteristics and regulations of composing of each combination have been discussed detailedly in this dissertation.
2¡¢Through the research on composing of the three basic combination methods, the dissertation has discovered that the priority of combination and the order is under the control of the degree of concept differentiation. In one close GBK without sentence ecdysis, the degree of concept differentiation is a gradually decreasing sequence. The degree of concept differentiation can be used to recognize the GBK.
3¡¢The dissertation has carried on the overall research and put forward the initial solution to the boundary of GBK, which includes three categories¡ª¡ªGBK-GBK¡¢GBK-EK and GBK-fK. The GBK boundary processing has combined the degree of concept differentiation and the knowledge of sentence category.
4¡¢
The dissertation has put forward the main structure of
the named entities, including named words, domain words and common words. It
also obeys the rules
about the decreasing degree of concept
differentiation, so the degree of concept differentiation can be used as a clue
for Named Entity Recognition(NER).
As summing up the above, this dissertation studies the composing of GBK without sentence ecdysis, which includes three kinds of basic combination relations, the named entity processing and GBK boundary processing. It has also put forward the homologous processing strategy. The fruits of its labour will make up the shortage of SCA and contribute to improving the compositive performance ability of SCA platform.
Key words: General Object Semantic Chunk(GBK); HNC Theories; degree of concept differentiation; without sentence ecdysis