Research on Literature Retrieval Based on Concepts Similarity

Tutor: YangXiQuan
School: Northeast Normal University
Course: Applied Computer Technology
Keywords: Semantic Web,Ontology,Concepts Similarity,Literature Retrieval
CLC: TP391.3
Type: Master's thesis
Year:  2009
With the development of information technology, Web has become a major source of information accessing. However, the personal preference information (knowledge) has two puzzles: (1) Non-structural Web pages, the disorderly hyperlinks and mass of content lead to information trek; (2) Information resources are short of a unified semantic description and it is difficult to find relevant resources for users. How to find valuable information has become a study focus of information retrieval technology.In recent years, the development of ontology technology provide support for this problem, ontology-based information retrieval is based on knowledge and semantic, which make up the deficiencies of traditional keyword-based information retrieval technology, it ensure that result in precision and on recall is better. Ontology is the core of Semantic Web, it is a conceptualized and model of field knowledge method that can be used to describe semantic information for computer processing data. Semantic Web seeks to provide unique mark to all the resources on Web, and set up the various types of semantic contact which computer can deal with between resources. Make disorderly Web to become orderly knowledge base of computer understandable. Semantic Web use multi-level representation framework, ontology is located from the document describes to the turning of knowledge reasoning, so the construction of ontology is the key of Semantic Web. Ontology is used to describe concepts and relations between these concepts in a domain or a wider range, it make concepts and relations have a clear definition in shared range and reach a consensus to exchange between computer and people. Because concept is the smallest unit of information, research on concepts semantic similarity is very important in the information retrieval. It has a wide range of applications in the recommendation and filtering, data mining and other fields. Today, it is a key technology in the field of information technology. And widespread application of ontology in the field of information retrieval artificial intelligence provides a new approach for concepts similarity computation.In our work, ontology-based concepts similarity computation is the focus of study. Firstly, introduction theory and technology of ontology, including the formal definition, modeling element language, description language, construction, classification, it provides a theoretical basis for ontology-based concepts similarity computation. Secondly, we propose an ontology-based comprehensive concepts similarity computation model. It considers four factors: property similarity, semantic distance, hierarchy depth, regulatory factors. Finally, we create a Paper Ontology with ontology modeling tool Prot¨¦g¨¦, analyze Paper Ontology with Jena, and compute papers similarity based on the method in our work and expand OWL documents, compare retrieval result of the expanded OWL documents and non-expanded OWL documents. The relevance retrieval not only search in the form of grammar and consistent query results, but also search in semantic. The experimental results show this method of computing concepts similarity can improve the quality of search, and would effectively guarantee the relevance of search results to improve the retrieval precision and recall.
