Location:Home > Engineering science > Information and Communication Engineering > Research and Application on Text Similarity Detection Based on Fingerprint Retrieval

Research and Application on Text Similarity Detection Based on Fingerprint Retrieval

Downloads: []
Tutor: ZhangZuPing
School: Central South University
Course: Information and Communication Engineering
Keywords: text similarity detection,fingerprint retrieval,b-bitminwise hash,fine-grained e
CLC: TP391.1
Type: Master's thesis
Year:  2013
Facebook Google+ Email Gmail Evernote LinkedIn Twitter Addthis

not access Image Error Other errors

The openness of network and simplification of text copy not only provide a convenient way for sharing of academic resources, but also provide an opportunity for copy plagiarism and other academic misconduct. For the sake of protecting intellectual property and correcting study styles, research of text similarity detection technique has become very necessary.Considering fund project similarity detection as research background, in order to detect the similar documents in massive amounts of documents quickly and accurately, the thesis focuses on key techniques involved in the similarity detection system based on fingerprint retrieval such as fast fingerprint retrieval algorithm and technique, model and method of fingerprint extraction. Specific research work is as follows:(1)As distance calculation of high dimension vector is time-consuming and less fingerprint leads to low accuracy in massive amounts of text similarity retrieval, the thesis puts forward a parallel retrieval algorithm based on fingerprint group, indexes fingerprint in several groups, which reduces distance calculation by retrieving a low-bit fingerprint. Meanwhile, the parallel platform of CPU+GPU helps execute the process of fingerprint retrieval, shorten fingerprint retrieval time and improve the retrieval precision at low similar threshold.(2)On account of structured content, diverse chapter tag word and user¡¯s different attention to various parts of document, the thesis studies techniques, such as fine-grained classification, fuzzy match of word, and Chinese word segmentation, and achieves fine-grained classification in chapter, paragraph, sentence, and so on. According to accuracy requirement of fund project, maximum forward match algorithm and maximum reverse match algorithm based on string match are combined to extract feature fingerprint accurately. The fingerprint ensures the quality of sequent detection, and presents similarity evidence visually and clearly. (3)The thesis discusses function framework and main flow of the text similarity detection system, analyzes techniques such as document clustering, similarity estimate, detailed document similarity comparison, and result present, combines and achieves finger group parallel retrieval algorithm and fine-grained text extraction technique.
Related Dissertations
Last updated
Sponsored Links
Home |About Us| Contact Us| Feedback| Privacy | copyright | Back to top