Location:Home > Engineering science > Computer Science > Computer System Architecture > Hadoop-Based Community Discovery Algorithm Research

Hadoop-Based Community Discovery Algorithm Research

Downloads: []
Tutor: WangCuiRong
School: Northeastern University
Course: Computer System Architecture
Keywords: Hadoop,community discovery,PageRank
CLC: TP391.3
Type: Master's thesis
Year:  2011
Facebook Google+ Email Gmail Evernote LinkedIn Twitter Addthis

not access Image Error Other errors

The main research in this paper is the community discovery based on Web data mining.In recent years,as the Internet growing up,the Web network has became a very large wide center in the world that supplies news,finance,commerce,culture,education and so on.The pages extremely related on the Internet form all kinds of the topics.These topics become the community.The community discovery is that to find potential and defined discovery on the Internet and distill them.So,it has importtant theoretical and practical value for Web data mining.We research on community discovery from theory,algorithms and implementation on this paper.We describe the basic theory of community discovery firstly and the platform of Hadoop and we analysis the algorithms of community discovery such as PageRank, HITS, Based-on binary graph algorithms and Based-on most flows algrithms. For the advantage and disadvantage especially the topic drift and the question of the time of PageRank algorithm,we present a new algorithm TTPageRank, to solve the problems and realize the method of community discovery based-on link weight.We set up Hadoop distributed platform and implement the PageRank and TTPageRank with MapReduce and show the process of community discovery. We use tool of crawl to collect web pages to experiment on the Hadoop distributed platform.The results of experiment prove the difference of the two algorithms on Performance and accuracy.
Related Dissertations
Last updated
Sponsored Links
Home |About Us| Contact Us| Feedback| Privacy | copyright | Back to top