Location:Home > Engineering science > Computer Science > Computer System Architecture > Research on Parallel Clustering Algorithm Based on Map-Reduce

Research on Parallel Clustering Algorithm Based on Map-Reduce

Downloads: []
Tutor: WangKaiDong
School: Xi'an University of Electronic Science and Technology
Course: Computer System Architecture
Keywords: Hadoop,Map-Reduce,k-means,clustering,distributed computing
CLC: TP311.13
Type: Master's thesis
Year:  2012
Facebook Google+ Email Gmail Evernote LinkedIn Twitter Addthis

not access Image Error Other errors

With the swift development of age of information, data is characterized by diverse,massive, heterogeneous and dynamic changing. An embarrassing situation whichwebsite operators often facing is ¡°riching in data but lacking inknowledge¡±.People urgently need a powerful data analysis tools to find usefulknowledge from the complex and mass data, then discover the relationship and rules init to help people make decisions, research, and bring about enormous valuableinformation. Clustering, being a method of unsupervised leaning, is a commontechnique for statistical data analysis used in many fields£¬ including data mining£¬machine learning, pattern recognition and image analysis.Map-Reduce is a currently popular distributed computing framework, which isproposed by Google. It separates logic problems from the complex underlyingimplementation details, this model is mainly for mass data processing, compared withtraditional model of parallel computing, Map-Reduce takes care of the details of taskscheduling, partitioning the input data, handling machine failures, and so on, therefore itgreatly simplifies the design of programs.This thesis deeply researched two clustering algorithms: k-means clustering andcanopy-k-means clustering, then designed parallel algorithms based on Map-Reduce.This thesis implemented these two algorithms on Hadoop cluster which was composedof4machines. The result of experiment shows that canopy-k-means based onMap-Reduce has higher accuracy, more convergence than k-means based onMap-Reduce. Both of them have good speedup and scalability.
Related Dissertations
Last updated
Sponsored Links
Home |About Us| Contact Us| Feedback| Privacy | copyright | Back to top