Location:Home > Engineering science >  Control Theory and Control Engineering > Internet Medicine Information Monitoring System Based on Focused Crawler

Internet Medicine Information Monitoring System Based on Focused Crawler

Downloads: []
Tutor: HaoPing
School: Zhejiang University of Technology
Course: Control Theory and Control Engineering
Keywords: focused crawler,medicine information monitoring,pagesearch algorithm,correlation
CLC: TP393.09
Type: Master's thesis
Year:  2011
Facebook Google+ Email Gmail Evernote LinkedIn Twitter Addthis

not access Image Error Other errors

With the rapid development of the world wide web in recent years, network has becomeanimportant way to access to information and transmit information, information in the internetgrowth exponentially have followed. Althoughthe Internethas greatlyfacilitatedthedevelopment ofpeople¡¯s lives, however, because of internet¡¯s characteristic such as extensivesource, large range, low cost of issuing information, difficulties in monitoring, many fake goodssellers that has been strongly combat in the market by law enforcement agencies transferred theplatform forselling fake products to the network, a large number ofcounterfeit goodsappearonthenetworkwithimpunity.In order to combatrampantsellingcounterfeit drugs, monitoring the informationontheInternetdrug trade is necessary. The key problem of monitoring the drug trade informationon Internet is topic search, and the focused crawler can be used in topic search. Focused crawleraims at one certain filed or faces the specific topic to obtain the high recall ratio and precision.But most of search algorithms are used in large topic search, effect of the search strategy thatusedinspecificsmalltopicisnotideal.Themainworkinthepaperincludes:1.Forthedifferentnetworkstructure¡¯scharacteristicsof forumwebsiteandthegeneralsite,differentpagesearchalgorithmswereproposed.2. Aiming at the problem , which effect of the search strategy that used in specific smalltopicisnotideal,proposedacombinedstrategythatsearchedspecifictopicontheInternetbasedon analyzing focused crawler¡¯s searching algorithm. The combined strategy includedpage-searching and relativity analysis. Page relativity algorithm adopted improved Fish-Search algorithm; Relativity analysis adopted distributed algorithm, hereinto the first step made use ofVector space modelalgorithm to find out the great topic in the rough. The second steprespectively adopted improved Native bayes classification algorithm and k Nearest Neighborsalgorithmtoselectthecorrelativesmalltopicfromthepreviousstep¡¯sresult.3. On basis of researching, developed an information monitoring system facing themedicine on Internet. By testing the data of some websites and forums¡¯page, the result shows,thecombinedsearchingstrategyimproves theharvestratioandsmalltopicsearch¡¯s efficiencyofthefocusedcrawlersystem.
Related Dissertations
Last updated
Sponsored Links
Home |About Us| Contact Us| Feedback| Privacy | copyright | Back to top