SIG-KBS/KBSE Web 1996 1999 ( ) 2004 2006 Web 1
Web 200 z ªªªªª ª (Yahoo! s2005/8) i uw ªªªªª ªªª «ªªª ªªª«ªª i ªªª ªªª w( ~ ªªª) ªªªª ªªª (ªªª 2000 36«Netcraft ) mf ÿ u ( ~ ) ~ y ªªª ªªª XML Webªªªª ªªªªª Hypermedia and Multimedia Semantic Web Web E*Applications E-Commerce, E-Science E-Learning Data Mining Classifying, and Clustering docs Mining content, link structure, and usage data Evolution of the Web XML and Web Services Web Engineering WWW Conference Tracks Search Link-based search techniques Security, Privacy, and Ethics Pervasive Web and Mobility Performance, Reliability and Scalability Browsers and User Interfaces Web info. vis. Browsing and navigation 2
3
u ª ª ª 4
[ACM Hypertext 2001] ( ) ªªªª ~ªªª PC } ªªª «PC ªªªª ªª ª HITS [Kleinberg, 1997] Trawling [Kumar et al, 1999] KB Graph Structure in the Web [Broder et al, 2000] SONY TOSHIBA NEC 5
ªªªªªªª } ªªª«ªªªªu ª ~ ªªªªªªªª t ªªª ªªªª ve Relationship graph For each page, find authorities in the neighborhood, and make edges from the page to authorities Adobe NEC Microsoft Oracle IBM Toshiba SONY Canon Olympus Konika 6
Relationship Graph Adobe NEC Microsoft Oracle IBM Toshiba SONY Canon Olympus Konika Web Community Chart Build a graph of clusters Weights of edges are # of relationships between clusters Adobe IBM NEC 3 SONY Microsoft Oracle Software vendors 2 Canon Toshiba PC vendors Olympus Konika Digital Camera Vendors 7
ªªªªªª ªª ª ªªªª ª ª 1999 2004 7 ªªªª 3ª ª ªªª ªªª ªªª 10 ªªªª ~ªªª ª ª SCC,WCC ª ª Year 1999/8 2000/8 2001/10 2002/2 2003/2 2003/07 2004/05 ªªªªªªv #Pages #seeds 17M 671K 17M 741K 40M 1431K 45M 1583K 66M 4646K 97M 7870K 96M 8192K #comms 83K 94K 158K 171K 554K 874K 849K Itanium2 (8 CPU, 128GB Memory 1 2 3 (URL ID, ID <OutLinks><InLinks>) 2004/05 IDŸ<OutLinks><InLinks> DB 2.6Gªªª«1999 URL «4.5Gªªª 30GB d ªª DB DB (URL Title, URL <AnchorTexts>) 8
Yahoo! [ 2003] URL (2002 ) URL 5 4079 (33930URL 8.13) 4965 (63757URL 12.84) Yahoo! Yahoo! URL1 URL2 URL3 URL4 URL2 URL3 URL1 URL4 Ÿ Ÿ 9
[ ] DEWS2001 [, ] 2000 2001 Web Community Browser [ ] DEWS2002, WISS2002, FIT2002 [ ] DEWS2003, TOD22 [ ] TOD20, DEXA2004 [RICOH] NTCIR3 Web 10
[ACM Hypertext 03] Internet Web Archive Crawling Extract all communities and their relevance Compare T1 T2 T3 time A Large-Scale Study of the Evolution of Web Pages [Fetterly et al 2003] 1.5 ªªª ~1 11~ ªªª What s New on the Web? [Ntoulas et al 2004] m 150ª ª ~1 1 u ªªªªªªªªª Recall (Internet Archive) [Patterson 2003] Recall Internet Archive ªªª««96 03 110 ªªª ~ªªªªª Š ªªª s 11
1999 2004 7 Year 1999/8 2000/8 2001/10 2002/2 2003/2 2003/07 2004/05 ªªªªªªv #Pages #seeds 17M 671K 17M 741K 40M 1431K 45M 1583K 66M 4646K 97M 7870K 96M 8192K #comms 83K 94K 158K 171K 554K 874K 849K UFJ ªªªªª s 12
ÒÖ ªª ªªªªv f ~ 99 s ªªªª ªªªªªª u µ 13
ccc.jp ddd.jp eee.jp fff.jp ggg.jp hhh.jp iii.jp jjj.jp kkk.jp lll.jp mmm.jp nnn.jp tk-1 Types of Changes Changes are detected by comparing neighboring charts aaa.jp bbb.jp eee.jp fff.jp ggg.jp hhh.jp iii.jp jjj.jp kkk.jp lll.jp mmm.jp nnn.jp tk time Emerge Dissolve Split Merge Types of Changes Structure of communities changes dynamically How the size distribution is kept unchanged? 14
Emerged and Dissolved Communities Both size distributions follow the power-law Both exponents are greater than ones in size distribution of all communities Split and Merged Communities # of split and merged URLs also follow the power-law, and have clear symmetry 15
Grown and Shrunken Communities Growth rate have clear y-axis symmetry 䉡䉢䊑䈱 㑆ಽᨆ 㪲㪘㪚㪤㩷㪟㫐㫇㪼㫉㫋㪼㫏㫋㩷㪇㪌㪴 㑆䋫 㑆ಽᨆ䋺䉮䊚䊠䊆䊁䉞䈱ᄌㆫ 䋨 䋺i-modeᬌ 䉰䉟䊃䋩 䊔䊮䉼䊞䊷ਛᔃ 㑆 ᄢᚻ䈱ᣂⷙ 䊔䊮䉼䊞䊷 ㅌ ᄢᚻਛᔃ䈮⒖ⴕ 16
[. DEWS2006] [Toyoda. WWW2006] 17