一般社団法人電子情報通信学会 信学技報 THE INSTITUTE OF ELECTRONICS, IEICE Technical Report INFORMATION THE INSTITUTE OF AND ELECTRONICS, COMMUNICATION ENGINEERS IEICE LOIS2016-85(2017-03) Technical Report INFORMATION AND COMMUNICATION ENGINEERS 263-8522 1-33 E-mail: afpa3246@chiba-u.jp, higaki.yasuhiko@faculty.chiba-u.jp 4 18 6 18 11 0.1 word2vec Application of distributed representations of words to tourist spots recommendation system Ryota KAICHI Yasuhiko HIGAKI Graduate School of Engineering, Chiba University 1-33 Yayoi-cho, Inage-ku, Chiba-shi, 263-8522 Japan E-mail: afpa3246@chiba-u.jp, higaki.yasuhiko@faculty.chiba-u.jp Abstract The purpose of this study is to examine the optimum condition for applying distributed representations of words to recommendation system of tourist spots and verify the effectiveness of this system applying word distribution expression. The system consists of four processes: word vector creation, data input and vectorization, similarity calculation, tourist spots extraction and presentation. In order to raise accuracy in this system, we conducted preliminary experiments on tourist spots databases and corpus. As a result of the evaluation experiment, the proposed method has better results than the existing method and the effectiveness was confirmed. Keywords Tourist Spots Recommendation SystemWord VectorWord2vecNatural Language Processing 1. 1.1. Web [1] TF-IDF 1.2. [2][3] [4] 2. 2.1. - 129 - This This article is is a a technical report without peer peer review, and and its its polished and/or extended version may may be be published elsewhere. Copyright Copyright 20 2017 by IEICE IEICE
[5] Distributional Semantic Models[6] ( ) skip-gram [7] 2.2. word2vec 1 word2vec[4] () skip-gram ( ) skip-gram 1 skip-gram c log-bilinear [8] (2) w,w word2vec Wikipedia 2 word2vec (1) [2] [4] 3. 2 2.2 word2vec 3 3.1. 3 Web word2vec g v (3) 1 https://code.google.com/p/word2vec 2 https://ja.wikipedia.org/wiki/ - 130 -
1 Google amusement_park establishment aquarium spa stadium place_of_worship museum park zoo 3.2. n n (4) (5) (6) 3.3. (7) 3.4. 4 4 4 4. 4.1. Web 4.1.1. a. Wikipedia b. Google Places API 3 1 Wikipedia 914,843 Google Places API 170,853 (1) (2) (3) 4.1.2..com 4.com.com 17119.com F 3 https://developers.google.com/places/ 4 http://www.rurubu.com/domestic/ - 131 -
2 (1) F (Wiki) 914,843 4,984 0.0107 Wiki 60,512 3,138 0.0808 3 (2) F (Wikipedia) 914,843 4,984 0.0107 (Wikipedia) 119,482 3,954 0.0579 4 3,693 6,703 29.8 43 9,306 0.249 369 484 41.2 14,807 17,957 44.6 732 12,375 3.20 2 112 0.965 (Google) 170,853 4,983 0.0530 (Google) 23,465 3,272 0.1612 F 4.1.3. (8) (9) (10) ( ) 2 2 F ( ) 3 Yahoo! 5 3 Wikipedia Google Places F ( ) Yahoo! (11) (12) 5 (3) 2015 11 1,790,753 96,790,696 (13) 4 0~54.05 4 5 (1)(2) 0.1 5 F Wikipedia (1) (2)(3) 0.9 5 http://chiebukuro.yahoo.co.jp - 132 -
5 7 (1) 6.6GB 161 34 3.6GB (2) 4.5GB 42 9 959MB (3) 15GB 110 74 2.5GB (4) 11GB 178 338 4.9GB (A) (B) (C) (D) (E),,,,,,,,,, 6 (100 ) (1) 71 9.70 (2) 75 8.56 (3) 72 9.30 (4) 66 10.56 4.2. word2vec [4] 4.2.1. 4 (1) Wikipedia (2) Yahoo! (3) Yahoo! (4) (1)(2) Mecab 6 word2vec word2vec size=600window size=5 (hs=0) 5 5 4.2.2..com 6 http://taku910.github.io/mecab/ (F),USJ, 5.com 7 10 10 [9] 10 4.2.3. 6 6 (2) (2) 5. 5.1. 7 7 6 3 4.2 18 1 17 5.2. [1] 2 3 7 http://www.rurubu.com/ranking/dom Sight.aspx - 133 -
8 1 0.289 1 0.286 2 0.280 2 0.279 3 0.262 3 0.273 4 0.260 4 0.251 5 0.252 5 0.231 6 5.3. 6 18 11 18 6 0.1 6. 8 8 8 7. Wikipedia (1) (2)(3) 0.9 Yahoo! [1],,, :,, 2011 2011,1566-1579,2011-06-30. [2], : AWA (artist2vec ), 9 (SIG- DOCMAS2015), 2015. [3] Mihajlo G, Vladan R, Nemanja D, and Narayan B: E- commerce in Your Inbox: Product Recommendations at Scale, KDD 15, (2015). () [4],,, :word2vec,. DE, 114(204), 41-46, 2014-09-03. [5] Peter Turney, Patrick Pantel:From frequency to meaning:vector space models of semantics, Journal of Artificial Intelligence Research 37, 141-288, 2010. [6] Harris:Distributional structure, Word.10(23), 146-162.1054. [7] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean:Efficient Estimation of Word Representations in Vector Space:In Proceedings of Workshop at ICLR,2013. [8] Mnih, A. and Teh, Y. W.: A fast and simple algorithm for training neural probabilistic language models, Proceedings of the 29th International Conference on Ma- chine Learning, pp. 1751-1758 (2012). [9] :, 26. - 134 -