98 514415-435 Chinese Journal of Psychology 2009, Vol. 51, No. 4, 415-435 1 2 1 1 2 08A172008 11 19 2009 9 25 2009 10 30 320 2 300 hwawei@ncu.edu.tw Landauer 2006 (1) (2) (3) NSC095-2917-I-008-001- Professors Walter Kintsch and Eilleen Kintsch
416 Landauer Latent Semantic Analysis, LSA Landauer, 2002; Landauer, Foltz, & Laham, 1998 D e e r w e s t e r Deerwester, Dumais, Furnas, Landauer, & Harshman, 1990 Singular Value Decomposition, SVD Deerwester Landauer Foltz Laham 1998 LSA LandauerGrolier Encyclopedia 60,768 term 30,473 term-to-document cooccurrence matrix SVD Dimension Reduction 300 Landauer et al., 1998 LSA cosine value LSA Landauer et al., 1998Landauer Landauer LSA Landauer et al., 1998 Foltz, Kintsch, & Landauer, 1998LSA LSA http://lsa.colorado.edu/ Dennis, 2006 LSA LSA LSA LSA Quesada, 2006 LSA 2005LSA 1557 2921 2002100 1600 LSA
417 LSA TASA Touchstone Applied Science Associates, Inc.9 320 500Dennis, 2006 3.0 500 LSA (1) (2) SVD (3) Quesada, 2006 2006 9,277 type token parsing General Text Parser GTPGiles, Wo, & Berry, 2003 SVD LSA GTP Word Boundary GTP GTP GTP SVD SVD of, an, the LSA Quesada, 2006 LSA Stop list 55,303 (1) (2) 49,021 9,277 9,277 M = 964 SD = 2335 LSA Landauer & Dumais, 1997 2009,277 40,463 M = 219SD = 66 49,021 40,463 SVD GTP SVD SVD
418 SVD SVD TASA Tedlab http://tedlab.mit. edu/~dr/ SVDLIBC/ SVDlibc SVDlibc Sparse Matrix Dense Matrix SVDlibcSparse Matrix SVD 49,021 40,463PIII 2.8G 1G Ram SVD SVD 40,463 LSA Landauer Dumais 1997100 200300 Tedlab SVD 100 200 300 2009 LSA 1
419 http://www.lsa.url.tw 1LSA pairwise comparison Nearest Neighbors Pairwise Comparison 2 ASBC 1 LSA 2
420 term-to-term Document-to-term term-to-document document-to-document Get cosine cosine 1 1 term-to-term Document Segmentation document-to-docuement term-todocument multiple Data Get cosine Download cosine result N e a r e s t Neighbors 3 Nearest Neighbors 1 Text1 Text2 Cosine term-to-term 0.74 term-to-document 0.36 docutment-to-term 0.36 docutment-to-document 0.43 Text1 Text2 Cosine term-to-term 0.53 term-to-document 0.26 docutment-to-term 0.37 docutment-to-document 0.23
421 3 2 Term Cosine Term Cosine 1.00 0.74 0.80 0.51 0.73 0.43 0.71 0.405 0.68 0.402 0.66 0.403 0.65 0.391 0.64 0.382 0.59 0.355 0.59 0.346 Threshold 0 1 0.3 0.3 3 submit 21-6000 cosine10
422 cosine 10 4 Get CharFre Segmentation Get WordFre 198 OOV LSA 1981 2004 cosine cosine cosine.55 cosine.16 1996 600 300 1. 1996 4
423 3 Range Number Percentage (%) items(s) (Repeated items are not shown) 1-500 137 69.19,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 501-1000 21 10.61,,,,,,,,,,,,,, 1501-2000 7 3.54,, 2501-3000 2 1.01,, 3001-3500 8 4.04, OOV 23 11.62,, 1-3000 75 54.74,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 3001-6000 4 2.92,,,, 6001-9000 3 2.19,,, 9001-12000 3 2.19,, 12001-15000 3 2.19,, 30001-33000 1 0.73, 33001-36000 1 0.73, OOV 47 34.31,,,,,,,,,,,,,,,,, relative frequency Biased Balanced word association task 1996 85 1 45 38 4 22 56 9 5 2. 600 term
424 4 1996 M SD M SD N 81.00 9.13 7.15 6.26 289 52.29 13.42 27.46 10.48 311 5 74 56 9 5 2 4 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1996 documents term-to-documents 600 term term term-to-term 415 cosine 415 term-to-documentscosine term documents F(1, 413) = 15.81 p <.001 2 =.037 cosine cosinef(1, 206) = 29.57 p <.001 2 =.129 cosine F(1, 207) =.013 p >.05 cosine F(1, 434) = 7.95 p <.001 2 =.036 cosine cosine term-to-documents term-to-term
425 6 cosine term-to-documents M SD M SD N 0.137 0.145 0.081 0.113 207 0.098 0.112 0.097 0.113 208 5cosine 1. 160 2. 16 204 (1) (2) (3) 5 50 3.
426 7 / F(2, 406) = 547.88 p <.001 2 =.731 F(1, 203) = 35.07 p <.001 2 =.159 cosine t =.404 p >.05 cosinecosinet =7.98; 8.51, p <.05 ASBC ASBC F F(1, 202) = 6.49 9.93 13.33 all ps <.01 LSA LSA LSA 2005 2002 1996 7 Human ratinglsa cosine rating cosine Min. Max. M SD Min. Max. M SD / 0.821 4.775 3.252 0.916 0.001 0.738 0.143 0.166 0.500 4.950 2.596 0.994-0.169 0.934 0.149 0.160 0.025 3.123 0.558 0.582-0.076 0.659 0.040 0.100
427 LSA multiple Data 1996 38 67-168 2009 51 21-36 2005 3/3 NSC93-2520-S-003-011 2004 46 49-55 2002 1981 23 137-153 Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41, 391 407. Dennis, S. (2006). How to use the LSA web site. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp. 57-70). Mahwah, NJ: Lawrence Erlbaum Associates. Foltz, P. W., Kintsch, W., & Landauer, T. K. (1998). Analysis of text coherence using latent semantic analysis. Discourse Processes, 25, 285 307. Giles, J. T.,Wo, L., & Berry, M. W. (2003). GTP (general text parser) software for text mining. In H. Bozdogan (Ed.), Statistical data mining and knowledge discovery (pp. 455 471). Boca Raton, FL: CRC press. Landauer, T. K. (2002). On the computational basis of learning and cognition: Arguments from LSA. In N. Ross (Ed.), The psychology of learning and motivation (Vol.41, pp. 43 84). New York: Academic Press. Landauer, T. K. (2006). LSA as a theory of meaning. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp. 3-34). Mahwah, NJ: Lawrence Erlbaum Associates. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259 284. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104, 211 240. Quesada, J. (2006). Creating your own LSA spaces. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp. 71-85). Mahwah, NJ: Lawrence Erlbaum Associates.
428 {P 1, P 2,.P I }{M 1, M 2,.M k }n 204 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 2 11 2 12
429 2 13 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 3 9 3 10 10 3 11 3 12 4 1 4 2 4 3 4 4 4 5 4 6 4 7 4 8 4 9 4 10 4 11 4 12 4 13 5 1 5 2 5 3 5 4 5 5 5 6 5 7 5 8
430 5 9 5 10 6 1 6 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6 11 6 12 6 13 6 14 7 1 7 2 7 3 7 4 7 5 7 6 7 7 7 8 7 9 8 1 8 2 8 3 8 4 8 5 8 6 8 7 8 8
431 8 9 8 10 8 11 8 12 8 13 8 14 9 1 9 2 9 3 9 4 9 5 9 6 9 7 9 8 9 9 9 10 9 11 9 12 9 13 9 14 9 15 9 16 9 17 9 18 9 19 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 10 9
432 10 10 10 11 10 12 10 13 10 14 10 15 11 1 11 2 11 3 11 4 11 5 11 6 11 7 11 8 11 9 11 10 11 11 11 12 11 13 11 14 11 15 11 16 12 1 12 2 12 3 12 4 12 5 12 6 12 7 12 8 12 9 12 10 12 11 12 12
433 12 13 12 14 12 15 13 1 13 2 13 3 13 4 13 5 13 6 13 7 13 8 13 9 14 1 14 2 14 3 14 4 14 5 14 6 14 7 14 8 14 9 14 10 15 1 15 2 15 3 15 4 15 5 15 6 15 7 15 8 15 9 15 10 16 1 16 2
434 16 3 16 4 16 5 16 6 16 7 16 8 16 9 16 10 16 11 16 12 16 13
435 The Construction and Validation of Chinese Semantic Space by Using Latent Semantic Analysis Ming-Lei Chen 1, Hsueh-Cheng Wang 2, and Hwa-Wei Ko 1 1 Institute of Learning and Instruction, National Central University 2 Department of Computer Science, University of Massachusetts at Boston, MA. Recently, using corpus to create a word net is a new approach in psycholinguistic study. The present study used Latent semantic analysis (LSA) to create a Chinese semantic space. The semantic relationship between words in the Chinese semantic space was estimated by taking the dot product (cosine) between two vectors. The semantic relationship between two sentences or two documents could be estimated in the same way. The results of human data indicate that the Chinese semantic space is a valid way to represent Chinese reader s world knowledge. Keywords: latent semantic analysis, semantic relation, semantic space