154 12 ( ) Vol.54 No.2 2015 3 JournalofXiamenUniversity (NaturalScience) Mar.2015 doi:10.6043/j.issn.0438 0479.2015.02.018 + 9 R, ( ]a G,9O 361005) 56: % 5 ) B` M 7 # <, [ <# 67 < # 9 Z V _ D H,W H < 6 7 # # #4 #(7 R, Z X D H 5 4 # ( 7. D H 0 1 b, (IDF)(767,WH T#X D G T # Z " J, Z 89 TZ "# T(767 : (inverseconceptfrequency,icf). - D H # C Q, 6718 T 1 T T G T#ICF(7,X4 # TICF(7 H#ICF(7. < ICF 9 #XD,_ 67 - # ` J LZICF7 > = 6 7 #. J & ] 7 [ 160 30.74% = 72.28%, 15.87% = 49.64%. 789: ; ; ; T(7 :; : TP391.1 # <: A #=' : 0438 0479 (2015)02 0257 06 # ] #. c, 5# ] Z # ] H ] < #!. ] +, <&]# H b # ] M ] PQ, V# 67 V # H< # 4@ # ]M, = #67F Y H7. R #C, V # H # S- =. 67 01# 67? #, 01T # b. % 5 ) B ` 9 M 7 # <. 01 \# <[ 7 01 67 DH?,89 V#67, '"CQ V #$ G 67 # 8 9 ] 6 #, < 4,0 1 $, G V Z # 9, Z 67 # ]6# 6 7 : R 8, #, $ R. 4 ]6 Z J # $,C! 7 5 6775 U, < #$ ]# D H,_ 6 7 # \, [1] < % 6 7 V# 6 7 : R 8,? 8 # %,G G '" D 8 V C V #M [2], 6 7, H 0 1 # b,g [ V # 6 7 V #, 0 1 # V # 6 7. 01 [3 4] V# 967 _ J. [ 6 7 <, < # 8 9 # R, M#G Y ', [5]0 1 >?@ :2014 04 29 AB@ :2014 08 25 +CD: _B`a (61303004);9O B`a (2013J05099) E :zdz@xmu.edu.cn # :, # 67[J]. :B`a,2015,54(2):257 262. 犆犻狋犪狋犻狅狀 :SunJing,ZhangDongzhan.Wordsimilaritycomputingbasedoninverseconceptfrequencies[J].JournalofXiamenUni versity:naturalscience,2015,54(2):257 262.(inChinese) 犺狋狆 : 犼狓犿狌. 狓犿狌. 犲犱狌. 犮狀
258 (B`a ) 2015 < T/ #D,67 T#! 9 # [6]^ _ T # 6 7, [7] Z ] 2 # 6 7, [8] Z C #C T%, [9] Z # ( 7 D U. R 0 1, X T T T #XD, T # 6 7 D ( 7. < H T (7, # (7 6 7 #. ` R F,R 7 J * T(7# R +. 1 9 1 1 7W" [10] ( HowNet) 3 W A _X N O#8 # ], C G V # # % G % V #, '" ' #? # T, < T R9 R C #, T D #. 8 <8 C,8 <8C T C, # 67 # T# 67. < ]DH #/. 1 狑 H8, 犠 H #!,Z 狑 犠. 2 Sim( 狑 1, 狑 2).? 狑 1 狑 2, H8 0 1V #,?@ H1,( ( H0 [5]. 3 Dis( 狑 1, 狑 2).? 狑 1 狑 2, '" CQ# HH b,? #. '"V #8 $ #, R #^, HH. V J [5],? 狑 1 狑 2,Dis( 狑 1, 狑 2 ), Sim( 狑 1, 狑 2), V 4 犮 H #8, 犆 H #!,Z 犮 犆. 5 犅 ( 犮犻 )= 狑, 犮犻 犆 H 犮犻 < R 狑. 6 犉 ( 狑 )H 狑 #!. 8 狑, R 8, H 狑 # 75R,R H 犉 ( 狑 )={ 犮 1, 犮 2,, 犮犻,, 犮狀 犮犻 犆, 犅 ( 犮犻 )= 狑,1 犻 狀 }, 犉 ( 狑 ) 犆. 1 2 1 2 1 9? 狑 1 狑 2, [5] Z 狑 1 狑 2 # HG #,Z Sim( 狑 1, 狑 2)= max 犮犻 犉 ( 狑 1 ), 犮犼 犉 ( 狑 )Sim ( 犮犻, 犮犼 ). (1) 2 6 7 @ H 67, T V C #, 67 # @H67 T# 1 2 2 T Y Z < C 1 # D#, T $ 8. T'"V, -67 T. # 67 T, [5]? T R '"V # H 犱, R? TV # : Sim( 狆 1, 狆 2)= α 犱 +α, ( 2), 狆 1 狆 2 R? T, 犱 狆 1 狆 2 T'" V # ^,α α # G H0.5 # 1 2 3 7 狔 H #8 T, 犢 H T #!,Z 狔 犢.,8 犮犻 <8C TC C!, TC TG # T C :< T, @ ( V ) _ C; TC :< T= T @ T= ( V ) @ ( T = V ) C; T C :< T @ ( V ) C; 8 犘 ( 犮犻 )={ 狔 1, 狔 2,, 狔犻,, 狔狀 狔犻 犢,1 犻 狀 }H 犮犻 # T!. 9 犘 1( 犮犻 )H 犮犻 #18 TC # T!,18 TC Z 犮犻 Z #18 TC. 0 犘 1( 犮犻 ) 1, 犘 1( 犮犻 ) 犘 ( 犮犻 ). 10 犘 2( 犮犻 )H 犮犻 # 1 T C # T!, 18 TC # TC H 犮犻 # 1 TC, 犺狋狋狆 : 犼狓犿狌. 狓犿狌. 犲犱狌. 犮狀
12 : # 67 259 犘 2( 犮犻 ) 犘 ( 犮犻 ). 11 犘 3( 犮犻 )H 犮犻 # T C # T!, 犘 3( 犮犻 ) 犘 ( 犮犻 ). 12 犘 4( 犮犻 )H 犮犻 # T C # T!, 犘 4( 犮犻 ) 犘 ( 犮犻 ). 犘 ( 犮犻 )= 犘 1( 犮犻 ) 犘 2( 犮犻 ) 犘 3( 犮犻 ) 犘 4( 犮犻 ) [5] Z 8 9 6 7 #, < 4 / 6 7 2 犮犻, 犮犼 # : 1)18 T C,R 8 6 D # Sim 1 ( 狔 1, 狔 2)( 狔 1 犘 1( 犮犻 ), 狔 2 犘 1( 犮犼 )),6 7 (2); 2) 1 TC,R86D# H Sim 2 ( 狔 1, 狔 2),67 $: (i) Z 犘 2( 犮犻 ) 犘 2( 犮犼 ) # # O,D 67, #8C, I$ 犘 2( 犮犻 ) 犘 2( 犮犼 ) K RC#2 (i)7=? (i), 犘 2( 犮犻 ) 犘 2( 犮犼 )H. (i)67(i) # #7 3) T C,R 8 6 D # H Sim 3 ( 狔 1, 狔 2),67 X T # C DH 8 C, 6 7, Q C # 7 4) C,R 8 6 D # H Sim 4 ( 狔 1, 狔 2),67 X # D H 8 C, #?!, 2) 67 67R2!#, Q G / # #7 < H 6 D # " 6 D # <, 6D J,1W" 6D# V # <? RS # V H: 4 犻 Sim( 犮 1, 犮 2)= β 犻 Sim 犼 ( 狔 1, 狔 2), 犻 =1 犼 =1, β 犻 (1 犻 4), β 1+β 2+ β 3+ β 4 =1, β 1 β 2 β 3 β 4.Q@ Sim1 Sim 4 V # < "'. 2 + ( 犻狀狏犲狉狊犲犮狅狀犮犲狆狋犳狉犲 狇狌犲狀犮狔, 犐犆犉 ) 9 13 TXD. 犐 ( 犮犻, 狔犼 )( 狔犼 犘 ( 犮犻 )) H T 狔犼 犮犻 #XD,Z 犮犻 X D 1 #, 狔犼 T 犘 ( 犮犻 ) # (7. 14 / XD. 犐 ( 犮犻, 犘犼 ( 犮犻 ))(1 犼 4)H/ 犼 ( 犼 H H,1 1 8 T C,2 1 TC,3 TC,4 C ) 犮犻 #XD. 犐 ( 犮犻, 犘犼 ( 犮犻 ))=max{ 犐 ( 犮犻, 狔犽 )( 狔犽 犘犼 ( 犮犻 ))}. XD # Z, < 犘犼 ( 犮犻 )(1 犼 4) # 4 / 6 7 2, Sim 犼 ( 狔 1, 狔 2)(1 犼 4) #(7 β 犼 G 犐 ( 犮犻, 犘犼 ( 犮犻 )) 67#,Z β 犼 - 犐 ( 犮犻, 犘犼 ( 犮犻 )) # R >. # 犮犻, 犐 ( 犮犻, 犘犼 ( 犮犻 ))(1 犼 4)# #. &]# T(767 67 # <* # T(7,R `] β 犼 - 犐 ( 犮犻, 犘犼 ( 犮犻 )) # > #.R9> # 67. 2 1 犐犆犉 TF IDF ( 7 6 7 [11], Z < ICF67 T # ( 7,ICF ( 7 6 7 89]6, < TZ # ( R TXD #, T#7 %_ Z # HJ$. 15 ICF 8 T X D # % # 5. 8 T 狆犻 #ICF, # R T 狆犻 Z 犘犼 ( 犮犻 )( 犮犻 犆,1 犼 4) #, X # ICF( 狆犻, 狘犆狘犼 )=log 狘犘犼 ( 犮犻 ) (1 犼 4, 狆犻 狘 犘犼 ( 犮犻 ), 犮犻 犆 ), (3) 狆犻 H # T, 犼 H T ], 犆 H!. 16 ICF_ 犆 ( 犮犻, 狔犼 )( 狔犼 犘 ( 犮犻 ))H T 狔犼 犮犻 #ICF(7. 8 犮犻, T#ICF ( 7 R T #ICF (7, ICF_ 犆 ( 犮犻, 狔犼 )=ICF( 狔犼, 犽 )( 犽, 狔犼 犘犽 ( 犮犻 )). T # 7 % _ Z # HJ$, Z, 犆 #, ICF_ 犆 ( 犮犻, 狔犼 )# G 犐 ( 犮犻, 狔犼 )#J H λ.z lim ICF_ 犆 ( 犮犻, 狔犼 )=λ 犐 ( 犮犻, 狔犼 )( 犮犻 犆,λH8 狘犆狘 ). 犺狋狆 : 犼狓犿狌. 狓犿狌. 犲犱狌. 犮狀
260 (B`a ) 2015 17 ICF_ 犆 ( 犮犻, 犘犼 ( 犮犻 ))=max{icf_ 犆 ( 犮犻, 狔犽 )( 狔犽 犘犼 ( 犮犻 ))}HG / 犮犻 #ICF( 7. 2 2 [5] < * ( 6 7, <ICF67/ ( _ 67 V,A D 6 D " 6 D # <,4 / # V# # < ICF, ', < $ (4)67? RS # V : 4 Sim( 犮犼, 犮犽 )= β 犻 Sim 犻 ( 狔犼, 狔犽 ), (4) 犻 =1, β 犻 (1 犻 4) ICF 6 7 Z # ( 7, β 1+β 2+ β 3+ β 4=1. 2 3 + 犐犆犉 9 犮 犆, 犮犻 犆 ( 犻 =1,2,, 犆 ), < (4)67 犮 犮犻 # Word_Simi( 犮, 犮犻 ). 犮 狀 (0< 狀 4) / 狆 11, 狆 12,, 狆 1 狀, 犮犻 犿 (0< 犿 4) / 狆 21, 狆 22,, 狆 2 犿. : ICF# 677 c : 犠 c : G 犠 = # : 1) T. # 8 T 狔犻,D 6 7 T 狔犻 Z 18 T 1 T T #". 2) # 8 犮犻 ( 犻 1), 犮犻 # 8 / 犘犼, 17 6 7 犘犼 #ICF #8, ICF. 犘犼 3) # 8 犮犻,67 犮犻 犮 # Word_Simi( 犮, 犮犻 )= 1 ICF 狆 Sim( 1 犻狆 1 犻, 狆 2 犻 ), 犻 4 ( 狆 1 犻 犘犻 ( 犮 ), 狆 2 犻 犘犻 ( 犮犻 )). 4) 1) 13)[ # 6 7 犠 # 5) = E, E af7 > %, Z 8 9,! 犆 # 犮犻 # 犐 ( 犮犻, 犘犼 ( 犮犻 ))(1 犼 4) #, 8 α(0<α 1)? # 犮犻 犮犼, 犐 ( 犮犻, 犘犽 ( 犮犻 ))- 犐 ( 犮犼, 犘犽 ( 犮犼 )) α( 1 犽 4).R! # #. X 犆 # O$ ICF # 6 7 J * ( 7 7 ^ _ # 67. #4 #(7 β 犼 (1 犼 4),Z β 犼 - 犐 ( 犮犻, 犘犼 ( 犮犻 )) # R >. X β 犼 * H 4 α 犼 (1 犼 4),, 犮犻 α 犼 - 犐 ( 犮犻, 犘犼 ( 犮犻 )) α/2. ICF 7, ( 7 ICF_ 犆 ( 犮犻, 犘犼 ( 犮犻 )) lim (ICF_ 犆 ( 犮犻, 犘犼 ( 犮犻 ))=λ 犐 ( 犮犻, 犘犼 ( 犮犻 ))( 犮犻 犆,λH 狘犆狘 8 )), 1 犐 ( 犮犻, 犘犼 ( 犮犻 ) = 1, 犼 4 ICF_ 犆 ( 犮犻, 犘犼 ( 犮犻 ))(1 犼 4)##8 H1/λ., ICF# 67, Z#4 #(7 β 犼 (1 犼 4) lim β 犼 = 犐 ( 犮犻, 犘犼 ( 犮犻 ))( 犮犻 犆 ). 狘犆狘 F 8 H 犖,G 犆 犖 β 犼 - 犐 ( 犮犻, 犘犼 ( 犮犻 )) <α/2,r F 犆 #[ $, ICF# 67J* (77 ^_ # 67. 1 犆 =,, 犆 # 9 Hafairs ),military,; 犆犻 =, 犆犻 # 9 Hinstitution,military. 2.3 #67, -#8 Q 犆 #ICF=[0.229 6,0.7,0.01,0.05], 犆 犆犻 # T 1 T T CH[0.2316,1,1, 1], Sim( 犆, 犆犻 )=0.8236. < [5]# 7,! [6] T 6 7 # ^ _, Z Sim ( 犆, 犆犻 )=0.2316., LZ, X \ 67#F. 2 4 9 W, 8 #, V H8 8 # $ V#, #O Z 8 # [10]. # % E_C %,R % 9 H` %.` %< HD #R, #? H `,8 buy!, 8 weave (. W_C=,, G_C=V, E_C=~,~ Z,~[, ~, ~, DEF=buy!, W_C=,, E_C=~ \,~ ],~ S ^,~ b _,~ 8 ) `, ~ab,~, G_E=V, DEF=weave (. # c,#1 $ 犺狋狋狆 : 犼狓犿狌. 狓犿狌. 犲犱狌. 犮狀
12 : # 67 261. - $ G # #67 \ # #67#J, X 8 H # [9]? # # H? #.G [10] J, O T$ #, WH [10] #D # D <$ #, 8 # $ # WH < [10]# D YH!. X( E_C %, # # E_C % X ( 8 # _ `DH. 3 FGPT. S 3 1 FGPT < VisualStudio2008 Window 7 ] ICF # T ( 7 6 7, < (1)6 7, T T'". # [5] 67 T ` 6D R%#, <6 D 45 ^ V5 H `, # _ 6 7, = E. GD # 5 ", < # # D ( ]8#, < R!# C `. 1, D <5~6! H % ]6!,,! ( 狓, 狔 )R ` #, [ 狓 # `X 7 ( 1) [6]7 ( 2) [5]7 ( 3)397 J, 1 L Z 7 6D # c>. 3 2 FGPT S ICF ( 7 6 7 R # 8 : XD # T R 1 1 # T Z "# T, XD #.R 9! # W #. <ICF( T(7#, & ] ` T* (7, 67 (7G 4XD >. ` L Z, <ICF ( T (7# R9> # ICF T 4XD #89, 89 4 # (, WH # T 7, # T ( <,Y`R ( H #.J # T > T, 18 T Z " 364, > 1 T Z " 263, 67 T(7 Ya7 >,R ] _ # c, > # =#,] 6 ` [6] # 7 R H, EU 18 T 1 T #X, # L,R 6 7 # # ] # HS, ` L Z, E [ # T > T #. 4 P " _ # c, D < C 1 Fig.1 Accuracyrateandrecalrate 犺狋狆 : 犼狓犿狌. 狓犿狌. 犲犱狌. 犮狀
262 (B`a ) 2015, T T R # ]5, T#ICF T ( 7. # 6 7 `, < X #4 T(7D G4 T, G #, Q < [5] (1) = # ` af,gd # V #>. D # c> ^,T, ICF67 T ( 7 7 9 # % -,_ # _ [ Y #, # T9 9 # D. Q#01,7! = ICF 67 T(77 # %, 7 < Y #, R 01 < D # : [1] B` % ( ] D 3O[D]. _: a 0 1 6 7 0 1,2001. [2] Rodríguez M A,Egenhofer M J.Determiningsemantic similarityamongentityclassesfrom diferentontologies [J].Knowledgeand Data Engineering,IEEE Transac tions,2003,15(2):442 456. [3] BudanitskyA,HirstG.Semanticdistancein WordNet:an experimental,application orientedevaluationoffivemeas ures[c] Workshopon WordNetandOtherLexicalRe sources.pennsylvania:thepennsylvaniastate Universi ty,2001:1 6. [4] LeeW N,ShahN,SundlassK,etal.Comparisonofontol ogy basedsemantic similaritymeasures[c] AMIA An nualsymposiumproceedings.bethesda MD,USA:Amer ican MedicalInformaticsAssociation,2008:384 388. [5], O. # 67[J]. 67,2002,7(2):59 76. [6] 9S,$,,.89^_# # 67[J]. ],2008,22(5):84 89. [7] 7,0 # 7 01[J]. ],2010,24(6):31 36. [8],^_# # 7 [J].6 7 <,2011,31(11):3075 3077. [9], b.^_# # 6 7[J].67 <,2013,8:2276 2279,2288. [10] X,X. [DB/OL].[1999 06 01].htp: www.keenage.com/. [11],,,.8 9 / 7 # D / ( [J].6 7 0 1 G. c,2009,10: 1693 1703. 犠狅狉犱犛犻犿犻犾犪狉犻狋狔犆狅犿狆狌狋犻狀犵犅犪狊犲犱狅狀犐狀狏犲狉狊犲犆狅狀犮犲狆狋犉狉犲狇狌犲狀犮犻犲狊 SUNJing,ZHANGDong zhan (SchoolofInformationScienceandEngineering,XiamenUniversity,Xiamen361005,China) 犃犫狊狋狉犪犮狋 : Thewordsimilaritycomputationplaysanimportantroleinserviceselection,naturallanguageprocessing,andliterature retrieval.currentresearchesofwordsimilarityaregeneralybasedonhownet.byanalyzingthestructureofhownet,wepresentan ideathattheweightoffourbasicstructuresoftheconceptshouldbedynamicalygeneratedduringcomputingthesimilaritybetween twowordsandamethodofcalculatingtheweightofprimitivebasedonthefrequency.wecomputetheicfofeachbasicprimitivein thefirstbasicprimitive,otherbasicprimitives,relationprimitiveandmarkprimitivethroughconceptstructureanalyzing,andtakethe maximumicfastheicfofthebasicstructure.thenwecomputethewordsimilaritybyusingdynamicicfobtainedastheweightof fourbasicstructures.experimentalresultsshowthattheaccuracyofwordsimilaritycalculationisefectivelyimproved.theaverage accuracyofformer160wordsrisesfrom30.74% to72.28%,andtherecalrisesfrom15.87% to49.64%. 犓犲狔狑狅狉犱狊 : HowNet ;wordsimilarity;inverseconceptfrequency;primitiveweight 犺狋狋狆 : 犼狓犿狌. 狓犿狌. 犲犱狌. 犮狀