000-9825/2005/609523 2005 Journal of oftware Vol6, No9 +, 2,,,00084 2,00084 emantc Analyss and tructured Language Models LI Mng-Qn +, LI Juan-Z 2 ANG Zuo-Yng, LU Da-Jn Department of Electronc Engneerng, snghua Unversty, Beng 00084, Chna 2 Department of Computer cence and echnology, snghua Unversty, Beng 00084, Chna + Correspondng author Phn +86-0-6278704, E-mal lmq@thspeetsnghuaeducn, http//wwwtsnghuaeducn Receved 2004-05-4; Accepted 2004-09-07 L MQ, L JZang ZY, Lu DJ emantc analyss and structured language models Journal of oftware, 2005,69523 533 DOI 0360/os6523 Abstract An ntegrated semantc analyss system s presented, and the structured language models are proposed based on t he semantc analyss system can automatcally tag semantc class for each word and analyze the semantc dependency structure between words wth the precson of 9085% and 7584% respectvely In order to descrbe sentence structure and long-dstance dependency, two nds of structured language models are examned and analyzed Fnally, these two language models are evaluated on the tas of Chnese speech recognton Experments show that the best semantc structured language model headword trgram model acheves 08% absolute error reducton and 8% relatve error reducton over the trgram model Key words semantc analyss; dependency analyss; language model; speech recognton,, 9085% 7584%,,,, 08%, 8% ; ; ; P8 A upported by the Natonal Hgh-ech Research and Development Plan of Chna under Grant No200AA407 863 977,,,,,, ;964,,,,CCF,,, ; 935,,,, ; 928,,,,
524 Journal of oftware 2005,69 N [],,,,,, [2] rgger [3] [4] ppng [5],, [6,7] [8] [9] [0],,, 9085% 7584%,,,,,, 08%, 8%,, 2 3 4, [], 343,, [2],,,,, 59, 70,,,,,,,,,, /Experencer,,,,,,,, /content,, N W w, w,, w }, { 2 N RL { R, R2,, R N }, R H, R Rsemantc relaton, R, R H, R, ernel word 2, H
525 w /s /Dd5 /Ae3 /Ca /Ka0 /Gb2 /Aa04 /Hc05 /Da4 /Kd0 /Ie0 /H28 Englsh hese years, Doctor Yang pays a lot of attenton to the popularzaton and applcaton of hs nventon a he sentence tagged wth semantc classes a experencer tme content degree restrctve restrctve de restrctve dependency coordnaton [Yang] [Doctor] [these years] [a lot] [hs] [producton] [popularzaton] [pay attenton to] [nventon] [of] [applcaton] target b he semantc dependency tree b Fg emantc dependency structure of a Chnese sentence w /s /Dd5 /Ae3 /Ca /Ka0 /Gb2 /Aa04 /Hc05 /Da4 /Kd0 /Ie0 /H28 Englsh hese years, Doctor Yang pays a lot of attenton to the popularzaton and applcaton of hs nventon - 2 a he sentence tagged wth semantc classes a R Modfer HeadWord H Index Word Index Word emantc relaton R /Yang 2 /Doctor /Restrctve 2 /Doctor 5 /pay attenton to /Experencer 3 /these years 5 /pay attenton to /me 4 /a lot 5 /pay attenton to /Degree 5 /pay attenton to - - /Kernel word 6 /hs 8 /producton /Restrctve 7 /nventon 8 /producton /Restrctve 8 /producton 0 /popularzaton /Content 9 /of 8 /product // De dependency 0 /popularzaton 5 /pay attenton to /arget /applcaton 0 /popularzaton /Coordnaton b emantc dependency relaton lst b Fg2 emantc dependency relaton lst of a Chnese sentence 2 2 W w, w,, w } s, s,, s } RL { R, R2,, { 2 N { 2 N R N }, P, W, N P, W { P s,, W P w, P, }
526 Journal of oftware 2005,69,,,, U,,, P s, P s s, s 2 2 P w, P w s, w 3 P, 3 22 P, W,,, N Pc, W { Pc w,, W Pc s,, W P, } P w, P w w, w 5 c 2 P s, P s w, s 6 c, W, W 3,, 3,,,,,, r hw hs,,, {, / }, 70 /, ;,, NULL NULL r,,,,, 3,,,,, P, P τ, P q,, q,, q 4, 7, q τ,, 3, 2, Restrcton,rght Restrcton,rght P q,, q,, q,, q l,, q l, g P q r, r P q hwl, hs g l, hw, g hs g core q,, q,, q λ log P q wl, s,, log, l w g s g + λ P q rl r g l g 8, λ, λ 03 P q r, r P q hw, hs, hw, hs l g l l g g
527 [0],, Experencer,rght, me,rght, Degree,rght, arget,left, Restrctve,rght, De dedependency,left, Restrctve,rght, Restrctve,rght, Restrctve,rght, Coordnaton,left [Yang] [Doctor] [these years] [a lot] [hs] [producton] [popularzaton] [pay attenton to] [nventon] [ de ] ['de'] [applcaton] Fg3 One of the bnary semantc parse trees for the semantc dependency tree n Fg 3 2,,,, 2 W W P W P, W 9, [0], 9 P Pr W P,, Pr, 0 P W,, Pr, * *,, arg max P, W, 0 * * P W P,, 0
528 Journal of oftware 2005,69 22 2, [6],, 4, 9, rgram model 2 3 4 5 6 7 8 9 0 a radtonal word trgram model a experencer Headword trgram model restrctve 2 3 tme degree restrctve de restrctve dependency 4 5 6 7 8 9 0 b Headword trgram model b Fg4 4,, P 2 2 w W PWP w,, W P,, W, 4 P WP h h,, 2 P WP P WP w, P w, 2 P h, h 2 Pr 2 w W PWP w,, W ρ,, W, ρ,,, N,, W Pr PN,, W 3 4 P, ρ 5, P, Chelba [6] 6 N P, { P s,, W P w, P, }, P, P q,, q,, q τ W,, W, 6,,, τ 6
529,,,, 5 [ ] τ, P s,, W P w, P, J P N 7 7, 4, 7, 3 P N 3, semantc dependency net, DN [2] 9394 DN, 9394 993 ~994 2, 2, DN, 7 2, DN 9394, DN [3] [0] ;, 9394, 9394-DN;, 9394-DN, CR CR, w, w, R,, H w, w, R, H 2 RCR RCR 3 CR CR, em class tagger 2 Dep parser [0],Dep, 00%, 7 8 3 4 W W 4 able Results of semantc analyss system CR % RCR % CR % em class tagger 94 Dep parser 00 6725 7687 W-Parser 9085 6650 7584 W-Parser 9020 6632 7566,W W W [3] W,W
530 Journal of oftware 2005,69 P s s 2, s 2, W W,, 33,W W,,,, 32 HP,HP, 6,5, 20, 25 [6,7],,, P W λp W + λ P 8, W P I W P W P w P w2 w P w w 2, w N 3, P w W λ P w W + λ P w w, w 20 32 I 2 2 2 PPL,,, K PPL exp log PM w / w,, w K 0,, P w w,,, w w, 3 W- 20 λ, 2, λ 04, rgram 28%, 07% able 2 Perplexty of headword trgram model under dfferent nterpolaton weghts 2 λ 0 02 04 06 08 PPL 2763 25034 24820 25332 267 3444 322, 2 [6] 2, h w, ρ, 5 δ x, y D 2 N Pr P d δ D, d ρ, y x, δ x, y ;, δ x, y 0 HP, 5 4% 2, 2733 637, 323, 45 MFCC, 4 MFCC, DDBHMMduraton dstrbuton based hdden Marov model [4],, trgram, 00 ; 9 2
53, 3 3 trgram -best n n-best 06 05 Probablty 04 03 02 0 0 0 2 3 4 5 6 7 8 9 0 Depth Fg5 Depth dstrbuton of headword trgram model 5 able 3 Chnese character error rates of -best and n-best paths of baselne % 3 -best n-best % -best 5-best 20-best 00-best 020 766 624 508 4 3 FDM BDM HM, able 4 Results of all semantc structure language models % 4 % CER relatve error reducton FDM BDM HM W-Parser 956 622 940 780 939 794 W-Parser 987 39 989 299 937 84 FDMBDM,, W-Parser W-Parser 780% 299% BDM HM, 3,, 67%,,, 2,,,,,,, 08%, 84%,,
532 Journal of oftware 2005,69 4, [5], 677%,,,, [6,7],Chelba W, Chelba Upenn reeban,,, 2, W W,,, Chelba W 3 7, 6, Chelba 2208 588, W 2733 637, Chelba 008%, W 937% 07 06 05 our Our normalzaton method Chelba's Chelba s normalzaton method Probablty 04 03 02 0 0 0 2 3 4 5 6 7 8 9 0 Depth Fg6 Depth dstrbuton comparson between headword trgram models wth and wthout our normalzaton method 6 5 [0],,, 9085% 7584%,
533,, 07%, 08%, 8%,, 3 ;2,,, ;3, P WP References [] Jelne F elf-organzed language modelng for speech recognton In Wabel A, Lee KF, eds Readngs n peech Recognton an Mateo Morgan Kaufmann Publshers, 990 450 506 [2] Brown PF, DellaPetra VJ, Deouza PV, La JC, Mercer RL Class-Based n-gram models of natural language Computatonal Lngustcs, 992,84467 479 [3] Lau R, Rosenfeld R, Rouos rgger-based language models A maxmum entropy approach In ullvan BJ, ed Proc of the Int l Conf on Acoustcs, peech, and gnal Processng ICAP, Vol II 993 45 48 [4] Bellegarda JR A mult-span language modelng framewor for large vocabulary speech recognton IEEE rans on peech Audo Processng, 998,65456 467 [5] Gao JF, uzu Hen Y Explorng headword dependency and predctve clusterng for language modelng In Hac J, Matsumoto Y, eds Proc of the Emprcal Methods n Natural Language Processng EMNLP 2002 248 256 [6] Chelba C Explotng syntactc structure for natural language modelng [PhD hess] Johns Hopns Unversty, 2000 [7] Xu P, Chelba C, Jelne F A study on rch syntactc dependences for structured language modelng In Proc of the 40th Annual Meetng of the Assocaton for Computatonal Lngustcs ACL ACL, 2002 9 99 [8] Roar B Probablstc top-down parsng and language modelng Computatonal Lngustcs, 200,272249 276 [9] Gao JF, uzu H Unsupervsed learnng of dependency structure for language modelng In Proc of the 4st Annual Meetng of the Assocaton for Computatonal Lngustcs ACL ACL, 2003 7 2 http//researchmcrosoftcom/~fgao/paper/ dlm-acl03pdf [0] L MQ, L JZang ZY, Lu DJ A statstcal model for parsng semantc dependency relatons n a Chnese sentence Chnese Journal of Computers, 2004,272679 687 n Chnese wth Englsh abstract [] Me JJ, Zhu YM, Gao YQ, Yn HX ongyc Cln Dctonary of ynonymous Words hangha hangha Cshu Publsher, 983 n Chnese [2] L MQ, L JZ, Dong ZDang ZY, Lu DJ Buldng a large Chnese corpus annotated wth semantc dependency In Ma Q, Xa F, eds Proc of the 2nd IGHAN Worshop on Chnese Language Processng 2003 84 9 [3] Zhang JP A study of language model and understandng algorthm for large vocabulary spontaneous speech recognton [PHD hess] Beng Department of Electronc Engneerng, snghua Unversty, 999 n Chnese wth Englsh abstract [4] Wang ZY, Xao X Duraton dstrbuton based HMM speech recognton models Chnese Journal of Electroncs, 2004,3246 49 n Chnese wth Englsh abstract [5] Zhou M A bloc based dependency parser for unrestrcted Chnese text In Proc of the 2nd Chnese Language Processng Worshop 2000 78 84 http//researchmcrosoftcom/chna/papers/robust_dependency_parser_chnese_extpdf [0],,,2004,272679 687 [],,,,983 [3] [ ],999 [4], HMM,2004,3246 49