4 2014 7 ( ) Journal of East China Normal University (Natural Science) No. 4 Jul. 2014 : 1000-5641(2014)04-0062-07, (, 200062) :,.,,, ; N-POSW, 2-POS W,.,, F 7%. : ; ; N-POSW ; : TP39 : A DOI: 10.3969/j.issn.1000-5641.2014.04.008 Study on the extraction of Chinese microblog subjective sentences based on lexicon and corpus Abstract: ZHU Hai-huan, YU Qing-song (Computer Center, East China Normal University, Shanghai 200062, China) In this paper, we propose a new method for the extraction of Chinese microblog subjective sentence, which is based on a combination of lexicon and corpus. By determining whether the sentence contains emotional expressions, it can be classified as a subjective or objective sentence. Firstly, a highly credible sentiment lexicon was built based on the words whose emotional orientation is fixed from the existing sentiment dictionary. Based on the highly credible sentiment lexicon, sentiment expressions can be extracted with assurance of accuracy. Finally, a N-POSW model was proposed for the corpus-based learning method. Through the 2-POSW model, the remained sentiment expressions in the sentence can be extracted, thus guaranteeing the overall recall rate. Experimental results show that the F Value in this paper increases 7% compared with the traditional method, which is based on the large-scale sentiment lexicon. Key words: sentiment lexicon; highly credible lexicon; N-POSW model; subjective sentence : 2013-07 :,,,. E-mail: zhhh1988@gmail.com. :,,,, Web. E-mail: qsyu@cc.ecnu.edu.cn.
4, : 63 0,,.,.,. Kim [1].,. Wiebe [2 5],. Pang [6]. Long [7] Tweets,., [8] (2-POS). CHI 2-POS, 2-POS,. [9] SVM. [10], 2-POS,,...,.,,,.,,., : N-POS, N-POSW, 2-POSW,. : 1 ; 2, ; 3 ; 4 ; 5. 1..,,. :,.,. 1: #ipad#. 2: #ipad# 3: #ipad#,/.,. 1,. 2,. 3, /
64 ( ) 2014.,.,..,.,,,. 1. 1 Fig. 1 Main thoughts of extracting Chinese microblog subjective sentences 2,,.,. HowNet [11] NTUSD [12], 11 212, SentiDic. 1. 1 Tab. 1 A algorithm for extraction of subjective sentences based lexicon Input: Sentence Output: Sentence Sentence 1: if Sentence SentiDic then 2: Sentence 3: else Sentence 4: end if.,,,., :,,., : #ipad#., ; : # #,,
4, : 65. 3 3.1 ( 2 ),,.,,., SentiDic R SentiDic,. R SentiDic 2 143. 2 10 R SentiDic. 2 R SentiDic 10 Tab. 2 Ten words in the R SentiDic 3.2 N-POSW N-POSW, N-POS [8]. N-POS, N,. N=2, 2-POS. : :. : ( ) ( ) ( ) ( ). ( ). 2-POS -, -, -,, - 2-POS. [8] N-POS..,. N-POS,, N-POSW. 1 S. 2 w i S i. S=w 1 w 2 w 3... w i... w n, n S. 3 SqPOS=(c 1, c 2,...,c n ) S, c i w i. 4 N-Words=(w 1 w 2...w N, w 2 w 3...w N+1,..., w n N 1 w n N+1...w n ), N-Words S N. 5 N-POSW=(SqPOS,N-Words) S N-POSW., S, S N.,.. : ( ) ( ) ( ) ( ). ( )., SqPOS=(,,, ), 2-Words=(,, ). 3.3 2-POSW N-POSW, N 2, 2-POSW. 2-POSW,,.
66 ( ) 2014 { 1 subj(text) > α obj(text), subj(text) > β, 6 T(text)=, subj(text) 0. text, obj(text) text, α, β. { 1 w {w i, w i+1 }, c (a, v, n), 7 Q(w i w i+1 )=, w i w i+1 S 0., c w, a, v, n,. 8 P(S) = n 1 i=1 Q(w iw i+1 ) T(w i w i+1 ) S., w i S i, n S. P(S) S. S. α, β,, S. 3 2-POSW. ICTCLAS [13]. 3 2-POSW Tab. 3 A algovithm for extraction of subjective sentence base on 2-POSW Input: S Output: S S 1: S 2: S 2-POSW 3: P(S) 4: if P(S) > 0 then 5: S 6: else S 7: end if 3.4 2-POSW. R SentiDic., R SentiDic ;,. R SentiDic,.,. R SentiDic,., 2-POSW,,. 4. 2-POSW.,,.,. 2,, SentiDic.
4, : 67 4 Tab. 4 A algovithm for extraction of Chinese microblog subjective senterces based on lexicon and corpus Input: S Output: S S 1: if S then 2: S 3: else 4: if 2-POSW S then 5: S 6: else S 7: end if 8: end if 4 4.1 NLP&CC [14]., 7 040. xml,. NLP&CC :,., 1 000 1 000, 3.4. : <weibo id= 89 > <sentence id= 1 ># #,,, </sentence> <sentence id= 2 >,, </sentence> <sentence id= 3 > </sentence> <sentence id= 4 >,,,,,. </sentence> <hashtag id= 1 > </hashtag> </weibo> 4.2 F. =, =, F = 2 +.
68 ( ) 2014 4.3 5. 6.,, F 7%. 5 Tab. 5 The experiment result base on the traditional large-scale lexicon algorithm /% /% F /% /s 57.5 79.2 66.6 3.2 Tab. 6 6 The experiment result base on the lexicon and corpus algorithm α β /% /% F /% /s α=3, β=6 64.8 80.6 71.9 61.3 α=3, β=9 69.6 78.1 73.6 60.5 α=4, β=8 69.7 72.9 71.3 59.1 α=4, β=12 74.0 67.1 70.4 61.1,..,.,., α β.,.,.,,. 5. N-POSW, 2-POSW,,. N-POSW [8] N-POS. N-POS N ; N-POSW, N,., F 73.6%,. [ ] [ 1 ] KIM S M, HOVY E. Automatic detection of opinion bearing words and sentences[c]//companion Volume to the Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP). Berlin: Springer, 2005: 61-66. [ 2 ] WIEBE J, WILSON T, BELL M. Identifying collocations for recognizing opinions[c]//proceedings of the ACL 01 Workshop on Collocation: Computational Extraction, Analysis, and Exploitation. Toulouse, FR: ACL, 2001: 24-31. [ 3 ] WIEBE J, WILSON T. Learning to disambiguate potentially subjective expressions[c]//proceedings of the 6th conference on Natural language learning-volume 20. Stroudsburg, PA: Association for Computational Linguistics, 2002: 1-7. ( 87 )
第 4 期 于波, 等 : 聚吡咯结构与导电性能的研究 87 [25] RAJAGOPALAN P,IROH JO.Characterizationofpolyaniline polypyrrolecompositecoatingsonlowcarbon steelaxpsandinfraredspectroscopystudy[j].appliedsurfacescience,2003,218:58 69. [26] STREET GB,CLARKETC,GEISSR H,etal.Characterizationofpolypyrrole[J].JournaldePhysique,1983, 44:c3,559 606. [27] LIU Y C,HWANGB J,JIAN W J,SANTHANAM R.Insitucyclicvoltammetry surface enhancedramanspec troscopystudiesonthedopingundopingofpolypyrrolefilm[j].thinsolidfilms,2000,374:85 91. ( 责任编辑李艺 ) ( 上接第 68 页 ) [4] WILSON T,WIEBEJ,HWA R.Justhow madareyou?findingstrongandweakopinionclauses[c]//proceed ingsofthenationalconferenceonartificialinteligence.menlopark,ca:mitpress,1999,2004:761 769. [5] WILSON T,WIEBEJ,HEA R.Recognizingstrongandweakopinionclauses[J].ComputationalInteligence. 2006,22(2):73 99. [6] PANGB,LEEL.Asentimentaleducation:Sentimentanalysisusingsubjectivitysummarizationbasedon mini mumcuts[c]//proceedingsofthe42ndannualmeetingonassociationforcomputationallinguistics.[s.l.]:as sociationforcomputationallinguistics,2004:271 278. [7] LONGJ,MOY.Target dependenttwitersentimentclassification[c]//proceedingofthe49thannualmeeting oftheassociationforcomputationallinguistics.stroudsburg,pa:acl,2011:151 160. [8] 叶强, 张紫琼, 罗振雄. 面问互联网评论情感分析的中文主观性自动判别方法研究 [J]. 信息系统学报,2007, 1(1):7 91. [9] 张博. 基于 SVM 的中文观点句抽取 [D]. 北京邮电大学,2011. [10] 杨武, 宋静静, 唐继强. 中文微博情感分析中主客观句分类方法 [J]. 重庆理工大学学报 : 自然科学,2013, 27(1):51 56. [11] 董振东, 董强. 知网简介 [DB/OL].[2013 7 20].htp://www.keenage.com. [12] 台湾大学.NTUSD 简体中文情感极性词典 [DB/OL].[2013 7 20].htp://www.datatang.com/data/11837. [13] ICTCLAS.ICTCLAS 汉语分词系统 [DB/OL].[2014 06 10].htp://www.ictclas.org. [14] 中文信息技术专业委员会. 中文微博情感分析评测 [EB/OL].[2013 7 20].htp://tcci.ccf.org.cn/conference/ 2012/pages/page04_eva.html. ( 责任编辑李艺 )