國立台灣大學圖書資訊學系四十週年系慶學術研討會論文格式

Automatic Time Alignment for a Taiwanese Read Speech Corpus and its Application to Constructing Audiobooks with Text-Speech Synchronization 140 11 83% Youtube (force alignment) 214

[3] 140 7 2 Wav 1.21G Sample rate 16kHz 681.37 133 15923 139271 ForPA Transcriber HTK (Syllable) Transcriber 0.16 0.06 HTK HTK 95% 1. tsoh-sit 215

2. e-thôo-lâng 3. kah but 4. kah tīnn-khó-x tsoh-site-thôo-lâng buttīnn-khó-x kah kah kah ô â ī Unicode Ascii/Big5 : 1. 2. 3. TL ForPA 1. ' sing5 ' [' ', ' ', ' ', ' ', ' ', ' ', ' ', 'sing5', ' ', ' ', ' ', ' ', ' ', ' '] (regular expression) python re.split('([ - ] [a-za-z]+\d*)', ) re Python regularexpress Python unicode \u4e00 unicode \ufa2d Python >>> jj=' sing5 ' >>> re.split('([ - ] [a-za-z]+\d*)',jj) ['', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', ' ', 'sing5', ' ', ' ', '', ' ', '', ' ', '', ' ', ' '] ( ) 216

>>> ii=' sîng ' >>> re.split('([ - ] [a-za-z]+\d*)',ii) ['', ' ', '', ' ', ' ', 's', 'î', 'ng', ' ', ' ', '', ' ', '', ' ', '', ' ', ' '] sîng Python (LGO.py) hunhl [2] >>> ii=' sîng ' >>> hunhl(ii) [' ', ' ', ' ', ' ', ' ', ' ', ' ', 'sîng', ' ', ' ', ' ', ' ', ' ', ' '] sîng 2. e-thôo-lâng e1-thoo5-lang5 = 'ô â a ō î' = 'o a a o i' = '5 5 8 7 7' >>> tobsr('siâ') 'sia5' >>> tobsr('dâ') 'da5' 3. ForPA TL ForPA (1) (Python hunsu) (2) ( Python Dict ) TL 'thoo5' ForPA >>> s,u,d = splitsud( thoo5') #S,U,D >>> [S2S.get(s,s), U2U.get(u,u), D2D.get(d,d)] 'to5' 217

splitsud S2S, U2U, D2D Python (Python dict) (2.1) re.split('([aeiou][a-za-z]*)(\d*)',syllable) syllable = song4 => ['s', 'ong', '4', ''] ForPA TL ForPA TL m ng m ng hm bng png mng hm png phng mng dng tng nng tng thng nng gng kng hng kng khng hng zng cng sng chng chhng sng ()(ng m) (2.2) >>> splitsud('song4') ['s', 'ong', '4'] >>> splitsud('siong2') ['s', 'iong', '2'] >>> splitsud('sng5') ['s', 'ng', '5'] >>> splitsud('sng') ['s', 'ng', ''] >>> splitsud('ng') ['', 'ng', ''] >>> splitsud('oai') ['', 'oai', ''] >>> splitsud('oe2') ['', 'oe', '2'] >>> splitsud('dfgsfd3') ['', 'dfgsfd', '3'] >>> splitsud(' ') ['', ' ', ''] ForPA ForPA Python dict 218

Python S2S = {k:v for k,v in zip(tlsvor, ForPaSVOR)} U2U = {k:v for k,v in zip(tluvor, ForPaUVOR)} D2D = {k:v for k,v in zip(tldiau, ForPaDiau)} #dict{...} will map from key to value >>> [S2S.get(S,S), U2U.get(U,U), D2D.get(D,D)] Python ['th', 'oo','5'] Python Dict ['to5'] 219

(2.3)(2.4) (2.3) TL TL ForPA ts ch z tsh chh c oe ue ue oa ua ua oai uai uai Python S2S.update({'ts':'z','tsh':'c'}) U2U.update({'oe':'ue','oa':'ua','oai':'uai'}) (2.4) ( ) ForPA soo oo so o so o ser er mo o mo o no o no o ngo o ngo o o o er mo/no/ngo mer/ner/nger Python S,U,D = splitsud(syllable) if S in {'m n ng'.split()} and U is 'o': rr = [S,U,D] else : rr = [S2S.get(S,S), U2U.get(U,U), D2D.get(D,D)] 220

Transcriber HTK Python CguAlign 1~10 (Combine001)[3] 221

txt mp3 HTK HTK wav Audacity[6] mp3 wav Sample rate 16000Hz txt wav Transcriber[5] trs CguAlign CguAlign CguAlign trs wav Output trs lrc sbv CguAlign 1.HTK 2. CguAlign 1. HTK[1] 222

(1) trs lab trs sil (2) lab lab (3) HtkTool 7 hled.led hled00.led hcopy.conf hinit.conf hrest.conf herest.conf hvite.conf (4) mlf HTK hled scp hled lst mlf mlf scp mlf dic hled mlf (Biphone) dic (5) lab mfc mfc HTK HTK hcopy hopy mfc scp hcopy wav mfc hcopy windows windows shift hcopy.conf hcopy os.system('hcopy -A -C hcopy.conf -S spwav2mfc.scp') mfc HMM HTK HCompV os.system('hcompv -A -C HCompV.conf -S spmfc.scp -m -I splab_p.mlf -M hmms_p/ -o '+m+' myhmmpro') mfc HCompV HMM splab_p.mlf (lab) spmfc.scp myhmmpro Mixture State HTK HERest mfc HERest HMM Phone 5 N=5 (6) HTK HVite HVite (Forced alignment) 223

SN0.mfc de_it_pinn_sil_qu_hi lab splab_p.mlf HVite mfc mlf HVite Input Output lab hled hcopy HCompV HERest HVite 224

CguAlign.py HTK 2. function( lrc sbv ) function lrc sbv lrc sbv output.lab.lab lab lrc sbv http://dl.dropbox.com/u/33089565/ryex007_3.html Transcriber Transcriber 1280 x 800 5 225

Transcriber HTK HTK μ = HTK N K N-K 226

HTK 0.06032 0.164033 759 HTK 0.06 0.16 0.16 0.06 HTK CguAlign CguAlign ( ) 78.67% ( ) 82.93% CguAlign 5.176 5.962 2. Works CguAlign 227

Youtube (CguTASync)https://dl.dropbox.com/u/36364100/wj.html Youtube[4] Youtube Combine001( 1~10 ) CC Combine001 Youtube CguTASync(CguTextAudioSynchronization) Firefox 228

kalaok Youtube fft CguTASync 229

UTF8 Transcriber 140 HTK Transcriber 0.16 0.06 HTK 140 11 83% YouTube [1] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason,D. Povey, V. Valtchev, and P. Woodland, The HTK book(for HTK version 3.4.1), Cambridge University Engineering Department,Tech. Rep., March. 2009. [2] (2011). Python ( LGO.py) [3] http://140.111.34.54/mandr/minna/first.html [4] Youtube http://www.youtube.com/ [5] Transcriber http://trans.sourceforge.net/en/presentation.php [6] Audacity http://audacity.sourceforge.net/ 230