Automatic Time Alignment for a Taiwanese Read Speech Corpus and its Application to Constructing Audiobooks with Text-Speech Synchronization 140 11 83% Youtube (force alignment) 214
[3] 140 7 2 Wav 1.21G Sample rate 16kHz 681.37 133 15923 139271 ForPA Transcriber HTK (Syllable) Transcriber 0.16 0.06 HTK HTK 95% 1. tsoh-sit 215
2. e-thôo-lâng 3. kah but 4. kah tīnn-khó-x tsoh-site-thôo-lâng buttīnn-khó-x kah kah kah ô â ī Unicode Ascii/Big5 : 1. 2. 3. TL ForPA 1. ' sing5 ' [' ', ' ', ' ', ' ', ' ', ' ', ' ', 'sing5', ' ', ' ', ' ', ' ', ' ', ' '] (regular expression) python re.split('([ - ] [a-za-z]+\d*)', ) re Python regularexpress Python unicode \u4e00 unicode \ufa2d Python >>> jj=' sing5 ' >>> re.split('([ - ] [a-za-z]+\d*)',jj) ['', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', ' ', 'sing5', ' ', ' ', '', ' ', '', ' ', '', ' ', ' '] ( ) 216
>>> ii=' sîng ' >>> re.split('([ - ] [a-za-z]+\d*)',ii) ['', ' ', '', ' ', ' ', 's', 'î', 'ng', ' ', ' ', '', ' ', '', ' ', '', ' ', ' '] sîng Python (LGO.py) hunhl [2] >>> ii=' sîng ' >>> hunhl(ii) [' ', ' ', ' ', ' ', ' ', ' ', ' ', 'sîng', ' ', ' ', ' ', ' ', ' ', ' '] sîng 2. e-thôo-lâng e1-thoo5-lang5 = 'ô â a ō î' = 'o a a o i' = '5 5 8 7 7' >>> tobsr('siâ') 'sia5' >>> tobsr('dâ') 'da5' 3. ForPA TL ForPA (1) (Python hunsu) (2) ( Python Dict ) TL 'thoo5' ForPA >>> s,u,d = splitsud( thoo5') #S,U,D >>> [S2S.get(s,s), U2U.get(u,u), D2D.get(d,d)] 'to5' 217
splitsud S2S, U2U, D2D Python (Python dict) (2.1) re.split('([aeiou][a-za-z]*)(\d*)',syllable) syllable = song4 => ['s', 'ong', '4', ''] ForPA TL ForPA TL m ng m ng hm bng png mng hm png phng mng dng tng nng tng thng nng gng kng hng kng khng hng zng cng sng chng chhng sng ()(ng m) (2.2) >>> splitsud('song4') ['s', 'ong', '4'] >>> splitsud('siong2') ['s', 'iong', '2'] >>> splitsud('sng5') ['s', 'ng', '5'] >>> splitsud('sng') ['s', 'ng', ''] >>> splitsud('ng') ['', 'ng', ''] >>> splitsud('oai') ['', 'oai', ''] >>> splitsud('oe2') ['', 'oe', '2'] >>> splitsud('dfgsfd3') ['', 'dfgsfd', '3'] >>> splitsud(' ') ['', ' ', ''] ForPA ForPA Python dict 218
Python S2S = {k:v for k,v in zip(tlsvor, ForPaSVOR)} U2U = {k:v for k,v in zip(tluvor, ForPaUVOR)} D2D = {k:v for k,v in zip(tldiau, ForPaDiau)} #dict{...} will map from key to value >>> [S2S.get(S,S), U2U.get(U,U), D2D.get(D,D)] Python ['th', 'oo','5'] Python Dict ['to5'] 219
(2.3)(2.4) (2.3) TL TL ForPA ts ch z tsh chh c oe ue ue oa ua ua oai uai uai Python S2S.update({'ts':'z','tsh':'c'}) U2U.update({'oe':'ue','oa':'ua','oai':'uai'}) (2.4) ( ) ForPA soo oo so o so o ser er mo o mo o no o no o ngo o ngo o o o er mo/no/ngo mer/ner/nger Python S,U,D = splitsud(syllable) if S in {'m n ng'.split()} and U is 'o': rr = [S,U,D] else : rr = [S2S.get(S,S), U2U.get(U,U), D2D.get(D,D)] 220
Transcriber HTK Python CguAlign 1~10 (Combine001)[3] 221
txt mp3 HTK HTK wav Audacity[6] mp3 wav Sample rate 16000Hz txt wav Transcriber[5] trs CguAlign CguAlign CguAlign trs wav Output trs lrc sbv CguAlign 1.HTK 2. CguAlign 1. HTK[1] 222
(1) trs lab trs sil (2) lab lab (3) HtkTool 7 hled.led hled00.led hcopy.conf hinit.conf hrest.conf herest.conf hvite.conf (4) mlf HTK hled scp hled lst mlf mlf scp mlf dic hled mlf (Biphone) dic (5) lab mfc mfc HTK HTK hcopy hopy mfc scp hcopy wav mfc hcopy windows windows shift hcopy.conf hcopy os.system('hcopy -A -C hcopy.conf -S spwav2mfc.scp') mfc HMM HTK HCompV os.system('hcompv -A -C HCompV.conf -S spmfc.scp -m -I splab_p.mlf -M hmms_p/ -o '+m+' myhmmpro') mfc HCompV HMM splab_p.mlf (lab) spmfc.scp myhmmpro Mixture State HTK HERest mfc HERest HMM Phone 5 N=5 (6) HTK HVite HVite (Forced alignment) 223
SN0.mfc de_it_pinn_sil_qu_hi lab splab_p.mlf HVite mfc mlf HVite Input Output lab hled hcopy HCompV HERest HVite 224
CguAlign.py HTK 2. function( lrc sbv ) function lrc sbv lrc sbv output.lab.lab lab lrc sbv http://dl.dropbox.com/u/33089565/ryex007_3.html Transcriber Transcriber 1280 x 800 5 225
Transcriber HTK HTK μ = HTK N K N-K 226
HTK 0.06032 0.164033 759 HTK 0.06 0.16 0.16 0.06 HTK CguAlign CguAlign ( ) 78.67% ( ) 82.93% CguAlign 5.176 5.962 2. Works CguAlign 227
Youtube (CguTASync)https://dl.dropbox.com/u/36364100/wj.html Youtube[4] Youtube Combine001( 1~10 ) CC Combine001 Youtube CguTASync(CguTextAudioSynchronization) Firefox 228
kalaok Youtube fft CguTASync 229
UTF8 Transcriber 140 HTK Transcriber 0.16 0.06 HTK 140 11 83% YouTube [1] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason,D. Povey, V. Valtchev, and P. Woodland, The HTK book(for HTK version 3.4.1), Cambridge University Engineering Department,Tech. Rep., March. 2009. [2] (2011). Python ( LGO.py) [3] http://140.111.34.54/mandr/minna/first.html [4] Youtube http://www.youtube.com/ [5] Transcriber http://trans.sourceforge.net/en/presentation.php [6] Audacity http://audacity.sourceforge.net/ 230