27 2013 1 * + XML + XML 1. 1 interpretative linguistic 2002 139-140 2005 69 HSK 1. 2 * 11BYY054 12JZD018 128
2000 1996 1 2 3 4 5 6 raw corpus 2008 2002 141 1996 2011 2010a 129
27 2013 1 + 2008 2010b 2010c + 2008 + 3. 1 5 3. 2 3. 2. 1 3. 2. 2 1996 2002 130
2011 3. 2. 3 2010 HSK XML 131
27 2013 1 3. 2. 4 HSK 2011 HSK XML XML 2007 XML 3. 3 3. 3. 1 1 2 3 4 5 6 7 3. 3. 2 3. 3. 2. 1 132 1 3 2001 2 4 5 6 7 8 20
9 3 3 5 6 7 8 20 9 9 1994 174 1999 15 1 1-4 5 8 6 7 9 3. 3. 2. 2 HSK 4 133
27 2013 1 3. 4 3. 4. 1 3. 4. 1. 1 99% 2002 145 1% 2008 2010 2008 2010b 3. 4. 1. 2 10 134
1 10 2 10 10 L CQ CJ-sy CJy CJdw 11 CJxw 12 CC 11 12 3. 4. 2 HSK 10 L L CQ CQ CJ-sy - CJy CJdw CLEC + cc3 1 - cc3 1-2002 68-70 2007 135
27 2013 1 XML 13 /v /r < syn > /p /r /q /n /v < /syn = > /w XML 1 XML 2007 XML XML 2002 2 HSK CJX 14 CJX 15 CJX 16 1 CJX 14-16 1 XML 14 < order > < /order > 15 < order > < /order > 16 < order > 1 < /order > 3 XML W3C World Wide Web Consortium 1996 SGML 2004 SGML Standard Generalized Markup Language 1986 SGML 1995 SGML declaration 1998 SGML XML 136 XML 1
XML XML XML XML XML 2 HSK CJX 17 CJX 18 CJX 17 18 XML 17 < order > < /order 7 > 18 < order > < /order 4 > 17 7 18 4 XML XML XML 3. 5 3. 5. 1 1 2 3 4 5 3. 5. 2 137
27 2013 1 XML 2007 19 /r < word > /n /vg < /word = /v > /nz /m /q /y /w 20 /r /d /v /n /w /d < syn > /v < /syn = > /n / u /n /v /vg /r /q /n /w 19 < syn > < syn > /r < /syn = 1 > < syn > /v /nz < /syn = 2 > < syn > /m /q < /syn = > /y /w < /syn = > 20 < syn > < syn > /r < /syn = > < syn > /d < /syn = > < syn > /v < /syn = > < syn > /n < /syn = > /w < /syn = > < syn > < syn > /d /v /v /n < /syn = > /u < syn > / n < /syn = > < syn > /v < /syn = > < syn > /vg < /syn = > < syn > /r /q < /syn = > < syn > /n < /syn = > /w < /syn = > + XML + + 1996 2000 1 1998 SGML 4 2002 138
2007 3 2001 1 2002 1996 3 1996 1994 2010 HSK 6 2002 XML 2 2010 2005 2008 HSK 2011 2 2004 SGML HTML XML 1 2008 2 2002 2008 2010a HSK 2 2010b 3 2010c 2008 1999 2011 Re-considering the Modes of Annotation of All-purpose Chinese Interlanguage Corpus Zhang Baolin Abstract This paper mainly focuses on the annotation modes of corpus and redefines and explains the error annotation + base annotation mode based on the analysis and summary of the status quo of the annotation of Chinese interlanguage corpus. It suggests that 1 in the respect of content semantic annotation and pragmatic annotation should be employed and annotation for discourse and styles should be enhanced and 2 in the respect of annotation method for one and the same error multiple-annotation should be limited. It expounds the advantages feasibility 139
27 2013 1 and problems that exist in the application of XML an extendible markup language to the error annotation + base annotation mode. Keywords all-purpose Chinese interlanguage corpus annotation modes of corpus XML 1958 40 100083 2013 Linguistics Institute of China LINC 2013 7 15 8 9 2013 30 1 2 I 3 I 4 I 80 800 10 100% 11 20 50% 1 2 2013 4 20 5 20 linc_2011@ yahoo. com 2013 7 14 7 15 140