现代汉语语料库基本加工规格说明书

Similar documents

國立中山大學學位論文典藏.PDF

基模在教學的應用

致谢本论文能得以完成, 首先要感谢我的导师胡曙中教授正是他的悉心指导和关怀下, 我才能够最终选定了研究方向, 确定了论文题目, 并逐步深化了对研究课题的认识, 从而一

Microsoft Word 李若鶯.doc

A.hóng B.jiàng A.sh n B.shàn

<4D F736F F D20D1A7C9FACAD6B2E1B8C4D7EED6D5A3A8B4F8B1EDB8F1BCD3D2B3C2EBB0E6A3A9372E3239>

桂林市劳动和社会保障局关于

第三章維修及管理

Microsoft Word 年度选拔硕博连读研究生的通知.doc

2006中國文學研究範本檔

壹：教育文化公益慈善機關或團體免納所得稅適用標準

Microsoft Word 定版

<D0D0D5FED7A8CFDF2E696E6464>

摘要 : 馬來西亞柔佛州首府柔佛巴魯, 其市中心有間百年的柔佛古廟每年的農曆正月二十日至二十二日, 柔佛古廟遊神是柔佛巴魯華人社會每年一度的盛事近年來, 其遊神聲勢

文憑試中國語文科練習卷評卷參考

Microsoft Word - 1王志宇.doc

1 引言大陆与台湾两地之间的交流与日剧增, 大量与台湾有关的信息进入了大陆居民的生活随着交流的不断深入, 我们发现台湾国语和我们所使用的普通话存在一定的差别台

Microsoft Word G...doc

/ J J J J See HUAN Q Z.

NCUE Journal of Humanities Vol. 5, pp March 2012 Theme Awareness and Writing Skills on the Image of Teacher in Chen Heng-jia s Fiction Lian-pe

1 科学谋划, 有序促进扶贫工作的持续发展 1.1 科学定位, 精准发现地方的需求按照国家生态功能区的划分, 库伦旗属重点生态保护开发区这里生态环境优良特色作物资源优势

東方設計學院文化創意設計研究所

<4D F736F F D203820BDD7A848B171A4E5AABAA55FA8CAAED1BC6720B071C541B5BE2E646F63>

<4D F736F F D C2E0BEC7A6D2A4ADB14DB0EAA4E52DB8D5C344A8F72E646F63>

12 () 2009,,,,,,,,,,:,,,,,,, :,,,,,,,,,,,,,,,,?, :,,,, :,,,,,,,,,,,?,,,,,,,,, :,:,,,,,,,,,,,, :,,,,,,, :,,,,,,,,,,,,,,,,:1986,63 :, 654,:1990,2945,:19

a a a 1. 4 Izumi et al Izumi & Bigelow b

《中文信息学报》投稿模版

Microsoft Word - A doc

國史館館刊第 23 期 Chiang Ching-kuo s Educational Innovation in Southern Jiangxi and Its Effects ( ) Abstract Wen-yuan Chu * Chiang Ching-kuo wa

48 東華漢學第20期 2014年12月後卿由三軍將佐取代此後中大夫極可能回歸原本職司由於重要性已然不再故而此後便不見中大夫記載於左傳及國語關鍵詞左傳中大夫里克丕鄭卿

raw corpus a 129

Microsoft Word - D-2°w¶Ë¬ì¹ï¤U�Iµh®{¤âÀË¬d¬yµ{_¬x°ö�×__P _.doc

1 1. M J M M J M J M M J

has become a rarity. In other words, the water resources that supply the needs in Taiwan depend crucially on the reservoirs built at least more than t

% Gorgoroon E. H. Blair and J. A. Robertson The Philippine Island

186 臺灣學研究. 第十三期民國一一年六月壹前言貳從廢廳反對州廳設置到置郡運動參地方意識的形成與發展肆結論 : 政治史的另一個面相壹前言長期以來, 限於史料的限制

,,,,,,,,,,,,,,,,,,,,,,,,,, :? ,, :,,?,, : ; ; ; 2003,,, 196,,,,,,, 10,,,,,? 77

硕士学位论文论文题目 : 北岛诗歌创作的双重困境专业名称 : 中国现当代文学研究方向 : 中国新诗研究论文作者 : 奚荣荣指导老师 : 姜玉琴 2014 年 12 月

增刊谢小林, 等. 上海中心裙房深大基坑逆作开挖设计及实践 745 类型, 水位埋深一般为地表下.0~.7 m 场地地表以下 27 m 处分布 7 层砂性土, 为第一承压含水层 ; 9 层砂性土

１對外華語文詞彙教學的策略研究_第三次印）.doc

<4D F736F F D203720A4E8AA46BEF0A16DAC4CAC4EB8E2A8A5A16EB5FBB3B3B8D620B169AB57ACD52E646F63>

The Melancholy South - Sun Yuan-heng s Writings on Formosan Products and Climate in Chi-Kan Collections and Their Internal Implication Shih Yi-Lin Pro

<4D F736F F D205F FB942A5CEA668B443C5E9BB73A740B5D8A4E5B8C9A552B1D0A7F75FA6BFB1A4ACFC2E646F63>

Microsoft Word - 04呂素端83-120

Microsoft Word doc

我国原奶及乳制品安全生产和质量安全管理研究

Microsoft Word doc

Microsoft Word - TIP006SCH Uni-edit Writing Tip - Presentperfecttenseandpasttenseinyourintroduction readytopublish

180 漢學研究第 31 卷第 4 期一前言河豚於史載甚早, 東漢王充 ( 西元年 ) 在論衡裡, 已將河豚視為毒魚 1 南北朝時期, 開始有人把河豚剁成魚醬供給日常食用, 漸成老饕

叶急诊室故事曳医疗电视真人秀与传统方式在医学科普中的作用比较研究掖掖掖王韬曾荣朱建辉方秉华 / 研究论文阴 2014 年 12 月 26 日起, 大型医疗纪实性真人秀急诊室故

Microsoft Word - 100碩士口試流程

180 中南大学学报 ( 社会科学版 ) 2013 年第 19 卷第 1 期乐府诗集相和歌辞相和曲下陌上桑 : 蚕饥妾复思, 拭泪且提筐值得注意的是, 农书齐民要术中拭的使用范围很广, 不但

2005 5,,,,,,,,,,,,,,,,, , , 2174, 7014 %, % 4, 1961, ,30, 30,, 4,1976,627,,,,, 3 (1993,12 ),, 2

Microsoft Word - 24.doc

60 教育資料集刊第四十五輯 2010 各國初等教育 ( 含幼兒教育 ) The Centennial Change from Imitation to Innovation : A Strategic Adjustment in the Reform of C

Microsoft Word - A doc

Microsoft Word - 10-朴庸鎮+徐真賢.doc

Microsoft Word - 10_Anna Seo_setting

Abstract Since 1980 s, the Coca-Cola came into China and developed rapidly. From 1985 to now, the numbers of bottlers has increased from 3 to 23, and

Microsoft Word - 第四組心得.doc

跨越文藝復興女性畫像的格局—

Transcription:

TP391 The Basic Processing of Contemporary Chinese Corpus at Peking University SPECIFICATION YU Shi-wen DUAN Hui-ming ZHU Xue-feng Bing SWEN (Institute of Computational Linguistics, Peking University, Beijing, 100871) Abstract: The Institute of Computational Linguistics, Peking University has completed the basic processing of a contemporary Chinese corpus that has 27 million Chinese Characters. In addition to word segmentation and part-of-speech tagging, the processing involves the tagging of proper nouns (person names, place names, organization names and so on), morpheme subcategories and the special usages of verbs and adjectives. The success of this large-scale language engineering is attributed to the SPECIFICATION, which had been made beforehand and was being perfected while in use. We are hereby making an introduction to the SPECIFICATION through this publication, thus inviting the comments from all the experts and our colleagues for the improvement of it. Keywords: contemporary Chinese; corpus; word segmentation; part-of-speech tagging; specification 69483003973 G19980305074863 985 1938 12 1957 12 1937 12 1968 10 4 1

2

3

4

* 5

6

7

8

9

10

> 11

12

13

14

15

16

17

18

19

20

21

22

23