2013 5 310 GLOBAL EDUCATION Vol. 42 No5 2013 本 研 究 以 上 海 市 初 中 英 语 学 业 考 试 为 例, 探 讨 了 运 用 项 目 反 应 理 论 改 进 考 试 评 分 的 可 行 性 合 理 性 和 优 越 性 对 比 四 种 能 力 评 分 方 式, 我 们 发 现 所 得 考 分 的 信 度 高 度 一 致 且 所 有 信 度 指 数 都 高 于 0. 9 卡 帕 系 数 也 体 现 了 以 不 同 考 分 为 基 础 的 毕 业 和 选 拔 决 策 的 一 致 性 这 些 都 体 现 了 运 用 项 目 反 应 理 论 的 可 行 性 考 分 和 测 量 误 差 的 分 布 以 及 与 国 外 两 个 研 究 结 果 的 对 比 则 验 证 了 项 目 反 应 理 论 在 常 规 考 试 实 践 中 的 合 理 性 和 优 越 性 学 业 考 试 ; 项 目 反 应 理 论 ; 评 分 ; 信 度 ; 效 度 ; 测 量 误 差 陈 芳 / 华 东 师 范 大 学 英 语 系 讲 师 ( 上 海 200241) 毕 笑 楠 / 上 海 市 教 育 考 试 院 助 理 研 究 员 ( 上 海 200235) 2009 National Council on the Measurement in Education Classical Test Theory or CTT 1968 Lord Novick Item Response Theory or IRT 1 100
2 2010 97604 2010 200 976 5 195 0 18 inter - rater intra - rater 101
3-4 1 Wang 5 conditional standard error of measurement or CSEM θ θ 1 32 quadrature points MULTILOG LEGS POLYCSEM equipercentile POLYCSEM 2010 2011 2010 97604 2011 94128 56 14 28 7 Reported Score 1 102
1 0 6 0 1 2 testlet 1 7 8 0 7 0 18 7 0 6 Kolen Lee 7 Samejima 8 7 6 2 1 7 6 1 2 θ θ scale IRT Score 0 7 simple sum weighted sum 1 1 2 2 2 3 6 1 1 2 2 Kolen Lee 9 2011 1 103
2 0. 990 108 27-2. 48 8. 66 0. 989 127 33-2. 33 7. 88 0. 987 121 32-2. 20 7. 31 0. 972 128 19-1. 85 6. 72 72 102 κ = 0. 944 p = 0. 000 0. 966 0. 965 0. 969 2010 2 1 U 1 2011 2010 104
0. 14 θ 2 2010 3 0 3 105
1 6 1 18 10 7 7 7 11 106
Kolen 12 Wang 13 Kolen Wang 14 Wang 1 Lord F. M. & Novick M. R. Statistical Theories of Mental Test Scores. M. Reading Mass. Addison - Wesley. 1968. 2. M. 2011. 3 5 13 14 Wang T. Kolen M. J. & Harris D. J. Psychometric Properties of Scale Scores and Performance Levels for Performance Assessments Using Polytomous IRT. J. Journal of Educational Measurement. 2000 37 2 141-162. 4 7 9 Kolen M. J. & Lee W. Psychometric Properties of Raw and Scale Scores on Mixed - format Tests. J. Educational Measurement Issues and Practice. 2011 30 2 15-24. 6 Birnbaum A. 1968. Some Latent Trait Models and Their Use in Inferring an Examinee's Ability. A. In. Lord F. M. & Novick M. R. eds.. Statistical Theories of Mental Test Scores. Reading Mass. Addison - Wesley. 8 Samejima F. Estimation of Latent Ability Using a Response Pattern of Graded Scores. J. Psychometrika Monograph Supplement. 1969. 10 Thissen D. & Wainer H. Test Scoring. M. Lawrence Erlbaum 2001. 11 Finn R. H. Effects of Some Variations in Rating Scale Characteristics on the Means and Reliabilities of Ratings. J. Educational and Psychological Measurement. 1972 32 2 255-265. 12 Kolen M. J. Zeng L. & Hanson B. A. Conditional Standard Errors of Measurement for Scale Scores U- sing IRT. J. Journal of Educational Measurement. 1996 33 2 129-140. 107 99
Pólya s Let Us Teach Guessing lesson. Journal of Mathematical Behavior 26 96-114. 9 Franke M. L. Webb N. M. Chan A. et al. 2009. Teacher questioning to elicit students mathematical thinking in elementary school classrooms. Journal of Teacher Education 60 4 380-392. 10 Cobb P. & Hodge L. L. 2011. Culture identity and equity in the mathematics classroom. A Journey in Mathematics Education Research 48 Part 5 179-195. Mathematics Education Library. 11 Imm K. & Stylianou D. A. 2012. Talking mathematically An analysis of discourse communities. Journal of Mathematical Behavior 31 130-148. 12 P. M.. 2002 283-294. 13 Gallego M. A. Cole M. & The Laboratory of Comparative Human Cognition. 2001. Classroom Cultures and Cultures in the Classroom. In Edited by Richardson V.. Handbook of Research on Teaching 4th Edition. pp. 957-962. American Educational Research Association Washington DC. Construction of a Content Framework for the Analysis of Mathematics Classroom Teacher - Student Conversation LIU Lanying Basic Education Development Center Shanghai Normal University Shanghai 200234 China Abstract Mathematics classroom instruction is a process of teacher - student interaction and mutual development. From teacher view the study constructs a content framework for the analysis of mathematics classroom teacher - student conversation by combining the sociocultural theory and literature research. This framework is made up of three dimensions mathematics classroom language mathematics meaning making and mathematics classroom culture and ten elements. The conclusion is of practical significance in the problem diagnosis of mathematics classroom teacher - student conversation and carrying out targeted teacher training programs. Key words mathematics classroom teacher - student conversation analysis content framework ( 责 任 校 对 : 王 荣 ) 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 檿 107 Improving the Scoring Procedures of an English Achievement Test with Item Response Theory CHEN Fang & BI Xiaonan English Department East China Normal University Shanghai 200241 China Shanghai Municipal Educational Examinations Authority Shanghai 200235 China Abstract This study aims to demonstrate the feasibility appropriateness and superiority of Item Response Theory to improve scoring procedures for an achievement test. Reliability coefficients of the test scores based on four different methods are found to be very close to each other and are all o- ver. 90. Kappa coefficients confirm consistent decisions based on the test scores and provide validity evidence. Comparison to two relevant studies in the US on the distribution of test scores and measurement error further supports the appropriateness and superiority of the Item Response Theory for test scoring. Key words achievement test Item Response Theory scoring reliability validity measurement error ( 责 任 校 对 : 王 荣 ) 99