2012, Vol. 44, No.3, 400412 Acta Psychologca Snca DOI: 10.3724/SP.J.1041.2012.00400 * (, 330022),,, b-str a-str Monte Carlo,, ; ; ; ; b-str B841 1 (Computerzed Adaptve Testng, CAT), (,,, 2002) Scence (Quellmalz & Pellegrno, 2009), (hgh stake), CAT CAT, CAT,, CAT, (, ) (Barrada, Olea, Ponsoda, & Abad, 2010) CAT,, (,,,, 2011) Fsher (MFI, Fsher ), (),,, (Chang & Yng,1996, 1999);,, CAT,,, a (a-stratfed, a-str) (Chang & Yng, 1996, 1999) a b (a-stratfed wth b blockng, b-str) (Chang, Qan, & Yng, 2001) CAT MFI, a Chang Yng (1999) a-str a (Cheng, Chang, Douglas, & Guo, 2009) a, CAT,,, : 20101021 * (30860084, 60263005, 31160203,31100756), (CCA110109), (09JJCXLX012, 10YJCXLX049), (GJJ11385). (2009JKS2009) :, E-maldng06026@163.com 400
3 401 a-str,, a,, a-str a (Cheng et al., 2009),, a-str, a, a /, MFIChang, Qan Yng (2001) b-str a-str b b,, b θ b, b-str (Meer & Nerng, 1999) ETS, () (Dodd, De Ayala, & Koch, 1995),, CAT (,,, 2010),,,, ; (,,,, 2006;,,,, 2006;,,, 2008;,,,,, 2008) Samema (1969) (GRM)(PCM) CAT, (HRQOL) GRM (Cho & Swartz, 2009) GRM,, (patent-reported outcomes, PROs) CAT Cho Swartz (2009), MFI(MLWI) (MPWI)(MEI) (MEPV) (MEPWI), MFI,, CAT 0-1 ; ; GRM,,, MFI, (van der Lnden, 1998), Penfeld (2006), (MPWI)(MEI), Penfeld CAT, CAT ( PCM) CAT, MPWI MEI 0-1 CAT CAT, /, (reducton of dmensonalty),, / /;, / /(, 2006;, 2006;, 2008) (2008),, (2008),,, //, Penfeld (2006) CAT (sequental
402 44 estmate), ; (2006),,,,,, b-str GRM, (2006) a (2006),, ( 5, )CAT,,, (),,, ( );,,, (IRT),, m I ( ˆ ) 1, ˆ, m, GRM CAT,, (Cho & Swartz, 2009),,, ; a-str b-str,,, (MFI ), 9,, 2 GRM 9, MFI, ; (2006)(2006), ; MFI a-str b-str, Md_b, Avg_b, Avg1-b CAT,, 2.1 (MFI),, (RAN), 2.2 GRM,,, f, b t t, Md _ b {b t,t =1,2,, f }, f {b t,t =1,2,, f }, f Avg 1 f _ b b t1 t f Avg 1 1 1_ b f t 2 t f 2 b, b t
3 403 ( GRM, 2, f-1 ) (2006)(2006) (, RDSI), Md _ b ( md) Avg _ b ( Avg) Avg1_ b ( Avg1), (2006), Avg1 Avg, Avg bt, 2.3 Chang (2001) b-str b-str, b-str, b-str GRM-CAT b-str,, CAT, b-str, b,, 2.3.1,, GRM, CAT () ; CAT,,,, I( ˆ ) ( ε), 1/ I( ˆ ), ˆ m I( ) I ( ˆ ) B 1, h(b ) B, 1 Sh { : h( B), } (1) S h,, hb ( ) S h θ, θ, ε, h(b ) ; ε, h(b ) ε, (I (θ) ), h(b ) θ b-str, θ, θ a B θ, (1) B (h(b )), (1), (1), h(b )=Md_b,, md-ε, S { : Md_ b } Md h(b )=Avg1_b, B, Avg1-ε, SAvg1 { : Avg1_ b } h(b )=b t, b t B, Sb {(, t): bt }, b t -ε, md-ε, Avg1-ε, b t -ε ( IESI) 2.3.2 f bt (2) 1 t f,,, f f b t 1 t (3) CAT,, mk ( ) a (4) k=1, 2,, KK, a-str, K=4 m (1)=2,
404 44 m (2)=3/2, m (3)=1/2, m (4)=0 mk ( ) a / I( ˆ ) (5), mk ( ) f f a ( ˆ bt I ) (6) t 1,, CAT, I( ˆ ), m (k), k=1, m (1)=2, θ, (5), MFI,,, CAT, k=4, m (k)=0,(6) b, M/30( M CAT ), m (k)=2; 5*M/30, m (k)=3/2; 14*M/30, m (k)=1/2; 14*M/30, m (k)=0 m (k) (, 2006) 14*M/30, m (k)=0, MFI (6), ( DC) MFI, 3 Monte Carlo, 3.1 N (p, q) p, q, U (p, q)[p, q] 4 (, 2006), 1000, {3, 4, 5, 6}( a, b) b ~ N(0,1),ln a ~ N (0,1) ; b ~ U( 3, 3), ln a ~ N(0,1) ; b ~ N(0,1), a ~ U (0.4, 2.5) ; b ~ U( 3,3), a ~ U(0.4, 2.5) ;, a [0.4, 2.5] 1000 N (0, 1) ( 1) 19, 1000 ( 2), CAT M ( M=16)(30 ) 3.2,, ˆ q X ( ) ( ) 1 kl Xk A X k k, q (q=2[ q LX ( ) AX ( ) m ], k 1 k k m (, 2002), m, q 2 f 1, q31)[x] x X k 1 36* (=0, 1, q-1), q 1 8 2 X AX ( ) * k k e / 2, q LX m f u ( k) P t 1 t0 f, P αt α t, u αt =1 α t, u αt =0 3.3 CAT CAT,,, ;, 0, ;,,, 1000, 9,, 9 (2002) 1000, 9 4 4.1 ABS N 1 ( ˆ )/ N t
3 405 RMSE Bas N 1 N 1 ˆ 2 ( ) / N ( ˆ )/ N N Nf n / 1 N 2 L ( ) 2 / L r 1 r r, r / r 1 L, N, L,, ˆ, n, r ABS, Bas, Bas, ; ABS RMSE, Nf,, 2 (Chang et al., 1996; 1999),,, CAT 4.2 1 2 CAT, 9, 2.2 2.3, RDSI md Avg1 ; IESI md-ε Avg1-ε b t -ε, SxS=RDSI IESI, SxS 4.2.1 9 ABSRMSE Bas 1 2, 1 9, CAT, ABSRMSE Bas 1,, RDSI ; RDSI IESI ; DC SxS, MFI ; RDSI bn (0, 1) bu (3, 3); DC, MFI 2, ( 0) 1 N (0, 1), 9 ABS RMSE bn (0, 1), lna N (0, 1) bu (3, 3), lnan (0, 1) bn (0, 1), au (0.4, 2.5) bu (3, 3), au (0.4, 2.5) ABS RMSE ABS RMSE ABS RMSE ABS RMSE md 0.164 0.206 0.173 0.217 0.166 0.209 0.181 0.228 md-ε 0.162 0.204 0.180 0.225 0.166 0.210 0.173 0.222 Avg1 0.163 0.210 0.182 0.229 0.167 0.211 0.175 0.219 Avg1-ε 0.162 0.206 0.178 0.222 0.168 0.212 0.179 0.226 0.165 0.207 0.170 0.216 0.164 0.208 0.180 0.227 b t-ε 0.173 0.217 0.175 0.222 0.171 0.216 0.180 0.226 DC 0.174 0.227 0.177 0.224 0.174 0.222 0.175 0.222 MFI 0.183 0.230 0.179 0.232 0.187 0.234 0.189 0.240 RAN 0.171 0.219 0.178 0.225 0.174 0.223 0.181 0.225 2 N (0, 1), 9 Bas bn (0, 1), bu (3, 3), bn (0, 1), bu (3, 3), lnan (0, 1) lnan (0, 1) au (0.4, 2.5) au (0.4, 2.5) md 0.002 0.006 0.002 0.009 md-ε 0.000 0.007 0.001 0.002 Avg1 0.005 0.006 0.011 0.004 Avg1-ε 0.008 0.004 0.000 0.009 0.005 0.003 0.006 0.006 b t-ε 0.005 0.002 0.003 0.004 DC 0.011 0.008 0.008 0.015 MFI 0.006 0.001 0.011 0.001 RAN 0.004 0.008 0.018 0.002
406 44 1 2, 2 9, CAT, ABS RMSE 1 2, 1 1 2,, bn (0, 1), 9 19, ; bu (3, 3), SxS,,, DC MFI,, DC 19 ABS RMSE MFI 4.2.2 9 Nf, 1 9, CAT, 3 (Nf), 3 1 19 ABS 2 19 RMSE
3 407 3 N (0, 1), 9 Nf bn (0, 1), bu (3, 3), bn (0, 1), bu (3, 3), lnan (0, 1) lnan (0, 1) au (0.4, 2.5) au (0.4, 2.5) md 18.637 18.720 12.687 12.432 md-ε 19.534 18.991 13.008 12.782 Avg1 20.346 19.054 13.190 12.995 Avg1-ε 20.072 19.644 13.160 13.216 21.357 18.478 11.794 12.237 b t-ε 20.003 18.563 13.327 12.664 DC 9.608 9.443 8.743 8.433 MFI 7.034 6.735 6.403 6.397 RAN 22.782 23.103 16.214 17.271 3 N (0, 1),, 3, RDSI,, md Nf,, md RDSI, Nf RDSI IESI,, b t -ε Nf,, Nf DC SxS, Nf, MFI,, RDSI IESI au (0.4, 2.5) Nf lnan (0, 1) 3,, DC SxS, MFI SxS, Nf RAN 4, 2 9, CAT, 4,, DC SxS, MFI SxS θ= 1.8θ= 0.9θ=0, θ=0.9, SxS θ=1.8, b t -ε, SxS DC MFI, SxS, SxS,
408 44 4, 5 4, 1.8 1.8, RAN 3, 4 RDSI IESI au (0.4, 2.5) Nf lnan (0,1), DC, 4.2.3 9 2 4, 1 9, CAT 2 4 DC md Avg1, MFI ; RDSI bu (3, 3), au (0.4, 2.5) ; RDSI IESI, RDSI IESI, b t -ε, b t -ε ; IESI, b t -ε RAN 2, 2 ;, Avg1-ε, b t -ε 5, 1 9, CAT,, 0% 100%, 10%,,,, 5, bn (0, 1), au (0.4, 2.5), MFI 50% 40%, 8 50% 50%60%, 50%, ;, MFI 30%
3 409 20%, 30% 30%40%, 30%, 4 N (0, 1), 9 2 bn (0, 1), lna N (0, 1) bu (3, 3), lnan (0, 1) bn (0, 1), au (0.4, 2.5) bu (3, 3), au (0.4, 2.5) md 27.743 12.018 17.701 6.967 md-ε 11.822 1.906 7.006 1.425 Avg1 29.284 8.290 16.598 5.592 Avg1-ε 18.573 1.467 10.910 1.296 72.468 55.418 35.535 30.905 b t-ε 1.754 2.953 1.403 1.970 DC 55.697 75.072 35.734 43.076 MFI 167.999 203.041 133.976 116.374 RAN 1.037 1.027 0.989 0.956 5 5 Monte Carlo (), (1)IESI RDSI, ABS RMSE, Nf, 2 ; SxS, ( 2.7, 2.4 ), ABS RMSE b U (3, 3),, ABS RMSE ; bn (0, 1),, ABS RMSE, ( 2.4, 2.7 ),, ; bu (3, 3), Nf, N (0, 1),,, md, b t -ε (2)DC,,, DC 2
410 44, ; DC (Nf ) MFI, ABSRMSE 2 MFI, 2 MFI, ; DC ABS RMSE MFI,, MFI (1) ABSRMSE b, SxS b b ; b, mdmd-εavg1avg1-ε, b b ; (2) a, SxS 6, a lna ; (RDSI), IESI 2,,,,, MFI, ;,, DC, f f t1 b t,, MFI ; DC, DC MFI,, 2, MFI b-str, 0-1 b-str, b-str ; a-str, a-str a-str, a-str, a-str, CAT CAT, Barrada (2010),, CAT ( CAT (overlap rates)) 0-1, (2011) 0-1 CAT, CAT, Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2010). A method for the comparson of tem selecton rules n computerzed adaptve testng. Appled Psychologcal Measurement, 34(6), 438 452. Chang, H. H., Qan, J. H., & Yng, Z. L. (2001). A-stratfed multstage CAT wth b-blockng. Appled Psychologcal Measurement, 25, 333 341. Chang, H. H., & Yng, Z. L. (1996). A global nformaton approach to computerzed adaptve testng. Appled Psychologcal Measurement, 20, 213 229. Chang, H. H., & Yng, Z. L. (1999). A-stratfed multstage computerzed adaptve testng. Appled Psychologcal Measurement, 23, 211 222. Chen, P., Dng, S. L., Ln, H. J., & Zhou, J. (2006). Item selecton strateges of computerzed adaptve testng based on graded response model. Acta Psychologca Snca, 38(3), 461 467. [,,,. (2006).., 38(3), 461 467.] Cheng, X. Y., Dng, S. L., Yan, S. H., & Zhu, L. Y. (2011). New tem selecton crtera of computerzed adaptve testng wth exposure-control factor. Acta Psychologca Snca, 43(2), 203 212. [,,,. (2011).., 43(2), 203 212.] Cheng, Y., Chang, H. H., Douglas, J., & Guo, F. M. (2009). Constrant-weghted a-stratfcaton for computerzed adaptve testng wth nonstatstcal constrants. Educatonal and Psychologcal Measurement, 69(1), 35 49. Cho, S. W., & Swartz, R. J. (2009). Comparson of CAT tem selecton crtera for polytomous tems. Appled Psychologcal Measurement, 33(6), 419 440. Da, H. Q., Chen, D. Z., Dng, S. L., & Deng, T. P. (2006). The comparson among tem selecton strateges of CAT wth multple-choce tems. Acta Psychologca Snca, 38(5), 778 783. [,,,. (2006).., 38(5),
3 411 778 783.] Dodd, B. G., De Ayala, R. J., & Koch, W. R. (1995). Computerzed adaptve testng wth polytomous tems. Appled Psychologcal Measurement, 19(1), 5 22. L, M. Y., Zhang, M. Q., & Zhu, J. X. (2010). Test securty methods n computerzed adaptve testng. Advances n Psychologcal Scence, 18(8), 1339 1348. [,,. (2010).., 18(8), 1339 1348.] Lu, Z., Dng, S. L., & Ln, H. J. (2008). Item selecton strateges for computerzed adaptve testng wth the generalzed partal credt model. Acta Psychologca Snca, 40(5), 618 625. [,,. (2008). GPCM., 40(5), 618 625. ] Luo, Z. S., Ouyang, X. L., Q, S. Q., Da, H. Q., & Dng, S. L. (2008). IRT nformaton functon of polytomously scored tems under the graded response model. Acta Psychologca Snca, 40(11), 1212 1220. [,,,,. (2008).., 40(11), 1212 1220.] Meer, R. R., & Nerng, M. L. (1999). Computerzed adaptve testng: Overvew and ntroducton. Appled Psychologcal Measurement, 23(3), 187 194. Penfeld, R. D. (2006). Applyng bayesan tem selecton approaches to adaptve tests usng polytomous tems. Appled Measurement n Educaton, 19(1), 1 20. Q, S. Q., Da, H. Q., & Dng, S. L. (2002). Prncples of modern educatonal and psychologcal measurement (PP. 141 142, 252 256). Beng, Chna: Hgher Educaton Press. [,,. (2002). (PP. 141 142, 252 256)., :. ] Quellmalz, E. S., & Pellegrno, J. W. (2009). Perspectve: Technology and testng. Scence, 323(2), 75 79. Samema, F. (1969). Estmaton of latent ablty usng a response pattern of graded scores. Psychometrka Monographs, 34(17). van der Lnden, W. J. (1998). Bayesan tem selecton crtera for adaptve testng. Psychometrka, 63(2), 201 216. Dynamc and Comprehensve Item Selecton Strateges for Computerzed Adaptve Testng Based on Graded Response Model LUO Fen; DING Shu-Lang; WANG Xao-Qng (School of Computer and Informaton Engneerng, Jangx Normal Unversty, Nanchang 330022, Chna) Abstract Item selecton strategy (ISS) s a core component n Computerzed Adaptve Testng (CAT). Polytomous tems can provde more nformaton about examnee compared wth dchotomous tems, and adoptng polytomously scored tems n test s a research drecton of CAT. As we know, the most wdely used ISS s the maxmum Fsher nformaton (MFI) crteron, whch rases concerns about cost-effcency of the pool utlzaton and poses securty rsks for CAT programs. Chang & Yng (1999) and Chang, Qan, & Yng (2001) proposed two alternatve tem selecton procedures, the a-stratfed method (a-str) and the a-stratfed wth b blockng method (b-str) based on dchotomous model, wth the goal to remedy the problems of tem overexposure and tem underexposure produced by MFI. However, the technology of a-str and b-str s statc because the tems are stratfed accordng to the gven nformaton at the begnnng of test. Based on graded response model (GRM), a technque of the reducton dmensonalty of dffculty (or step) parameters was employed to construct some ISSs recently. The lmtaton of ths dmenson reducton technque s that t loses a lot of nformaton. Thus, n order to mprove MFI, two new tem selecton methods are proposed based on GRM: (1) modfy the technque of the reducton dmensonalty of dffculty (or step) parameters by ntegratng the nterval estmaton; (2) dynamc a-str and dynamc b-str methods are mplemented n the testng process. On one hand, these new ISSs can avod and remedy the lmtatons of MFI and make good use of the advantages of the Fsher nformaton functon (FIF); FIF compresses all tem parameters and ablty parameters, so t s a comprehensve tool for all parameters n natureon the other hand, the new ISSs employ the property that FIF could represent
412 44 the nverse of the varance of the ablty estmaton, let ε be the square root of the recprocal of the Fsher nformaton, d be the absolute devaton between the estmate ablty and the functon of the parameters of an tem, whch may be chosen and could be changed durng the course of CAT, the nequalty of d<ε has the form of nterval estmaton, and ts utlty could be maged as a more flexble shadow tem pool. A smulaton study based on GRM was conducted. Four tem pools of dfferent structures were smulated, and 1000 examnees was generated and ther abltes were randomly drawn from the standard normal dstrbuton N (0,1). Each pool conssts of 1000 polytomous tems and the maxmum score of each tem was randomly selected from set {3, 4, 5, 6}. In ths paper, we assume the pror dstrbuton of ablty s standard normal and the Bayesan expected a posteror (EAP) s employed to estmate the ablty parameter. The CAT test stopped when the accumulatve nformaton satsfes the pre-determned value M (M=16) or reaches the pre-assgned test length 30. The results of the smulaton study show that the new tem selecton methods requred shorter test lengths and lower average exposure rates than the other methods, whle mantanng the accuracy of ablty estmaton. More specfcally, the new ISSs whch appled the dea of the nterval estmate were better than the correspondent ISS n terms of the Ch-square value. And the same effect appeared when comparng the dynamc a-str and dynamc b-str ISS wth MFI. Some mportant results are also found by comparng dfferent structure of tem pool. The accuracy of ablty estmaton and tem exposure rate were related to the dstrbuton of the dffcult parameters b, that s, the accuracy of ablty estmaton obtaned from the condton n whch b was sampled from N (0,1) was better than that when b was sampled from unform dstrbuton. The concluson of tem exposure rate s on the contrary. Also, the test length was related to the dstrbuton of the dscrmnaton parameter a, the test length requred by the condton n whch a was sampled from unform dstrbuton was shorter than that when the logarthm of a was sampled from N (0,1). In a word, n terms of controllng and balancng the tem exposure, the new ISSs may gan an advantage over the former correspondent ISS. Key words Graded Response Model (GRM); Computerzed Adaptve Testng (CAT); Item selecton; Interval estmaton; b-str based on polytomously scored model