項目無反應資料的多重插補 * 分析 劉正山 ** 莊文忠 *** 壹 前言貳 遺漏資料的意涵與處理參 遺漏資料的插補方法肆 資料來源 工具與研究設計伍 結果與分析陸 結論與討論 * 2012 TEDS2012 2012 11 3-4NSC 100-2410-H-128-001-MY2 ** ***
276 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 壹 前言 (missing data) (unit nonresponse) (item nonresponse) 1 Filion(1976, 482) (imputation) (multiple imputation) 不會 1 Lohr(2010, 329) (missing data) (nonresponse)
第八章項目無反應資料的多重插補分析 277 Liu, 2010 (TEDS) 2004 2008 2012 ( ) 貳 遺漏資料的意涵與處理 (incomplete data) Little Rubin(2002) 1997 (Berinsky 2008; Dillman et al. 2002;
278 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 1997 2004)(unit nonresponse) (item nonresponse) 2003, 2 2011, 147 2012, 3 Lohr(2010, 330) (1) (2) (3) (4) Lohr
第八章項目無反應資料的多重插補分析 279 (nonresponse bias) Hair (2010, 46-50) (1) (ignorable) (not ignorable) 2 (2) (3) (Missing Completely at Random, MCAR) (Missing at Random, MAR) (not missing at random, NMAR) (4) (pattern) (amount) 2003, 2.11 2 (ignorable) (1) (2) (3) (not ignorable) (1) (known) (2) (unknown) (Hair et al. 2010, 44)
280 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 (Filion 1976, 484) (Chen and Shao 1999; Mason, Lesser, and Traugott 2002; 1997) 1995 (seductive and dangerous) (Hair et al. 2010, 50) Hair (2010, 42)
第八章項目無反應資料的多重插補分析 281 參 遺漏資料的插補方法 Rubin(1987) Little Rubin(2002) (single imputation) (multiple imputation) 2012, 8 2011, 150 (auxiliary variables) 2008 1997 2011 Lohr (2010, 351) 3 3 (deductive imputation) 1 2
282 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 一 完整個案分析 (complete case analysis) 2003, 2.12 (Hair et al. 2010, 51) (list-wise deletion) (Hair et al. 2010; Little and Rubin 2002; Peugh and Enders 2004; 1997 2003 2012) (MCAR) (pair-wise deletion) (all available data analysis) (eigenvalues) (Hair et al. 2010; 1997 2003) 2(Lohr 2010, 347)
第八章項目無反應資料的多重插補分析 283 (NMAR) 二 平均值插補 (mean imputation) (Lohr 2010; Hair et al. 2010) (Lohr 2010, 348) 2003, 2.14 (mode imputation) 三 熱卡 / 冷卡插補 (hot/ cold deck imputation)
284 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 (imputation cell) 1997 2003 (exhaustive) (exclusive) (homogeneous) X Y X Z X (Marker et al. 2002, 329-330) (nearest-neighbor hot-deck imputation) (Lohr 2010, 349) Fix Hodges(1951) (nearest neighbor rule) n (x 1, θ 1 ),(x 2, θ 2 ),...,(x n, θ n ) x i i θ i i (x, θ) x n x 2011, 151-152 4 4 (distance) (dissimilarity) Kaufman Rousseeuw(1990) d(i, j) i j0 1 d f i j f =1,2,...,p d (1) f (binary) (nominal) (2) f (interval) (3) f (ordinal) (ratio)
第八章項目無反應資料的多重插補分析 285 Marker (2002, 230) (1) (2) (3) MSE 5 Hair (2010, 53) 四 個案替代 (case substitution) (Hair et al. 2010, 53) 2011, 151-152 5 R StatMatch ( http://cran.r-project.org/web/packages/statmatch/statmatch. pdf) VIM (http://cran.r-project.org/web/packages/vim/index.html)
286 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 五 模型基礎法 (model-based methods) Hair (2010, 50-51) (MAR) (Maximum likelihood method) EM E M E M (consist) (efficient) 1997 1 0 六 迴歸插補 (regression imputation) (Hair et al. 2010; 1997 2003) m Z Z (X 1, X 2...X k )
第八章項目無反應資料的多重插補分析 287 m logistic (discriminant function analysis) 2003 (1) (2) (3) ( ) (4) (5) ( 11 ) (6) (Hair et al. 2010; 1997 2003) 七 多重插補 (multiple imputation) Rubin 1978 Little Rubin m (m > 2)
288 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 (expectation maximization, EM) 6 (Markov Chain Monte Carlo, MCMC) 7 EM E (expectation step) M (maximization step) 2012, 10 m m 6 Dempster Laird Rubin(1977) ( ) E E M 2012, 10 7 MCMC Tanner Wong(1987) (data augmentation) (stationary distribution) m 2011, 151 MCMC (Bayesian inference) (prior probability)(posterior probability) (Bayesian approach) (prior distribution) (prior probability) θ 2011, 151 MCMC MCMC (conditional posterior distribution) 5% 2012, 10-11
第八章項目無反應資料的多重插補分析 289 m 10 m 3 10(Rubin 1987; 2011,151) Hair (2010, 54) (composite estimate) Lohr(2010, 351) Lohr 1997 2011, 170 8.1
290 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 表 8.1 遺漏資料的插補技術比較 插補方法優點缺點最佳使用時機 / SPSS EM
第八章項目無反應資料的多重插補分析 291 表 8.1 遺漏資料的插補技術比較 ( 續 ) 插補方法優點缺點最佳使用時機 Hairs et al. (2010, 55) 肆 資料來源 工具與研究設計 (TEDS) TEDS 2001 2012 TEDS 8 8 TEDS 2004 (NSC93-2420-H-004-005-SSS) 2008 (NSC96-2420-H-002-025) 2012 (NSC100-
292 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 TEDS 2004 (TEDS2004P, N = 1,823) 2008 (TEDS2008P, N = 1,905) 2012 (TEDS2012, N = 1,826) 2004 3 20 TEDS2004P 2004 7 9 2008 3 22 TEDS2008P 2008 6 8 2012 1 14 TEDS2012 2012 1 3 (raking) 9... 10 R 2420-H-002-030)TEDS http://www.tedsnet.org 9 TEDS2004 91.23% (74.38%) TEDS2008 88.37% (80.28%) TEDS2012 89.36% (74.38%) 10 SPSS19.0
第八章項目無反應資料的多重插補分析 293 Gary King Amelia II Amelia II EMis(Expectation Maximization with importance re-sampling) (Markov chain Monte Carlo MCMC) chain equations (Honaker, King, and Blackwell 2009, 2011) (graphical user interface, GUI) Amelia II 8.1 圖 8.1 Amelia II 操作介面 ( 以 TEDS2012 為例 )
294 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 TEDS 10 R 2004 2008 2012 10 11 12 8.1 8.3 TEDS2004P TEDS2008P TEDS2012 2004 67% TEDS2008P 33% TEDS2012 27% 2004 17% TEDS2008P 14% TEDS2012 12% 11 Gary King R Zelig Amelia (Imai, King, & Lau, 2004) R 12 20122004 TEDS 2008 2012
第八章項目無反應資料的多重插補分析 295 表 8.1 投票選擇的多重插補變數列表 (TEDS2004P) 變數變數全稱變數代碼遺漏數 Vote 2004 VH1B 314 partyid VH7A 1218 likekmt VP2A 147 likedpp VP2B 144 likepfp VP2C 177 ChengScale VK11A 108 RuiScale VK11B 132 LienScale VK11C 160 SoonScale VK11D 158 retroecon VD01 204 prosecon VD02 373 incscale VE01 164 demscore VF01 204 age1 VS01 0 Eth VS02 43 2004 N = 1,656
296 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 表 8.2 投票選擇的多重插補變數列表 (TEDS2008P) 變數變數全稱變數代碼遺漏數 Vote 2008 H1a 240 partyid N1b 550 likekmt N2 134 likedpp N2a 146 HsiehScale J6a 145 SuScale J6b 156 MaScale J6c 147 Siew Scale J6d 168 tvnews A3 207 retroecon E1 60 prosecon E2 319 incscale C1 145 demscore F2 152 age1 S1 0 Eth S2 15 2008 1. N = 1,680 2. tvnews 0
第八章項目無反應資料的多重插補分析 297 表 8.3 投票選擇的多重插補變數列表 (TEDS2012) 變數變數全稱變數代碼遺漏數 Vote 2012 H01a 201 partyid Q01b 439 likekmt Q02 106 likedpp Q02a 122 likepfp Q02b 209 TsaiScale J02a 120 SuScale J02b 168 MaScale J02c 107 WuScale J02d 132 SoonScale J02e 157 LinScale J02f 303 tvnews A03 201 retroecon E01 46 prosecon E02 260 incscale C01 105 demscore F05 78 age1 S01 0 Eth S02 16 2012 1. N = 1,629 2. tvnews 0 伍 結果與分析 8.4
298 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 TEDS2004P 314 48% 49% TEDS2008P (53%) (47%) TEDS2012 201 47% 43% 10% 表 8.4 投票選擇 未答者的投票偏好經過多重插補後所呈現的分配情形 (%) 資料檔 TEDS2004P TEDS2008P TEDS2012 47.92 (0.01) 48.72 (0.01) 47.33 (0.03) 52.67 (0.03) 43.53 (0.05) 46.92 (0.05) - 9.55 (0.02) (N) 314 240 201 2004 2008 2012 8.5 8.5 (1) (2) (3) 8.2
第八章項目無反應資料的多重插補分析 299 8.5 TEDS2004P 2008 2012 TEDS2012 (0.63) 10 0.01 8.5 8.2 2012 0.75 8.94% TEDS2008P 7.33% TEDS2004P 2004 2008 表 8.5 使用多重插補法前後得票率估算結果比較 (%) 年度 2004 2008 2012 政黨插補前插補後開票插補前插補後開票插補前插補後開票 45.41 45.61 (0.004) 54.59 51.60 (0.004) 49.89 62.57 53.63 (0.004) 50.11 37.43 39.61 (0.004) 58.45 58.82 56.94 (0.006) 41.55 38.52 39.55 (0.004) - - - - - 2.66 3.51 (0.003) 51.60 45.63 2004 2008 2012 1. 2. 10 100% 2.76
300 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 2004 2008 2012 圖 8.2 使用多重插補法前後得票率估算結果比較圖 2012 陸 結論與討論 (1) (2) TEDS
第八章項目無反應資料的多重插補分析 301 TEDS (Liu 2010) TEDS 100% (MAR)
302 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 (MCAR) (NMAR) MAR R Stata MICE Amelia II MICE TEDS2012 TEDS
第八章項目無反應資料的多重插補分析 303 參考文獻 I. 中文部分 1995 2008 2005 2008 (III) 2008 NSC 96-2420-H-002-025 ------ 2012 2009 2012 (3/3) 2012 NSC 100-2420-H-002-030 2004 2000 11(2): 111-131 2003 LISREL 2008 10: 19-39 1997 3: 75-106 2008 2005 2008 (IV) 2008 NSC 96-2420-H-004-017 2004 2002 2004 (III) NSC 92-2420-H-031-004 2012 2009 2012 (3/3) 2012 NSC 100-2420-H-002-030 2012 59(1): 1-32 2011
304 臺灣選舉與民主化調查 (TEDS) 方法論之回顧與前瞻 47: 143-178 2004 2002 2004 (IV) NSC 93-2420-H- 004-005-SSS II. 英文部分 Berinsky, Adam J. 2008 Survey non-response. In The Sage Handbook of Public Opinion Research,eds. Donsbach Wolfgang and Michael W. Traugott. Los Angeles: Sage. Chen, Yinzhong, and Jun Shao. 1999. Inference with Survey Data Imputed by Hot Deck When Imputed Values are Nonidentifiable. Statistica Sinica 9: 361-384. Dempster, A. P., Laird, N. M., and Rubin Donald B. 1977 Maximum Likelihood From Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, 39(1): 1-38. Dillman, Don A., John L. Eltinge, Robert M. Grove, and Roderick J. A. Lilttle. 2002 Survey Nonresponse in Design, Data Collection, and Analysis. In Survey Nonresponse, eds. Robert M. Grove, Don A. Dillman, John L. Eltinge, Roderick J. A. Little, New York: John Wiley & Sons, Inc. Filion, F. L. 1976. Estimating Bias Due to Nonresponse in Mail Surveys. The Public Opinion Quarterly 39(4): 482-492. Fix, Evelyn, and J. L. Hodges. 1951 Discriminatory Analysis-Nonparametric Discrimination: Consistency Properties. Project 21-49-004, Report NO.4, US Air Force School of Aviation Medicine, Randolph Field. Hair, Jr. Joseph F., William. C. Black, Barry J. Babin, and Rolph E. Anderson. 2010, Multivariate Data Analysis: A Global Perspective 7 th ed. New Jersey: Prentice Hall. Honaker, James, King Gary, and Blackwell Matthew. 2009. Amelia Software Web Site. http:// gking.harvard.edu/amelia (accessed April 24, 2009). Honaker, James, King Gary, and Blackwell Matthew. 2011. Amelia II: A Program for Missing Data. Journal of Statistical Software 45(7): 1-47. Imai, Kosuke, Gary King, and Olivia Lau. 2004. Zelig: Everyone s Statistical Software. http://gking.harvard.edu/zelig
第八章項目無反應資料的多重插補分析 305 Kaufman, Leonard, and Peter J. Rousseeuw. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley and Sons, Inc. Little, Roderick J. A., and Rubin, Donald B. 2002. Statistical Analysis with Missing Data 2 nd ed. New York: John Wiley & Sons. Liu, Frank C.S. 2010. Reconstruct Partisan Support Distribution with Multiply Imputed Survey Dada: A Case Study of Taiwan s 2008 Presidential Election. Survey Research 24: 135 162. Lohr, Sharon L. 2010. Sampling: Design And Analysis 2 nd ed. MA: Duxbury Press. Marker, David A., David R. Judkins, and Marianne Winglee. 2002. Large-Scale Imputation for Complex Surveys. In Survey Nonresponse, eds. Robert M. Groves, Don A. Dillman, John L. Eltinge, Roderick J. A. Little. New York: John Wiley & Sons. Mason, Robert, Virginia Lesser, and Michael W. Traugott. 2002. Effects of Item Nonresponse on Nonresponse Error and Inference. In Survey Nonresponse, eds. Robert M. Groves, Don A. Dillman, John L. Eltinge, Roderick J. A. Little. New York: John Wiley & Sons. Peugh, James L., and Craig K.Enders. 2004. Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research 74(4): 525-556. Rubin, Donald B., 1987. Multiple Imputation for Nonresponse in Surveys, New York: John Wiley & Sons. Rubin, Donald B., and Elaine Zanutto. 2002. Using Matched Substitutes to Adjust for Nonignorable Nonresponse through Multiple Imputations. In Survey Nonresponse, eds. Robert M. Groves, Don A. Dillman, John L. Eltinge, Roderick J. A. Little. New York: John Wiley & Sons. Tanner, Martin A., and Wing Hung Wong. 1987 The Calculation of Posterior Distributions by Data Augmentation (with Discussion). Journal of the American Statistical Association 82: 528-550.