Ch 19 實習 (1)
Agenda Nonparametric statistic 使用時機 Wilcoxon Rank Sum Test Sign Test Wilcoxon Signed Rank Sum Test Kruskal-Wallis Test Friedman Test Spearman Rank Correlation Coefficient 2
1. Nonparametric statistic 使用時機 資料型態 Interval Nominal Ordinary 檢定方式 T Z ANOVA Chi square goodness of fit 列連表 無母數 常態分配 Yes Check No 3
1. Nonparametric statistic 使用時機 有母數 無母數 Independent T Wilcoxon Rank Sum Test Ordinal or Interval ( 非常態 ) N>20 才會趨近常態 ( 或個別樣本都要 >10) Match Paired T Sign Test Ordinal or Interval ( 非常態 ) N>20 才會趨近常態 ( 或 Nd>10 對 ) Match Paired T Wilcoxon Signed Rank Sum Test Interval ( 非常態 ) N>30 才會趨近常態 ( 或 Nd>15 對 ) One way ANOVA Kruskal-Wallis Test Ordinal or Interval ( 非常態 ) Random Block design Friedman Test Ordinal or Interval ( 非常態 ) Correlation Spearman Rank Correlation Coefficient Ordinal or Interval ( 非常態 ) 4
1. 檢驗步驟 S1: 先檢驗是否為常態分配 (chi-square goodness of fit) S2: 非無母數, 用無母數檢定 ( 或 ordinal 直接用無母數 ) S3: 設定假設 S4: 找 critical point S5: 求檢定量 T (or Z) S 6: 結論 H 0 : The population locations are the same (Ho: u1=u2) H 1 : (i) The locations differ, (Ho: u1=u2) (ii) Population 1 is located to the right of population 2(Ho: u1>u2) (ii) Population 1 is located to the left of population 2 (Ho: u1<u2) 5
2. Wilcoxon Rank Sum Test (Mann- Whitney Test) The problem characteristics of this test are: The problem objective is to compare two populations. The data are either ordinal or interval (but not normal). The samples are independent. 類似獨立樣本 T 檢定 6
1. Wilcoxon Rank Sum Test (N 各別 <10) S1: 設定假設 H 0 : The two population locations are the same H 1 : The location of population 1 is different from the location of population 2 S2: 求 critical point 查表 P(T T L )=P(T T U ) S3: 算統計量 T (N 各別 >10) S1: 設定假設 H 0 : The two population locations are the same H 1 : The location of population 1 is different from the location of population 2 S2: 求 critical point 查表 Za/2, Za S3: 算統計量 Z S4: 結論 S4: 結論 7 TL TU
Example 1 Based on the two samples shown below, can we infer at 5% significance level that the location of population 1 is to the left of the location of population 2? ( 類似 H0: u1<u2, 左尾 ) Sample 1: 22, 23, 20 Sample 2: 18, 27, 26 8
S1: Solution 1 H 0 : The two population locations are the same. H 1 : The location of population 1 is to the left of the location of population 2. S2: 查表 T L =6, T U = 15 S3 計算 T( 由小到大排序, 求 T1) Sample 1 22 23 20 Rank 3 4 2 Sample 2 18 27 26 Rank 1 6 5 結論 : T1=2+4+3=9 T1=9 T2=1+6+5=12 9 TL=6 TU=15 Don t reject H0. 表示兩樣本的 location 沒有差異
1. Critical values of the Wilcoxon Rank Sum Test α =.025 for one tail test, or α =.05 for two tail test n2 n1 3 4 5... 10 3 6 15 4 6 18 11 25 17 33... 61 89 5 6 21 12 28 18 37... 64 96... T L T U T L T U T L T U 10 9 33 16 44 24 56... 79 131 T L T U Using the table: For given two samples of sizes n 1 and n 2, P(T T L )=P(T T U )= α. A similar table exists for α =.05 (one tail test) and α =.10 (two tail test) 10
Example 2 (N<10) Use the Wilcoxon rank sum test on the following data to determine the two population locations differ. (Use a 10% significance level.) Sample 1: 15 7 22 20 32 18 26 17 23 30 (n=10) Sample 2: 8 27 17 25 20 16 21 17 10 18 (n=10) 11
Solution 2 H 0 : The two population locations are the same H 1 : The location of population 1 is different from the location of population 2 Sample 7 1 Rank 8 2 10 3 15 4 16 5 17 (6+7+8)/3=7 18 (9+10)/2=9.5 T1=118 12 There is not enough evidence to infer that the location of population 1 is different from the location of population 2 TL=83 TU=127
13
Example 3 (N>10) Given the following statistics, calculate the value of the test statistic to determine whether the population locations differ. In addition, calculate the P-value. (a=0.05) T 1 =250 n 1 =15 T 2 =215 n 2 =15 14
Solution 3 S1: S2: H 0 : The two population locations are the same H 1 : The location of population 1 is different from the location of population 2 Z a/2 =1.96 S4 Z=0.73 S3: n (n1 + n2 E(T) = 2 σ + 1) 15(15 + 15 + 1) 1 = = n1n 2(n1 + n2 12 + 1) 2 (15)(15)(15 + 15 + 1) T = = = 12 232.5 24.11-1.96 1.96 T E( T) z = σ T 250 232.5 = 24.11 = 0.73 p-value = 2P(Z >.73) = 2(.5.2673) =.4654. 15
2. Sign Test The objective is to compare two populations. The data are either ordinal or interval (but not normal). The samples are matched by pairs( 類似 paired T test). 16
S1: 假設檢定 H 0 : The two population locations are the same H 1 : The two population locations are different H 0 : p =.5 H 1 : p. 5 S2:Critical point: Za or Za/2 S3: z = x 0.5n 0.5 n where n 10. 2. The Sign Test X: 有幾的正號 n: 樣本數 ( 但不包含相減為 0) The binomial variable can be approximated by a normal variable if np and n(1-p) >= 5. The Z- statistic becomes z = x np np(1 p) where n 10. = x.5n n(.5)(.5) = x.5n.5 n S4: 結論 17
Example 4 兩位評審員對 12 個參賽選美的候選人進行評分, 其評分係依主觀偏好給予 0-10 分 ; 而評分結果如下 (α=5%) 評審員 I 5 6 10 7 0 9 7 10 9 6 9 9 評審員 II 4 1 7 5 8 5 5 6 8 10 5 4 試以 sign test 來檢定兩位評審員, 對 12 為參選者的評分是否有顯著差異 18
S1: H 0 : 兩位評審員評分沒有差異 (p=0.5) H 1 : 兩位評審員評分有差異 (p 0.5) Solution 4 S2: 拒絕域 :z>1.96 or z<-1.96 S3: 計算 Z 評審員 I 5 6 10 7 0 9 7 10 9 6 9 9 評審員 II 4 1 7 5 8 5 5 6 8 10 5 4 + + + + - + + + + - + + 10 個 +,2 個 -,n=12( 沒有為 0) z = 10 12 0.5 12 0.5 0.5 z = x 0.5n 0.5 n where n 10. X: 有幾的正號 N: 樣本數 ( 但不包含相減為 0) = 2.3094 > 1.96 S4: 拒絕 H 0, 所以兩位評審員評分有差異 19
Example 5 Suppose that in a matched pairs experiment, we find 28 positive differences, 7 zero differences, and 41 negative differences. Can we infer at the 10% significance level that the location of population 1 is to the left of the location 2? 20
Solution 5 H 0 : The two population locations are the same H 1 : The location of population 1 is to the left of the location of population 2 ( 左尾 ) +:28 -:41 0:7 Z=-1.57 21-1.96 1.96
3. Wilcoxon Signed Rank Sum Test This test is used when the problem objective is to compare two populations, the data are interval but not normal the samples are matched pairs. The test statistic and sampling distribution T is based on rank sum of the absolute values of the positive and negative differences When n <=30, reject H 0 if T>T U or T<T L (T L and T U tabulated values related to n, n is the number of nonzero ). When n > 30, T is approximately normally distributed. Use a Z-test. E(T) = n(n+1)/4 standard deviation = [n(n+1)(2n+1)/24]^(1/2) 22
1. Wilcoxon Signed Rank Sum Test (N<30) S1: 設定假設 H 0 : The two population locations are the same H 1 : The location of population 1 is different from the location of population 2 S2: 求 critical point 查表 P(T T L )=P(T T U ) S3: 算統計量 T (N>30) S1: 設定假設 H 0 : The two population locations are the same H 1 : The location of population 1 is different from the location of population 2 S2: 求 critical point 查表 Za/2, Za S3: 算統計量 Z E(T) = n(n+1)/4 S4: 結論 σ # = %(%'()(*%'() *+ S4: 結論 23 TL TU
Example 5 (N<30) Perform the Wilcoxon signed rank sum test to determine whether the location of population 1 differs from the location of population 2 given the data shown here. (Use α =.05) Pair 1 2 3 4 5 6 Sample1 18.2 14.1 24.5 11.9 9.5 12.1 Sample2 18.2 14.1 23.6 12.1 9.5 11.3 Pair 7 8 9 10 11 12 Sample1 10.9 16.7 19.6 8.4 21.7 23.4 Sample2 9.7 17.6 19.4 8.1 21.9 21.6 24
Solution 5 H 0 : The two population locations are the same H 1 : The location of population 1 is different from the location of population 2 Reject region: TL<14; TU>64 (1+2+3)/3=2 (6+7)/2=6.5 T+=34.5 25 TL=14 TU=64
26
Example 6 (N>30) A matched pairs experiment produced the following statistics. Conduct a Wilcoxon signed rank sum test to determine whether the location of population 1 is to the right of the location of population 2.(Useα=.01) + - T=3457 T=2429 n=108 27
Solution 6 H 0 : The two population locations are the same H 1 : The location of population 1 is to the right of the location of population 2 ( 右尾 ) + - T=3457 T=2429 n=108 28
4. Kruskal-Wallis Test The problem characteristics for this test are: The problem objective is to compare two or more populations. ( 比較兩個以上樣本 ) The data are either ordinal or interval but not normal. The samples are independent. The hypotheses are H 0 : The location of all the k populations are the same. H 1 : At least two population locations differ. ( 一定是檢測 K 組相不相似, 因為沒檢測誰比較大 ) 29
S1: 假設檢定 4. Kruskal-Wallis Test H 0 : The location of the 3 are the same H 1 : At least two locations are different S2:Critical point: S3: 計算 H 值 2 H > χ α,k 1 Rank the data from 1(smallest) to n (largest). Calculate the rank sums T 1, T 2, T k for all the k samples. H k = 12 ( + 1) T n n n j= 1 2 j j 3( n + 1) S4: 結論 30
31 Example 7
Solution 7 先找到最小值 :12 12 : (1+2)/2=1.5 17 : 3 18 : (4+5)/2=4.5 H = 12 k T n( n + 1) n j= 1 2 j j 3( n + 1) 3.03 32 5.99
無母數優缺點 無母數統計分析 (Non-parametric Statistics) 母體分配未知, 且樣本數不是很大 優點 : 對母體的假設少, 不需要假設母體是什麼分配 對小樣本的資料, 或是有偏斜分配的母體做推論比較合適 可以分析順序資料 缺點 : 檢定力 (1-β) 較弱 33 對某些較複雜的模式如有交互作用的多因子設計無法做檢定 處理方式不一 ( 很難計算 )