Hypothesis Testing - review The null hypothesis (H 0 ) and the alternative hypothesis (H 1 ) Type I error: rejecting H 0 when H 0 is true Type II error: failing to reject H 0 when H 1 is true (H 0 is false) A test is a decision rule that states when to rejct H 0 based on a statistic. A statistic is a quantity that can be computed based on the data. The size of a test is the largest Type I error probability for the test. When the size of a test is less than or equal to α, we say that the test is of level α. We cannot require both Type I error probabilities and Type II error probabilities to be arbitrarily small. 決定 H 0 Example 1. 1. Suppose that the weight of a can of coke (in ounces) is N(31.2, 0.4 2 ) distributed using the conventional production process. 2. Suppose that some changes have been made in the production process and the weight distribution is N(µ, 0.4 2 ). 3. Check the weights of the next 16 cans of cokes. Sample mean X: 31.02. 4. We want to know whether µ < 31.2. 5. We want to control the probability that we deduce that µ < 31.2 based on the sample when µ 31.2. Which statement should be H 0, µ < 31.2 or µ 31.2? One sample z test. Data: (X 1,..., X n ), a random sample from N(µ, σ 2 ) σ is known. Three testing problems. H 1 : µ µ 0, µ > µ 0 or µ < µ 0. Test statistic: Z = n( X µ 0 )/σ. At level α, the one sample z test rejects H 0 : µ = µ 0 if Z > z α/2 ; H 0 : µ µ 0 if Z > z α ; H 0 : µ µ 0 if Z < z α. Example 2. Suppose that Assumptions 1 3 hold in Example 1. Can we conclude that µ < 31.2 at level 0.05? 1
When the rejection rule for a test at every level α can be re-written as then xxx is the p-value of the test. xxx < α, If p-value < α, then the test can reject H 0 at level α. If p-value > α, then the test cannot reject H 0 at level α. Suppose that a test rejects H 0 level α whenever Z > z α/2 p-value = 2P (N(0, 1) > observed Z ); whenever Z > z α p-value = P (N(0, 1) > observed Z); whenever Z < z α p-value = P (N(0, 1) < observed Z); Intepreting the p-value. p-value < 0.1 some evidence against H 0. p-value < 0.05 strong evidence against H 0. p-value < 0.01 very strong evidence against H 0. p-value < 0.001 extremely strong evidence against H 0. 可樂問題. 正常製程下, 每瓶可樂重量 ( 以 ounce 計 ) 分布為 N(31.2, 0.4 2 ). 假設新產出的可樂每瓶重量分布為 N(µ, 0.4 2 ). Sample: 新產出的 16 瓶可樂重量. Sample mean X: 31.38. 想知道 : 是否有強烈證據顯示目前製程裝較多可樂 (µ > 31.2). Sol. 考慮以 one sample z test 檢定 H 0 : µ 31.2 versus H 1 : µ > 31.2, 並根據 p-value 判斷是否有強烈證據支持 H 1 : µ > 31.2. One sample z test rejects H 0 : µ 31.2 at level α 的條件為 Z def 16( X 31.2) = > z α, 0.4 且觀察到的 Z 統計量為 16(31.38 31.2)/0.4 = 1.8. 因此 p-value 為 P (N(0, 1) > 1.8) = 0.5 0.4641 = 0.0359. 0.0359 < 0.05, 所以有強烈證據顯示目前製程裝較多可樂. One sample t test. Data: (X 1,..., X n ), a random sample from N(µ, σ 2 ) σ is unknown. Three testing problems. H 1 : µ µ 0, µ > µ 0 or µ < µ 0. 2
Test statistic: T = n( X µ 0 )/S. At level α, the one sample t test rejects H 0 : µ = µ 0 if T > t α/2,n 1 ; H 0 : µ µ 0 if T > t α,n 1 ; H 0 : µ µ 0 if T < t α,n 1. 可樂問題. 正常製程下, 每瓶可樂重量 ( 以 ounce 計 ) 分布為 N(31.2, σ 2 ), σ 未知. 假設新產出的可樂每瓶重量分布為 N(µ, σ 2 ). Sample: 新產出的 16 瓶可樂重量. Sample mean X 為 31.38, sample standard deviation S 為 0.4. 想知道 : 是否可在 0.05 顯著水準下, 推論目前製程裝較多可樂 (µ > 31.2). Sol. 考慮以 one sample t test 檢定 H 0 : µ 31.2 versus H 1 : µ > 31.2, One sample t test rejects H 0 : µ 31.2 at level α 的條件為 16( X 31.2) T = > t α,16 1 = t α,15. 0.4 今 α = 0.05, t α,15 = t 0.05,15 = 1.753 且觀察到的 T 統計量為 16(31.38 31.2)/0.4 = 1.8 > t 0.05,15. 因此可在 0.05 顯著水準下推論目前製程裝較多可樂. Approximate z test for testing for population proportion. Data: (X 1,..., X n ), a random sample from Bin(1, p) Three testing problems. H 1 : p p 0, p > p 0 or p < p 0. Test statistic: Z n = n( X p 0 )/ p 0 (1 p 0 ). At level α, the test rejects H 0 : p = p 0 if Z n > z α/2 ; H 0 : p p 0 if Z n > z α ; H 0 : p p 0 if Z n < z α. 陪審團成員挑選是否公平問題 ( 取自 : 看漫畫學統計 ). 陪審團成員由合格公民中選出. 合格公民中有一半為非裔. 選出的 80 位陪審團成員中, 有 4 位為非裔. 想知道 : 是否有強烈證據顯示陪審團成員挑選並不公平 ( 挑中非裔成員機率 = p < 0.5). 3
Sol 1. 令 X 為選出的 80 位陪審團成員中, 非裔成員個數. 考慮使用 approximate z test 檢定 H 0 : p 0.5 versus H 1 : p < 0.5 並依 p-value 判斷是否有強烈證據顯示陪審團成員挑選並不公平. 當 Z n = 80( X 0.5)/ 0.5(1 0.5) < z α 時, approximate z test 在顯著水準 α 下拒絕 H 0 : p 0.5. 觀察到的 Z n = 80(4/80 0.5)/ 0.5(1 0.5) = 8.049845. p-value = P (N(0, 1) < 8.049845) < P (N(0, 1) < 3.09) = 0.001. p-value < 0.001, 有非常強烈證據顯示陪審團成員挑選並不公平. Sol 2. Using an exact test for testing H 0 : p 0.5 versus H 1 : p < 0.5 based on X, and compute the p-value to see if there is a strong evidence against H 0. Exact test rejects H 0 at level α whenever 80 X C α, where C α is the largest integer such that P (Bin(80, 0.5) C α ) α. Let p v = P (Bin(80, 0.5) observed 80 X) = P (Bin(80, 0.5) 4), then the test can reject H 0 at level α if α > p v and the test cannot reject H 0 at level α if α < p v. Since the p-value is p v = P (Bin(80, 4) 4) = 1.378894 10 18 < 0.001, there is a very strong evidence that the selection is unfair (p < 0.5). Testing for σ. Suppose that (X 1,, X n ) is a random sample from N(µ, σ 2 ). Let S be the sample standard deviation. Consider the testing problem H 0 : σ σ 0 v.s. H 1 : σ > σ 0. Consider the test that rejects H 0 at level a whenever (n 1)S 2 > k a,n 1, where k a,n 1 is the (1 a) quantile for χ 2 (n 1). Then the test is of size a. Example 3. Suppose that we have a random sample of size 61 from N(µ, σ 2 ) and the sample standard deviation is 11. Can we conclude that σ > 10 at level 0.05? Sol. Use the test that rejects H 0 : σ 10 at level a whenever (n 1)S 2 > k a,n 1, 4
where k a,n 1 is the 1 a quantile for χ 2 (n 1). The observed test statistic is (n 1)S 2 (61 1)112 = 10 2 = 72.6 and k a,n 1 = k 0.05,60 = 79.082 > 72.6, so we cannot conclude that σ > 10 at level 0.05. 5