. 兩 料 數 ( 率 (
料 連 兩 數 Q : 女 異 女 料 數 { 女 } 連 女 女 女
女 女 女 女? 3
料 (biary{ } 兩 率 Q : all pass 女 all pass 例 兩 女 料 Y {all pass, ot all pass} all pass 女 女 all pass 女 all pass 4
all pass 例 S/F 女 all pass 例 女 S/F 女 S F S F 女? 5
兩 兩 立 例 vs 女 立 6
兩 立 女 女 女 立 女 女? 7
立. Before-After : 不 例 降 兩 料 8
立 : before-after 0? 9
立. Match-pair : 類 例 療 療 料 不 降 0
立 : match pair 女 女....... - 女 0?
Goals: PART I. Two idepedet samples A : Comparig populatio meas(cotiuous data Case I. Normal+kow variaces z-test. Case II. Large sample sizes z-test. Case III. Normal+ukow equal variaces t-test. B : Comparig populatio proportios (biary data Large sample sizes z-test! PART II. Two depedet samples Testig the populatio mea differece DiffereceD-~sigle populatio. Normal+ukow variace. t-test(cotiuous data
Recall the statistical iferece i sigle populatio, Iferece about populatio mea :. Poit estimate :. Samplig distributio of Mea Variace, SE( σ / Distributio : Normal, or approimately ormal, or t-distributio A stadardized statistic : Z / t ( µ SE( ~ N(0, / t ( 3. Costruct cofidece iterval of based o the statistic : z or t α / α / SE( SE( µ + z µ + t α / α / SE( SE( Why? How? 3
4. Testig 0 : Test statistic : based o the stadardized statistic ( µ 0 poit estimate -µ 0 Z / t SE( SE(poit estimate Rejectio regio : based o the samplig distributio Uder H, Z / t ( µ SE( 0 0 ~ N(0, / t ( 4
Now for comparig two populatios, Iferece about populatio meas differece, µ :. Poit estimate : µ. Samplig distributio of Mea µ µ Stadard error SE( σ / + σ / Distributio : Normal, or approimately ormal, or t-distributio A stadardized statistic : Z / t ( ( µ ~ N(0, / t ( SE( If populatio variaces are equal, a pooled estimate is cosidered. 3. Costruct cofidece iterval of based o the statistic : µ µ µ ( ± zα / / t v, α / SE( 5
4. Testig µ µ : µ D0 Test statistic : based o the stadardized statistic ( 0 poit estimate µ D? SE(po it estimate Rejectio regio : based o the samplig distributio (poit estimate µ D0 Uder H0, SE(poit estimate ~? 6
PART I : Two idepedet samples A comparig meas ~populatio, ~populatio, 料 µ D µ µ? (,s,(,s, µ ˆ D, ~? 7
8 D s s s s s, 3.SE( s s s.variace.mea + + σ + σ σ + σ σ σ + σ + σ + σ σ σ µ, ˆ D µ ( 若 異數 ~? ˆ D µ ( 若 異數
9 Samplig distributio of Case I : Normal populatios + s kow Data : Result : N(0, ~ ( ( SE( ( (.Z, N( ~. D σ µ µ µ µ + σ σ σ µ µ µ,,,(,, ( σ σ σ σ σ σ, ˆ D µ
0 Case II 30 kow/ukow Data : Result : N(0, ( ( s ( (.Z, N(.CLT d D d σ µ µ µ µ + σ σ σ µ µ µ s,s,s,( s,s,s (
Eample. (P357~358 The samplig distributio of the differece i the mea hourly wage rates of the plumbers(p sample ad electricias(e sample? Assume meas ad sds of the populatioa are kow through cesus. P: ~ 30, 4.5, variace(4.5 0.5 E: ~ 9, 4.5, variace(4.5 0.5
If idepedetly take two radom samples P : 40>30, by CLT, SE ( σ / 0.5/400.506(0.7 d N(30,(0.7 E : 35>30, by CLT, SE ( σ / 0.5/350.579(0.76, d N(9,(0.76
Samplig distributio of mea var iace µ µ σ 30 9 σ d N(,(.04 σ + 0.506 + 0.579 (.04 See Table - : 0 sample differeces 3
µ, σ 0.5 µ, σ 0. 5 30 9 40 35.. 40 35 Theoretically, d N(,.04 4
Table. µ 30, σ 0.5 µ 9, σ 0. 5 40 35 40 35 d " N(,.. 0.04 " Based o 0 samples to validate the theoretical result : 5
µ 6
. Bous(% : TABLE - 料 0 料 Plumbers ( µ ± k(se( 30 ± k(0.7,k 0,, Electricias ( µ ± k (SE( 9 ± k(0.76,k Differece ( ± k(se( µ 0,, ± k(.04,k 0,, (histogram (empirical rule 7
Testig populatio meas for case I ad case II, Z-test. Step. State the hypotheses: o differece vs differece H : µ µ 0 vs H : µ µ 0 0 H : µ µ 0 vs H : µ µ 0 > 0 H : µ µ 0 vs H : µ µ 0 < 0 Step. Determie the sigificace level for type I error rate: 論 0.0 論 0.05 論 0. 8
9 Step 3. Test statistic: Case I. Normal populatios + s kow Case II. Sample sizes 30, s (ukow 0 ( SE( 0 ( Z σ + σ S S ( ( SE( ( Z + σ + σ
Step 4. Decisio rule the rejectio regio is 若 : µ µ 0 Z z / H 若 : µ µ 0 Z z H > 若 H : µ µ < 0 Z -z Step 5. 料 Z 論 30
若 料 z 0 p-value H : µ µ 0 vs H : µ µ p-value P( Z > z 0 0 0 H : µ µ 0 vs H : µ µ p-value P( Z > z 0 0 > 0 H : µ µ 0 vs H : µ µ p-value P( Z < z 0 0 < 0 論 若 p-value 3
Eample.(p358 Q : Is it reasoable to coclude the mea checkout time is reduced for usig the U-sca? Q : p-value? s mea checkout time for the stadard method u mea checkout time for U-sca H : s > u 0.0 Data : Customer type (miutes s sample size Stadard 5.5 0.4 50 U-Sca 5.3 0.3 00 3
Step. State hypotheses : H 0 : s u vs H : s > u Or equivaletly, H 0 : s u 0vs H : s u >0 Step. Select sigificace level : 0.0 Step 3. Determie the test statistic : s, u 30, Z (s S Step 4. Formulate a decisio rule : z > z 0.0.33 s u u ( s S s s u S + u u 33
Step 5. Coclusio : The ull hypothesis is rejected at 0.0,. sice z (s u 5.5 5.3 3.3 > Ss Su 0.4 0.3 + + 50 00 s u.33. Or, sice the p-value ca be foud i Appedi D, P(Z > 3.3 < P(Z > 3.09 0.5-0.4990.00 < 0.0 P-value <0.00 0.499 3.09 3.3 34
Case III. Normal populatios, small sample sizes, ukow equal variaces, σ σ t-test Assumptios : Populatios :. Normal distributios. Ukow equal variaces P A pooled estimate,, for the commo variace. S P Samples : Small sample sizes σ σ σ Q : Costruct a 00(- % cofidece iterval for i this case. ( ± t SE( ( ± t (+, α / (+, α / S p ( µ µ + 35
36 Test statistic : t-test Uder H0, t ~ t-distributio with df ( -+( - + - ( S ( S S ( SE( ( t p p p + +
Variace estimate S P? A pooled estimate weighted average of S, S S p ( S + ( S ( + ( ( + ( ( + ( Why d.f.(+-? There are (- d.f. i S. There are (- d.f. i S. Totally, there are (+- i Sp. 37
Eample. (367 vs. Q : Is there a differece i the mea time to mout the egies o the frames of the lawmowers? Q : 不 異 0., Assume ormal populatios ad equal variaces Data : Procedure data (miutes s sample size, 4, 9, 3, 4.955 5 3, 7, 5, 8, 4, 3 5.0976 6 -- s are small! 38
Step. State hypotheses : H 0 : vs H : Or equivaletly, H 0 : - 0 vs H : 0 Step. Select sigificace level : 0. 39
40 Step 3. Determie the test statistic :, < 30, ormal populatios with ukow equal variaces T-test Step 4. Formulate a decisio rule : Two-sided t-test Df+-5+6-9 t t 9,0.05.833 or t -t 9,0.05 -.833 ( ( ( S S S t p p p + +
Step 5. Collect data, calculate t-value, Sample meas : Sample s.d.s : 4, S S ( ( 5 /( /( (.955 (.0976 Pooled sample variace S P ( 6. S + ( S + (5 (.9 + (6 (.09 5 + 6 4
T-value t S ( p (.833 < Coclusio : + 4 5 6.( 5 t 0.66 <.833 0.66 Not to reject the ull hypothesis of o differece at 0. There is o differece i the mea times by usig the two methods. + 6 4
Alteratively, the p-value ca be foud i Appedi F. Sice 0.6 > -.383, p-value >0. Not reject ull hypothesis at 0. 43
0. -.383-0.66 0.66.383 P-value > 0. 44
B. Statistical iferece of two populatio proportios Recall a populatio proportio :. Poit estimate : p π. Samplig distributio of p Mea π Stadard error SE(p Distributio : approimately ormal, -- coditios? A stadardized quatity: Z (p π SE(p σ p p π π( π π( π d N(0, p( p 45
3. A approimate 00(- % cofidece iterval of p( p p ± zα / SE(p p ± zα / π 4. Testig ( p π H 0 : π ( p π π π ( 0 Z 0 0 0 0 σ p π π 0 π 0 ( p π ( or ( p π p ( p π H0 46
B. Statistical iferece of two populatio proportios populatio proportios differece π :. Poit estimate : p-p π. Samplig distributio of p-p Mea π π Stadard error, SE(p - p σ p + σ p π( π + π ( π Distributio : approimately ormal, -- coditios? A stadardized statistic : (p p ( π π (p p ( π π d Z N(0, SE(p p π( π π( π + 47
3. Costruct cofidece iterval of based o the statistic. π π π π πd0 4. Testig : Test statistic : based o the stadardized statistic ( 0 poit estimate π D SE(po it estimate? Rejectio regio : based o the samplig distributio (poit estimate πd0 Uder H0, ~? SE(poit estimate 48
I testig two populatio proportios, : π π or H : π π H0 0 whe, 30, a Z-test is cosidered. 0 Z (p p 0 SE(p p π ( π (p p + π ( π p c ( p (p c p + p c ( p c where p: sample proportio i sample ; p: sample proportio i sample ; pc: the pooled, estimated proportio uder H0 49
Uder ull hypothesis, there is a commo populatio proportio. π a pooled estimate. The pooled estimated proportio is p c π p + + : the umber of successes i sample ; : the umber of successes i sample. + the total umber of successes i the two samples. + the total sizes i the two samples p + + 50
Eample. P363 A ew perfume amed Heavely is developed. The sales departmet is iterested i whether there is a differece i the proportios of youger ad older wome who would purchase Heavely if it were marketed. There are two idepedet populatios : youger, older wome Each sampled woma will be asked to smell Heavely ad idicate whether she likes the perfume or ot. 5
Problem : age vs. preferece Step. State the hypotheses H 0 : o differece vs H : differ Y purchase rate of youg wome O purchase rate of elder wome H 0 : Y O vs H : Y O Step. Select the sigificace level, 0.05 Step 3. Determie the test statistic Y 00, O 00, large sample sizes, a Z-test is used. Z (p p σ P P pc( pc pc( (p p + p c 5
Step 4. Formulate the decisio rule. A two-sided test, Z-test is used, Sigificace level 0.05, Thus, the ull hypothesis will be rejected if z z 0.05.96 or z -z 0.05 -.96 53
Step 5. Collect data ad compute Z-value ad make a decisio. Sample proportios : Y 00, Y 0, p Y 0/000. O 00, O 00, p O 00/000.5 Pooled proportio : p c (0+00/(00+000.4 The Z-value is Z Coclusio : sice z-5<-.96, the ull hypothesis is rejected at 0.05. Q : p-value? p (p ( p 5 <.96 c Y c p + O p c ( p c 0.4( 0.4 00 (0. 0.5 0.4( 0.4 + 00 54
Note : alterative z-test Similarly, i testig H0 : π π d whe, 30, a Z-test is cosidered. Z (p p d SE(p p (p p σ d (p P P p( p p( p + d p where p: sample proportio i sample ; p: sample proportio i sample ; 55
PART II. Two depedet samples Depedet samples, related samples, paired samples Two kids of ivestigatio :. Before-After : 不 例 vs 量. Match pair : 類 例 56
Origial data set: paired samples Hypothesis testig : 0 : Data process: for each pair, calculate D- Processed data set: a sample of D s µ µ Hypothesis testig: let H µ µ H D µ : µ 0 a sigle populatio/sample 0 D problem Pair -D.......... Summary statistics, s ( (, s ( D, s D 57
Assumptios : Paired differece : D ~ N( µ D, σ D σ D ukow; The problem is simplified to a sigle populatio case, testig H0 : µ D 0 Whe is ot large, a t-test is used. t S D D / where t-test has d.f.(-; ( D,s is the sample mea ad s.d. D based o the observed differeces (D,, D 58
Eample. P37 Nickel Savigs ad Loa wishes to compare the two compaies they use to appraise the value of residetial homes. Eperimet : Nickel Savigs selected a sample of 0 residetial properties ad scheduled both firms for a appraisal. The results are give i et page. At the 0.05 sigificace level, ca we coclude there a differece i the mea appraised values of the homes. 59
Origial data: paired samples Home 35 8 0 05 3 3 9 4 4 40 5 05 98 6 30 3 7 3 7 8 0 5 9 5 0 49 45 60
Step. State the hypotheses H0 : o differece, H : differ, that is H0 : µ D 0 vs H : µ D 0 Step. Select the sigificace level 0.05 Step 3. Select the test statistic, Assumptio : the differece, D, follows a ormal distributio. Sice the sample size 0 <30, ukow populatio variace, a t-test is used. t S D D / 6
Step 4. Formulate the decisio rule. A two-sided test A t-test, with df -0-9, The sigificace level 0.05, by Appedi F H0 will be rejected at 0.05, if t t 9,0.05.6 or t -t 9,0.05 -.6 6
New data set Home -D 35 8 7 0 05 5 3 3 9 4 4 40 5 05 98 7 6 30 3 7 7 3 7 4 8 0 5-5 9 5 3 0 49 45 4 D (7 + 5 + +... + 4 /0 4.6 sd [(7 4.6 + (5 4.6 +... + (4 4.6 ]/(0 4.40 63
Step 5. Collect data, calculate t ad draw coclusio. Sice D 4.6,s D 4.40, 0 t S D D / 4.6 4.40 / 0 3.305 >.6 Coclusio : the ull hypothesis of o differece is rejected at 0.05. Note that sice 3.5<3.305<4.78, 0.00<P-value <0.0. 64
65
ECEL : home Schadek Bowyer 數 異 數 異數 S 數 數 數 度 臨 P-value 臨 66
Q : why i previous case, for two idepedet samples, we ca t have such simplificatio? No correspodig pair eists. Q : why uses depedet samples? To reduce variatio. Eample. 療 若 兩 立 不 3 兩 量 量 異 來 :.. 易 異 來 來 67
ECEL 料 z 兩 數 異 t 兩 數 異 異數 t 兩 數 異 異數不 t 數 異 68
Eercise. Clarify ad add the adequate coditios for the method you use i the eercise. Try to use ECEL for the eercises with large data sets. Two idepedet samples Cotiuous data : 3, 5, 33, 37, 47(ECEL, 48(ECEL Biary data : 9, 3, Two depedet samples 4, 45 69