~CABLE 2006 ~ 2006.08
< > by 2006 8/10() 8/11() 1 vs. 2 Z () t 3 Z- t- () 4 () 5 1 2 () () 3
~CABLE 2006 ~ () 2006.08.10
( 1)
(Statistics) vs. (Biostatistics)
(Descriptive Statistics) (Inferential Statistics)
( )
(variable) Ex.
(nominal variable) (ordinal variable) (discrete variable) (interval variable) (ratio variable) (continuous variable)
(ex. ) ( ) 21 21 (frequency)
(ex. ) ( ) 321 31 ( ) ( ) ( )
1 2 101102 (ex. )
0( 0) (ex. ) 0 40 20
Ex.
vs. (1) (continuous variable) ( ) (ex. n ) (discrete variable) ( ) (ex. 10.2 )
vs. (2) ( ) ( )(ex. )
( vs. p.14)
vs. (Population) (ex. ) (parameter)(ex. N) (Sample) (statistics)(ex. n)
(ex. )
0 (random) Ex. ( 0) Ex.
( p.29 ) 0~9 ( ) (ex. 300 3 )
( ) / K(K ) K 1 K R R, R+K, R+2K,, R+(n- 1)K
( ) (ex. )
(ex. )
(1) ex. ( )
(2) (Probability Proportional to size, PPS ) ( ) Ex. 7~12
(ex. )
Ex.
ex. ( ) ( )
(ex. AIDS )
( 2) (Descriptive Statistics)
Check data, clean data (outlier)
12 101102 ) (ex. 0( 0) (ex. ) 0 40 20
1
2 (Mean) ( ) (Median) ( 2) n X X n i i = = 1 N X N i i = = 1 µ
3 (Maximum) (Minimun) (range)= ex. A 95 5 90 B 60 40 20 A B ( )
4 (Variance) ( ) 2 X µ 2 X X 2 S = ( n 1) ( ) (Standard Deviation) σ = N ( ) 2 ( ) ( ) 2 X µ X X 2 2 σ = σ = = S = N ( n 1) 2
5 (Coefficient of variancec.v.) % σ S µ X %CV ( ) ex.
(frequency) (Mode)
x ( )y (histogram ) (stem-and-leaf plot) (box plot) (outlier) (bar chart, bar graph) x x
(stem) (leaf)
outlier 1.5(IQR) IQR= 1.5(IQR)
1 (Normal Distribution) or or (Gausian shape, symmetric) 50%
2 mean = median = mode
1 (positively skewed) mean > median > mode
2 (negatively skewed) mean < median < mode
3 (bimodal distribution)
( 3) ()
( ) vs. ( ) ( ) ( )
( ) vs. Z-test t-test Paired t-test ANOVA
Z-test vs. t-test Z-test Z X µ Z = 0 t-test σ n t ( 2 2 ) X1 X 2 [( n1 1) s1 + ( n2 1) s2 ] t = s = 1 1 n1 + n2 + n n 1 2 t (n-1) t Z n120 tz 2
vs. H o 1 2 1 2 1 2 ( ) ( ( ( ( ) ) ) ) H 1 1 2 ( ) ( ) ( H o ) ( H o ) 1 2 1 2 ( 1 2 1 2 )
p-value vs. -value p : H o : H o 0.05 (5%) p p0.05 H o ( ) 0.10 10% 0.01 1%
Z Z= -2-1 0 1 2 Z= 1.96~-1.96 95%
Paired t-test ( ) A B
( ) ( ) ( ) ( ) Ex.
vs. 1 1 2 95% (95% Confidence Interval 95% C.I.) 95% Ex. Ho: 1 2 1 2 95% C.I. 0 Ho: 1 2 Ho
ANOVA 1 Analysis of Variance N-way ANOVA N (ex. ) One-way ANOVA ( ) vs. ( ) Z-testt-test ANOVA
ANOVA 2 H o : 1 2 3 4 H 1 : 2 Assumption ANOVA (F-test)
ANOVA 3 (TSS=WSS+BSS) WSS( Within Sum of Square) = n-kk BSS( Between Sum of Square) = k-1 TSS( Total Sum of Square) = n-1 =(n-k)+(k-1)
F-test F=MBSS/MWSS ANOVA 4 MBSS=BSS/(k-1) ( Mean Between Sum of Square) MWSS=WSS/(n-k) ( Mean Within Sum of Square) p0.05
ANOVA ( ) Scheffe s( )Bonferroni LSD( Least Significant Difference method) ( k 1) F k 1, n k, 1 α Scheffe s: t Bonferroni: LSD: tt * α α = k C 2 α n k,1 2
( 4) ()
(binomial variable) (proportional variable) (ex. ) 1(p+q=1=100%) AB
( ) McNemar s test B (cells) 5 ( RC table) Chi- Square test for trend Fisher s Exact test Chi- Square test Proportional Z-test
Proportional Z-test 1 p~ p B 17% B p~ p ~ H 0 : p 1 =p 0 p p ~ 0 p p0 Z = = σ ~ p p0 q0 n 50% 120 80 40 t pq σ p ~ σ p ~
Proportional Z-test 2 ~ p 1, ~ p 2 ( p 1 -p 2 ) B 17%30% B 95% ~ p 1, ~ p 2 (p 1 p 2 ) ( ~ p1 ~ p2 ) µ ( p1 p2 ) H 0 : p 1 =p 2 =p 0 Z = σ ~ ~ ( p p ) 1 2 B 17%30% B
Chi-Square test 1 ( ) (association) A B ex.
Chi-Square test 2 H 0 : X 1 X 2 2 χ ( O E) = E 2 (Observed O ) (Expected E ) 10 (26.67) 90 (73.33) Total 70 (53.33) 130 (146.67) Total 100 200 300 80 E11 = 100 = 26.67 300 220 E12 = 100 = 73.33 300 80/300 80 220
Fisher s Exact test OE ( ) OE ( )( ) 35 30 (146.67) 5 (4.08) 60 53 7 Total 25 23 (53.33) 2 (2.92) Total ( ) ( )!!!!! )! )!( )!( )!( ( )! )!( (!!!!!!! ),,, ( d c b a n d b c a d c b a d b c a n d c d c b a b a C C C P n c a d c c b a a d c b a + + + + = + + + + = = + + + O 11 210 c+d d c n=a+b+c+d b+d a+c Total a+b b a X 2 Total X 1
McNemar s test bc ad H 0 : p=1/2 E 12 =E 21 =(b+c)/2 X 2 a c X 1 b d Total a+b c+d 2 χ ( ) b + c 2 ( b + c) Total a+c b+d n=a+b+c+d b c 1 = ( b + c) Ex. vs. A B vs. ( 1/2 by chance) = b 2 4 1 2 2
Chi-Square test for trend Ex. ( ) vs. ( ) ( 2k table)
( 5) ()
(pearson correlation) (simple linear regression)
(pearson correlation) x 1 x 2 (simple linear regression) x y x y (Scatter plot)
yˆ = 1 β + β x y i = β + β x + ε 0 1 1 0 1 i β 0 β ε 1 i
1 () x y ( r ) x -1~1 y 0 r
2
~ ~
~CABLE 2006 ~ () 2006.08.11
( 1) ()
(multiple linear regression)
xy2
assumption (y) () (mean=0variance= 2 ) 1 >1
y ŷ k ˆ 0 1 1 2 2 3 3 y = β + β x + β x + β x +... + β k x k y y = x 0 1 1 2 2 3 3 k k β + β x + β x + β x +... + β + ε ŷ 100% y
(x) x y (dummy variable) (reference group) y k k-1 ( SAS ) xb(cd ) A y (ex. 0 x y )
Reference Ex. ( ) 3 X 11 X 12 1(0 ) Contrast (Effect) SAS default 1 1 1 2 3 4 1 0 0 0 0 1 0 X 13 0 0 0 1
1 ( ) x x 1 y 1 3 3 2 2 1 0 3 3 2 2 1 0 2 1 1 2 1 3 3 2 2 1 1 0 2 3 3 2 2 1 1 0 1 3 3 2 2 1 1 0 )... 0 ( )... 1 ( ˆ ˆ ) ( y y... ˆ... ˆ... ˆ β β β β β β β β β β β β β β β β β β β β β β β β β β = + + + + + + + + + + = + + + + + = + + + + + = + + + + + = k k k k k k k k k k x x x x x x y y x x x x x y x x x x y x x x x y
(Full Model) (Model selection) (Backward selection) (Forward selection) (Stepwise selection)
(r 2 R 2 ) -1r1 0r 2 1 y % x ( ) x r 2 =
(transformation)
( 2) ()
(logistic regression, logit model) (multicategory logistic regression, polytomous logistic regression, polychotomous logit model) (proportional odds logit model)
assumption (y) ABP(A)+P(B)=1
)... ( 1 1 1 1 3 3 2 2 1 1 0 0 1 3 3 2 2 1 1 0 exp 1 1 ) logit ( log ) 1 log(... ) log( k x k x x x y y y y k k y y P P odds P P x x x x P P β β β β β β β β β β + + + + + = = = = = = + = = = = + + + + + = y
(x) x 1 ( ) e 1 (dummy variable) (reference group) y k k-1 xb(cd ) A ( ) e 1 (ex. 0 x y 1 )
1 e OR )... 0 ( )... 1 ( ) log ( ) log( 0 1 log 0) log( 1) log( ) ( y y... ) log( 1 3 3 2 2 1 0 3 3 2 2 1 0 0 1 0 1 0 1 0 1 0 1 1 2 1 3 3 2 2 1 1 0 0 1 β β β β β β β β β β β β β β β β β = = = + + + + + + + + + + = = = = = = = = + + + + + = = = = = = = = = = = = = odds ratio x x x x x x odds ratio odds odds x P P x P P x P P x P P x x x x x P P k k k k x x y y y y y y y y k k y y OR
(Full Model) (Model selection) (Backward selection) (Forward selection) (Stepwise selection)
Concordant pairs( ) (%) C ROC curve Hosmer-Lemeshow chi-square test df=2 chi-square distribution (estimated probability) (percentile) 10( ) (discrepancy) Pseudo R 2
(y) baseline-category logit model proportional odds model, cumulative logit model
baseline-category logit model ( ) ( )x P P P P P P P P x x x x x P P x x x x x P P b a b a y y y y y y y y b b k k y y a a k k y y β β α α β α β β β β β β α β β β β β + = = + = + + + + + = + = + + + + + = = = = = = = = = = = = = = 0 2 0 1 0 2 0 1 3 3 2 2 1 1 0 0 2 3 3 2 2 1 1 0 0 1 log ) log( ) log(... ) log(... ) log( 0,1,2 y y
proportional odds logit model x x x x x P P x x x x x P P b k k y y a k k y y j j β α β β β β β β α β β β β β π π π π + = + + + + + = + = + + + + + = = + + + + = > > = > = = = =... ) log(... ) log( 0,1,2 y y...... log j) P(Y 1 j) P(Y log j) logit P(Y 3 3 2 2 1 1 02 0&1 2 3 3 2 2 1 1 01 0 1&2 1 1 2
P(Y>j) P(Y>1) P(Y>2) P(Y>3)
1 2 3 4
~ ~
2002/3
(random error)
(systemic error)
(test-retest reliability) (alternate form reliability)
Weakness biologic, psychological and social changes in the respondent or try to be good in the second time Improvement carry out after a long enough period to reduce memory artifacts but promptly enough to reduce the probability of systematic changes. Apply measure instrument at two times for multiple persons compute correlation between the two measures assumes there is no change in the underlying trait between time1 and time2
(quantitative data) Pearson Correlation (categorical data) Spearman Rank Correlation (phi coefficient) SPSS Analysis Correlate Bivariate Pearson Spearman
(split half reliability) Cronbach s s coefficient alpha (( Kuder-Richardson reliability
Weakness They will underestimate reliability if the items within the set are not close replications of each other. Reliability may be overestimated by the internal consistency design if the whole interview is affected by irrelevant global response patterns socially undesirable response biases Improvement The instrument is well design. Apply Includes several items pertaining to a single underlying psychological trait or symptom dimension. The items that relate to the same underlying concept are considered to be replications of each other.
SPSS Analysis Scale Reliability Analysis Model : split half Model : alpha Split half Cronbach s
Cronbach s α = + α = + cov is the average covariance between items var is the average variance between items r is the average correlation between items ( = Ε ) Ε( ) Ε( ) = =
(Intra-rater rater reliability) (Inter-rater rater reliability)
Weakness Insofar as respondent s s idiosyncratic response contribute to unreliability, estimates based on a single interview may underestimate the actual. Improvement well training use video tape Supervision in the process of the collecting data Apply are different observers consistent? can establish this outside of your study in a pilot study can look at percent of agreement (especially with category ratings)
(quantitative data) Intraclass Correlation Reliability (ICR) (categorical data) Agreement, Kappa, Random Error SPSS Analysis Scale Reliability Analysis Statistics Intraclass Correlation Coefficient (ICC) Analysis Descriptive Statistics Crosstabs Statistics Kappa
TYPE of RELIABILITY Intraclass Correlation Reliability (ICR) RATERS FEXED or RANDOM VERSION of INTRACLASS CORRELATION PartA Reliability of single rater Nested n subjects rated by k different raters Random ICR(1,1) = BMS - WMS BMS + (k - 1)WMS Subject by rater crossed design Random ICR(2,1) = TMS - EMS TMS + k 1 EMS + k JMS - EMS n (kappa) Subject by rater crossed design Fixed ICR(3,1) = TMS - EMS TMS + (k - 1)EMS (phi or r) PartB Reliability of the average of k ratings Nested: n subjects rated by k different raters Random ICR(1,k) = BMS - WMS BMS Subject by rater cross design Random ICR(2,k) = TMS - EMS TMS + (JMS - EMS) / n Subject by rater crossed design Fixed ICR(3,k) = TMS - EMS TMS α
ICR - F Table Source of Variation Sum of Sq. DF Mean Square F Prob. Between People 2253.5000 9 250.3889 Within People 422.6667 20 21.1333 Between Measures 13.0667 2 6.5333.2871.7538 Residual 409.6000 18 22.7556 Total 2676.1667 29 92.2816 ICR(1,1)=0.77 77%
rater1 total rater2 a b a+b c d c+d total a+c b+d n
Agreement Agreement =(a+d)/n
Kappa = σ = + = ( + )( + ) + ( + )( + ) [ ] = = + + ( )
weight kappa Weight kappa A weighted K can be used when some notion of the seriousness of the rater s s disagreements is available and where the disagreements can be weighted accordingly. = Σ Σ
generalize kappa Generalize kappa A generalization of k to the case where there are more than two diagnostic (nominal) classes and more than two raters. R raters N subjects C classes = = = = =
random error Random error The RE coefficient is recommended in which chance is assumed to operate in a purely random way. = + = + = +
(content validity) (face validity)
(1) (content validity matrix) 1 2 3.. A B C (Principal Component)
(2) (2)
(external) (criterion) (predictive validity)
(predictive validity) (concurrent validity) (postdictive validity) (discriminant validity)
(correlation analysis) (sensitivity) (specificity) Receiver Operating Characteristic (ROC)
Positive Negative Total Criterion Present Absent a b c a+c d b+d Total a+b C+d N Sensitivity=a/(a+c) Positive predictive value=a/(a+b) Specificity=d/(b+d) Negative predictive value=d/(c+d) Prevalence=(a+c a+c)/n
ROC SPSS Analysis Graphs ROC curve Coordinate points of the ROC curve Test variable State variable Value of state variable (positive)
ROC analysis Cut off Sensitivity 1 Specificity 2.0000 1.000 1.000 3.5000 1.000.640 4.5000.941.560 5.5000.941.400 6.5000.824.240 7.5000.824.160 8.5000.824.080 9.5000.412.000 10.5000.176.000 11.5000.118.000 13.0000.059.000 5.0000.000.000
(construct validity)
1 (Exploratory factor analysis, EFA)
2 (Confirmatory factor analysis, CFA)
(internal validity) (external validity)
1 vs Paradigm Kuhn Positivism Interpetive Critical 1. 2. 3. 4. v.s 5. ( ) 6. 7. 8. From Neuman1994
2
vs. ( ) Z-test t-test Fisher s Exact test ANOVA a ANOVA b c
< > n [6-18 ] < > n (%)
<t-test> 95% t p <ANOVA> 95% F p O A B AB
< > n (%) X 2 p O A B AB < > 1.00 (p=) 0.86 (p=) 0.43 (p=) 1.00 (p=) 0.89 (p=) 1.00 (p=)
< > (se) p ( O ) A B AB ( ) < > OR 95% C.I. p ( O ) A B AB ( )
< > Y=1 / Y=0 Y=2 / Y=0 Y=3 / Y=0 OR 95% C.I. OR 95% C.I. OR 95% C.I. ( O ) A B AB ( ) < > 1(Y=1 & 2 / Y=0) OR 95% C.I. p 2(Y=2 / Y=0 & 1) ( O ) A B AB ( )
Factor Analysis FA 1. Exploratory Factor Analysis EFAEFA EFA (1) (2)
2. Confirmatory Factor Analysis CFA (1) CFA CFA (2) CFA 3 1 factor loading 1 reference indicator variable 1 0 =
13.4% 7.0% 9.0% 11.5% 14.0% 9.0% 11.5% 14.0% 14.0% 14.0%