plsun@mail.neu.edu.cn
1. 2. 3. 4. 5.
1. Mathematical Statistics R.V.Hogg ( 1979) 2. Statistics -The Conceptual Approach G. R. Iversen, ed ( - 2000) 3. Mathematical Statistics and Data Analysis J. A. Rice ( 2003) 4. ( - 1998)
1. Ideas of Statistics J.L.Folks ( 1987) 2. The fascination of Statistics R. J. Brook, ed ( 1986) 3. Practical Nonparametric Statistics W.J.Conover (,,, 2006) 4. Analyzing Multivariate Data J. M. Lattin, ed ( 2003)
1 1.1 1.2 1.3 1.4 1.5 EXCEL
( Statistics ) 1. 2. 3. ( )
. 1. (population) 0 1
2. (sample) ( )
..
1.1.1 X F X 1 X 2 X n F X 1 X 2 X n F ( X ) n x 1 x 2 x n X
F F F F n
. 1. 2. Bayes 3.
4. 5.
1.
2. X ~ N (m s 2 ) m s 2 ( )
Remark (1). n x 1 x 2 x n x (1) x (2) x (n) (a b) a x (1) x (n) b
(a b) x (1) x (n) n
(2). F (x) F n (x) = 0 x x (1) k x (k) x x n (k+1) 1 x x (n) x 1 x n x F n (x) = { x 1 x n x } / n
y 2/n 1/n O x (1) x (2) x (3) x F n (x) F (x)
3. n n
n x 1 x 2 x n x = 1 n n k = 1 x k m
m m 0 m 0
( m ) m m 0 ( )
SARS
( )
F F n n F F
( ) Remark fi fi
x y
. 21
1. (Categorical Variable)
1.1.1 (1993 )
2. ( metric variable ) ( )
1.1.2 1995 37 30 27 56 40 30 26 31 24 23 25 29 33 29 22 33 29 46 25 34 19 23 44 29 30 25 23 60 25 27 37 24 22 27 31 24 26 23 19 60
(1). (Lineplot) 20 25 30 35 40 45 50 55 60 22 ~34
(2). (Boxplot) 1/4 19 24 27 32 60
(3). (Stemplot) 6 5+ 5 4+ 4 3+ 3 2+ 2 1+ 0 6 6 0 4 7 0 0 0 1 1 3 3 4 5 5 5 5 6 6 7 7 7 9 9 9 9 2 2 3 3 3 3 4 4 4 9
(4). (Histogram)
3. (1). (Scatterplot)
1.1.3 37 ( ) (37 30) (30 27) (65 56) (45 40) (32 30) (28 26) (45 31) (29 24) (26 23) (28 25) (42 29) (36 33) (32 29) (24 22) (32 33) (21 29) (37 46) (28 25) (33 34) (17 19) (21 23) (24 23) (49 44) (28 29) (30 30) (24 25) (22 23) (68 60) (25 25) (32 27) (42 37) (24 24) (24 22) (28 27) (36 31) (23 24) (30 26) EXCEL.xls
EXCEL 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80
1.1.4 24 5 2 4 1 3 6 5 8 3 7 3 9 10 20 16 15 9 6 8 5 10 7 8 6 10 15 13 20 16 25 22 14 15 19 17 20 5 3 4 2 4 1 3 3 4 3 3 2
25 20 15 10 5 0 0 5 10 15 20 25 30
(2).
1.1.5 81.0 80.0 79.0 78.0 77.0 76.0 75.0 74.0 73.0 72.0 71.0 70.0 1890 1900 1910 1920 1930 1940
0 90.0 80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 1890 1900 1910 1920 1930 1940
4. (table) 1992 24 98 419 27 34 310 9 6 61
5.
. (1) (2) (Statistical Package)
1. SPSS Statistical Package for the Social Science ( ) Statistical Product and Service Solutions ( )
2. SAS Statistical Analysis System ( ) SAS
3. EXCEL AVERAGE MEDIAN VAR CORREL TINV CHIINV ( p- ) NORMSDIST CHIDIST TDIST FDIST LINEST
1.1.6 1.1.7 F n (x)
1.2 1.. A 2. 3.
. 1. P(A)
2. P(B A) 3. Bayes
. 1. X 2. ( a X b ) A
3. F(x) P(X x ) (x k x ) p k (0,1) ( t x ) f(t)
4. 5.
2. 4.. 1. 3.
5. f (x q ) h(q x ) q h(q) X q f (x q ) X q h(q) f (x q ) q X X q X h(q x )
6.
. 1. E(X)
2. D(X) 3. Chebyshev
4. Cov(X Y) 5. 6.
7. Y ( X=x i ) EY ( X = x ) = y py ( = y X = x ) i j j i j= 1 Remark Y X E( Y X ) E( Y X=x i ) P (X=x i )
Y ( X=x ) + EY ( X = x) = y f( y x) dy - Remark Y X E( Y X ) E( Y X=x) f(x
. 1. Bernoulli Kolmogrov
2. De moivre-laplace Lindeberg-Levy
1.3. (statistic) 1. 1.3.1 X 1 X n X g( ) g(x 1 X n ) X 1 X n x 1 x n g(x 1 X n ) g(x 1 x n )
X 1 X 2 X n X ~ N (m s 2 0 ) m s 02 1. 1 n n k= 1 X k 1 n n k = 1 X k - m X k s 0 X k
Remark T = T ( X 1 X n ) T = t ( X 1 X n ) q T
2. ( Sufficient Statistic ) 1920 Fisher Eddington X 1 X n X ~ N (m s 2 ) s Eddington p 1 j1 2 nn ( - 1) = X - X Fisher n j 2 n k n k = 1 n - 1 G ( ) = 2 ( Xk - X) = 1 2 G ( ) 2 k 2
1.3.1 N M n X 1 X n T = X 1 + + X n p = M /N T = t 1 /C nt T t/n p
( )
3. f (x q ) ( X 1 X n ) ( X 1 X n ) f (x q ) = K (T (x) q ) h(x) T (X) x q
1.3.2 X ~ B (1 p) P { X = k } = p k ( 1-p) 1-k k = 0,1 x n- x k f x q = p - p (, ) (1 ) p X k 1.3.3 X ~ U (0 q ) q X (n)
1.3.4 X 1 X n X ~ N (m s 2 ) m s 2 1 1 (, ) ( ) exp{ [ ( ) ( )]} 2ps = n n 2 2 f x q = - xk x n x m 2 2s - + - k 1 (m,s 2 ) n 1 2 ( X, ( X X) ) n k - - 1 k = 1 n ( X, X 2 ) n k k= 1 k= 1 k
4. (Complete Statistic ) T g( ) E q g(t ) = 0 P q { g(t ) = 0 } = 1 q T
1.3.5 X 1 X n B (1 p) p. B (n p) 0 p 1 n t t n t E [ hx ( )] = ht () C p (1- p) - = 0 p t = 0 (p/1-p) n t = 0 n t p t ht () Cn( ) = 0 1- p 0 p 1 0
X ( ) f(x q ) f ( x, q) = C( q) hx ( )exp{ b ( q) T( x)} X k i= 1 i i (1) (2) X ( T 1 ( X i ) T k (X i ) )
1.3.6 1.3.7 X ~ R (l) l X X k= 1 X ~ N (m s 2 ) (m,s 2 ) n k n 1 2 ( X, ( X X) ) n k - - 1 k = 1 1.3.8 X ~ U (0 q ) q X (n)
. 1. 2. ( ) 3.
1.1 ( Sample mean ) X = 1 n n k = 1 X k ( ) { 2 4 6 8 10 } { 4 5 6 7 8 } 6
1.2 ( Median ) n { 2 1 6 4 3 } 3 n { 2 1 6 4 3 7 } (3+4)/2 = 3.5 1.3 ( Mode ) { 1 1 3 3 4 2 3 8 } 3
Remark (1). M (X ) 1 M( X) = inf { x:p( X x) } x 2 Mode (X ) (2). (3).
2. ( Sample variance ) 1 S X X n 2 2 = ( k - ) n - 1 k = 1 S ( Standard deviation ) { 2 4 6 8 10 } { 4 5 6 7 8 } 6 S 12 = 10 S 22 = 2.5 6
3. (Order Statistic) X 1 X n x 1 x n x (1) x (2) x (n) X (1) X (2) X (n) Remark 1. 2. 3. (Rank)
5 X 1 X 2 X 3 X 4 X 5 1 3 0 3 2 0 1 2 3 3 X 3 X 1 X 5 X 2 X 4 X (1) X (2) X (3) X (4) X (5) R 1 = 2 R 2 = 4 R 3 = 1 R 4 = 5 R 5 = 3 X (1) X (n) (Range) X (n) - X (1)
f(x) X 1 X n Y k = X (k) (1). f(y 1 y n ) = n! f(y 1 ) f(y n ) y 1 y 2 y n (2). k Y k = X (k) n! f y f yf y F y ( k-1)!( n-k)! k-1 n-k k( ) = ( ) ( ) [1- ( )]
1.3.9 X (1) f ( y) = nf( y)[1 -F( y)]n 1-1 X (n) f ( y) = nf( y)[ F( y)] n n -1 2 ( )
(3). (k l ) n! fkl, ( yk, yl) = f( yk) f( yl) ( k-1)!( l-k -1)!( n-l)! F y F y -F y - F y y < y k-1 l-k-1 n-l ( k) [ ( l) ( k)] [1 ( l)], k l 1.3.10 f ( y) = nn ( - 1) 1n + - f x f x y F x y F x dx y n-2 ( ) ( + )[ ( + )- ( )], > 0
Remark (1). (2). (3). 2 3
3. 1.75 1
1.3.11 3000 1000 21 10 1000
15 000 8 000 3 4000 5 2000 10 1000 21 63,000 3000 2000 1000 14 000
1.3.12 (km/h) 110 110 ~ 150 150 12 32 5
. ( ) N (0 1) 1. K 2 = X 12 X 22 X n2 X 1 X n N (0 1 ) K 2 n K 2 ~ c 2 (n)
2. 1 ( ) =, > 0 2 G ( n /2) n/2-1 - x/2 kn x x e x n /2 n 2n 3. X Y X ~ c 2 (n 1 ) Y ~ c 2 (n 2 ) X + Y ~ c 2 (n 1 + n 2 )
4. k n (x) X ~ c 2 (n) 0 a 1 c P { X c } = a o a c a2 (n) x c n a ( ) c a2 (n)
( ) t 1. t X Y X ~ N (0 1 ) Y ~ c 2 (n) T = Y X / n n t T ~ t (n)
2. t t n n + 1 G ( ) 2 x ( x) = 2 (1 + ) n n npg( ) 2 - n+ 1 2 0 ( n 2 ) t (1) Cauchy n/(n-2) ( n 3 ) 3. n t (n)
4. t t n (x) X ~ t (n) 0 a 1 c P { X c } = a a/2 - t a/2 (n) o t a/2 (n) a/2 x c n t a ( ) t a / 2 (n) a a/2
N (0 1 ) a j (x) a/2 a/2 u a / 2 - u a/2 o u a/2 x 0.05 u 0.025 = 1.96
( ) F 1. F X Y X ~ c 2 (m) Y ~ c 2 (n) F = X / m Y / n (m n) F F ~ F(m n)
2. F m+ n m G ( ) m n - 1 2 x 2 2 fmn, ( x) = 2 m n, x > 0 m+ n m n G( ) G( ) 2 ( n+ mx) 2 2 n/(n-2) ( n 3 ) 3. T ~ t (n) T 2 ~ F (1 n)
4. F f m n (x) F 1-a (m n) a = 1 / F a (n m) x o F a (m n) 1.3.13 F F ~ F(m n) 1/F ~ F(n m) F
Gamma G (a l ) l f x x e x G( a) a a-1 -lx ( ) =, > 0 a 0 l 0 G (a) Gamma G a + a - x ( ) x 1 - = e dx, a > 0 1. l G (1 l ) 2. n c 2 n 1 (n) G ( ) 2 2 3. Gamma G (a l ) a X ~ G (a l ) cx ~ G (a l /c ) 0
. ( ) 1.3.1 X 1 X 2 X n X ~ N(m s 2 ) X S 2 n( X - m) (1). ~ N(0,1) s ( n-1) S s 2 2 (2). ~ c ( n -1) 2 (3). X S 2 n( X - m) (4). ~ tn ( -1) S
1. X n N(m ) f 1 1 (2 p ) det S 2 T -1 (x) = exp{ - (x-m) S (x-m)} n /2 2. X n N(m ) n l l T X ~ N(l T m l T l ) 3. X ~ N(m ) A m n (m n ) A X ~ N(A m A A T )
1.3.1 1 1.. n n C = *.. * n *.. * X ~ N( m1 n s 2 I n ) Ł ł Y = CX ~ N( n 1/2 m1 (1) s 2 I n ) 1 (1) =(1 0.. 0) T (1) Y 1 ~ N( n 1/2 m s 2 ) n X (2) Y 2.. Y n i.i.d N(0 s 2 ) X 12 + + X n2 = Y 12 + + Y n2 (n-1)s 2 =Y 22 + + Y n2 1.2.1 (2) (3)
1.3.2 X 1 Xn 1 Y 1 Yn 2 X ~ N (m 1 s 2 1 ) Y ~ N (m 2 s 2 2 ) X n1 n1 1 2 1 2 i, 1 ( i ) 1 i= 1 n1-1 i= 1 X = X S = X - X n Y n2 n2 1 2 1 2 j, 2 ( j ) 2 j= 1 n2-1 j= 1 Y = Y S = Y -Y n
(1) ~ F(n 1-1, n 2-1) S 12 /S 2 2 s 12 /s 2 2 (2) s 2 1 = s 2 2 S ( n - 1) S + ( n -1) S 2 2 2 1 1 2 2 W = n1 + n2-2 ( X -Y)-( m1 -m2) ~ tn ( 1 + n2-2) 1 1 SW + n n 1 2
1.3.3 (Cochren) X 1 X n X ~ N (0 1 ) X = (X 1 X n ) T A i ( 1 i r) n i A 1 A r = I n (X 1 X n ) T r X T A i X X T A i X ~ c 2 (n i ) n 1 n 2 n r = n
2001 ( 2003 ) 1. ( ) (10%) 83.36 225.82 (20%) 0.79 5.26 8.61 39.55 45-54 13.2 48.56
2. ( ) (10%) 16.96 30.27 (20%) 1.03 1.00 3.99 6.80 35-44 4.43 5.64 45-54 4.65 7.64 3. ( / ) 12.1% 12.5%
1.3.14 (1) X ~ U (0 q ) X (n) (2) X ~ U (q q +1) E [ X (1) + X (n) ] 1.3.15 X h 1 (c) = E(X - c) 2 c = E X h 2 (c) = E X - c c = M(X) 1.3.16 Gamma 1.2.2
1.4
1. (Random Sample)
2. ( Convenience Sample )
3. ( Simple Random Sample ) Remark ( )
(1) ( Stratified Sampling ) (2) ( Cluster Sampling ) (3) ( Multi-stage Sampling ) (4) ( Systematic Sampling )
( Sampling Error ) 1.
95% 60% 3 95% (60 3) % 57% 60% 63% m
2. ( ) 2.1 ( Nonresponse Error )
2.2 ( Response Error ) 1.4.1 Rugg A B
1.4.2 1984
Hawthorne
(1) ( ) (2) (3)
R.A.Fisher
1936 62% 38% 1000 200 57% 43%
Gallup 3000 5,000 1936 1100 900
1. Warner 2. Simons
. 1.5 EXCEL 1. AVERAGE( ) MEDIAN ( ) MODE ( )
2. VAR ( ) STDEV ( )) MAX ( ) - MIN ( ) Remark DEVSQ
3. RANK ( 1) RANK ( 0) Remark RANK
. p- p- EXCEL NORMSDIST z 0 0 p- = NORMSDIST(z 0 ) z 0 0 p- = 1 - NORMSDIST(z 0 )
1.5.1 X ~ N(0 1) P( X - 0.8765 ) = NORMSDIST(- 0.8765 ) = 0.19038 P( X 5.1234 ) = 1 - NORMSDIST(5.1234 ) = 1.50304 10-7
p- EXCEL CHIDIST p- = CHIDIST(K 02 ) 1.5.2 X ~ c 2 (30) P( X 51.234 ) = CHIDIST(51.234 30 ) = 0.0092
t p- EXCEL TDIST t 0 0 p- = TDIST(t 0 1) t 0 0 p- = TDIST(ABS(t 0 ) 1 ) TDIST(t 0 2)
1.5.3 X ~ t (20) P( X 3.456 ) P( X - 3.456 ) = TDIST(3.456 20 1 ) = 0.00125 P( X 3.456 ) = TDIST(3.456 20 2 ) = 0.00250
F p- EXCEL FDIST p- = FDIST(F 0 ) 1.5.4 X ~ F(21 45 ) P( X 4.5678 ) = FDIST (4.5678 21 45 ) = 9.61377 10-6
EXCEL BINOMDIST BINOMDIST ( 1 0) 1 0
1.5.5 500 220 BINOMDIST (220 500 0.5 0)= 0.000973081 220 ( 280 ) BINOMDIST (220 500 0.5 1)= 0.00413 1.5.6 20 300 65