( ) ( ) ( NSC 84-2121-M004-009)
2 ( 1) Minitab SPlus ( 1):
3 1.1 1.2
4 2 (1980) ( 1952 ) (Significant difference (Two sample) (Change-point problem (1980) ( t-test) (Chi-square test)
5 ( 2 P187 2.1 2.1.1 1980 ( ) 3.677 3.392 0.116 3.391 3.910 P P- value
6 2.1.2 Literary style Stylometry Augustus de Morgan 1851 1930 G. Udny Yule George Zipf (Pattern) Frederick Mosteller David Wallace (1984) (Bayesian techniques) (The Federalist Papers) Alexander Hamilton James Madison John Jay 77 65 12 Hamilton Madison Masteller Wallace 55 Hamilton 90 1 Madison Shakespeare Bradley Efron Ronald Thisted 1976 1985 C.B.Williams(1975) (Bacon) (Prose) (Verse)
7 Mendenhall B.J.R.Bailey(1990) (Function words or contextfree words) (Binomial distribution), (Articles) (Chi-square test) A.F.Bissell(1995) Weighted Cumulative Sums Weighted Variance Estimation 26 a the 2.2 : (Populations) (Two sample) ; Change-point problem 2.2.1 (Time Series Analysis) (Discriminant Annlysis)
8 (Linear Discriminant function) 2.2.2 1930 (Quality Control) ( ) (Fixed Sample Size) David V. Hinkley Elizabeth A. Hinkley(1970) (Maximum likelihood estimate MLE) (Asymptotic distribution)
9 (Likelihood Ratio Tests) A. N. Pettitt(1980) (Conditional test) (Simulation) MLE K. J. Worsley (1983) (Cumulative Sum test CUSUM test) E.S. Page(1954) MLE H. Chernoff S. Zack(1964) 1 (Bayes test) (Critical value) (Power function) David V. Hinkley (1971) MLE S. Zacks (1991) B. E. Brodsky B. S. Darkhovsky (1993) 3.1 ( ) ( ) ( ) 3.2 3.3 3.4
10 DBase IV 3.2 ( ) ( 3) ( ) 1853 6321 29.3% 1.6% ( ) 19.3%( 1745 9039 ) 3.2 1 80 40 0 43 28 0 37 12 1~200 21 11 201~400 7 0 401~600 5 1 601~800 1 0 801~1,000 1 0 1,001~1,200 0 0 1,201~1,400 0 0 1,401~1,600 0 0 1,601~1,800 1 0 1,801-1 0 80 40
11 3 200 200, 56.8% 200 91.7% 3.2-1 z ( ) + = = +, n 1 = 80 n 2 p 1 = 40 p 2 P 0.0439 Η 0 : p 1 = p 2 V.S. Η 1: p1 > p2 P 0.0878 Η 0 : p 1 = p 2 V.S. Η 1: p1 p2 P 3-2.1 406 100 200 0.029( Two-
12 sided test) 135.9 22.9 57.10 0 3.2-1 2000 Poem 1000 0 Index 20 40 60 80 100 120 3.3 (Function Words) 3.3.1 3.3.2 3.3.3 3.3.4
13 3.3.1 3.3.2 (1980) 3.3.2 ( ) 3.3.1 100 720 3.3.-2 8.03 8.73 8.25 3.3. 2 1~80 4024 2405 14293 10216 3782 501284 (8.03) (5.00) (28.51) (20.83) (7.54) 81~120 2066 1501 6956 5513 2382 236740 (8.73) (6.34) (29.38) (23.29) (10.06) 1~120 6090 4005 21249 15729 6164 738024 (8.25) (5.43) (28.79) (21.31) (8.35)
14 3.10 7.31 2.08 8.08 11.10 0.02 0.001 ( ) 3.3.4 3.3.2 (1) 3.3-3 0 8.29 3.3-2 3.3-3 3 80% 3 65% z 5.41 0 (
15 5.1 2.3 ) ( ) t 3.3 3 80 0 79 21 1 0 10 2 1 7 3 0 2 40 80 40 3.3-2 3 2 C31 1 0 Index 20 40 60 80 100 120 t : 0.038 =62.77 24.92 =33.90 1.875 -Value=0.000 41.60 -Value=0.000
16 (2) 10.75 5.1 5.93 0 t : 8.61 =0.351 10.75 =5.93 9.78 -Value=0.87 5.10 -Value=0.000 (3) 10 30 2 12 t : 23.67 =2.06 10.99 =4.28 19.60 -Value=0.042 7.32 -Value=0.000 4 10.48 9.85
17 5.68 4.91 1~60 3.3.3 3.3-4 3.3-4 80 40 3 29 23 0 14 7 15 0 6 1 7 0 1 3 11 0 80 40 15
18 1 2 3 4 5 6 7 8 80 Y 1 3 23 14 15 6 7 1 10 80 40 Y 2 29 0 7 0 1 0 3 0 40 32 23 21 15 7 7 4 10 120 ( Y n P$ ) χ 2 2 8 ij j j = n P$ i = 1 j = 1 Y $ 1j + Y2j Pj =, j = 12,,..., 8 120 n = 80, n = 40 1 2 j j 2 = 78. 30 P 0 40 36 4.29 P 0.038 1~60 61~80 1 2 3 4 5 6 7 8 1~60 1 18 11 12 6 7 0 3 60 61~80 2 5 3 3 0 0 1 7 20 1~80 3 23 14 15 6 7 1 0 80 χ 2 =21.34 P 0.0033 1~60 61~80 30 10 0.5 1 0 1~120 3.3-3
19 1~40 41~80 3.3-3 1.0 C1 0.5 0.0 Index 20 40 60 80 100 120 3.3.4 : True Group Put into Group 1 2 1 75 4 2 5 36 total 80 40 Squared Distance between Group=7.16587
20 92.5 F F n1+ n2 p 1 nn 1 2 n n p n + n ( + 2) 1 2 1 2 2 D = 94. 62 > F pn, n p 1 0 01 1+ 2. ( ) n n 1 = 80 2 = 40 P=13 D=7.16587 P 0 3.4 K.J.Worsley 1983 CUSUM Likelihood Ratio Likelihood Ratio 0 CUSUM X i iid ( ) ~ B n, p i i, 0 p 1 i for i = 12,,..., 120 n i i i p i p H0: pi = p i = 12,,..., 120 v.s. H1: pi = { p i = 1,..., k i = k + 1,..., 120 p p m i i M N = = 120 i = 1 120 i = 1 m n i i M N K K = = K i = 1 K i = 1 m n i i i = 1, 2,..., 120 CUSUM
21 Q K = M r M K N K σ 2 P M 2 N K 2 0 =, σ = P0 ( 1 P0 ) rk =, SK = rk( 1 rk) N N 2 QK Q 2 K k S K Pearson χ 2 ( ) 4.1 : Utility Function 17 17 7
22 7/17 17 12 7 (1980)
23 ( ) ( ) ( ) 4.2 (Species Problem) (Capture-Recapture Model) (population) (1994)
24 ( ) : (1) : 1., (1980)," ",. 2. (1977)," ",. 3. (1983)," ",. 4. (1984)," ",. 5. (1994)," ",. 6. (David Steelman, 1994)," -- ",. (2) : 1.Bailey, B. J. R.(1990)"A Model for Function Word Counts", Applied Statistics, 39, pp.107-114. 2.Bissel, A. F.(1995)"Weighted Cumulative Sums for Text Analysis using Word Counts", Journal of Royal Statistics, Series A, 158, pp.525-545. 3.Brodsky, B. E. and Darkhovsky, B. S.(1993)"Nonparameteric Methods in Changepoint problems", Academic Publishers. 4.Chernoff, H. and Zacks, S.(1964)"Estimating the current mean of a Normal Distribution which is Subject to Changes in Times", Annals of Mathematical Statistics, 35, pp.999-1028. 5.Efron, B. and Thisted, R.(1976)"Estimating the Number of Unseen Species: How many Words did Shakespeare Know?", Biometrika, 63, pp.435-447. 6.Hinkley, D. V. and Hinkley, E. A.(1970)"Inference about the Change-point in a Sequence of Binomial Variables", Biometrika, 57, pp.447-488. 7.Hinkley, D. V.(1971)"Inference about the Change-point from Cumulative Sum Tests", Biometrika, 58, pp.509-523.
8.Holmes, D.(1995)"Who was the Author?", Journal of Royal Statistics News, Vol. 23, No. 2, pp.1-2. 9.Horvath, L.(1989)"The Limit Distributions of Likelihood Ratio and Cumulative Sum Tests for a Change in a Binomial Probability", Journal of Multivariate Analysis, 31, pp.148-159. 10. Karlgren, B. (1952), "New Excursions in Chinese Grammar", in Bulletin of the museum of Far Eastern Antiquities (Stockholm), No. 24, pp. 51-80. 25 11.Mosteller F. and Wallace D.L.(1984),"Applied Bayesian and Classical Inferrence:The Case of The Federalist Papers",Springer-Verlag. 12.Pettitt A. N.(1979)"A Nonparametric Approach to the Change-point Problem", Applied Statistics, 28, pp.126-135. 13.Pettitt A. N.(1980)"A Simple Cumulative Sum Type Statistic for the Change-point Problem with Zero-one Observations", Biometrika,67,pp.79-84. 14.Smith, A.F.M.(1975)"A Bayesian Approach to Inference about a Change-point in a Sequence of Random Variables", Biometrika,62,pp.407-416. 15.Thisted,R.and Efron, B.(1987)"Did Shakespeare Write a Newly-discovered Poem?", Biometrika, 74, pp.445-455. 16.Sichel, H.S.(1986)"Word Frequency Distributions and Type-token Characteristics", The Mathematical Scientist, 11, pp.45-72. 17.Williams, C. B.(1975)"Mendenhall's Studies of Word-length Distribution in the Works of Shakespeare and Bacon", Biometrika,62,pp.207-212. 18.Worsley, K.J.(1983)"The Power of Likelihood Ratio and Cumulative Sum Tests for a Change in a Binomial Probability", Biometrika,70,pp.455-464. 19.Yue, C.J.(1994)"Bayesian Sequential Tests for Comparing the Species Richness of Two Populations", Ph.D. thesis, Univ. of Wisconsin-Madison. 20.Zacks, S.(1991)"Detection and Change-point Problems", Handbook of Sequential Analysis, Mercel and Dekker.