UDC
1 ABSTRACT Data mining is one of the research fields that are developing fast recently. It includes many subjects such as mathematics, statistics, databases, pattern recognition, artificial intelligence, optimization etc. With the development of the society, people are not satisfied with the ordinary processing about data and hope to get something deep, undiscovered and valuable information from data by some way. So, data mining emerges as the times require and develops gradually. The focus of this thesis is to put forward the methods based on gray system theory for data mining and to mine the financial data of the companies in the Chinese stock market along with the knowledge of financial analysis. We give some examples and obtain some results from them by using the methods. And the methods are performed in computer. Chapter 1: here we give an overall statement of data mining, including introduction of the basic concept, the methods and research branches up to the present etc. Chapter 2: before explaining the mining methods, we introduce the basic concept and methods for financial statement analysis, explicate the formulae and the meanings of financial ratios which are mentioned in the financial ratio analysis. Allowing for the advantages of data mining, we also discuss some other methods that can be applied to mine financial data. Chapter 3: after giving some related concept of gray system theory, we put forward the methods of the Growth Rate Mining and the Development Situation Mining to evaluate the companies growth and development in some aspects. Then we carry out the gray association and the gray clustering on financial data to get knowledge about the association relations between financial variables and the clustering results among them. Chapter 4: The chapter mainly introduces how the mining system is implemented in software. Chapter 5: a conclusion about this thesis is made and the future expectation of data mining is discussed. Key Words data mining, gray system theory, financial analysis i
ABSTRACT ii 1 1 1 1.1 1 1.2 1 2 2 2.1 2 2.2 APRIORI 3 2.3 4 3 5 4 6 5 8 9 1 9 2 9 2.1 9 2.1.1 10 2.1.2 10 2.2 10 2.2.1 10 2.2.2 10 2.2.3 10 2.2.4 10 3 10 3.1 11 3.2 12 3.3 12 3.4 13 3.5 13 3.6 13
4 14 4.1 14 4.2 15 4.3 17 4.4 18 4.5 18 5 19 21 1 21 2 21 3 24 3.1 24 3.2 26 3.3 33 4 34 4.1 34 4.2 35 4.3 37 5 38 5.1 38 5.2 41 5.3 42 5.4 45 6 46 6.1 46 6.2 46 6.3 46 7 47 48 1 48 2 49 iii
3 51 4 52 5 53 6 54 56 1 56 2 56 58 59 iv
1 MIS SQL 2 2.1 2 2.2
[26] ID3 3 R. Agrawal 3 90 3.1 I = i, i,..., i } m D T { 1 2 m 2
I T I TID I X X T T X Association Rule X Y X I, Y I X Y = φ D % c X Y X Y D Confidence c% D % s X Y X Y D Support s% 1 D Frequent itemsets 2 A A B Support( A)/ Support( B) min conf B ( A B) 3.2 Apriori R. Agrawal 3 Apriori Apriori Apriori 1- L L 2- C 2 C 2 2-2- L2 L2 C 3 1 L 3 1.1 Apriori-gen 1 k - k - 1 1 insert into C k select p.item1, p.item2,, p. item k 1, q. item k 1 from L k 1. p, L k 1. q 3
c where p.item1 = q.item1, p.item2 = q.item2,, p. item k 2 = 2 q. item k 2, p. item k 1 < q. item k 1 ; C k c Lk 1 k 1 C k for all itemsets c C do for all (k-1)-subsets s of c do k if (s L k 1 ) then delete c from C k ; (1) L 1={large 1-itemsets}; Apriori Algorithm (2) for (k=2, L φ, k++) do begin k 1 (3) C k =apriori-gen( L k 1 ); //New candidates (4) for all transactions t D do begin (5) C t =subset( C k,t); //Candidates contained in t (6) for all candidates c Ct do (7) c.count++ (8) end; (9) L k ={c Ct c.count minsup}; (10) end (11) Answer= U L k k ; 1.1 Apriori 3.3 Apriori Park Hash DHP [34] [21] 4
[22] Agrawal CD CaD DD [28] Park PDM [29] PMAR [27] [16] Srikant Cumulate [12] Han Jia-wei ML_T2L1 [35] ML_AR [11] AR_SET [32] 9 13 18 19 23 30 [15,17,24] [10,25] [14,33] 4 [37,38] ID3 [4] J. R. Quinlan ID3 ID3 ID3 ID3 ID3 ID3 ID3 E = F1 F2... Fn n F j 5
E e =< ν 1, ν2,..., νn > ν j F j j = 1,2,..., n k Pi i = 1,2,..., k E i E A A v a 1,..., a v E v { E1, E2,..., Ev} E i j Pij j = 1,2,..., k i = 1,2,..., v E I E ) i ( i I( E ) i = k j= 1 P ij E i P log E A E( A) = v i= 1 Ei I( Ei ) E ij * A E(A) 39 44 41 42 C4.5 [5] SLIQ [6] SPRINT [7] PUBLIC [8] MID3 [43] 40 5 Partitioning algorithms n k 46 CLARANS Hierarchical algorithms i P i 6
47 48 45 R- R- 49 CF CF CF- BIRCH CF- CF- CF- Algorithm DBSCAN(D,Eps,MinPts) //All objects in D are unclassified. For All objects o in D Do If o is unclassified Call function expand_cluster to construct a cluster wrt. Eps and MinPts containing o. Function expand_cluster(o,d,eps,minpts): retrieve the Eps-neighborhood N Eps (o) of o; If N Eps (o) <MinPts //ie. o is not a core object mark o as noise and RETURN; Else //ie. o is a core object select a new cluster-id and mark all objects in N Eps (o) with this current cluster-id; push all objects from N Eps (o) \{o} onto the stack seeds; While not seeds.empty() Do currentobject:=seeds.top(); retrieve the Eps-neighborhood N Eps ( currentobject) of currentobject; If N Eps ( currentobject) MinPts select all objects in N Eps ( currentobject) not yet classified or are marked as noise; push the unclassified objects onto seeds and mark all of these objects with current cluster-id; seeds.pop(); RETURN 1.2 Single scan algorithm 7
8 50 DBSCAN p Eps MinPts DBSCAN 1.2 DBSCAN I/O [55] FDBSCAN 6 1996 1999
9 1 Financial Statement Analysis 2 2.1
10 2.1.1 2.1.2 2.2 2.2.1 2.2.2 2.2.3 2.2.4 100 3
Degree papers are in the Xiamen University Electronic Theses and Dissertations Database. Full texts are available in the following ways: 1. If your library is a CALIS member libraries, please log on http://etd.calis.edu.cn/ and submit requests online, or consult the interlibrary loan department in your library. 2. For users of non-calis member libraries, please mail to etd@xmu.edu.cn for delivery details.