) ( 91 9 11-34 * ** * ** (91 3 14 91 5 8 91 7 3 ) (Data Mining) KDD (Data Mining) Data Mining ) ( (Database Segmentation) (Link Analysis) (Deviation Detection) ( Neural networkfuzzy theorygenetic Algorithms Rough Set ) ( ) Data Mining Data Mining Data Mining Internet
12 91 9 Data Mining Data Mining Data MiningData Mining Data Mining (Hypothesis) (Verify) (Data Mining) (Trend) (Pattern) (Relationship) ( Knowledge Discovery in Databases, KDD) (Data Archaeology) (Data Pattern Analysis) (Functional Dependency Analysis) Data Mining (1)Database systems, Data Warehouses, OLAP (2)Machine learning (3)Statistical and data analysis methods (4)Visualization (5)Mathematical programming (6)High performance computing Data Mining
13 Applications of Data Mining Customer-focused Operations-focused Research-focused Life-time Value Market-Basket Analysis Profiling & Segmentation Retention Target Market Acquisition Knowledge Portal Cross-Selling Campaign Management E-Commerce Profitability Analysis Pricing Fraud Detection Risk Assessment Portfolio Management Employee Turnover Cash Management Production Efficiency Network Performance Network Performance Manufacturing Processes Combinatorial Chemistry Genetic Research Epidemiology Data Mining (subject-oriented) (integrated) (time invariant) (nonvolatile) System) 1.Organized around major subjects, e.g., customer, supplier, product, sales 2.Not on day-to-day transaction, point of sale 3.Focus on modeling and analysis of data for decision making 4.Typically provides a concise view around particular subject issues...
14 91 9 1.Data warehouse usually constructed by integrating multiple heterogeneous sources, such as: relational DB, flat files, on-line transaction records 2.Data cleaning techniques are applied to ensure consistency of naming conventions, encoding structures, attribute measures, 1.Data are stored to provide information from a historical perspective, every key structure in the data warehouse contains an elementary of time. 2. 60-90 ) ( 1-10 1.Data warehouse is always a physically separate store of data, transformed from the application data in the operational environment 2.It usually requires 2 operations for data accessing: initial loading and access of data. 3.
15 (bottom up) ( ) ( independent data mart) Data Mart Data Marts (strategic) (tactic 50 gigabytes (GB) 1 terabyte (TB) (top down) 1.Data Mart Centric: 2.Enterprise Data Warehouse with Dependent Data Marts: Dependent Data Marts Enterprise Data Warehouse (Data Mart)
16 91 9 (Enterprise Data Warehouse) (Full Scale) Enterprise Data Warehouse with Dependent Data Marts Independent Data Marts Enterprise Data Warehouse ( Danger Zone) Data Warehouse Data Mart Data Mart Data Mart Data Warehouse (Heterogeneous) Information Data Data Mart (Customer Relationship Management) (Effectiveness) Deliver The Right Thing To The Right People At The Right Time
17 1.Top-down Starts with overall design and planning (mature). (Water fall structured and systematic analysis at each step before proceeding to the next.) 2.Bottom-up Starts with experiments and prototypes (rapid). (Spiral rapid generation of increasingly functional system, short turn around time, quick turn around.) A data warehouse is based on a multi dimensional data model which views data in the form of a data cube. Product Country Cube: A lattice of cuboids.( ) Date Industry Category Product Dimensions Product, Country, Date Hierarchical Summarization Paths Country Region City Office Year Quarter Month Week Day
18 91 9 1.Star Schemaa single object (fact table) 2.Snow Flake Schema () dimensions 3.Fact Constellation () (On Line Transactional Processing, OLTP) (subject) OLTP Data Warehouse Systems ( ) ( ) ) ( (join) OLTP OLTP ( OLTP) ) ( ( ) ( OLTP )
19 1.Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction 2.Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions 3.Slice and dice: ) ( 4.Pivot (rotate): 1.Relational OLAP (ROLAP): (1) () (2) (3) ) ( 2.Multidimensional OLAP (MOLAP): (1)Array-based ( ) multidimensional storage engine (2) (3) ( Cube ) 3.Hybrid OLAP (HOLAP): User flexibility, e.g., low levelrelational, high levelarray OLAP (Online Analytical Process) OLAP Data Mining Mining OLAP Data Mining OLAP
20 91 9 OLAP Data Mining Data Mining OLAP (subject-oriented) (integrated) (time invariant) (nonvolatile) System) Data Warehousing ( ) Data Mining Data mining ( ) Data Mining Data Mining Statistics Data Mining CARTCHAID Data Mining Data Mining Data Mining 1. Data Mining 2. Data Mining 3. Data Mining Data Mining
21 KDD (Data Warehouse) (Data Mining) (Legacy) (DSS) (OLTP) (integrated data) (detailed and summarized data) (Metadata) Data Mining Data Mining Data mining ( ) Data Mining KDD (Knowledge Discovery in Database) Data Mining Fayyad KDD The nontrivial Process of identifying validnovelpotentially useful, and ultimately understandable pattern in data (Selection) (Pre-processing) (Transformation) Data Mining (Patterns) Interpretation/Evaluation
22 91 9 KDD Data Mining Data MiningData warehousekdd Data warehouse KDD Data Mining KDD Data Mining (1) (classification) (2) (estimation) (3) (prediction) (4) (affinity grouping) (5) (clustering) (class)
23 (decision tree) (memory-based reasoning) () (cross-selling) (clusters) (segmentation) k-means agglomeration Data Mining Domain-specific Tools (mutual funds)
24 91 9 Data Mining MDT Coverstory and SpotlightNichWork visualization systemlbsfalcon FAISNYNEXTASA 150 163 15 130 135 Data Mining (patterns) (sp data) (scalability) Model Data Mining Data Mining Two Crows Corp. Data Mining - - Customer Profiling Targeted Marketing Market-Basket Analysis Customer Profiling Data Mining Market-Basket Analysis
25 Data Mining Data Mining (Fraud Detection) Data Mining Data Mining Data Mining Data Mining 1. 2. (Sequential Pattern Detection) 3. 4. (Segmentation) 5. 6. 7.
26 91 9 8. 9. 10. 11. 12. 13. 14. 15. 16.NBA 17. 18. 19. 20. Glymour 1. 2. (Acquisition) 3. (Integration and checking) 4. (Data cleaning) 5. (Model and hypothesis development) 6. 7. (Testing and verfication) 8. (Interpretation and use)
27 Data Mining 80% Join Data Mining Data Miming (Model) (Relations) Association Model Data Mining ClassificationRegressionTime Series ClusteringAssociation SequenceClassification Regression Association Sequence ( )Clustering Classification ( " "" " ) Classification Classification Model Classification Logistic Regression Discriminant Analysis Data Mining Neural Nets Decision Tree
28 91 9 Neural Nets ( Net Node) (Node) (Weighted Sum) '' ' (Weights) ' ' Back-Propagation Weights Neural Net ' Neural Net Neural Net ' ' Neural Net Decision Tree ''' 'Decision Tree ' 40000' '' ' 40000' ' 5 ' Decision Tree Neural Net Decision Tree Neural Net Regression Neural Net Clustering Decision Trees
29 Regression Time-Series Forcasting Regression Time-Series Forcasting Time-Series Forcasting ( ) Clustering Clustering Classification Association Association Item A Item B X%( 85%) Sequence Discovery Association Sequence Discovery Item ( X Y 45% A 12% B 68%) Data Mining
30 91 9 (1)Case-Based Reasoning (2)Data Visualization (3)Fuzzy Query and Analysis (4)Knowledge Discovery (5)Neural Networks AI Domain algorithms) Data Mining Data Mining Data mining tools Case-based Reasoning Data Visualization Means 1.CBR Express record 2.Esteen 3.Kate-CBR 4.The Easy Reasoner 1.Alterian 2.AVS/Express 3.Visualization Edition 4.Axum 5.Discovery 6.SPSS Diamond 7.Visual Insight Fuzzy Query and Analysis 1.CubiCalc 2.FuziCalc 3.Fuzzy TECH for business 4.Quest Knowledge Discovery Neural Networks 1.Aria 2.Answer tree data mining 3.CART 4.DARWIN 5.Enterprise Miner 6.DataEngine 1.BackPack 2.BrainMaker 3.Loadstone 4.NeuFrame/NeuroFuzzy 5.Neural network Browser 6.Neural connection 7.Neural network Utility 8.Neuralyst For Excel
31 John Holland 1975 1. 2. Data Mining Data Mining (selection) (reproduction) (crossover) (mutation) (robustness) (domain independence) Data Mining Data Mining Data Mining 1.MLC++ (pd) 2.MOBAL (pd) 3.MOBAL (pd) 4.Emerald (rp) 5.Kepler (rp) 6.Clementine (cp) 7.DataMind DataCruncher (cp) 8.Darwin (cp) 9.Intelligent Miner (cp) 10.INSPECT (cp)
32 91 9 11.NeoVista Solutions (cp) 12.Nuggets (cp)partek (cp) 13.Polyanalyst (cp) 14.SAS Data Mining (cp) 15.SGI MindSet (cp) 16.Knowledge Explorer (cp) 17.DataEngine (cp) 18.Delta Miner (cp) 19.S-PLUS (cp) 20.MATLAB (cp) 21.Mathematica (cp) 22.XGOBI (pd) 23.Crystal Vision neé ExplorNsphinxVision 24.Graf-FXIRIS 25.Spotfire 26.Netmap 27.Visible Decisions Inc. 28.Visual Mine (Data Mining) CRM CRM (Data Warehousing) (Masscustomization)
33 CRM (Business Intelligence) Intelligence) (Sales Intelligence) (Service Intelligence) (hit rate) DM 10% 80% / (One-Shot) CRM CRM CRM
34 91 9 530 67-842000 Berry, M. J. A. and G. Linoff, Data Mining Techniques: for Marketing, Sales, and Customer Support, New York: John Wiley & Sons, 1997. Chen,M. S., J. Han and P.S. Yu, Data Mining: An Overview from a Database Perspecitve, IEEE Trans. Knowledge and Data Engineering, 8:886-883, 1996. Fayyad, U.M., G. Piatetsky-Shapiro, and P. Smyth, From Data Mining to Knowledge Discovery in Databases, American Association for Artificial Intelligence, pp.37-54, Fall, 1996. Fayyad, U. M., G. Piatetsky-Shapiro, P.Smyth and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, Cambridge, MA: AAAI/MIT Press, 1996. Fayyad, U.M., Mining Databases : Towards Algorithms for Knowledge Discovery, IEEE Computer Society Techinical Committee on Data Engineering, pp.1-10, 1998. The Tool of Commercial Intelligence -Data Mining ABSTRACT YU-TING CHENG*, CHIH-HSIUNG SU** *Department of Statistics, National Chengchi University **Department of Accounting, Chihlee Institute of Commerce The main purposes of the article are to descript the meaning of data mining and data warehouse, to compare the differences of data mining and statistics analysis, and to descript the relationships between data warehouse, KDD and data mining. At the same time, we also descript the functions, applications, processes, and instruments of data mining. Finally, we descript how to apply data warehouse and data mining in customs relationship management. Keywords: data mining, data warehouse, commercial intelligence, customs relationship management