6
01 06 2006 12 5 PK 20 4 4 800 1000 IBM 1997 1943 Warren McCulloch Walter Pits MP 1949 Hebb Hebb Hebb 145
01 SQL Server 2008 Data Mining 1969 Seymour Papert Marvin Minsky XOR 1982 John Hopfield HNN 1986 David E. Rumelhart Back-propagation Papert 6.1 6.1.1 6.1 Rosenblatt Perceptron 1957 10 11 Nerve Cells Soma Axon Dendrites Synapse 146
01 6.1 06 Excitatory Synapse Pulse Rate Inhibitory Synapse 6.1.2 6.2 X1 X2 X3 W1 W2 W3 X1 W1 I j w ij O i j i I O w Activation Function Y1 147
01 SQL Server 2008 Data Mining S SQL Server 2008 Sigmoid Logistic Y=1/ (1+e -x ) 1 0 O j 1 1 e -I J 6.1.3 6.1 6.2 SQL Server 2008 Back-propagation Network Input Layer Output Layer Hidden Layer 148
01 over-fitting Topology 06 6.3 MAXIMUM_STATES n n+1 500 Analysis Services 149
01 SQL Server 2008 Data Mining Error j O j (1 O j ) (T j O j ) O T Error j O j (1 O j ) Error k w jk ) k 6.1 Means Squared Error MSE w ij I* Error j O i w ij w ij w ij 150
01 06 l Learning Rate 0~1 The Gradient Steepest Descent Method 6.3 Step01 W 14 W 15 W 24 W 25 W 34 W 35 W 46 W 56 4 5 6 0.2-0.3 0.4 0.1-0.5 0.2-0.3-0.2-0.4 0.2 0.1 Step (X1,X2,X3,Y)=(1,0,1,1) 4 I j = w ij O i j =1*0.2+0*0.4+1*-0.5-0.4=-0.7 i 1 1 O j = = =0.332 1 e -I j 1 e 0.7 5 I j = w ij O i j =1*-0.3+0*0.1+1*0.2+0.2=0.1 i 1 1 O j = = =0.525 1 e -I j 1 e -0.1 Step 6 I j = w ij O i j =0.332*-0.3+0.525*-0.2+0.1=-0.1 i 151
01 SQL Server 2008 Data Mining Step 0.474 1 6 6 1 1 O j = = =0.474 1 e -I j Step 4 Error j O j (1 O j ) Error k w jk )=0.322(1-0.322)(0.1311)(-0.3)=-0.0087 k 5 Step06 0.9 W 14 =0.2+0.9(-0.0087)(1)=0.192 W 15 =-0.3+0.9(-0.0065)(1)=-0.306 W 24 =0.4+0.9(-0.0087)(0)=0.4 W 25 =0.1+0.9(-0.0065)(0)=0.1 W 34 =-0.5+0.9(-0.0087)(1)=-0.508 W 35 =0.2+0.9(-0.0065)(1)=0.194 W 46 =-0.3+0.9(0.1311)(0.332)=-0.261 W 56 =-0.2+0.9(0.1311)(0.525)=-0.138 4 =-0.4+0.9(-0.0087)=-0.408 5 =0.2+0.9(-0.0065)=0.194 6 =0.1+0.9(0.1311)=0.218 1 e 0.1 Error j O j (1 O j )(T j- O j )=0.474(1-0.474)(1-0.474)=0.1311 Error j O j (1 O j ) Error k w jk )=0.525(1-0.525)(0.1311)(-0.2)=-0.0065 k 6.1 152
01 06 1/2 2 2 =4 2 CRIPS-DM Validation Set Holdout Set SQL Server 2008 HOLDOUT_PERCENTAGE 153
01 SQL Server 2008 Data Mining 6.1.4 1 0 3 6.1 6.4 SQL Server 2008 154
01 06 Co-linearity 3 6.2 6.2.1 155
01 SQL Server 2008 Data Mining 6.2.2 BROADBAND 6.2 6.5 CUST_ID int GENDER char(1) AGE int TENURE int CHANNEL char(1) AUTOPAY char(1) ARPB_3M float 156
06 01 CALL_PARTY_CNT float DAY_MOU float AFTERNOON_MOU float NIGHT_MOU float AVG_CALL_LENGTH float BROADBAND char(1) minute of usage MOU minute of usage MOU Case...when... DAY_MOU _RATIO: CASE WHEN (DAY_MOU+AFTERNOON_MOU+NIGHT_MOU)=0 THEN 1 ELSE DAY_MOU/(DAY_MOU+AFTERNOON_MOU+NIGHT_MOU) END 157
01 SQL Server 2008 Data Mining AFTERNOON_MOU MOU =0.8786 3 MOU 6.6 Step01 Step Step BROADBAND Step Step Step06 Step07 6.2.3 6.2 SQL Server 2008 158
01 SQL Server 2008 6.7 06 6.7 T-SQL WHERE 6.8 6.8 159
01 SQL Server 2008 Data Mining 6.9 1 2 6.10 18.0~30.4 Tenure 38.3~72.0 1 BROADBAND=1 2 BROADBAND=0 1 38.3~72.0 1.0~14.1 6.2 160
01 06 6.10 6.11 1 45.38% 2 53.48% 6.11 1 2 1 14.79% 2 85.21% Lift Lift 1 42.00% 14.79% 2.840 57.82% Lift 2 85.21% 0.678 161
01 SQL Server 2008 Data Mining 2.84 1 100 6.11 100 0 6.12 6.12 6.2 1 1 6.13 162
01 06 6.2.4 6.13 Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-1, 1978 HOUSE_ PRICE HOUSE_ID Int CRIME_RATE Float BUSINESS_RATIO Float RIVER_SIDE char(1) NOX Float ROOM_NUM Int HOUSE_AGE Float WIIGHTED_DISTANCE Float 163
01 SQL Server 2008 Data Mining HIGHWAY_INDEX Float TAX_RATE Float TEACHER_RATE Float BLACK_INDEX Float 1000(Bk - 0.63)^2 LOW_STATUS_RATIO Float HOUSE_PRICE Float 6.2 6.14 164
01 6.14 LOW_STATUS_RATIO 6.15 2 1 LOW_STATUS_RATIO WIIGHTED_DISTANCE TEACHER_RATE 06 6.15 6.3 6.3.1 SQL Server 2008 HIDDEN_NODE_RATIO HOLDOUT_PERCENTAGE HIDDEN_NODE_RATIO * SQRT({ } * { }) 4 30 165
01 SQL Server 2008 Data Mining HOLDOUT_SEED 0 MAXIMUM_INPUT_ ATTRIBUTES 0 255 MAXIMUM_OUTPUT_ ATTRIBUTES 0 255 MAXIMUM_STATES 100 SAMPLE_SIZE SAMPLE_SIZE * (1 - HOLDOUT_ PERCENTAGE/100) 10000 6.3.2 Pyramids 6.3 HIDDEN_NODE_RATIO HIDDEN_NODE_RATIO * SQRT({ } * { }) HIDDEN_NODE_RATIO 166
06 1 RIVER_SIDE 11 TEACHER_RATE RIVER_SIDE 1 0 TEACHER_RATE 10 1 14 SQRT(1*14)=3.74 HIDDEN_NODE_RATIO 4 3.74*4=14.96 14 HIDDEN_NODE_RATIO 2 3.74*3=11.22 11 01 167