zwp@ustc.edu.cn Office: 1006 Phone: 63600565 http://staff.ustc.edu.cn/~zwp/ http://fisher.stat.ustc.edu.cn
1.1....................... 1 1.2............... 6 1.3.................... 11 1.3.1............... 12 1.3.2 K-means................. 19 1.3.3.................. 24 1.4.................. 30 1.5................. 38 Previous Next First Last Back Forward 1
1.1 data segmentation class discovery Previous Next First Last Back Forward 1
: : : :,,? Previous Next First Last Back Forward 2
2D 3D Previous Next First Last Back Forward 3
(clustering and classification),, Previous Next First Last Back Forward 4
: PCA, FA, MDS, Manifest learning, SOM. Previous Next First Last Back Forward 5
1.2 (similarity), s(, ) 1. 0 s(x, y) 1 2. s(x, x) = 1 3. s(x, y) = s(y, x) (dissimilarity)... ( ) d(, ) Previous Next First Last Back Forward 6
1..d(x, y) 0, x = y 2. d(x, x) = 0 3. d(x, y) = d(y, x) metric dissimilarity 4. d(x, y) d(x, z) + d(z, y).. 5. d(x, y) max{d(x, z), d(z, y)}. 5, d ultrametric dissimilarity x, y R p,,,. ( ). ( ) Previous Next First Last Back Forward 7
Minkowski: [ p d m (x, y) = x k y k m] 1/m = x y m k=1 Manhattan: ( city-block distance, box-car distance) d(x, y) = Euclidean: p x k y k = x y 1 k=1 d(x, y) = x y 2 maximum:(chebyshev distance) Canberra:( ) d(x, y) = max x i y i = x y d(x, y) = p k=1 x k y k x k + y k Previous Next First Last Back Forward 8
0-1 : x, y 1, x y 1 0 1 a b a+b 0 c d c+d a+c b+d n=a+b+c+d binary(jaccard) : d(x, y) = b + c no 0-0 match a + b + c Czekanowski : d(x, y) = b + c no 0-0 match 2a + b + c double 1-1 match. 12.1. : x, y, p q, Previous Next First Last Back Forward 9
χ 2. Coef. of contingency ( s ij = χ 2 ) 1/2 χ 2 + n Cramer s V contingency coef. s ij = n min{p 1, q 1} ( χ 2,. x i X i j X j : r ij = θ ij = [ n [ n n k=1 (x ik x i )(x jk x j ) k=1 (x ik x i ) ] 1/2 2 n k=1 (x jk x j ) 2 n k=1 x ikx jk k=1 x2 ik n k=1 x2 jk ] 1/2 ) 1/2 Previous Next First Last Back Forward 10
1.3 Previous Next First Last Back Forward 11
1.3.1 (Hierarchical clustering, ). (dissimilarity) (linkage) (agglomerative hierarchical method): ( ) (divisive hierarchical method): ( ). Previous Next First Last Back Forward 12
(Dendrogram) (dendrogram). ( )., 0, Previous Next First Last Back Forward 13
(Linkage criteria) cluster I cluster J single linkage D(I, J) = min i I,j J {d ij} complete linkage D(I, J) = average linkage 1 D(I, J) = n I + n J max i I,j J {d ij} i I,j J d ij. Previous Next First Last Back Forward 14
cluster I cluster J centroid linkage: D(I, J) = x I x J ward method: I J ( ) ( ), W I W J ;. M, W M, I J D(I, J) = W M W I W J Previous Next First Last Back Forward 15
: Single linkage.,. S.. Complete linkage,. ( ), ( ).. Average linkage.,. Centroid linkage ( ), Previous Next First Last Back Forward 16
1.., 2. ( ) (d ij ) 3. (linkage) 4. 5. 3-4, 6.,..,. cluster diana (Kaufman and Rousseeuw, 1990). Previous Next First Last Back Forward 17
, ( ),,, BRICH, BURE, ROCK, Chameleon. Previous Next First Last Back Forward 18
1.3.2 K-means,, k : k,,, ;.,., K-means Previous Next First Last Back Forward 19
K-means MacQueen (1967). : k,. 1. k 2. k C 1,..., C k, x i, i = 1,..., k k x i, i = 1,..., k, 3. k ESS = i=1 j C i x j x i 2 Previous Next First Last Back Forward 20
4., ESS. x i, i = 1,..., k 5. 3 4, (ESS ) K-medoids K-means : K-means, K-means, K-means K-means ( ) R base kmeans K-means, cluster pam(partitioning around medoids) K-medoids. Previous Next First Last Back Forward 21
K-means K-medoids,. CLARA (Clustering LARge Applications, Kaufman and Rousseeuw 1990) :, K-medoids CLARANS (Clustering Large Application based upon RANdomized Search, Ng and Han 1994). Previous Next First Last Back Forward 22
K-medoids 1. D = (d ij ) k 2. k (medoids) 3. ( k ) 4. l (l = 1,..., k), i 0 : i 0 = arg min d ij i C l j C l l 5. 3 4 Previous Next First Last Back Forward 23
1.3.3 (compatness): k-means, hierarchical clustering (connectivity): spectral clustering (Spectral clustering) K-means Previous Next First Last Back Forward 24
( )., K-means. V = (x 1,..., x n ), ( ) W, k,, Previous Next First Last Back Forward 25
, Gaussian kernel : x x W ij = e i j 2σ 2, σ 2 :, W (A, B) = i A,j B W ij A B. A 1,..., A k, cut(a 1,..., A k ) = 1 2 Ā A. k W (A i, Āi) i=1 Previous Next First Last Back Forward 26
: (Ncut, Shi and Malik, 2000): Ncut(A 1,..., A k ) = 1 2 k i=1 W (A i, Āi) vol(a i ) Previous Next First Last Back Forward 27
vol(a) = i A d i = i A n j=1 W ij. Ncut(A 1,..., A k ) NP-hard. Laplacian, Graph Laplacian: L = D W, D = diag(d 1,..., d n ). k(k > 2), h ij = 0. H = (h ij ), : min Ncut(A 1,..., A k ) A V H = D 1/2 T = Relaxation!!! 1, xi Aj, vol(aj ) min A 1,...,A k T race(h LH) st. H DH = I k min T race(t D 1/2 LD 1/2 T ), T R n k. st. T T = I k T opt L sym = D 1/2 LD 1/2 k. Previous Next First Last Back Forward 28
H opt = D 1/2 T opt L rw = D 1 L k. (Shi and Malik, 2000) n x 1,..., x n, k: : W Laplacian L = D W Lu = λdu k, U = [u 1,..., u k ] y i U i,i = 1,..., n, K-means y 1,..., y n C 1,..., C k A 1,..., A k, A i = {x j y j C i } Ng, Jordan, and Weiss (2002). Previous Next First Last Back Forward 29
1.4, k k,. (, )., Silhouette( ) Kaufman and Rousseeuw (1990),. Previous Next First Last Back Forward 30
i, a(i) i C i d(i, C) i C(C C i ) b(i) d(i, C) i Silhouette s(i) = Silhouetter b(i) a(i) max{a(i), b(i)} s = 1 n n sw i i=1 Previous Next First Last Back Forward 31
, 1 s(i) 1, s(i) 1 a(i) b(i). a(i) a(i) i, b(i) i, s(i) 1 i. s(i) -1 i s(i) 0 i, Kaufman and Rousseeuw : s > 0.5 s < 0.2 R cluster silhouette s(i). Previous Next First Last Back Forward 32
Previous Next First Last Back Forward 33
CH index K-means,. k, K-means (within-cluster variation): k W (k) = x j x i 2 i=1 C(j)=i W (k) k, W (k). (Between-cluster variation) : k B(k) = n i x i x 2 i=1 B(k) k, B(k) Previous Next First Last Back Forward 34
W (k) B(k), k: CH(k) = B(k)/(k 1) W (k)/(n k) ˆk = arg max CH(k) 2 k K max Previous Next First Last Back Forward 35
Gap Statistic W (k) k, k! Tibshirani et al. (2001) Gap Statistic : W (k) W (k), W (k) (, ),. Bootstrap, sd k = 1 B Gap n (k) = Enlog(W (k)) log(w (k)) 1 log(wb (k)) log(w (k)) B b b (W b (k) W (k)) 2, W (k) = 1 B b W b (k). Previous Next First Last Back Forward 36
s k = sd k 1 + 1/B, ˆk = inf{k : Gap(k) Gap(k + 1) s k+1 } Previous Next First Last Back Forward 37
1.5 (external criterion) (internal criterion). (reference, ). Purity, F-measure, Rand Statistics, Entropy.. : Davis-Bouldin, Dunn, Expected Density ρ : Elbow criterion, GAP statistics Previous Next First Last Back Forward 38
n, (Reference) C = {C1,..., Cl }. C = {C 1,..., C k } F-measure: (Precision): C j Ci / C j. (Recall): C j Ci / Ci C j C i F-measure:F ij (α) = ( ), α 1. 1+α 1 precision + recall α, F-measure l i=1 C i n max j=1,...,k {F ij} F-measure. Previous Next First Last Back Forward 39
Entropy: C C Entropy H(C) = C j [ C j Ci C j Ci ] log 2 n C C j C C j C i j C j Rand Index, n 11 = {(x i, x j ) x i, x j C i ; x i, x j C k} n 00 = {(x i, x j ) x i C i1, x j C i2 ; x i C k 1, x j C k 2 } n 10 = {(x i, x j ) x i, x j C i ; x i C k 1, x j C k 2 } n 01 = {(x i, x j ) x i C i1, x j C i2 ; x i, x j C k 2 } Rand index R = n 11 + n ( 00 n 2) R = 0, R = 1.. Previous Next First Last Back Forward 40
Adjusted Rand Index Rand index ( ), Hubert and Arabie (1985) Rand index ( : ), 0, 1. Meila (2003) Ajusted Rand index. Previous Next First Last Back Forward 41