K-means

Similar documents
: p Previous Next First Last Back Forward 1

:

: Previous Next First Last Back Forward 1

Previous Next First Last Ba

: p Previous Next First Last Back Forward 1

Previous Next First Last Back Forward 1

spss.doc


,

Microsoft Word - 1神奇的矩阵2.doc

untitled

3.1 ( ) (Expectation) (Conditional Mean) (Median) Previous Next

1 2 / 3 1 A (2-1) (2-2) A4 6 A4 7 A4 8 A4 9 A ( () 4 A4, A4 7 ) 1 (2-1) (2-2) ()

4 / ( / / 5 / / ( / 6 ( / / / 3 ( 4 ( ( 2



!""#!$% & # &((! $% ) &((! %" & $!""# & # &((( )# &( &((! # &(((!*+ % *


《拍案惊奇》(中)

Vol. 15 No. 1 JOURNAL OF HARBIN UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb O21 A

tbjx0033ZW.PDF

[9] R Ã : (1) x 0 R A(x 0 ) = 1; (2) α [0 1] Ã α = {x A(x) α} = [A α A α ]. A(x) Ã. R R. Ã 1 m x m α x m α > 0; α A(x) = 1 x m m x m +

欢迎参加 《计量基础知识》培训班

9202reply-s.doc

ttian

! "#$%& $()*+#$, $(-.&,./.+#/(-.&01( &-#&(&$# (&2*(,#-3.,14& $ +()5(*-#5(-#/-/#(-1#&-+)(& :;<<= > A B?

!"# $%& %!"# $%& %!"#$%& %! ( )***%% ) $)! +**+),,* -)+.* )( ) +, +*.*)+..**! )$,*)+$))$!"!#

Microsoft Word - ZLI14A0-105

2 2 2 : (1) : A 0, A 1, A 2, (2) : (3) : (, ) : ?? ( A 1 ) ((A

% +$ )!#$ %"!# & #!$ %" " ( ) * $ %!+$ %" -! < % 2 > E B > +? F! = E H > =+!! E H2 > 3 / /!!$ *" ( %, -.!!/ + ( ) %!,! %!, - ) > 3 2 > #= =

育儿知识100问(二)

17

untitled


" #" #$$" "#$$% # & $%& ()*+,- #$$% " & " & ( % ( ( ( % & ( % #" #" #" #"

Microsoft Word - 林文晟3.doc

Ps22Pdf

优合会计考点直击卷子之财经法规答案——第八套

%! # # % % & # ( ) ( +, & +, +, & +, & +, +, &!


# ( + + # + # 6 +,! + # +! +, + # ( + ) ( + ( + ) + 7! + # + /8 + ) ( +! + #. + ( +, +! + # + # + + ( ! ( + ) ( + ) +, + ( + 9% +! +, + ( +


M ( ) K F ( ) A M ( ) 1815 (probable error) F W ( ) J ( ) n! M ( ) T ( ) L ( ) T (171

Ps22Pdf


08-01.indd

, 13.4

國立中山大學學位論文典藏.PDF

( CIP).:,3.7 ISBN TB CIP (3) ( ) ISBN O78 : 3.


G(z 0 + "z) = G(z 0 ) + "z dg(z) dz z! # d" λ "G = G(z 0 ) + #cos dg(z) ( & dz ) * nv #., - d+ - - r 2 sin cosds e / r # ddr 4.r 2 #cos! "G = G(z 0 )

1. ( )( ) A. B. C. D. 2. ( )( ) A. : B. : C. : D. : 3. ( )( ) A. : B. : C. : D. : 1 D : 2

上海市本科教学质量年度报告


!!!!"#$ " " %& ( " # " " " " " "$%%& " $%% " "!!

!# $#!#!%%& $# &% %!# (# )#! "

untitled

ABP

koji-13.dvi

: 459,. (2011),, Zhu (2008). Y = Xθ + ε, (1.1) Y = (y 1,..., y n ) T, ε = (ε 1,..., ε n ) T, θ = (θ 1,..., θ p ) T, X n p, X i X i, E(ε) = 0, Var (ε)

untitled

國家圖書館典藏電子全文

BB.3


成 都 诗 词 田 正 中 水 调 歌 头 感 丙 戌 金 秋 风 树 生 凉 意, 胸 次 觉 清 新 园 中 丹 桂 撑 月, 雏 菊 傲 霜 芬 情 系 南 飞 北 雁, 坐 爱 枫 林 醉 染, 秋 色 更 迷 人 歌 故 早 相 约, 览 胜 宝 宾 村 巨 龙 腾, 金 风 翥, 气 凌

4 A C n n, AA = A A, A,,, Hermite, Hermite,, A, A A, A, A 4 (, 4,, A A, ( A C n n, A A n, 4 A = (a ij n n, λ, λ,, λ n A n n ( (Schur λ i n


!!!" #$ %& ()#*+ %,!" #--. #! % %! % %" & $! % $" # - #+$/0 - -*,/0 ). %*- #)%* #)%, 9:;"74 < #)*+ < 9:;"74 #- = #*0>? A7BC""7 D #)*+ #)

( ) 001 ( CIP ) /. :, 2005 ISBN G25-53 CIP (2005) ( 147 : ) 890 mm 1240 mm ISBN


)

!! # % & ( )!!! # + %!!! &!!, # ( + #. ) % )/ # & /.

山东2014第四季新教材《会计基础》冲刺卷第三套


AU = U λ c 2 c 3 c n C C n,, n U 2 U2 C U 2 = B = b 22 b 23 b 2n b 33 b 3n b nn U = U ( U 2, U AU = = = ( ( U 2 U 2 U AU ( U2 λ λ d 2 d 3 d n b 22 b 2

1

2007年普通高等学校招生全国统一考试


892415H032006


cs p3.ps, page Preflight ( S indd )

A. B. C. D. 2. A. B. C. D. 3. A. 4 N B. 18 N C. 40 N D N 1

软件测试设计

untitled

东 亚 货 币 一 体 化 的 经 济 基 础 扩 展 性 研 究 关 最 优 货 币 区 研 究 的 基 础 上, 结 合 东 亚 实 际, 选 取 与 货 币 一 体 化 组 建 密 切 相 关 的 一 系 列 经 济 基 础 性 指 标, 指 标 的 选 取 应 遵 循 以 下 原 则 相 关


Undangan Finalis

山东华鲁恒升化工股份有限公司

Microsoft Word - Part2_P3C_p1-p20_Chi_ doc

Microsoft Word - 職業倫理與道德題庫.doc

就 构 成 了 盗 窃 罪 与 破 坏 交 通 设 施 罪 的 想 象 竞 合, 按 照 其 中 处 罚 较 重 的 犯 罪 处 罚 5. 答 案 :B 本 题 主 要 考 察 如 何 区 分 收 买 被 拐 卖 的 妇 女 儿 童 罪 与 拐 卖 妇 女 儿 童 罪 的 共 犯 问 题 ( 对 向

! # % & # % & ( ) % % %# # %+ %% % & + %, ( % % &, & #!.,/, % &, ) ) ( % %/ ) %# / + & + (! ) &, & % & ( ) % % (% 2 & % ( & 3 % /, 4 ) %+ %( %!


( ) t ( ) ( ) ( ) ( ) ( ) t-

悖论

! + +, ) % %.!&!, /! 0! 0 # ( ( # (,, # ( % 1 2 ) (, ( 4! 0 & 2 /, # # ( &

五花八门宝典(一).doc

當 地 情 形 還 不 熟 悉 4 得 勝 的 歡 似 虎 : 形 容 因 勝 利 而 得 意 忘 形 5 不 吃 無 工 之 食 : 比 喻 人 不 能 無 緣 無 故 接 受 優 待 或 贈 與 4. 請 根 據 文 意, 在 中 填 入 正 確 的 成 語 代 號 ( 甲 ) 優 游 自 在

保母人員丙級應檢資料第二部份 doc

考 查 知 识 点 肝 气 疏 泄 调 畅 气 机 的 作 用, 主 要 表 现 在 以 下 几 个 方 面 :(1) 促 进 血 液 与 津 液 的 运 行 输 布 ;(2) 促 进 脾 胃 的 运 化 功 能 和 胆 汁 分 泌 排 泄 ;(3) 调 畅 情 志 ;(4) 促 进 男 子 排 精

Transcription:

zwp@ustc.edu.cn Office: 1006 Phone: 63600565 http://staff.ustc.edu.cn/~zwp/ http://fisher.stat.ustc.edu.cn

1.1....................... 1 1.2............... 6 1.3.................... 11 1.3.1............... 12 1.3.2 K-means................. 19 1.3.3.................. 24 1.4.................. 30 1.5................. 38 Previous Next First Last Back Forward 1

1.1 data segmentation class discovery Previous Next First Last Back Forward 1

: : : :,,? Previous Next First Last Back Forward 2

2D 3D Previous Next First Last Back Forward 3

(clustering and classification),, Previous Next First Last Back Forward 4

: PCA, FA, MDS, Manifest learning, SOM. Previous Next First Last Back Forward 5

1.2 (similarity), s(, ) 1. 0 s(x, y) 1 2. s(x, x) = 1 3. s(x, y) = s(y, x) (dissimilarity)... ( ) d(, ) Previous Next First Last Back Forward 6

1..d(x, y) 0, x = y 2. d(x, x) = 0 3. d(x, y) = d(y, x) metric dissimilarity 4. d(x, y) d(x, z) + d(z, y).. 5. d(x, y) max{d(x, z), d(z, y)}. 5, d ultrametric dissimilarity x, y R p,,,. ( ). ( ) Previous Next First Last Back Forward 7

Minkowski: [ p d m (x, y) = x k y k m] 1/m = x y m k=1 Manhattan: ( city-block distance, box-car distance) d(x, y) = Euclidean: p x k y k = x y 1 k=1 d(x, y) = x y 2 maximum:(chebyshev distance) Canberra:( ) d(x, y) = max x i y i = x y d(x, y) = p k=1 x k y k x k + y k Previous Next First Last Back Forward 8

0-1 : x, y 1, x y 1 0 1 a b a+b 0 c d c+d a+c b+d n=a+b+c+d binary(jaccard) : d(x, y) = b + c no 0-0 match a + b + c Czekanowski : d(x, y) = b + c no 0-0 match 2a + b + c double 1-1 match. 12.1. : x, y, p q, Previous Next First Last Back Forward 9

χ 2. Coef. of contingency ( s ij = χ 2 ) 1/2 χ 2 + n Cramer s V contingency coef. s ij = n min{p 1, q 1} ( χ 2,. x i X i j X j : r ij = θ ij = [ n [ n n k=1 (x ik x i )(x jk x j ) k=1 (x ik x i ) ] 1/2 2 n k=1 (x jk x j ) 2 n k=1 x ikx jk k=1 x2 ik n k=1 x2 jk ] 1/2 ) 1/2 Previous Next First Last Back Forward 10

1.3 Previous Next First Last Back Forward 11

1.3.1 (Hierarchical clustering, ). (dissimilarity) (linkage) (agglomerative hierarchical method): ( ) (divisive hierarchical method): ( ). Previous Next First Last Back Forward 12

(Dendrogram) (dendrogram). ( )., 0, Previous Next First Last Back Forward 13

(Linkage criteria) cluster I cluster J single linkage D(I, J) = min i I,j J {d ij} complete linkage D(I, J) = average linkage 1 D(I, J) = n I + n J max i I,j J {d ij} i I,j J d ij. Previous Next First Last Back Forward 14

cluster I cluster J centroid linkage: D(I, J) = x I x J ward method: I J ( ) ( ), W I W J ;. M, W M, I J D(I, J) = W M W I W J Previous Next First Last Back Forward 15

: Single linkage.,. S.. Complete linkage,. ( ), ( ).. Average linkage.,. Centroid linkage ( ), Previous Next First Last Back Forward 16

1.., 2. ( ) (d ij ) 3. (linkage) 4. 5. 3-4, 6.,..,. cluster diana (Kaufman and Rousseeuw, 1990). Previous Next First Last Back Forward 17

, ( ),,, BRICH, BURE, ROCK, Chameleon. Previous Next First Last Back Forward 18

1.3.2 K-means,, k : k,,, ;.,., K-means Previous Next First Last Back Forward 19

K-means MacQueen (1967). : k,. 1. k 2. k C 1,..., C k, x i, i = 1,..., k k x i, i = 1,..., k, 3. k ESS = i=1 j C i x j x i 2 Previous Next First Last Back Forward 20

4., ESS. x i, i = 1,..., k 5. 3 4, (ESS ) K-medoids K-means : K-means, K-means, K-means K-means ( ) R base kmeans K-means, cluster pam(partitioning around medoids) K-medoids. Previous Next First Last Back Forward 21

K-means K-medoids,. CLARA (Clustering LARge Applications, Kaufman and Rousseeuw 1990) :, K-medoids CLARANS (Clustering Large Application based upon RANdomized Search, Ng and Han 1994). Previous Next First Last Back Forward 22

K-medoids 1. D = (d ij ) k 2. k (medoids) 3. ( k ) 4. l (l = 1,..., k), i 0 : i 0 = arg min d ij i C l j C l l 5. 3 4 Previous Next First Last Back Forward 23

1.3.3 (compatness): k-means, hierarchical clustering (connectivity): spectral clustering (Spectral clustering) K-means Previous Next First Last Back Forward 24

( )., K-means. V = (x 1,..., x n ), ( ) W, k,, Previous Next First Last Back Forward 25

, Gaussian kernel : x x W ij = e i j 2σ 2, σ 2 :, W (A, B) = i A,j B W ij A B. A 1,..., A k, cut(a 1,..., A k ) = 1 2 Ā A. k W (A i, Āi) i=1 Previous Next First Last Back Forward 26

: (Ncut, Shi and Malik, 2000): Ncut(A 1,..., A k ) = 1 2 k i=1 W (A i, Āi) vol(a i ) Previous Next First Last Back Forward 27

vol(a) = i A d i = i A n j=1 W ij. Ncut(A 1,..., A k ) NP-hard. Laplacian, Graph Laplacian: L = D W, D = diag(d 1,..., d n ). k(k > 2), h ij = 0. H = (h ij ), : min Ncut(A 1,..., A k ) A V H = D 1/2 T = Relaxation!!! 1, xi Aj, vol(aj ) min A 1,...,A k T race(h LH) st. H DH = I k min T race(t D 1/2 LD 1/2 T ), T R n k. st. T T = I k T opt L sym = D 1/2 LD 1/2 k. Previous Next First Last Back Forward 28

H opt = D 1/2 T opt L rw = D 1 L k. (Shi and Malik, 2000) n x 1,..., x n, k: : W Laplacian L = D W Lu = λdu k, U = [u 1,..., u k ] y i U i,i = 1,..., n, K-means y 1,..., y n C 1,..., C k A 1,..., A k, A i = {x j y j C i } Ng, Jordan, and Weiss (2002). Previous Next First Last Back Forward 29

1.4, k k,. (, )., Silhouette( ) Kaufman and Rousseeuw (1990),. Previous Next First Last Back Forward 30

i, a(i) i C i d(i, C) i C(C C i ) b(i) d(i, C) i Silhouette s(i) = Silhouetter b(i) a(i) max{a(i), b(i)} s = 1 n n sw i i=1 Previous Next First Last Back Forward 31

, 1 s(i) 1, s(i) 1 a(i) b(i). a(i) a(i) i, b(i) i, s(i) 1 i. s(i) -1 i s(i) 0 i, Kaufman and Rousseeuw : s > 0.5 s < 0.2 R cluster silhouette s(i). Previous Next First Last Back Forward 32

Previous Next First Last Back Forward 33

CH index K-means,. k, K-means (within-cluster variation): k W (k) = x j x i 2 i=1 C(j)=i W (k) k, W (k). (Between-cluster variation) : k B(k) = n i x i x 2 i=1 B(k) k, B(k) Previous Next First Last Back Forward 34

W (k) B(k), k: CH(k) = B(k)/(k 1) W (k)/(n k) ˆk = arg max CH(k) 2 k K max Previous Next First Last Back Forward 35

Gap Statistic W (k) k, k! Tibshirani et al. (2001) Gap Statistic : W (k) W (k), W (k) (, ),. Bootstrap, sd k = 1 B Gap n (k) = Enlog(W (k)) log(w (k)) 1 log(wb (k)) log(w (k)) B b b (W b (k) W (k)) 2, W (k) = 1 B b W b (k). Previous Next First Last Back Forward 36

s k = sd k 1 + 1/B, ˆk = inf{k : Gap(k) Gap(k + 1) s k+1 } Previous Next First Last Back Forward 37

1.5 (external criterion) (internal criterion). (reference, ). Purity, F-measure, Rand Statistics, Entropy.. : Davis-Bouldin, Dunn, Expected Density ρ : Elbow criterion, GAP statistics Previous Next First Last Back Forward 38

n, (Reference) C = {C1,..., Cl }. C = {C 1,..., C k } F-measure: (Precision): C j Ci / C j. (Recall): C j Ci / Ci C j C i F-measure:F ij (α) = ( ), α 1. 1+α 1 precision + recall α, F-measure l i=1 C i n max j=1,...,k {F ij} F-measure. Previous Next First Last Back Forward 39

Entropy: C C Entropy H(C) = C j [ C j Ci C j Ci ] log 2 n C C j C C j C i j C j Rand Index, n 11 = {(x i, x j ) x i, x j C i ; x i, x j C k} n 00 = {(x i, x j ) x i C i1, x j C i2 ; x i C k 1, x j C k 2 } n 10 = {(x i, x j ) x i, x j C i ; x i C k 1, x j C k 2 } n 01 = {(x i, x j ) x i C i1, x j C i2 ; x i, x j C k 2 } Rand index R = n 11 + n ( 00 n 2) R = 0, R = 1.. Previous Next First Last Back Forward 40

Adjusted Rand Index Rand index ( ), Hubert and Arabie (1985) Rand index ( : ), 0, 1. Meila (2003) Ajusted Rand index. Previous Next First Last Back Forward 41