XX X Vol. XX, No. X 2X X ACTA AUTOMATICA SINICA Month, 2X AIR=(G,I,R,A) GI R A A K Maulik 6 TP8 Unsupervised Classification and Recognition Using an Artificial Immune System based on Manifold Distance Gong Mao-Guo Jiao Li-Cheng Ma Wen-Ping Zhang Xiang-Rong Abstract In this study, a novel artificial immune system algorithm for unsupervised classification and recognition is proposed by using a novel manifold distance based dissimilarity measure which can measure the geodesic distance along the manifold. The new method formulas the immune response as a quaternion AIR=(G,I,R,A), where G denotes exterior stimulus or antigen, I denotes the set of valid antibodies, R denotes the set of reaction rules describing the interactions between antibodies, and A denotes the dynamical algorithm describing how the reaction rules are applied to antibody population. In order to solve unsupervised classification problems, the new method encodes each antibody as a sequence of real integer numbers representing the cluster representatives, and searches the optimal cluster representatives from a combinatorial optimization viewpoint using the dynamical algorithm A. The experimental results on six artificial datasets with different manifold structure and the USPS handwritten digit datasets show that the novel algorithm has the ability to identify complex non-convex clusters compared with the K-Means algorithm, a genetic algorithm-based clustering proposed by Maulik, and a evolutionary clustering algorithm with the manifold distance. Key words Artificial immune systems, manifold, unsupervised classification, clustering, pattern recognition. (Artificial Immune Systems, AIS) [] [2] [3 6] Farmer [7] 27--3 27-9-2 Received May 3, 27; in revised form September 2, 27 ( 6737) 863 ( 26AAZ7) 973 ( 26CB77) ( 4) Supported by the National Natural Science Foundation of China (Grant No. 6737), the National High Technology Research and Development Program (863 Program) of China (Grant No. 26AAZ7), the National Basic Research Program (973 Program) of China (Grant No. 26CB77) and the Graduate Innovation Fund of Xidian University (Grant No. 4).. 224 77. Institute of Intelligent Information Processing, PO Box 224, Xidian University, Xi an, 77, China 996 2 997 998 IEEE Systems Man and Cybernetics,, International Joint Conference on Artificial Intelligence (IJCAI), International Joint Conference on Neural Networks (IJCNN), IEEE Congress on Evolutionary Computation (CEC), Genetic and Evolutionary Computation Conference (GECCO) 22,
2 XX [2] K- [8] K- K- [9 2] K K K [3] Su Chou [3] (a nonmetric measure) Charalampidis [4] K- ( ) ( ) 6 K- [8] [] Maulik []. [6] 3 2 3 2 3 2 3 Fig..6.6...4.4.3 2.2.4.6.8 An illustration of that the Euclidian distance metric can not reflect the global consistency. x i x j L(x i, x j ) 3 L(x i, x j ) ρ dist(xi,xj) () dist(x i, x j ) x i x j ρ <
X 3 G = (V, E) V E = {W ij } 2. G = (V, E) p V l l = p p p p (p k, p k+ ), k < p P ij x i x j x i x j p D(x i, x j ) = min L (p k, p k+ ) (2) p P i,j k= L(a, b) : D(x i, x j ) = D(x j, x i ) : D(x i, x j ) : x i, x j, x k D(x i, x j ) D(x i, x k ) + D(x k, x j ) D(x i, x j ) = x i = x j.2 [6,7,8] ( T ) AIR=(G,I,R,A) G I T R A () G T (2) I I = {b, b 2, b 3,, b n } G n G b R b = b b 2 b l b b i ------- B = {b, b 2, b 3,, b m } b m I m (3) R R = {r, r 2, r 3,, r l } I r i R B = {b, b 2, b 3,, b n } r i R b + b 2 + + b n r i b + b 2 + + b m (3) n,m m r i (3) (3) n r i m (4) A A R I B 2 R A 2. K K K K
4 XX K K (6,9,38,64,9) 6 9 38 64 9 (6,9,38,64,9) (6,9,64,38,9) Maulik [] K m m K K m 2.2 x i, i =, 2,, n C j, j {, 2,, K} j = arg min (D(x i, u j )) (4) j=,2,,k u j j (4) Aff(b) = + C k C i C k D(i, µ k ) () C b µ k C k D(i, µ k ) C k i µ k 2.3 R R r r 2 r 3 2.3. r ( ) B = {b, b 2, b 3,, b n } r b + b 2 + + b n r {b + b 2 + + b q }+ {b 2 + b 2 2 + + b q 2 2 } + + {b n + b 2 n + + b qn n } (6) b j i = b i, i =, 2,, n; j =, 2,, q i, q i [, n c ] n c q i = b i B i (b i, b 2 i,, b qi i ) b i 2.3.2 r 2 r 2 B = {b, b 2, b 3,, b n } r 2 b + b 2 + + b n r 2 b + b 2 + + b n (7) (6, 9, 38, 64, 9) [, ) (6,9+floor((-9)*random+), 38, 64, 9) (6, 9-floor((9-)*random+), 38, 64, 9) random [, ) floor 2.3.3 r 3 r 3 B = {b, b 2,, b q, b 2, b 2 2, b q 2 2,, b n, b 2 n,, b qn n } r 3
X {b + b 2 + + b q } + {b 2 + b 2 2 + + b q 2 2 } + + {b n + b 2 n + + b qn n } r 3 b + b 2 + + b n (8) b i i =, 2,, n, j {, 2,, q i } b j i B i (b i, b 2 i,, b qi i ) B i (b i, b 2 i,, b qi i ) b j i b i = b j i p(b i = b j i ) = R = {r, r 2, r 3 } 2.4 A. 2. R = {r, r 2, r 3 } 2 Fig. 2 2 The flowchart of artificial immune unsupervised classification and recognition algorithm r r 2 r 3 2.2 2.3 [7] r 3 [7] 3 3. AIR 6 6 Lineblobs Size Spiral Square Sticks Threecircles 3 K- [8] (KM) [] (DSEC) Maulik [] (GAC) AIR.8. DSKM ρ (, e 8 ] ρ = e 2 Adjusted Rand Index [2] n Adjusted Rand Index R(U, V ) = P lk (n lk 2 ) [ P l (n l 2 ) P k (n k 2 )]/( n 2 ) [P 2 l (n l 2 ) + P k (n k 2 )] [ P l (n l 2 ) P k (n k 2 )]/( n 2 ) (9) n lk l k R(U, V ) [, ] 3 6 Adjusted Rand Index 3
6 XX.6.6...4.4.3.2.4.6.8.6.6...4.4.3.2.4.6.8.6.6...4.4.3.2.4.6.8.6.6...4.4.3.2.4.6.8 (a) Line-blobs 2 2 2 2 2 2 2 2 (b) Size
X 7 (c) Spiral (d) Square4
8 XX.8.8.6.6.4.4.2.2.4.6.8.2.2.4.6.8.8.8.6.6.4.4.2.2.4.6.8.2.2.4.6.8 (e) Sticks.8.6.4.2.2.4.6.8.8.6.4.2.2.4.6.8.8.6.4.2.8.6.4.2.2.4.6.8.2.4.6.8 (f) Three-circles Fig. 3 3 The typical implementation results on the artificial data sets obtained from AIR, DSEC, GAC and KM
X 9 Table Results of AIR, DSEC, GAC and KM on artificial datasets Adjusted Rand Index AIR DSEC GAC KM line-blobs.399.49 Size.922.97.924.92 Spiral.34.33 Square4.962.83.937.86 Sticks.44.4 Three-circles.33.44 3 Line-blobs Spiral Sticks Three-circles AIR GAC Size Square4 AIR DSEC GAC K 3.2 USPS USPS 9298 6 6 729 27 USPS Buffalo NIST {, 8} {3,, 8} {3, 8, 9} {, 2, 3, 4} {, 2, 4, 8} 3. 3 Adjusted Rand Index 2 2 {, 8} {3,, 8} {3, 8, 9} {, 2, 3, 4} {, 2, 4, 8} AIR DSEC KM AIR.826.964 Table 2 2 USPS Results of AIR, DSEC, GAC and KM on the USPS handwritten digit datasets Adjusted Rand Index AIR DSEC GAC KM {, 8}.92.97.82.69 {3,, 8}.826.776.69.44 {3, 8, 9}.87.669.693.4 {, 2, 3, 4}.97.87.736.646 {, 2, 4, 8}.964.92.783.673 3.3 [9] m Adjusted Rand Index R m Adjusted Rand Index b m = R m max k R k () m b m =, b m b m
XX m b m 4 b m 4 Fig. 4 Robustness of the compared algorithms 4 AIR.483 GAC AIR AIR b m AIR 4 AIR GAC KM References de Castro L N, Timmis J. Artificial Immune Systems: A New Computational Intelligence Approach. Berlin: Springer-Verlag, 22. 2 Jiao Li-Cheng, Du Hai-Feng, Liu Fang, Gong Mao-Guo. Immunological computation for optimization, learning and recognition. Beijing: Science Press, 26. (in Chinese) (,,,.. 26 6 ) 3 de Castro L N, Von Zuben F J. Learning and optimization using the clonal selection principle. IEEE Transactions on Evolutionary Computation, 22, 6(3): 239-2 4 Dasgupta D. Artificial Immune Systems and Their Applications. Berlin: Springer-Verlag, 999. Jiao Li-Cheng, Wang Lei. A novel genetic algorithm based on immunity. IEEE Transactions on Systems, Man and Cybernetics, Part A. 2, 3(): 2-6. 6 Gong Mao-Guo, Jiao Li-Cheng, Liu Fang, Du Hai-Feng. The quaternion model of artificial immune response. In: Proceedings of the fourth international conference on Artificial Immune Systems. Banff, Alberta, Canada, 2. Springer- Verlag, Lecture Notes in Computer Science, Vol. 3627, 2. 27-29. 7 Farmer J D, Packard N H, Perelson A S. The immune system, adaptation, and machine learning. Physica D, 986: 87-24. 8 MacQueen J B. Some methods for classification and analysis of multivariate observations. In: Proccedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 967. 28-297. 9 Hall L O, Ozyurt I B, Bezdek J C. Clustering with a genetically optimized approach. IEEE Transactions on Evolutionary Computation, 999, 3(2): 3-2. Maulik U, Bandyopadhyay S. Genetic algorithm-based clustering technique. Pattern Recognition, 2, 33(9): 4-46 Pan H, Zhu J, Han D. Genetic algorithms applied to multiclass clustering for gene expression data. Genomics, Proteomics & Bioinformatics, 23, (4): 279-287 2 Handl J. Knowles J. An evolutionary approach to multiobjective clustering. IEEE Transactions on Evolutionary Computation, 27, (): 6-76 3 Su M C, Chou C H. A modified version of the K-Means algorithm with a distance based on cluster symmetry. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 23(6): 674-68 4 Charalampidis D. A modified K-Means algorithm for circular invariant clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 27(2): 86-86 Gong Mao-Guo, Jiao Li-Cheng, Wang Ling, Bo Lie-Feng. Density-sensitive evolutionary clustering. In: Proceedings of the th Pacific-Asia Conference on Knowledge Discovery and Data Mining. Nanjing, China, 27. Springer-Verlag, Lecture Notes in Computer Science, Vol. 4426, 27. 7-4 6 Zhou D, Bousquet O, La T N, Weston J, Schölkopf B. Learning with Local and Global Consistency. Advances in Neural Information Processing Systems 6. Cambridge, MA, USA: MIT Press, 24. 32-328 7 Gong Mao-Guo, Du Hai-Feng, Jiao Li-Cheng. Optimal approximation of linear systems by artificial immune response. Science in China Series F Information Sciences, 26, 49(): 63-79
X 8 Gong Mao-Guo, Jiao Li-Cheng, Du Hai-Feng, Ma Wen- Ping. A novel evolutionary strategy based on artificial immune response for constrained optimizations. Chinese Journal of Computers, 27, 3(): 37-47 (in Chinese) (,,,.., 27, 3(): 37-47) 9 Geng X, Zhan D C, Zhou Z H. Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Transactions on Systems, Man, and Cybernetics - Part B, 2, 3(6): 98-7 979 3 23 24 9 IEEE.. 863. 3 SCI 2.. Email: gong@ieee.org (Gong Mao-Guo currently works as a lecturer under the Innovative Research Team of the Ministry of Education of China at Xidan University, Xi an, China. He is a member of IEEE. His research interests are broadly in the area of Computational Intelligence and Hybrid Intelligent Systems. The areas of special interests include artificial immune systems, evolutionary computation, data mining, optimization and some other related areas. He has published more than 3 papers in journals and conferences. Corresponding author of this paper.) 99. 982 984 99... Email: lchjiao@mail.xidian.edu.cn (Jiao Li-Cheng currently works as a professor under the Innovative Research Team of the Ministry of Education of China at Xidan University, Xi an, China. He received the B.S. degree from Shanghai Jiao Tong University, Shanghai, China, in 982, the M.S. and the Ph.D. degree from Xi an Jiao Tong University, Xi an, China, in 984 and in 99, respectively. Since April 23, he has been the Dean of School of Electronic Engineering and the director of Institute of Intelligent Information Processing at Xidian University. His current research interests include natural computation, signal and image processing and intelligent information processing.) Email: wpma@mail.xidian.edu.cn (Ma Wen-Ping Ph.D. candidate and Teaching Assistant of the Institute of Intelligent Information Processing at Xidian University. Her research interests include artificial immune systems, data mining and image processing.). 2 SCI 9 Email: xrzhang@mail.xidian.edu.cn (Zhang Xiang-Rong Lecturer of the Institute of Intelligent Information Processing at Xidian University. Her research interests include pattern recognition, Machine Learning, image processing and natural computation. She has published more than 2 papers in journals and conferences.)