hyshi@pku.edu.cn 2018 6 9
Content 1 PCA VS. RPCA
2
2
3 [1] star s1 star s2 star s3 algorithm s1 algorithm s2 stars, movie, song, MVP stars, award, eagle, two-time supergiant, constellation, aurigae hash, algorithms, quick sort, recursive algorithms, optimization, public-key [1] Neelakantan et al.. 2014. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. In Proc. of EMNLP
4
5
5 WordNet [1] / [2] (hypernymy-hyponymy) animal/ country/ cat/ China/ [1] Miller. 1995. WordNet: a lexical database for English. Communications of the ACM [2]. 1984..
6 1 1 2 2 bank financial institution slope net trap ( ) income
6 1 1 2 2 bank financial institution slope net trap ( ) income score(v w s 1, SynH w i ) = cos(v w s 1, v w s k )ispossiblehypernym(synh w i, v w s 1 ) v w s k NN(v w s 1 )
6 1 1 2 2 bank financial institution slope net trap ( ) income score(v w s 1, SynH w i ) = cos(v w s 1, v w s k )ispossiblehypernym(synh w i, v w s 1 ) v w s k NN(v w s 1 )
7 : P pseudo (v w s 0, v w s 1 ) cos(v s 0, v s 1 ) v s 0 NN(v w s ) v 0 s 1 NN(v w s ) 1 θ θ
7 : P pseudo (v w s 0, v w s 1 ) cos(v s 0, v s 1 ) v s 0 NN(v w s ) v 0 s 1 NN(v w s ) 1 θ θ /
8 (1) (cat s0 -dog s0, cat s1 -dog s1 ) (2) (> 70%)
8 (1) (cat s0 -dog s0, cat s1 -dog s1 ) (2) (> 70%) M = w [ i j (v w i v w j )] v 0 cat v 1 cat v 1 cat v 0 cat..... v 0 cat v 2 cat v 1 cat v 2 cat M
9 d L min M L F subject to L + E = M, rank(l) = d
9 min M L F subject to L + E = M, rank(l) = d d L
9 min M L F subject to L + E = M, rank(l) = d d L (Robust PCA, RPCA) min rank(l) + λ 1 S 0 + λ 2 E F subject to L + E = M, rank(l) = d
10 RPCA [1]
10 RPCA [1] PCA RPCA M (0) = M, t = 0, S = 0 WHILE PCA L (t) + E (t) = M (t) S (t) = E (t) S = S + S (t) M (t+1) = M (t) S (t) t = t + 1 return L (t), S, E (t) M (t) L (t) [2] [1] Wright et al.. 2009. Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization. In Proc. NIPS [2]. 2010..
11 T min L(T) = m i=1 1 2 ( Tvwi s i,0 (vwi s i,0 + v wi 2 s i,1 ) 2 + Tv wi s i,1 (vwi s i,0 + v wi 2 s i,1 ) 2 )
11 T min L(T) = m i=1 1 2 ( Tvwi s i,0 (vwi s i,0 + v wi 2 s i,1 ) 2 + Tv wi s i,1 (vwi s i,0 + v wi 2 s i,1 ) 2 ) U R n T { Tα = 0, α U Tα = α α U T
11 T min L(T) = m i=1 1 2 ( Tvwi s i,0 (vwi s i,0 + v wi 2 s i,1 ) 2 + Tv wi s i,1 (vwi s i,0 + v wi 2 s i,1 ) 2 ) U R n T { Tα = 0, α U Tα = α α U T T Ṽ = {ṽ w s = Tv w s v w s V}.
12 WS-353 SCWS [1] 64.2 26.1 skip-gram (MSSG, 300D) [2] 70.9 57.3 skip-gram (NP-MSSG, 300D) [2] 69.1 59.8 NP-MSSG + 68.8 62.2 NP-MSSG + 69.2 63.7 NP-MSSG + (PCA) 69.2 65.3 NP-MSSG + (RPCA) 69.2 65.4 [3] 69.5 62.4 MUSE [4] 69.4 67.9 [1] Huang et al.. 2012. Improving word representations via global context and multiple word prototypes. In Proc. of ACL [2] Neelakantan et al.. 2014. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. In Proc. of EMNLP [3] Li and Jurafsky. 2015. Do Multi-Sense Embeddings Improve Natural Language Understanding?. In Proc. of EMNLP [4] Lee and Chen. 2017. MUSE: Modularizing Unsupervised Sense Embeddings. In Proc. of EMNLP
13 SentEval [1] (SUBJ), (TREC) (MSRP) BoW SUBJ TREC MSRP (NP-MSSG) [2] 91.0 78.4 70.0 + 91.2 83.2 70.6 + 91.0 84.2 70.0 + (RPCA) 92.3 85.4 71.0 [1] Conneau and Kiela. 2018. SentEval: An Evaluation Toolkit for Universal Sentence Representations. In Proc. of LREC [2] Neelakantan et al.. 2014. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. In Proc. of EMNLP
PCA VS. RPCA: WordNet Synset 14 PCA -L RPCA S 2 WordNet rank(l) = 1, 2, 3, 5-0.5 RPCA PCA 0.5 RPCA PCA 0.4 0.4 0.3 2000 3000 4000 5000 0.3 2000 3000 4000 5000 0.5 RPCA PCA 0.5 RPCA PCA 0.4 0.4 0.3 2000 3000 4000 5000 0.3 2000 3000 4000 5000
PCA VS. RPCA: 15 rank(l) = 3 PCA RPCA RPCA PCA # 1 after 1,2, eventually 0,1, whilst 0,1, again 1,2, finally 1,2 2 although 1,2, well 2,3, initially 1,2, more 1,4, both 0,2 3 Brian 1,2, February 0,2, Daniel 1,2, September 2,7, Frank 0,2 1 income 2,4, campaigns 1,5, age 6,7, development 4,5, goals 2,6 2 Berlin 0,6, Martin 3,4, Greek 0,3, Jan 0,4/0,6, name 1,3 3 quarterback 3,9, playoff 3,9 NBA 0,1, Houston 1,3, mayor 0,6 RPCA PCA
16