Understanding the Conjugate Gradient Method Hailiang Zhao * College of Computer Science and Technology, Zhejiang University July

Similar documents

4 A C n n, AA = A A, A,,, Hermite, Hermite,, A, A A, A, A 4 (, 4,, A A, ( A C n n, A A n, 4 A = (a ij n n, λ, λ,, λ n A n n ( (Schur λ i n

( ) Wuhan University



高等数学A

! Ν! Ν Ν & ] # Α. 7 Α ) Σ ),, Σ 87 ) Ψ ) +Ε 1)Ε Τ 7 4, <) < Ε : ), > 8 7

) Μ <Κ 1 > < # % & ( ) % > Χ < > Δ Χ < > < > / 7 ϑ Ν < Δ 7 ϑ Ν > < 8 ) %2 ): > < Ο Ε 4 Π : 2 Θ >? / Γ Ι) = =? Γ Α Ι Ρ ;2 < 7 Σ6 )> Ι= Η < Λ 2 % & 1 &

Ρ Τ Π Υ 8 ). /0+ 1, 234) ς Ω! Ω! # Ω Ξ %& Π 8 Δ, + 8 ),. Ψ4) (. / 0+ 1, > + 1, / : ( 2 : / < Α : / %& %& Ζ Θ Π Π 4 Π Τ > [ [ Ζ ] ] %& Τ Τ Ζ Ζ Π

& &((. ) ( & ) 6 0 &6,: & ) ; ; < 7 ; = = ;# > <# > 7 # 0 7#? Α <7 7 < = ; <

&! +! # ## % & #( ) % % % () ) ( %

Vol. 36 ( 2016 ) No. 6 J. of Math. (PRC) HS, (, ) :. HS,. HS. : ; HS ; ; Nesterov MR(2010) : 90C05; 65K05 : O221.1 : A : (2016)

4= 8 4 < 4 ϑ = 4 ϑ ; 4 4= = 8 : 4 < : 4 < Κ : 4 ϑ ; : = 4 4 : ;

:

, ( 6 7 8! 9! (, 4 : : ; 0.<. = (>!? Α% ), Β 0< Χ 0< Χ 2 Δ Ε Φ( 7 Γ Β Δ Η7 (7 Ι + ) ϑ!, 4 0 / / 2 / / < 5 02

., /,, 0!, + & )!. + + (, &, & 1 & ) ) 2 2 ) 1! 2 2

! # %& ( %! & & + %!, ( Α Α Α Α Χ Χ Α Χ Α Α Χ Α Α Α Α

8 9 8 Δ 9 = 1 Η Ι4 ϑ< Κ Λ 3ϑ 3 >1Ε Μ Ε 8 > = 8 9 =

/ Ν #, Ο / ( = Π 2Θ Ε2 Ρ Σ Π 2 Θ Ε Θ Ρ Π 2Θ ϑ2 Ρ Π 2 Θ ϑ2 Ρ Π 23 8 Ρ Π 2 Θϑ 2 Ρ Σ Σ Μ Π 2 Θ 3 Θ Ρ Κ2 Σ Π 2 Θ 3 Θ Ρ Κ Η Σ Π 2 ϑ Η 2 Ρ Π Ρ Π 2 ϑ Θ Κ Ρ Π

PowerPoint 演示文稿

8 9 < ; ; = < ; : < ;! 8 9 % ; ϑ 8 9 <; < 8 9 <! 89! Ε Χ ϑ! ϑ! ϑ < ϑ 8 9 : ϑ ϑ 89 9 ϑ ϑ! ϑ! < ϑ < = 8 9 Χ ϑ!! <! 8 9 ΧΧ ϑ! < < < < = 8 9 <! = 8 9 <! <

Β 8 Α ) ; %! #?! > 8 8 Χ Δ Ε ΦΦ Ε Γ Δ Ε Η Η Ι Ε ϑ 8 9 :! 9 9 & ϑ Κ & ϑ Λ &! &!! 4!! Μ Α!! ϑ Β & Ν Λ Κ Λ Ο Λ 8! % & Π Θ Φ & Ρ Θ & Θ & Σ ΠΕ # & Θ Θ Σ Ε

& & ) ( +( #, # &,! # +., ) # % # # % ( #

> # ) Β Χ Χ 7 Δ Ε Φ Γ 5 Η Γ + Ι + ϑ Κ 7 # + 7 Φ 0 Ε Φ # Ε + Φ, Κ + ( Λ # Γ Κ Γ # Κ Μ 0 Ν Ο Κ Ι Π, Ι Π Θ Κ Ι Π ; 4 # Ι Π Η Κ Ι Π. Ο Κ Ι ;. Ο Κ Ι Π 2 Η

. /!Ι Γ 3 ϑκ, / Ι Ι Ι Λ, Λ +Ι Λ +Ι

2 2 Λ ϑ Δ Χ Δ Ι> 5 Λ Λ Χ Δ 5 Β. Δ Ι > Ε!!Χ ϑ : Χ Ε ϑ! ϑ Β Β Β ϑ Χ Β! Β Χ 5 ϑ Λ ϑ % < Μ / 4 Ν < 7 :. /. Ο 9 4 < / = Π 7 4 Η 7 4 =


4 # = # 4 Γ = 4 0 = 4 = 4 = Η, 6 3 Ι ; 9 Β Δ : 8 9 Χ Χ ϑ 6 Κ Δ ) Χ 8 Λ 6 ;3 Ι 6 Χ Δ : Χ 9 Χ Χ ϑ 6 Κ

9!!!! #!! : ;!! <! #! # & # (! )! & ( # # #+

!! )!!! +,./ 0 1 +, 2 3 4, # 8,2 6, 2 6,,2 6, 2 6 3,2 6 5, 2 6 3, 2 6 9!, , 2 6 9, 2 3 9, 2 6 9,

= Υ Ξ & 9 = ) %. Ο) Δ Υ Ψ &Ο. 05 3; Ι Ι + 4) &Υ ϑ% Ο ) Χ Υ &! 7) &Ξ) Ζ) 9 [ )!! Τ 9 = Δ Υ Δ Υ Ψ (

) & ( +,! (# ) +. + / & 6!!!.! (!,! (! & 7 6!. 8 / ! (! & 0 6! (9 & 2 7 6!! 3 : ; 5 7 6! ) % (. ()

! /. /. /> /. / Ε Χ /. 2 5 /. /. / /. 5 / Φ0 5 7 Γ Η Ε 9 5 /

AU = U λ c 2 c 3 c n C C n,, n U 2 U2 C U 2 = B = b 22 b 23 b 2n b 33 b 3n b nn U = U ( U 2, U AU = = = ( ( U 2 U 2 U AU ( U2 λ λ d 2 d 3 d n b 22 b 2

# # # #!! % &! # % 6 & () ) &+ & ( & +, () + 0. / & / &1 / &1, & ( ( & +. 4 / &1 5,

; < 5 6 => 6 % = 5

SVM OA 1 SVM MLP Tab 1 1 Drug feature data quantization table

: ; # 7 ( 8 7

( ) (! +)! #! () % + + %, +,!#! # # % + +!

!! # % & ( )!!! # + %!!! &!!, # ( + #. ) % )/ # & /.

微积分 授课讲义

ABP

! ΑΒ 9 9 Χ! Δ? Δ 9 7 Χ = Δ ( 9 9! Δ! Δ! Δ! 8 Δ! 7 7 Δ Δ 2! Χ Δ = Χ! Δ!! =! ; 9 7 Χ Χ Χ <? < Χ 8! Ε (9 Φ Γ 9 7! 9 Δ 99 Φ Γ Χ 9 Δ 9 9 Φ Γ = Δ 9 2

! # % & # % & ( ) % % %# # %+ %% % & + %, ( % % &, & #!.,/, % &, ) ) ( % %/ ) %# / + & + (! ) &, & % & ( ) % % (% 2 & % ( & 3 % /, 4 ) %+ %( %!


7!# 8! #;! < = >? 2 1! = 5 > Α Β 2 > 1 Χ Δ5 5 Α 9 Α Β Ε Φ 5Γ 1 Η Η1 Δ 5 1 Α Ι 1 Η Ι 5 Ε 1 > Δ! 8! #! 9 Κ 6 Λ!!!! ; ; 9 # !!6! 6! 6 # ;! ;

% %! # % & ( ) % # + # # % # # & & % ( #,. %

07-3.indd

,!! #! > 1? = 4!! > = 5 4? 2 Α Α!.= = 54? Β. : 2>7 2 1 Χ! # % % ( ) +,. /0, , ) 7. 2

%% &% %% %% %% % () (! #! %!!!!!!!%! # %& ( % & ) +, # (.. /,) %& 0

! + +, ) % %.!&!, /! 0! 0 # ( ( # (,, # ( % 1 2 ) (, ( 4! 0 & 2 /, # # ( &

= > : ; < ) ; < ; < ; : < ; < = = Α > : Β ; < ; 6 < > ;: < Χ ;< : ; 6 < = 14 Δ Δ = 7 ; < Ε 7 ; < ; : <, 6 Φ 0 ; < +14 ;< ; < ; 1 < ; <!7 7

% & :?8 & : 3 ; Λ 3 3 # % & ( ) + ) # ( ), ( ) ). ) / & /:. + ( ;< / 0 ( + / = > = =? 2 & /:. + ( ; < % >=? ) 2 5 > =? 2 Α 1 Β 1 + Α

Microsoft Word - xxds fy.doc

untitled

koji-13.dvi

Ψ! Θ! Χ Σ! Υ Χ Ω Σ Ξ Ψ Χ Ξ Ζ Κ < < Κ Ζ [Ψ Σ Ξ [ Σ Ξ Χ!! Σ > _ Κ 5 6!< < < 6!< < α Χ Σ β,! Χ! Σ ; _!! Χ! Χ Ζ Σ < Ω <!! ; _!! Χ Υ! Σ!!!! ββ /β χ <

Α 3 Α 2Η # # > # 8 6 5# Ι + ϑ Κ Ι Ι Ι Η Β Β Β Β Β Β ΔΕ Β Β Γ 8 < Φ Α Α # >, 0 Η Λ Μ Ν Ο Β 8 1 Β Π Θ 1 Π Β 0 Λ Μ 1 Ρ 0 Μ ϑ Σ ϑ Τ Ο Λ 8 ϑ

! Β Β? Β ( >?? >? %? Γ Β? %? % % %? Χ Η Ιϑ Κ 5 8 Λ 9. Μ Ν Ο Χ? Π Β # % Χ Χ Θ Ρ% Ρ% Θ!??? % < & Θ

! # %! #! #! # % + &, % % ) %. /! # 0 1

9 : : ; 7 % 8


目录 三种迭代格式 Jacobi 迭代法 Gauss-Seidel 迭代 超松弛迭代法 Jacobi 与 G-S 迭代的收敛性分析 收敛的充分必要条件 收敛的充分条件及误差估计 收敛速度 平均

Β Χ + Δ Ε /4 10 ) > : > 8 / 332 > 2 / 4 + Φ + Γ 0 4 Η / 8 / 332 / 2 / 4 + # + Ι + ϑ /) 5 >8 /3 2>2 / 4 + ( )( + 8 ; 8 / 8. 8 :

WL100014ZW.PDF

% % %/ + ) &,. ) ) (!

! # Χ Η Ι 8 ϑ 8 5 Χ ΚΗ /8 Η/. 6 / Λ. /. Η /. Α Α + Α 0. Η 56 + Α : Α Μ / Η +9 Δ /. : Α : ϑ. Η. /5 % Χ

7 6 Η : Δ >! % 4 Τ & Β( Β) 5 &! Α Υ Υ 2 Η 7 %! Φ! Β! 7 : 7 9 Λ 9 :? : 9 Λ Λ 7 Φ! : > 9 : 7Δ 2 Η : 7 ΛΔ := ς : Ν 7 Λ Δ = Ν : Ν 7 ΛΔ : = Λ ς :9 Λ 7 Λ! Λ

08_toukei03.dvi

6.3 正定二次型

Υ 2 Δ Υ 1 = 1 : Φ Υ 1 Ω 5 ς ) Ν + Φ 5 ς ς Α+ ) Ν Φ 6 Ξ ς Α+ 4 Φ Ψ Ψ + = Ε 6 Ψ Ε Ε Π Υ Α Ε Ω 2? Ε 2 5 Ο ; Μ : 4 1 Ω % Β 3 : ( 6 Γ 4 Ρ 2 Ρ

1 <9= <?/:Χ 9 /% Α 9 Δ Ε Α : 9 Δ 1 8: ; Δ : ; Α Δ : Β Α Α Α 9 : Β Α Δ Α Δ : / Ε /? Δ 1 Δ ; Δ Α Δ : /6Φ 6 Δ

Copyright c by Manabu Kano. All rights reserved. 1

8 8 Β Β : ; Χ; ; ; 8 : && Δ Ε 3 4Φ 3 4Φ Ε Δ Ε > Β & Γ 3 Γ 3 Ε3Δ 3 3 3? Ε Δ Δ Δ Δ > Δ # Χ 3 Η Ι Ι ϑ 3 Γ 6! # # % % # ( % ( ) + ( # ( %, & ( #,.

< < ; : % & < % & > & % &? > & 5 % & ( ; & & % & Α Β + 8 ; Α9 Χ Δ () Χ Δ Ε 41 Φ # (Β % Γ : 9 Χ Δ Η +9 Χ Δ 2 9 Χ Δ 2 0 /? % & Ι 1 ϑ Κ 3 % & % & + 9 Β 9

Α? Β / Χ 3 Δ Ε/ Ε 4? 4 Ε Φ? ΧΕ Γ Χ Η ΙΙ ϑ % Η < 3 Ε Φ Γ ΕΙΙ 3 Χ 3 Φ 4 Κ? 4 3 Χ Λ Μ 3 Γ Ε Φ ) Μ Ε Φ? 5 : < 6 5 % Λ < 6 5< > 6! 8 8 8! 9 9 9! 9 =! = 9!

8 9 : < : 3, 1 4 < 8 3 = >? 4 =?,( 3 4 1( / =? =? : 3, : 4 9 / < 5 3, ; > 8? : 5 4 +? Α > 6 + > 3, > 5 <? 9 5 < =, Β >5

.., + +, +, +, +, +, +,! # # % ( % ( / 0!% ( %! %! % # (!) %!%! # (!!# % ) # (!! # )! % +,! ) ) &.. 1. # % 1 ) 2 % 2 1 #% %! ( & # +! %, %. #( # ( 1 (

# % & ) ) & + %,!# & + #. / / & ) 0 / 1! 2

[9] R Ã : (1) x 0 R A(x 0 ) = 1; (2) α [0 1] Ã α = {x A(x) α} = [A α A α ]. A(x) Ã. R R. Ã 1 m x m α x m α > 0; α A(x) = 1 x m m x m +

1#

小儿疾病防治(四).doc

3?! ΑΑΑΑ 7 ) 7 3

ΗΗ Β Η Η Η ϑ ΗΙ ( > ( > 8 Κ Κ 9 Λ! 0 Μ 4 Ν ΟΠ 4 Ν 0 Θ Π < Β < Φ Ρ Σ Ο ΟΦ Ρ Σ ) Ο Τ 4 Μ 4 Ν Π Υ Φ Μ ς 6 7 6Ω : 8? 9 : 8 ; 7 6Ω 1 8? ; 7 : ; 8 ; 9

3 = 4 8 = > 8? = 6 + Α Β Χ Δ Ε Φ Γ Φ 6 Η 0 Ι ϑ ϑ 1 Χ Δ Χ ΦΚ Δ 6 Ε Χ 1 6 Φ 0 Γ Φ Γ 6 Δ Χ Γ 0 Ε 6 Δ 0 Ι Λ Χ ΦΔ Χ & Φ Μ Χ Ε ΝΓ 0 Γ Κ 6 Δ Χ 1 0

; 9 : ; ; 4 9 : > ; : = ; ; :4 ; : ; 9: ; 9 : 9 : 54 =? = ; ; ; : ;

: ; 8 Β < : Β Δ Ο Λ Δ!! Μ Ν : ; < 8 Λ Δ Π Θ 9 : Θ = < : ; Δ < 46 < Λ Ρ 0Σ < Λ 0 Σ % Θ : ;? : : ; < < <Δ Θ Ν Τ Μ Ν? Λ Λ< Θ Ν Τ Μ Ν : ; ; 6 < Λ 0Σ 0Σ >

E = B B = B = µ J + µ ε E B A A E B = B = A E = B E + A ϕ E? = ϕ E + A = E + A = E + A = ϕ E = ϕ A E E B J A f T = f L =.2 A = B A Aϕ A A = A + ψ ϕ ϕ

9 < 9 Α Α < < < Β= =9 Χ 9Β 78! = = 9 ΦΑ Γ Η8 :Ι < < ϑ<; Β Β! 9 Χ! Β Χ Δ Ε <Β Α Α = Α Α Ι Α Α %8 Β < 8 7 8! 8 =

( CIP).:,3.7 ISBN TB CIP (3) ( ) ISBN O78 : 3.

扩充矩阵 给定矩阵 A 和向量 b a 11 a 12 a 13 b 1 A = a 21 a 22 a 23 b = b 2 a 31 a 32 a 33 b 3 定义扩充矩阵 ( A b ) = a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 b 1 b

3 4 Ψ Ζ Ζ [, Β 7 7>, Θ0 >8 : Β0 >, 4 Ε2 Ε;, ] Ε 0, 7; :3 7;,.2.;, _ & αε Θ:. 3 8:,, ), β & Φ Η Δ?.. 0?. χ 7 9 Ε >, Δ? Β7 >7 0, Τ 0 ΚΚ 0 χ 79 Ε >, Α Ε

9. =?! > = 9.= 9.= > > Η 9 > = 9 > 7 = >!! 7 9 = 9 = Σ >!?? Υ./ 9! = 9 Σ 7 = Σ Σ? Ε Ψ.Γ > > 7? >??? Σ 9

国民体质监测相关名词释义.doc


# # 4 + % ( ) ( /! 3 (0 0 (012 0 # (,!./ %

Β Χ Χ Α Β Φ Φ ; < # 9 Φ ; < # < % Γ & (,,,, Η Ι + / > ϑ Κ ( < % & Λ Μ # ΝΟ 3 = Ν3 Ο Μ ΠΟ Θ Ρ Μ 0 Π ( % ; % > 3 Κ ( < % >ϑ Κ ( ; 7

厨房小知识(四)

妇女更年期保健.doc

小儿传染病防治(上)

Transcription:

Understanding the Conjugate Gradient Method Hailiang Zhao * College of Computer Science and Technology, Zhejiang University hliangzhao@zju.edu.cn July 10, 2020 Contents 1 Introduction 2 2 The Quadratic Form 2 3 The Steepest Descent Method 3 3.1 The Direction and Optimal Step Size........................... 3 3.2 Eigenthings......................................... 6 3.2.1 Eigenvalues and Spectral Radius......................... 6 3.2.2 Jacobi Iterations.................................. 7 3.3 Convergence Analysis................................... 8 3.3.1 Instant Results................................... 8 3.3.2 General Convergence................................ 9 4 Conjugate Directions Methods 12 4.1 What is Conjugacy..................................... 12 4.2 Gram-Schmidt Conjugation................................ 15 4.3 Optimality Analysis.................................... 16 4.4 Conjugate Gradient Method................................ 17 4.5 Convergence Analysis................................... 18 4.5.1 Picking Perfect Polynomials............................ 18 4.5.2 Chebyshev Polynomials.............................. 21 4.6 Complexity Analysis.................................... 22 5 Conjugate Gradient Method in Action 23 5.1 Starting and Stopping Criterion.............................. 23 5.2 Preconditioning....................................... 23 5.3 Extending to General Cases................................ 24 6 Postscript 25 * Hailiang is a first-year Ph.D. student of ZJU-CS. His homepage is http://hliangzhao.me. 1

1 Introduction line search Hessian Hessian Conjugate Gridient method, CG Hessian CG CG Ax = b, (1.1) A R n n 1 b R n x R n A LU Jacobi Gauss-Seidel CG A Jacobi A CG Ax = b CG n 2 The Quadratic Form the Quadratic Form A R n n x, b R n c R x Ax x x Ax x = 2Ax x f(x) = Ax b f(x) 1 2 x Ax b x + c, (2.1) = (A + A )x b x x = b A f(x) 1 2 (A + A )x = b A A 2 x = A 1 b x f(x) = 0 x = x + e e R n f(x + e) = 1 2 (x + e) A(x + e) b (x + e) + c = f(x ) + 1 2 e Ae > f(x ). 1 A R n n x R n x Ax > 0 http://hliangzhao.me/math/math.pdf 2 λ x Ax = λx x Ax > 0 λ x 2 > 0 λ > 0 A A 2

这说明当 A 是一个对称正定矩阵时 通过求解 Ax = b 得到的就是 f (x) 的全局最小值 这相当 于把一个二次型的最优化问题转变成了线性方程组的求解问题 当 A 是一个对称正定矩阵时 f (x) 的图像是一个图形开口向上的抛物面 如果 A 非正定 那么其全局最小值点就可能不止一个 Fig. 2.1展示了不同的 A R2 2 对 f (x) 的图像的影响 Figure 2.1: 不同的 A 对 f (x) 的图像的影响 a 正定矩阵的二次型 b 负定矩阵的二次型 c 奇异矩 阵和非正定矩阵的二次型 d 不定矩阵的二次型 此时解是一个鞍点 梯度法和 CG 均无法处理该问题 3 The Steepest Descent Method 3.1 The Direction and Optimal Step Size 最速下降法即梯度法 作为一种一维搜索算法 梯度法采取函数梯度的反方向作为迭代方向 并 通过求导的方式确定迭代的最优步长 因为梯度是函数值增长最快的方向 所以梯度下降的方向是 函数值减少最快的方向 这就是为什么梯度法又被称为最速下降法的原因 在梯度法中 我们将从 任意点 x(0) 开始 经过迭代得到 x(1), x(2),... 直到达到最大迭代次数或者误差满足要求为止 在第 i 轮迭代中 梯度法选择的方向为 b Ax(i) 3 为了简化描述 我们定义如下变量 误差向量 e(i) x(i) x 表示第 i 轮迭代时 x(i) 和全局最小值点 x 的距离 残差向量 r(i) b Ax(i) 表示第 i 轮迭代时 Ax(i) 和 b 之间的距离 残差向量其实就是最速下降的方向 且 r(i) = b Ax(i) = Ax Ax(i) = Ae(i) 这个等价变换 会反复用到 务必记住 因此 x 将按照如下公式进行更新 x(i+1) x(i) + α r(i), (3.1) 3 别忘了只有当 A 是对称矩阵的时候才有 f = Ax b 不管 A 是否为对称阵 梯度法都会选取 b Ax x (i) 作为迭代 方向 因此梯度法并不能够保证收敛 我们会在后文只会给出当 A 为对称阵和对称正定阵时的收敛速度 3

α i α f(x (i+1) ) α φ(α) α α φ(α) = 0 α φ(α) = α f(x (i+1) ) = ( x(i+1) f(x (i+1) ) ) α x (i+1) = r(i+1) r (i) = 0. (3.2) (3.2) α α Fig. 3.1 Figure 3.1: a b x (0) = [ 2, 2] x 1x 2 c f(x) α α d α (3.2) r(i+1) r (i) = (b Ax (i) ) r (i) = [ b A(x (i) + α r (i) ) ] r(i) = (b Ax (i) ) r (i) α(ar (i) ) r (i) = r(i) r (i) αr(i) Ar (i) = 0, (3.3) α r (i) r (i) α = r(i) Ar. (3.4) (i) x (0) (3.4) (3.1) (3.1) x (1), x (2),... A r k = 0 Algorithm 1 4

Algorithm 1: Steepest Descent. Input: A R n n b R n Output: x Ax = b argmin x f(x) 1 r (0) = b Ax (0) 2 k = 0 3 while r k = 0 do 4 k = k + 1 5 α (k) = ( r(k 1) r ) ( ) (k 1) / r (k 1) Ar (k 1) 6 x (k) = x (k 1) + α (k) r (k 1) 7 r (k) = b Ax (k) 8 end while 9 return x (k) Algorithm 1 - step 5 step 7 - Algorithm 1 6 x (k) = x (k 1) + α (k) r (k 1) Ax (k) + b = (Ax (k 1) b) α (k) Ar (k 1) r (k) = r (k 1) α (k) Ar (k 1). (3.5) (3.5) r (k 1) α (k) Ar (k 1) b Ax (k) r (k) Ar (k 1) 7 - r (k) r (k) x (k) b Ax (k) r (k) Algorithm 2 Algorithm 2:. Input: A R n n b R n ε MaxIter Output: Ax = b 1 r (0) = b Ax (0), δ (0) = r (0) r (0) 2 k = 0 3 while k < MaxIter and δ (k) > ε 2 δ (0) do 4 q = Ar (k), α (k) = δ (k) /r (k) q 5 x (k+1) = x (k) + α (k) r (k) 6 if k is divisible by n then 7 /* Periodically update by the original formula */ 8 r (k+1) = b Ax (k+1) 9 else 10 /* Reduce complexity according to (3.5) */ 11 r (k+1) = r (k) α (k) q 12 end if 13 k = k + 1, δ (k) = r (k) r (k) 14 end while 15 return x (k) 5

3.2 Eigenthings 3.2.1 Eigenvalues and Spectral Radius v B λ Bv = λv B v v B λ < 1 lim k B k v 0; λ > 1 lim k B k v Fig. 3.2 Fig. 3.3 Figure 3.2: λ = 0.5 B k v Figure 3.3: λ = 2 B k v x B x B B R n n n 4 n B x x B n {v 1,..., v n } x = n c i v i, Bx = n c iλ i v i λ i v i i, c i = 1 x = n v i i λ i > 1 B k x 1 B k x 0 B k x 0 λ i B spectral radius ρ(b) max λ i. (3.6) i 4 http://hliangzhao.me/math/math.pdf 6

B n B (defective matrix) 3.2.2 Jacobi Iterations Ax = b Jacobi Gauss-Seidel Jacobi Jacobi Ax = b Jacobi x (k+1) i i, a ii 0 x (k+1) i x (k+1) j x (k+1) i x i = b i j i a ijx j a ii, (3.7) = b i i 1 j=1 a ijx (k) j n j=i+1 a ijx (k) j (3.8) a ii x j = b i i 1 j=1 a ijx (k+1) j n j=i+1 a ijx (k) j, (3.9) a ii Gauss-Seidel Gauss-Seidel Jacobi D = diag(a) E A F A A = D E F Ax = b (D E F)x = b D 1 (D E F)x = D 1 b x = D 1 (E + F)x + D 1 b. B D 1 (E + F) z D 1 b Jacobi (3.8) x (k+1) = Bx (k) + z. (3.10) Ax = b x (stationary point) i x (i) x e (i) (3.10) x (k+1) = Bx (k) + z = B(x + e (i) ) + z = Bx + z + Be (i) x = Bx + z = x + Be (i) e (i+1) = x (i+1) x = Be (i) Jacobi ρ(b) < 1 0 Jacobi x x (0) Jacobi B Jacobi A A 7

3.3 Convergence Analysis 3.3.1 Instant Results e (i) A λ e r (i) = Ae (i) = λ e e (i) r (i) A λ e (3.4) α Ar (i) = λ e Ae (i) = λ e r (i), (3.11) α = r (i) r (i) r(i) Ar = r (i) r (i) (i) r(i) λ (3.11) er (i) = r (i) r (i) λ e r (i) r (i) = 1 λ e. (3.12) (3.1) e (i+1) + x = e (i) + x + α r (i), e (i+1) = e (i) + α r (i). (3.13) e (i+1) = e (i) + 1 r (i) (3.12) λ e = e (i) 1 λ e λ e e (i) = 0. x Fig. 3.4 Figure 3.4: A = [ [3, 2], [2, 6] ] A v 1 = [1, 2] v 2 = [ 2, 1] λ 1 = 7 λ 2 = 2 e (i) A λ e A A A = PDP 1 = PDP, (3.14) P = {v 1,..., v n } {v 1,..., v n } D = 8

diag(λ 1,..., λ n ) n e (i) e (i) = n ξ i v i, (3.15) e (i) 2 = n ξ2 r (i) = Ae (i) = n ξ iλ i v i e (i) Ae (i) = ( n ξ i vi )( n ) n ξ i λ i v i = ξi 2 λ i, (3.16) r(i) Ar (i) = ( n ξ i λ i vi ) ( n A ξ i λ i vi ) ( n )( n = ξ i λ i v i ξ i λ 2 ) i v i = e (i+1) = e (i) + α r (i) n ξi 2 λ 3 i. (3.17) e (i+1) = e (i) + r (i) r (i) r(i) Ar r (i) (i) n = e (i) + ξ2 i λ2 i n r (i) (3.18) ξ2 i λ3 i A λ (3.18) e (i+1) = e (i) + λ2 n ξ2 i λ 3 n ( Ae (i) ) ξ2 i = e (i) + λ2 n ξ2 i λ 3 n ( λe (i) ) ξ2 i = 0. (3.19) Ae (i) = A n ξ iv i = n ξ iav i = n ξ iλv i = λe (i) A f(x) n A e (i) A λ e A 3.3.2 General Convergence energy norm e A (e Ae) 1 2 (3.20) Fig. 3.5 x f(x) 9

Figure 3.5: 能量范数相同的两个向量 根据能量范数的定义可得 e(i+1) 2A = e (i+1) Ae(i+1) = (e (i) +α (3.13) r(i) )A(e(i) + α r(i) ) 2 = e (3.4) (i) Ae(i) + 2αr(i) Ae(i) + α r(i) Ar(i) r(i) r(i) ( ) ( r(i) r(i) )2 r(i) r(i) + r(i) Ar(i) = e(i) 2A + 2 Ar r(i) Ar(i) r(i) (i) = e(i) 2A (r(i) r(i) )2 Ar r(i) (i) ( = e(i) 2A 1 (r(i) r(i) )2 ) Ar )(e Ae ) (r(i) (i) (i) (i) n ( ) 2 2 2 ( ξi λi ) 2 n = e(i) A 1 n (3.16) & (3.17) ( ξi2 λi )( ξi2 λ3i ) n ( ξi2 λ2i )2 n = e(i) 2A ω 2 ω 2 1 n ( ξi2 λi )( ξi2 λ3i ) (3.21) 我们只要分析 ω 2 的上界 就可以知道梯度法在最坏情况下的表现了 ω 越小 收敛速度越快 至于原 因 此处暂时按下不表 后文会给出解释 我们先针对一个简单的情况 A R2 2 是个正定对称阵 进行分析 前文已经说明说明过 当 A 是正定矩阵时 其特征值 λ1, λ2 均为正数 且特征向量 v1, v2 线性无关 因此不妨设 λ1 > λ2 此时 A 的谱条件数 spectral condition number κ = λ1 /λ2 1 请回顾(3.15)5 我们将误差向量 e(i) 相对于特征向量张成的空间/坐标系的斜率定义为 µ ξ2 /ξ1 5 在(3.15)中 我们将 vi 视为基 相应地 ξi 是对应基上的坐标 将 v1 视为 x 轴 v2 视为 y 轴 即可类比得到斜率 µ 10

由此可对 ω 2 作如下化简 ( ω2 = 1 ( = 1 ξ12 λ21 +ξ22 λ22 ξ12 λ22 2 2 ξi λi ξ12 λ2 )2 )( 2 2 3 ξi λi ξ12 λ32 ) (κ2 + µ2 )2. (κ + µ2 )(κ3 + µ2 ) (3.22) 上式表明我们可以将 ω 看成 κ 和 µ 的函数 只考虑正数部分 观察 Fig. 3.6左图可发现 κ 和 µ 越 小 ω 越小 此外 µ = ±κ 时 ω 最大 这个结论从图像上看很直观 也可以通过对(3.22)求导得到 分析这个图像也可以理解梯度法是如何做到一步迭代即收敛的 当 e(i) 是一个特征向量时 此时误 差向量在特征向量组成的坐标系的坐标轴上 因此斜率 µ 为 0 或 此时均有 ω = 0 当 A 的所 有特征值均相等时 κ = 1 同样有 ω = 0 此外 我们可以观察一下 ω 随 κ 和 µ 的变化情况 当 κ 很小的时候 不论 µ 如何 不论初始点如何选取 收敛速度都是比较快的 这是因为 A 各个方向的 特征值相似 所以 f (x) 接近球形 在各个点处均有合适的迭代方向 当 κ 很大的时候 不同的特征 向量的能量范数差距很大 此时 f (x) 的一极相对另一极长很多 想象一个 a 和 b 差异很大的椭圆 x2 /a2 + y 2 /b2 = 1 如果初始点在山脊处 那么迭代方向还是很好找的 收敛速度不算慢 如果初 始点在山谷处 那么梯度法只能朝着波谷方向曲折而缓慢的前进 每一步迭代作出的有效改进很小 收敛速度很慢 Fig. 3.6右图给四种情形下梯度法的迭代情况做了一个形象化的展示 Figure 3.6: 左图 ω 关于 κ 和 µ 的函数图像 右图 四种情形下梯度法的迭代情况 a κ 大 µ 小 b κ 大 µ 大 c κ 小 µ 小 d κ 小 µ 大 我们再来算一算最差的情况下 即 µ = ±κ 时 ω 的上界是多少 对于对称正定阵 A 我们将条 件数 condition number 定义为特征值之比的最大值 λmax κ. λmin 当 µ = ±κ 时有 µ2 = κ2 因此(3.22)满足 ω2 1 (3.23) ( κ 1 )2 4κ4, κ5 + 2κ4 + κ3 κ+1 进而得到 ω κ 1. κ+1 该结论在 n 2 时依然成立 将(3.24)绘制出来可得 11 (3.24)

Figure 3.7: A ω κ ω f(x (i) ) f(x 1 ) f(x (0) ) f(x ) = 2 e (i) Ae (i) 1 2 e (0) Ae (0) = e (i) 2 A e (0) 2 = ω 2, (3.25) A ω e (i) 2 A i (3.24) ( ) i κ 1 e (i) A e (0) A, (3.26) κ + 1 f(x (i) ) f(x ) f(x (0) ) f(x ) A 4 Conjugate Directions Methods 4.1 What is Conjugacy ( ) 2i κ 1. (3.27) κ + 1 Fig. 3.6 b n d (0), d (1),..., d (n 1) n n (3.15) x (i) x (i+1) x (i) + α (i) d (i). (4.1) α (i) d (i) e (0) = x x (0) = α (0) d (0) +...α (n 1) d (n 1), (4.2) e (0) 12

i + 1 i e (i+1) = e (0) + α (k) d (k) (4.3) d (i) e (i+1) = n k=i+1 = k=0 n 1 k=i+1 α (k) d (i) d (k) α (k) d (k), d (i) d (j) = 0 if i j = 0. (4.4) d (i) (e (i) + α (i) d (i) ) = 0 = α (i) = d (i) e (i) d (i) d. (4.5) (i) (4.5) e (i) i e (i) d (i) α (i) n d (0), d (1),..., d (n 1) n A- d (0), d (1),..., d (n 1) A- A-orthogonal p q p Aq = 0, (4.6) p q A- α (i) A- Fig. 4.1 a A- A p q b a b Figure 4.1: A- A- A- conjugate direc- 13

tions methods (4.4) (4.5) d (i) Ae (i+1) = 0. (4.7) α (i) = d (i) Ae (i) d (i) Ad (i) r (i) = Ae (i) = d (i) r (i) d (i) Ad (i) (4.8) (4.5) d (i) (4.8) (4.7) (3.2) (4.1) x α f(x (i+1) ) = ( x(i+1) f(x (i+1) ) ) α x (i+1) = r(i+1) d (i) = (d (i) r (i+1)) = (d (i) Ae (i+1)) = 0 (4.7) A- d (i) r (i) (3.15) e (0) n 1 e (0) = δ i d (i). (4.9) n 1 d j Ae (0) = δ i d j Ad (i) d (i) d (j) = 0 if i = j = δ i d j Ad (j), δ j = d j Ae (0) d j Ad (j) = d j Ae (0) + j 1 i=0 α (i)d (j) Ad (i) d j Ad (j) = d j A( e (0) + j 1 i=0 α ) (i)d (i) d j Ad (j) d (i) d (j) = 0 if i j j 1 e (j) = e (0) + α (i) d (i) i=0 = d j Ae (j) d j Ad = d j r (j) (j) d j Ad (j) = α (j). (4.8) (4.10) α (i) = δ i e (i) 14

i 1 e (i) = e (0) + α (j) d (j) (4.3) = = j=0 n 1 i 1 δ j d (j) δ j d (j) (4.9) & (4.10) j=1 j=0 n 1 δ j d (j). (4.11) j=i 4.2 Gram-Schmidt Conjugation d (i) (4.1) (4.8) - conjugate Gram-Schmidt process A- - - Gram-Schmidt process - {u (0),..., u (n 1) } {q (0),..., q (n 1) } q (0) = u (0), q (1) = u (1) u (1) q (0) q (0) q (0) q (0), q (n 1) = u (n 1) n 2 ( u(n 1) q (i) ) i=0 q (i) q (i) q (i). (4.12) q (1) u (1) q (0) u (1) q (0) q (0) q (0) u (1) q (0) q (1) q (2) q (1) p (i) i q (0),..., q (i 1) q (i) i, q (i) i 1 q (i) = p (i) (p (i) q (k) )q (k), q (i) q (i) q (i). (4.13) k=0 A- - q (i) d (i) i 1 ( p (i) Ad (k) i : d (i) = p (i) d (k) Ad d (k) ). (4.14) (k) k=1 p (i) Ad (k) d (k) Ad β ik - (k) i = 0 d (i) = p (i) 0 < i < n i 1 d (i) = p (i) + β ik d (k), (4.15) k=0 d (i) Ad (j) = p (i) Ad i 1 ( ) (j) + β ik d (k) Ad (j) k=0 = p (i) Ad (j) + β ij d (j) Ad (j), (4.16) 15

(4.15) (4.17) (4.14) β ij = p (i) Ad (j) d (j) Ad. (4.17) (j) p (i) CG p (i) r (i) 4.3 Optimality Analysis e (i) e (i) A 6 D i i D i span{d (0),..., d (i 1) } (4.3) j 1 e (i) = e (0) + α (i) d (i) e (0) + D i. (4.18) Fig. 4.2 e (i) A i = 2 i=0 Figure 4.2: e (0) + D 2 e (i) A e (0) + D i e (i) e (i) = argmin e e(0) +D i e A e (i) A e (i) 2 A e (i) 2 A = e (i) A e (i) (4.11) ( n 1 ) A ( n 1 ) = δ j d (j) δ k d (k) d (i) d (j) = 0 if i j j=i k=i n 1 = (δ j d (j) ) A(δ j d (j) ). (4.19) j=i e (i) 2 A e (i) A CG 6 3.3.2 i 16

(4.11) i j j > i d (i) A n 1 d (i) r (j) = d (i) Ae (j) = d (i) A δ k d (k) k=j n 1 = δ k d (i) Ad (k) k=j d (i) d (k) A- = 0. (4.20) j {i + 1,..., n 1} r (j) d (0),..., d (i) 1 A- r (j) = Ae (j) (4.15) r (j) j > i d (i) r (j) = p (i) r i 1 (j) + β ik d (k) r (j) k=0 0 = p (i) r (j), (4.21) j {i + 1,..., n 1} r (j) p (0),..., p (i) 2 (4.15) r (j) j = i 3 d (i) r (i) = p (i) r (i). (4.22) Algorithm 2 (3.5) - x (k) = x (k 1) + α (k) d (k 1) Ax (k) + b = (Ax (k 1) b) α (k) Ad (k 1) r (k) = r (k 1) α (k) Ad (k 1). (4.23) r (k 1) α (k) Ad (k 1) b Ax (k) r (k) Ad (k 1) - b Ax (k) 4.4 Conjugate Gradient Method 4.2 CG d (0),..., d (n 1) r (0),..., r (n 1) - (4.21) (4.8) (4.1) x i j : r (i) r (j) = 0. (4.24) (4.15) d (i) d (0),..., d (i 1) n p (i) r (i) 17

r (i) r (j+1) = r (i) r (j) α (j) r (i) Ad (j) (4.23) α (j) r(i) Ad (j) = r(i) r (j) r(i) r (j+1) (4.24) 1 r(i) α Ad (i) r(i) r (i), i = j (j) = 1 α (i 1) r(i) r (i), i = j + 1 0. otherwise β ij = { 1 α (i 1) r (i) r (i) d (i 1) Ad (i 1), j = i 1 0. j < i 1 (4.25) (4.17) (4.26) CG O(n 2 ) O(m) m A j = i 1 α (i 1) β ij β ij = d (i 1) Ad (i 1) d (i 1) r (i 1) r(i) r (i) d (i 1) Ad (i 1) (4.8) = r (i) r (i) d (i 1) r (i 1) (4.22) & p (i 1) r (i 1) r(i) = r (i) r(i 1) r (4.27) (i 1) β ij j = i 1 β (i 1) (4.15) d (i+1) = r (i+1) + β (i) d (i). (4.28) CG Algorithm 3 4.5 Convergence Analysis CG n (4.23) - cancellation error A- 4.5.1 Picking Perfect Polynomials - span{r (0),..., r (i) } = span{d (0),..., d (i) } = D i+1, (4.29) r (i) D (i+1) (4.23) i + 1 r (i+1) r (i) Ad (i) r (i+1) r (i) AD i D i+2 = D i+1 AD i+1. (4.30) 18

Algorithm 3: CG. Input: A R n n b R n ε MaxIter Output: Ax = b 1 k = 0, r (0) = b Ax (0) 2 d (0) = r (0) // Set d (0) as r (0) 3 δ (0) = r (0) r (0) 4 while k < MaxIter and δ (k) > ε 2 δ (0) do 5 q = Ad (k), α (k) = δ (k) /d (k) q 6 x (k+1) = x (k) + α (k) d (k) 7 if k is divisible by n then 8 /* Periodically update by the original formula */ 9 r (k+1) = b Ax (k+1) 10 else 11 /* Reduce complexity according to (4.23) */ 12 r (k+1) = r (k) α (k) q 13 end if 14 δ (k+1) = r (k+1) r (k+1), β = δ (k+1) /δ (k) 15 d (k+1) = r (k+1) + βd (k) // Set d (k+1) by (4.28) 16 k = k + 1 17 end while 18 return x (k) D i = D i 1 AD i 1 = D 1 AD 1 A 2 D 1... A i 1 D 1 = span{d (0), Ad (0),..., A i 1 d (0) } = span{r (0), Ar (0),..., A i 1 r (0) } r (i) = Ae (i) = span{ae (0), A 2 e (0),..., A i e (0) }. (4.31) Krylov i ( i e (i) = I + ψ k A k) e (0), (4.32) k=1 ψ k α (k) β (k) A P i (A) λ P i (λ) = 1 + i k=1 ψ kλ k P i (0) = 0 (4.32) e (i) = P i (A)e (0) v 1,..., v n A λ 1,..., λ n e (0) 19

e (0) = n ξ iv i n e (i) = ξ j P i (λ j )v j Ae (i) = e (i) 2 A = j=1 n ξ j P i (λ j )λ j v j j=1 P i (A)v j = P i (λ j )v j n ξj 2 ( Pi (λ j ) ) 2 λj, v j = 1 (4.33) j=1 e (0) 2 A = n j=1 ξ2 j λ j P 0 ( ) 1 4.3 CG e (i) A (4.33) CG e (i) A P i e (i) 2 A min P i = min P i max λ max λ ( Pi (λ) ) 2 n ξj 2 λ j j=1 ( Pi (λ) ) 2 e(0) 2 A. (4.34) n = 2 Fig. 3.3 λ 1 = 7, λ 2 = 2 Fig. 4.3 CG max λ ( Pi (λ) ) 2 a i = 0 e(0) 2 A 1 b i = 1 P i (0) = 1 P 1 (λ) = a λ + 1 a a = argmin max { (2a + 1) 2, (7a + 1) 2}, a a = 2/9 P 1 (λ) = 2/9λ + 1 5/9 c i = 2 (0, 1), (2, 0), (7, 0) 0 Figure 4.3: i CG P i(0) = 1 P i 0 λ 1 = λ 2 (0, 1), (λ 1, 0), (λ 2, 0) CG Fig. 4.3 d 20

λ min λ max CG [λ min, λ max ] CG 4.5.2 Chebyshev Polynomials Chebyshev polynomials i T i (ω) = 1 ( (ω + ω 2 2 1) i + (ω ω 2 1) i). (4.35) Fig. 4.4 ω [ 1, 1] T i (ω) 1-1 1 ω / [ 1, 1] P i (λ) = ( ) T λmax+λ min 2λ i λ max λ min T i ( λ max+λ min λ max λ min ) (4.36) 7 max λ ( Pi (λ) ) 2 Fig. 4.4 λmin = 2, λ max = 7 4.3 c 0.183 Figure 4.4: 7 CG 21

T i ( λmax+λmin 2λ λ max λ min ) 1 ( ) λ T max+λ min 2λ i λ max λ min e (i) A ( ) e (0) A λ T max+λ min i λ max λ min ( ) 1 λmax + λ min T i e (0) A λ max λ min ( ( κ + 1 ) i ( κ 1 ) ) i 1 = 2 + e (0) A. (4.37) κ 1 κ + 1 ( κ+1 ) i i κ 1 0 CG ( κ 1 ) i e(0) e (i) A 2 A. (4.38) κ + 1 (4.37) (3.27) i = 1 CG i > 1 Fig. (3.7) CG Fig. (4.5) CG κ 2 CG (4.38) Figure 4.5: CG κ 4.6 Complexity Analysis CG CG CG - - O(m) m ϵ e (i) = ϵ e (0) (3.27) 1 ( 1 ) i 2 κ ln. (4.39) ϵ 22

(4.38) CG 1 ( 2 ) i κ ln 2. (4.40) ϵ O(mκ) CG O(m κ) O(m) d κ O(n 2/d ) O(n 2 ) CG O(n 3/2 ) O(n 5/3 ) CG O(n 4/3 ) 5 Conjugate Gradient Method in Action 4 CG CG 5.1 Starting and Stopping Criterion x (0) CG x (0) = 0 f(x) A (4.27) Algorithm 3 Step 14 β ij 0 Algorithm 3 Step 4 5.2 Preconditioning Fig. 4.5 (4.38) κ CG CG M Ax = b M 1 Ax = M 1 b, (5.1) κ(m 1 A) κ(a) M 1 A A M 1 A CG Ax = b M M A M 1 A M E EE = M Cholesky λ M 1 A v (E 1 AE )(E v) = E 1 A(E E )v E E = E E = I = (E E )E 1 Av = E (E E 1 )Av E E 1 = M 1 = E M 1 Av M 1 Av = λv = λ(e v), (5.2) λ E 1 AE E v Ax = b { E 1 AE ˆx = M 1 b, ˆx = E x. (5.3) 23

ˆx x A E 1 AE CG ˆx Transformed Preconditioned Conjugate Gradient Method TPCG CG (5.4) ˆd (0) = ˆr (0) = E 1 b E 1 AE ˆx (0) α (i) = ˆr (i) ˆr (i) ˆd (i) E 1 AE ˆd (i) ˆx (i+1) = ˆx (i) + α (i) ˆd(i) ˆr (i+1) = ˆr (i) α (i) E 1 AE ˆd (i) β (i+1) = ˆr (i+1) ˆr (i+1) ˆr (i) ˆr (i) ˆd (i+1) = ˆr (i+1) + β (i+1) ˆd(i). TPCG E E x ˆx ˆr (i) = E 1 r (i) ˆd (i) = E d (i) ˆx = E x E E 1 = M 1 (5.4) r (0) = b Ax (0), d (0) = M 1 r (0) α (i) = r (i) M r (i) d (i) Ad (i) x (i+1) = x (i) + α (i) d (i) r (i+1) = r (i) α (i) Ad (i) β (i+1) = r (i+1) M 1 r (i+1) r (i) M 1 r (i) d (i+1) = M 1 r (i+1) + β (i+1) d (i). Untransformed Preconditioned CG Method UPCG TPCG UPCG ˆx UPCG E f(x) M = A κ(m 1 A) = 1 f(x) f(x) Mx = b M A f(x) Cholesky incomplete Cholesky preconditioning CG / Transformed/Untransformed Preconditioned Steepest Descent Method 5.3 Extending to General Cases A CG x Ax b 2 = 0 (5.4) (5.5) min x Ax b 2, (5.6) A Ax = A b. (5.7) A Ax = b A R m n m > n overconstrained (5.7) (5.6) A A Ax = b underconstrained A A CG 24

(5.7) A CG CG r (i) b Ax (i) r (i) f (x (i) ) α (i) β (i) Fletcher-Reeves FR Polak-Ribière PR β(i+1) F R = r (i+1) r (i+1) r (i) r (i) β(i+1) P R = r (i+1) (r (i+1) r (i) ) r (i) r (i) (5.8) FR f(x) PR β(i+1) P R < 0 CG { r } (i+1) (r (i+1) r (i) ) β (i+1) max r(i) r, 0, (5.9) (i) CG CG f f CG f CG 6 Postscript Jonathan Richard Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain http://hliangzhao.me/math/ CG.pdf 25