Tmng-Power Optmzaton for Mxed-Radx Lng Adders by Integer Lnear Programmng Y Zhu Janhua Lu Haun Zhu and Chung-Kuan Cheng Department of Computer Scence & Engneerng Unversty of Calforna San Dego
Outlne Prefx Adder Problem Bacground & Prevous Wor Extensons Hgh-radx Lng Our Wor Area/Tmng/Power Models ILP Formulaton Expermental Results Future Wor 2
Bnary Addton Input two n-bt bnary numbers 0 and b n... bb one bt carry-n c 0 0 Output n-bt sum s... s s n 0 and one bt carry out c n Prefx Addton Carry generaton & propagaton Generate Propagate c + s g c g p + p ( a a b a b c b ) a n... a a 3
4 Prefx Addton Formulaton b a p a b g Preprocessng Postprocessng Prefx Computaton c p s c P G c + + 0 0 0 P P P G P G G +
Prefx Adder Prefx Structure Graph 4 3 2 a b gp gp generator Preprocessng GP GP - Prefx Computaton GP GP cell G 0 s p 4 3 2 Postprocessng sum generator 5
Hgh-Radx Adders Each cell has more than two fan-n s Pros less logc levels 6 levels (radx-2) vs. 3 levels (radx-4) for 64-bt addton Cons larger delay and power n each cell 6
Radx-3 Slansy & Kogge- Stone Adder Davd Harrs Logcal Effort of Hgher Valency Adders 7
8 Lng Adders b a p a b g Preprocessng Postprocessng Prefx Computaton c p s c P G c + 0 0 0 P P P G P G G + Prefx Lng ( ) * 0 * 0 + p d G d G s b a p a b g * * + + p p P g g G * 2 * * * G P G G + * 2 * * P P P * 0 G p c
An 8-bt Lng Adder 8 7 6 5 4 3 2 H 8 H 7 H 6 H 5 H 4 H 3 H 2 H 9
An 8-bt Lng Adder 8 7 6 5 4 3 2 H 8 H 7 H 6 H 5 H 4 H 3 H 2 H 0
Area Model Dstngush physcal placement from logcal structure but eep the bt-slce structure. Logcal level Bt poston Bt poston 8 7 6 5 4 3 2 8 7 6 5 4 3 2 Physcal level Logcal vew Physcal vew Compact placement
Tmng Model Cell delay calculaton d f + Effort Delay p Intrnsc Delay f g h Logcal Effort Electrcal Effort Cout/Cn (fanouts+wrelength) / sze Intrnsc propertes of the cell 2
Power Model Total power consumpton Dynamc power + Statc Power Statc power leaage current of devce P sta λ*#cells Dynamc power current swtchng capactance P dyn ρ C load ρ s the swtchng probablty ρ ( s the logcal level*) Ptotal Pdyn+ Psta Cload + λ # cells * Vanchayobon S etc Power-speed Trade-off n Parallel Prefx Crcuts 3
ILP Formulaton Overvew Structure varables GP cells Connectons (wres) Physcal postons Capactance varables Gate cap Vertcal wre cap Horzontal wre cap Power Obectve ILP ILOG CPLEX Tmng varables Input arrval tme Output arrval tme Optmal Soluton 4
Integer Lnear Programmng (ILP) ILP Lnear Programmng wth nteger varables. Dffcultes and technques Constrants are not lnear Lnearze usng pseudo lnear constrants Search Space too large Reduce search space Search s slow Add redundant constrants to speedup 5
6 Lnearzaton ) ( ) ( R h L h y y ) ( ) ( R l L l y y ) 2 2 ( 2) 2 ( R l L l y y ) ( ) ( R L y y f ) ( ) ( + (l) wr wl(h) y y L l R h f ) ( ) ( ) ( + wl(h) l wr y l R h ) ( ) ( ) ( + wl(h)) ( n l wr y l R h ) ( ) ( ) ( + + wl(h)) ( n l wr y l R h y L ) ( Lnearze Pseudo Lnear
Search Space Reducton Lng s adder separate odd and even bts Double the bt-wdth we are able to search 7
Redundant Constrants Cell () s nown to have logc level before wre connecton Assume load s MnLoad (fanout wth mnmum wre length) P ( ) MnLoad +λ Cell () has a path of length - Assume each cell along the path has MnLoad T ( ) ( PD+ LE MnLoad) 8
Experments 6-bt Unform Tmng 9
Mn-Power Radx-2 Adder 6 5 4 3 2 0 9 8 7 6 5 4 3 2 6 5 4 3 2 0 9 8 7 6 5 4 3 2 20
Mn-Power Radx-2&4 Adder 6 5 4 3 2 0 9 8 7 6 5 4 3 2 6 5 4 3 2 0 9 8 7 6 5 4 3 2 Radx-2 Cell Radx-4 Cell 2
Mn-Power Mxed-Radx Adder 6 5 4 3 2 0 9 8 7 6 5 4 3 2 6 5 4 3 2 0 9 8 7 6 5 4 3 2 Radx-2 Cell Radx-3 Cell Radx-4 Cell 22
Experments 6-bt Non-unform Tme ILP s able to handle non-unform tmngs Lng adders are most superor n ncreasng arrval tme faster carres 23
Experments 64-bt Herarchcal Structure Handle hgh bt-wdth applcatons 6x4 and 8x8 a 64 b 64 a 49 b 49 a 48 b 48 a 33 b 33 a 32 b 32 a 7 b 7 a 6 b 6............ a b Level ILP Bloc ILP Bloc ILP Bloc ILP Bloc... GP* 4949... GP* 3333... GP* 77... GP* GP* 6450 GP* 4834 GP* 328 GP* 62 Level 2 ILP Bloc..................... H 64 H 49 H 48 H 33 H 32 H 7 H 6 H 24
Experments 64-bt Herarchcal Structure TSL a 64-bt hgh-radx three-stage Lng adder V. Olobdza and B. Zeydel Energy-Delay Characterstcs of CMOS Adders n Hgh-Performance Energy-Effcent Mcroprocessor Desgn pp. 47-70 2006 25
ASIC Implementaton - Results 64-bt herarchcal desgn by ILP vs. fast carry loo-ahead adder by Synopsys Module Compler TSMC 90nm standard cell lbrary was used Method Area (nm 2 ) Delay (ns) Power (mw) MC 352.0644 5.47 ILP 3833 0.9425 2.54 ILP 3636 0.9607 2.353 ILP 34.278.973 26
Future Wor ILP formulaton mprovement Expected to handle 32 or 64 bt applcatons wthout herarchcal scheme Optmzng other computer arthmetc modules Comparator Multpler 27
Q & A Than You! 28
Prevous Wors Classcal prefx 8 7 adders 6 5 4 3 2 8 7 6 5 4 3 2 8 7 6 5 4 3 2 8 7 6 5 4 3 2 Brent-Kung Logcal levels 2log 2 n Max fanouts 2 Wre tracs 8 7 6 5 4 3 Slansy Logcal levels log 2 n Max fanouts n/2 Wre tracs 2 8 7 6 5 4 3 Kogge-Stone Logcal levels log 2 n Max fanouts 2 Wre tracs n/2 2 29
Experments 6-bt Unform Tmng (CPU Tme) CPU Tme for Optmal Lng Adders 00000 0000 CPU (sec) 000 00 0 Radx-2 Lng Radx-2&4 Lng Mxed-Radx Lng 5 0 5 20 25 Delay Smaller Delay Const -> Less accurate of LP 30
ASIC Implementaton ILP Formulaton ILOG CPLEX Optmal Soluton Synthess Program Power Prme Power TSMC 90nm Standard Cell Lbrary Verlog Fle Physcal Compler Placement Astro Routng Tmng Analyss Verlog Fle Module Compler Area Delay 3