28nm FPGA TeraFLOPS WP011421.0 DSP 101 Innovation Drive San Jose, CA 95134 www.altera.com 2010 Altera ALTERA ARRIA CYCLONE HARDCOPY MAX MEGACORE NIOS QUARTUS STRATIX Altera www.altera.com/common/legal.html Altera Altera Altera Altera Altera Altera 2010 9 Altera
2 FPGA 4500 4000 4096 3500 3000 2500 2000 1500 1000 500 0 3.2X 2048 1288 896 1.4X 6.4X 512 322 224 89 128 4X 1.4X EP3SE110 EP4SGX230 EP5SGSD8 18x18 Multipliers SinglePrecision FloatingPoint Multipliers DoublePrecision FloatingPoint Multipliers FPGA 28nm FPGA TeraFLOPS 2010 9 Altera
FPGA 3 +/ Mantissa1 Mantissa2 Exponent1 Exponent2 +/ +/ Slightly largerwider operands Denormalize Normalize True floatingpoint mantissa (not just 1.0 1.99..) +/ +/ Remove Normalization Mantissa Exponent Do not apply special or error conditions here 2010 9 Altera 28nm FPGA TeraFLOPS
4 / 8x8 32x32 64x64 128x128 E SD F 57.60 9.40 5.33 2.29 459.18 44.30 36.94 7.60 E HD F 10.38 2.73 1.65 1.27 47.10 10.36 7.36 5.33 2 z n + 1 = z n + c e (c) x single (c) Square 2 double (c) + c mag Mag double 4 C1 a a>=b b CmpGE boolean 1 single (c) x single (c) Square double (c) 2 point + 3 nz 20 Maxlter a a>=b b CmpGE1 boolean l Finished boolean 4 exit 1 qpoint 3 count + int16 2 1 qcount Maxlter1 Coord2 28nm FPGA TeraFLOPS 2010 9 Altera
5 Math.h SIN POW(x,y) COS LDEXP TAN FLOOR ASIN CEIL ACOS F ATAN SQRT EXP DIVIDE LOG 1/SQRT LOG10 LU QR 2010 9 Altera 28nm FPGA TeraFLOPS
6 1TeraFLOPS 1TeraFLOPS +/ Mantissa1 Mantissa2 Exponent1 Exponent2 27 18 +/ +/ Slightly largerwider operands 64 Accum Denormalize True floatingpoint mantissa (not just 1.0 1.99..) +/ Normalize +/ Remove Normalization Mantissa Exponent Do not apply special or error conditions here 18x18, 27x27, 36x36 seamless tradeoff Greatly increased multiplier density High fmax with logic and routing reductions FPGA 703K (LE) 282K (ALM) 574K (ALUT) 1128K 4096 ( 18x18 ) 2048 ( 27x27 ) 55Mb RAM ( 20k ) 28nm FPGA TeraFLOPS 2010 9 Altera
GFLOP 7 ( f MAX ALMs DSP M9K M144K MemBits ) GFLOPS 8x8 8x8 8 3,367 32 26 14,986 420 209 6.30 16x16 16x16 8 3,585 32 27 55,562 421 611 6.32 32x32 32x32 16 6,301 64 76 339,718 419 2,172 13.00 64x64 64x64 32 11,822 128 80 16 2,382,318 388 8,353 24.45 64 ALUT 13.4K 21.6K 16.4K 28.9K GFLOP 2010 9 Altera 28nm FPGA TeraFLOPS
8 IP IP 13.4K ALUT = 127 = 49 GFLOPS 574 / 13.4 = 43 43 49 GFLOPS = 2107 GFLOPS 16.4K = 127 = 49 GLOPS 1128 / 16.4 = 69 69 49 GFLOPS = 3381 GFLOPS 64 (27x27) 2048 / 64 = 32 32 49 GFLOPS = 1568 GFLOPS = 100% = 75% = 46% 28nm FPGA TeraFLOPS 2010 9 Altera
IP 9 2010 9 Altera 28nm FPGA TeraFLOPS
10 Stratix IV EP4SGX530 406,465 424,960 96 ALUT 308,521 424,960 73 Reg 294,579 424,960 69 M9K 1,280 1,280 100 M144K 64 64 100 DSP 18 896 1,024 88 f MAX 222.72 MHz 4.5977 µs ( 0.3284 µs) Stratix IV EP4SGX530 300,000 424,960 70 ALUT 224,000 424,960 53 Reg 210,000 424,960 49 M9K 1,280 1,280 100 M144K 64 64 100 DSP 18 896 1,024 88 f MAX 300+ MHz 3.4 µs ( 0.24 µs) 28nm FPGA TeraFLOPS 2010 9 Altera
11 1. Altera www.altera.com/products/ip/dsp/arithmetic/maltfloatpoint.html 2. 7542008 IEEE http://ieeexplore.ieee.org 3. Suleyman S. Demirsoy and Martin Langhammer, Fused Datapath Floating Point Implementation of Cholesky Decomposition, Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, February 22 24, 2009: http://portal.acm.org/dl.cfm Michael Parker DSP IP Altera 2010 9 1.0 2010 9 Altera 28nm FPGA TeraFLOPS