Signal Processing on DSP Platforms
Lecture Outline Arithmetic Operations on C54x DSP Signal Processing on DSP Real-time Signal Processing on DSP * Please Refer to TMS320C54x Reference Set, Vol4: Applications Guide for details. The slides contain Licensed Materials obtained from TI C5000 Teaching ROM. The materials, together with the slides, should be used solely for internal academic and/or teaching needs.
Fixed-point & Floating-point DSP Floating-point offers: + Better Dynamic Range + Easier Algorithm and Software Design - More Complex and Slower Implementation
Fixed-point & Floating-point DSP Fixed-point DSP: 0x40000000 representing 1073741824 Floating-point DSP: 0x40000000 representing 2.0 (IEEE-754 Format)
Arithmetic on C54x DSP TI C54x DSPs are 16-bit fixed-point processors. In the C54x DSP core, numbers are represented and manipulated in integer format: unsigned: signed: 2 s complement The user should take responsibility for providing a meaningful result.
Arithmetic on C54x DSP Fixed-point Arithmetic Signed or unsigned Word length: overflow finite word length effects (NOT covered here) Extended-precision Arithmetic 32-bit/64-bit arithmetic
Sign Extension the SXM bit in ST1 on C54x: SXM =0 disable sign extension mode SXM =1 enable sign extension mode SXM =1 A=00,8000,0000H + 1000H A=FF,8000,1000H SXM =0 A=00,8000,0000H + 1000H A=00,8000,1000H -2147483648 + 4096 = -2147479552 2147483648 + 4096 = 2147487744
The following equation is the basis of many DSP algorithms y Overflow N 1 k = 0 ( n) = a( k) x( n k) Two problems arise when using signed and unsigned integers: Multiplication overflow. Addition overflow.
Overflow Multiplication Overflow 16-bit x 16-bit = 32-bit Example: using 4-bit representation 0 0 1 1 3 x 1 0 0 0 x 8 0 0 0 1 1 0 0 0 24 24 cannot be represented with 4-bits.
Overflow Addition Overflow 32-bit + 32-bit = 33-bit Example: using 4-bit representation 1 0 0 0 8 + 1 0 0 0 + 8 1 0 0 0 0 16 16 cannot be represented with 4-bits.
Overflow Overflow Solution Prevention rule: design your code with minimal possible overflows. Guard bits in accumulator could be useful for intermediate results. Example: 5 x 2-1 x 4 = 10 4 = 6 If accumulator has one extra bit called guard bit the final result will not overflow! The solutions for reducing the overflow problem are: Saturate the result. Use double precision result. Use fractional arithmetic.
Overflow Saturate the Result Unsigned numbers: If A x B 15 result = A x B If A x B > 15 result = 15 x 0 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 3 8 24 Saturated 1 1 1 1 15
Signed numbers: Overflow Saturate the Result If-8 A x B 7 result = A x B If A x B > 7 result = 7 If A x B < -8 result = -8 x 1 1 1 0 Saturated 0 0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 3-8 -24-8
Overflow Double Precision Result For a 4-bit x 4-bit multiplication hold the result in an 8-bit location. Problems: Uses more memory for storing data. If the result is used in another multiplication the data needs to be represented into single precision format (e.g. prod = prod x sum). Results need to be scaled down if it is to be sent to an A to D converter.
Overflow Using Fractional Arithmetic If A and B are fractional then: A x B < min(a, B) i.e. The MPY result is less than the operands hence it will never overflow. But the ADD result may still overflow. The simplest way to avoid overflow is scaling. e.g. y( n) = N 1 i= 0 h x( n i) i If the input is in Q15 format, overflow can be avoided if: N 1 i = 0 h i 1
Fractional Arithmetic Definition: -2 0 2-1 2-2 -1 0.5 0.25 0 0 1 2 -(N (N-1) 1 For 16-bit representation: MAX = 1-2 -15 = 0.999969 MIN = -1-1 x < 1
Fractional Arithmetic Also called Q-format Q represents the Quantity of fractional bits Number following the Q indicates the number of bits that are used for the fraction. Q15 used in 16 bit DSP chip, resolution of the fraction will be 2^ 15 or 30.518e 6 Q15 means scaling by 1/2 15 Q15 means shifting to the right by 15 Example: how to represent 0.2625 in memory: Method 1 (Truncation): INT[0.2625*2 15 ] = INT[8601.6] = 8601 = 0010000110011001 Method 2 (Rounding): INT[0.2625*2 15 +0.5] = INT[8602.1] = 8602 = 0010000110011010
Fractional Arithmetic Q format Multiplication Product of two Q15 numbers is Q30. So we must remember that the 32-bit product has two bits in front of the binary point. Since NxN multiplication yields 2N-1 result Addition MSB sign extension bit Typically, only the most significant 15 bits (plus the sign bit) are stored back into memory, so the write operation requires a left shift by one. Q15 Q15 Extension sign bit Sign bit X 15 bits 15 bits 16-bit memory
Floating-point Arithmetic Single Precision Format s = sign bit e = exponent (8-bit biased : -127) m = mantissa (23-bit normalised fraction) value = (-1) sign * (1.mantissa) * 2 (exponent-127) Example: 0x41400000 representing? 12.0
Floating-point Arithmetic Addition ( ) E E A+ B = M 2 + M 2 = M + M 2 2 a b a b E E E a b b a a It is necessary to denormalize the smallest number (B) Its mantissa is multiplied by 2 Eb-Ea before being added to Ma. Loss of precision due to the rounding of the mantissa
Floating-point Arithmetic Multiplication A B= M M 2 Ea+ E a b b It is necessary to normalize M a M b 1 extra bit would be necessary to prevent overflow of E a +E b. 2m-1 bits are necessary to represent M a M b If M is truncated to m bits, the absolute error increases rapidly.
Floating-point Arithmetic Floating-point Arithmetic Implementation on a Fixed-point DSP Operands must first be converted to fixed-point numbers. For floating-point addition, shift the mantissa so that the exponents of the two operands match. Left-shift the lower-power operand by the difference between the two exponents. Add the exponents and normalize the result. For floating-point multiplication, add their mantissas, multiply the exponents, and normalize the resulting mantissa. Convert fixed-point values back to floating-point numbers.
Arithmetic Summary Unsigned integer: 2 N-1 2 0 x... 2 1 x x Signed integer: -2 N-1 2 0 x... 2 1 x x Signed fractional: -2 0 2 1 2 2 2 -(N-1) x x x x...
Arithmetic Summary Dynamic Range Largest Number (positive) Smallest Number (positive) Smallest Number (negative) Floating Point Single Precision 3.4 x 10 38 5.8 x 10-38 -3.4 x 10 38 2 16 Integer 16-bit 16-1 1-2 16 Fixed Point 32-bit 2 32 32-1 1-2 32 Fractional 16-bit 1-2 -15 2-15 -1 32-bit 1-2 31 2-31 -1
Arithmetic Summary Multiply by 2: Use shift left Divide by 2: Use shift right Division: Use subtraction and shift Sine, Cosine, Log: 1. Use Taylor series expansion 2. Use look up tables To convert a fractional number to hex: 1. Num x 2 15 2. Then convert to hex e.g: convert 0.5 to hex 1. 0.5 x 2 15 = 16384 2. (16384) dec = (0x4000) hex
Arithmetic Summary The final result of a calculation usually uses more than 16 bits (size of memory words). ACCUs use 32, 40, 56... Bits If we want to save the result in a single memory word, the question is: Which pack of N bits must be saved from accumulator? Possibility of overflow and underflow Overflow during accumulation or during saving. Guard bits ACCU High ACCU Low 39 32 31 16 15 0 16 bits to save
Signal Processing FIR Filter x(n) h 0 h 1 h 2 h N-2 h N-1 + + + + y(n) y( n) = N 1 i= 0 h x( n i i) x[n] h k y[n] N represents the filter input, represents the filter coefficients, represents the filter output, is the number of filter coefficients (order of the filter).
Signal Processing FIR Filter Design To fully design and implement a filter five steps are required: (1) Filter specification. (2) Coefficient calculation. (3) Structure selection. (4) Simulation (optional). (5) Implementation.
Signal Processing H(f) 1 pass-band stop-band fc : cut-off frequency (a) fs/2 f(norm) Filter Specification - Step 1 pass-band H(f) (db) Δ p 0-3 transition band stop-band pass-band ripple H(f) (linear) 1 + δ p 1 1 δ p Δ s stop-band ripple δ s fsb : stop-band frequency fc : cut-off frequency fpb : pass-band frequency (b) fs/2 f(norm)
Signal Processing FIR Filter Design: Coefficient Calculation There are several different methods available, the most popular are: Window method. Frequency sampling. Parks-McClellan. We can use MATLAB or other commercial tools to calculate the filter coefficients.
Signal Processing FIR Filter Design: Realization Structure Direct form structure for an FIR filter: x(n) z -1 z -1 z -1 b 1 b 0 b 2 b N-1 + + + y(n) y ( n) = b x( n) + b x( n ) +... + b x( n N 1) 0 1 1 N 1 +
Cascade structures: Signal Processing FIR Filter Design: Realization Structure x(n) b 0 + + + y(n) z -1 b 1,1 + z -1 b 2,1 + z -1 b M,1 + z -1 z -1 z -1 b 1,2 b 2,2 b M,2
FIR Filter Implementation (1) y( n) = x( n) h(0) + x( n 1) h(1) + + x[ n ( N 1)] h( N 1) We must compute a different sum of products for each n. For each time instant n, we always have the same set of [ h(0) h(n-1) ]. For each time instant n, we need only store the most recent N data samples, x[n],..., x[n (N-1)], sometimes called a sliding window. We can use different method to implement such a sliding window.
FIR Filter Implementation (2) Method I y( n) = x( n) h(0) + x( n 1) h(1) + + x[ n ( N 1)] h( N 1) pcoff h(n-1) h(n-2) x(n-n+1) x(n-n+2) oldest x(n-n+2) x(n-n+3) pdata h(1) h(0) x(n-1) x(n) newest x(n) x(n+1) Each time a MAC operation is done, the data of x(i) is copied to the higher address, replacing x(i-1). We can take advantage of MACD instruction to implement the above operations. The data buffer is of ripple style.
Method II y( n) = x( n) h(0) + x( n 1) h(1) + + x[ n ( N 1)] h( N 1) h(n-1) h(n-2) FIR Filter Implementation (3) pdata x() x(n-n+1) x(n) pdata x(n) x(n-n+1) x(n+1) Circular Circular h(1) h(0) pcoff x(n-2) x(n-1) x(n-2) x(n-1) After each iteration, only the oldest data is replaced by the incoming data. The pdata is progressed by 1 after each iteration. The data buffer is of circular style.
Real-time Signal Processing Example:audio signal filtering Tasks 1. sampling incoming audio signals 2. filtering 3. output processed audio signals
Real-time Signal Processing Polling Mode main() { while ( true) { sample ready? get one sample a ; filtering a ; output a ; } } Polling mode is generally not very efficient!
Real-time Signal Processing Polling Mode
Real-time Signal Processing Interrupt driven
Real-time Signal Processing Interrupt driven Suppose each time a sample is ready, the A/D will issue an interrupt. In the corresponding ISR, we read out the new sample from A/D, and write a result to D/A for output. The sample rate is the same as the interrupt frequency. Two issues are to be determined about the processing: How? sample-by-sample v.s. frame-by-frame Where? in the ISR v.s. in the background loop
Real-time Signal Processing sample-by-sample get one sample; filtering ; output the point; frame-by-frame (buffered) get one sample into A ; output one point from B ; if ( buffer A full ) { filtering A into B ; } It is generally more efficient to use frames. But it will cause additional latency.
Real-time Signal Processing Interrupt driven ISR Loaded main() { } return; interrupt() { get one sample; filtering ; output the point; } return_enable ; } global buffer A, B ; main() { return; } interrupt() { get one sample into A ; output one point from B ; if ( buffer A full ) { filtering A into B ; } return_enable ; } ISR heavily Loaded!
Real-time Signal Processing Interrupt driven ISR Loaded Possible Problems: interrupt 1 2 N filtering filtering 1/Fs Samples may be missing!
Real-time Signal Processing Interrupt driven Main Looping global buffer A, B ; main() { while ( true) { if ( buffer A full ) { copy A to A1 ; copy B1 to B ; filtering A1 into B1 ; } } } interrupt() { get one sample into A ; output one point from B ; return_enable ; }
Real-time Signal Processing Summary Use Interrupts as the Backbone of System. Put critical operations only in the ISR. Use background loops for time-consuming processing tasks. As the processing tasks increase, it is better to turn to multi-threaded style with well-defined thread scheduling, or, to say, a RTOS.
References 1. 数字信号处理器实验讲义 2. TI, TMS320C54x Reference Set, Vol1 5, 1999 3. 张雄伟, DSP 芯片的原理与开发应用, 电子工业出版社,2000 4. www.ti.com