Adaptive Playout Scheduling Using Time-scale Modification Yi Liang, Nikolaus Färber Bernd Girod, Balaji Prabhakar
Outline QoS concerns and tradeoffs Jitter adaptation as a playout scheduling scheme Packet scaling using improved -scale modification technique Loss concealment in compatible with adaptive playout Performance comparison and audio demos
QoS Concerns at the Receiver Over best-effort network Obstructs proper reconstruction of voice packets at the receiver Delay Jitter Impairs interactivity of conversations Delay Packet Loss Impairs speech quality
Playout Algorithm (1) - Fixed Deadline Use buffer to absorb delay variations and playout voice packets at fixed deadline jitter absorption Voice packets received after deadline are discarded packetization Sender 1 2 3 4 5 6 7 8 Receiver Playout 1 2 3 4 5 6 7 8 late loss buffering delay
Buffer Delay vs. Late Loss late loss buffering delay Delay Playout Jitter Packet Loss Fixed playout deadline and jitter absorption: The playout rate is constant The tradeoff is between buffering delay and late loss
Playout Algorithms (2) - Adaptive Playout Monitor delay variation and adapt playout accordingly - jitter adaptation Slow down playout when delay increases to avoid loss; speed up playout when delay decreases to reduce delay packetization Sender 1 2 3 4 5 6 7 8 Receiver Playout 1 2 3 4 5 6 7 8 buffering delay slow down, speed up
Adaptive Playout and Jitter Adaptation buffering delay Delay Playout Jitter Packet Loss Adaptive playout and jitter adaptation Scaling of voiced packets in highly dynamic way Playout schedule set according to past delays recorded Improved tradeoff between buffering delay and late loss Playout rate is not constant
Packet Scaling (1) pitch period template segment 0 1 2 3 4 input packet output packet 0/1 1/2 2/3 3 4 Based on WSOLA [Verhelst 93] Improved to scale short individual voice packets In-and-out black box operation, no algorithmic delay, smooth transitions Preserves pitch
Packet Scaling (2) STD network delay= 20.9 ms Max. jitter=112.0 ms STD total delay= 10.5 ms Packets scaled: 18.4 % Scaling ratio: 50% - 200% DMOS: 4.5 DMOS scaling : degradation is 5 - inaudible 4 - audible but not annoying 3 - slightly annoying 2 - annoying 1 - very annoying
Loss Concealment L L i-2 i-1 i lost i+1 i+2 alignment found by correlation i-2 i-1 i+1 i+2 2 L 1.3 L Based on [Stenger 96] Using information from both sides, delay minimized to one packet Integrates nicely into system when adaptive playout is used 20% random packet loss: Original: Loss: Concealed:
Comparison of Different Algorithms 1. Method which uses fixed playout throughout the whole session; 2. Method which estimates delay dynamically but only adjusts playout during silence periods [Ramjee 94, Moon 98]; 3. Method which dynamically estimates and adjusts playout, and scales packets within talkspurts using -scale modification.
Performance Comparison Traces measured between a host at Stanford and hosts in: 1) Chicago 2) Germany 3) MIT 4) China
Overall Performance Buff. delay Loss rate MOS Alg. 2 55 ms 10% 2.6 Alg. 3 4% 3.7 Original 4.4 Quality Score Excellent 5 Good 4 Fair 3 Poor 2 Bad 1
Conclusions Small playout rate variation can be traded for lower delay and lower loss rate Playout scaling depends on audio scaling; scaling of individual packets is almost inaudible Improved -scale technique to work on individual packets with minimum delay WSOLA based loss concealment integrates nicely into system Adaptive playout and jitter adaptation significantly reduce buffering delay and late loss, which results in improved overall performance