Acoustic Echo Cancellation

Updated on May 29, 2026

Abstract

The quick advancement of technology in recent years has altered the entire dimension of communication. Now a days people are more interested in hands-free communication. In such a situation, the use a typical loudspeaker and often a high-gain microphone, in place of obsolete telephone receiver, looks more justified.

This would allow multiple persons to participate in a conversation at the same time such as in a teleconferencing scenario. Yet another benefit is that it would allow the person to have both hands free and to move in the room at their ease.

However, the presence of a significant acoustic coupling between the loudspeaker and microphone will generate a loud echo that would make conversation a little difficult. The remedies to these problems is the removal of the echo with an echo suppression or echo cancellation algorithm..

However, the echo suppressor has a foremost disadvantage as it supports only half-duplex communication. Half-duplex communication allows only one speaker to talk at a time. This drawback motivated technocrats to the invention of echo cancellers. An important aspect of echo cancellers is that full-duplex communication can be sustained, which permits both speakers to talk at the same instant. The three basic components of an echo canceller are an adaptive filter, a doubletalk detector and a non linear processor. The adaptive filter generates an almost exact replica of the echo and subtracts it from the combination of the produces echo and users speech.

The doubletalk detector senses the doubletalk. Doubletalk occurs when both users speak simultaneously which stops the adaptive filter in order to avoid divergence.. In order to avoid clipping, a noise gate is used as a non linear processor . The noise gate allows a threshold value to be set and all signals below the threshold are removed. This action makes sure that only residual echoes were removed in the last stage. Till date, the real time implementation of AEC is performed by utilizing both a VLSI processor and a DSP processor. Since there has been an advancement in computing field, all essential algorithms are implemented in MATLAB.

Acoustic Echo in Telephony

Advent of hands free telephony and teleconferencing has enabled users to communicate with others without holding the device with their hand during conversation. However, in such cases, numerous detrimental phenomena can significantly harm the quality of speech being communicated. Acoustic echo is perhaps the most troublesome amongst those. Acoustic echo is produced where loudspeaker and microphone of a same device get acoustically coupled (fig. 1.2).when a cell phone is set on hands free off , sound wave coming out of speaker doesn’t have sufficient power level to be sensed by microphone of the same cell phone.

In this scenario power level of speech wave is so less at output of speaker that we can’t listen to the far end talker without holding the cell phone with its speaker near our ears (fig. 1.1). In this case by the time the sound reaches to the microphone of the same cell phone it becomes practically insensible. Therefore, if near end talker speaks in the microphone, only his/her sound will be sensed and effect of far end talker’s speech through speaker of near end talker wouldn’t be substantial.

Near end talker in hands free off mode ( no acoustic coupling)

Now assume a situation where the near end talker has set his/her speaker hands free. Sound wave coming out off speaker of cell phone will now affect microphone adversely. In case of hands free mode power level of sound emitted from speaker of near end talker would be several times larger than that of former case. Having traveled some distance sound wave will reach to microphone and will be picked by the same.

Therefore, along with near end talker’s voice an additional signal due to hands free environment will propagate through channel and will be received by far end talker. as the received signal contains two components; one signal is the speech of near end talker and other is the delayed version of his/her own signal arisen due to hands free environment.

Due to above phenomenon far end talker listens to his own voice after considerable amount of time as an echo. Conditions may become more severe if near end talker is sitting in a room having several reflecting objects or having poor acoustic immunity .It is know that if the time interval between echoes of near end talker doesn’t exceed 1/10th of a second it can go unnoticed for far end talker. More over, as in mobile communication environment user can move any where in the room the appearances of echoes would be different for different locations in the room. This dynamics prompted engineers to carry out time varying modeling of acoustic echoes by estimating echoes path.

Acoustic Echo Modeling

Echo is a phenomenon wherein a time delayed and distorted copy of an actual sound is reflected back to the source. With rare exceptions, conversations occur in the presence of echoes. Echoes of our speech are heard as they are reflected off the floor, walls and other objects in the proximity. If a reflected wave reaches after a very little time of direct sound, it is assumed as a spectral distortion or reverberation. Nevertheless, when the leading edge of the reflected wave arrives a few tens of milliseconds after the original sound, it is perceived as a distinct echo. Since the advent of telephony echoes have been an issue in communication networks.

In particular, echoes can be generated electrically due to impedance mismatches at various points along the transmission medium. The most important parameter in echoes is called as end-to-end delay, which is also known as latency. Latency is the time between the generation of the sound at one end of the call and its reception at the other end. Round trip delay, which is the time taken to reflect an echo, is approximately twice the end-to-end delay. Echoes become annoying when the round trip delay go beyond 30 ms. Such an echo is nominally heard as a hollow sound. Echoes ought to be loud enough to be heard.

Those less than thirty (30) decibels (dB) are unlikely to be noticed. However, when round trip delay go beyond 30 ms and echo strength becomes more than 30 dB, echoes become steadily more severe. However, not all echoes degrades voice quality. In order for telephone conversations to sound comfortable, callers must be able to hear themselves speaking. For this reason, a short instantaneous echo, termed side tone, is deliberately inserted.

The side tone is coupled with the caller’s speech from the telephone mouthpiece to the earpiece so that the line sounds connected.[9] Mathematically if x(t) is the original signal then it’s one of the components of echo can be represented as ax(t-t1) where a is the attenuation factor and t1 is the delay encountered by the sound after reflecting from a surface. In case of multiple path available for reflection the composite signal at the input of microphone can be written.

C(t) = x(t) + a1 x(t-t1) + a2 x(t-t2) + a3 x(t-t3) + ……......an x(t-tn) (1)

‘Where’ c(t) is composite signal , x(t) is original signal a1, a2, a3, an…. are attenuations suffered by sound from corresponding paths and t1, t2, t3, t4…….tn are underlying delays. Since today in almost all cases digital technology prevails so the representation of a sound wave is carried out by sampling and quantizing the electric voltage signals at twice the nyquist rate and hence the composite signal without AEC would be

C(k) = x(k) + a1x(k-k1) + a2 x(k-k2) + a3 x(k-k3) +…. an X(k-kn) + n(k) (2)

‘Where’ additional term n(k) is the noise due to digitization of analog voltage signal.

Room Impulse Response and it’s Estimation

When a human being speaks in front of a microphone in an open atmosphere having no nearby objects practically no problem of echo is observed because sound traveling in open atmosphere won’t get reflected. But in case of a closed room (fig. 2.1) microphone receives multiple signal including direct one. It may be assumed that there exist a system whose input is a original speech signal at NET and output is the signal received by microphone.

Acoustic modeling of room to estimate room impulse response

If a person produces acoustic impulse in front of microphone microphone won’t receive that impulse directly it would rather receive the signal coming from different paths having reflected off different surfaces. So the signal sensed by microphone would be as that of shown in fig. 2.2 where a series of time delayed impulse would come in the picture.

Noise Gate as a NLP

Noise gate is used as a NLP, which is a type of dynamic processor. Noise gates belong to the category of expanders. As the name suggests, it boost up the dynamic range of a signal so that low-level signals are attenuated significantly while the higher-level signals are neither attenuated nor amplified. The noise gate expansion can be taken to the extreme where it will greatly attenuate the input or eliminate it completely leaving only silence. While expanders are immensely difficult to use effectively, noise gates are a very simple and effective way of reducing the apparent noise level in audio signals.

The noise gate provides a method of turning down the gain of an audio signal when the signal value falls below some threshold value. The threshold value needs to be large enough that only the background noise goes below but not so high that the audio signals are cut off unnecessarily. Noise gates are too often used to extricate noise or hiss that may otherwise be amplified.

References

[1] J. Benesty, T. Gansler, D.R. Morgan, M.M. Sondhi and S.L. Gay, “Advances in Network and Acoustic Echo Cancellation”, Springer-Verlag, 2001.

[2] Whitaker Jeery and Benson Blair K., "Standard Handbook Of Audio Engineering",. McGraw-Hill, 2nd Ed.,2001.

[3] https://www.ti.com/lsds/ti/dsp/overview.page