Spectral Subtraction for Eliminating Noise from Speech

A superior DSP technique removes noise from speech signals.

By Doug Hall, KF4KL

If you operate on any of the HF bands you'll quickly notice that they all have one thing in common: noise. When signals are strong, noise isn't much of a problem, but as the signal strength drops the noise begins to mask the information contained in the signal. The proliferation of low-cost DSP chips has encouraged efforts to create noise reduction equipment based on digital signal-processing techniques.

The concept of noise reduction from audio is basically quite simple: take the noisy signal and subtract the undesirable noise to provide a clean representation of the desired information. However, when the desired signal is speech,

JPS Communications PO Box 97757 Raleigh, NC 27624-7757

the problems can become quite complex. One noise-reduction technique commonly used recognizes that speech has a limited bandwidth, so the frequencies above and below the used region can be filtered out. While this may improve the S/N ratio, studies show that reducing a speech signal's bandwidth to even 1800 Hz can cause a 50% reduction in intelligibility. This is due in part to attenuation of the unvoiced, or fricative, portions of speech, as well as many of the sibilant ("s") sounds. To circumvent these time-domain problems, design engineers are turning to frequency-domain techniques.

When a signal can be described mathematically, several DSP-based options for noise removal become available. These options range from simple digital band-pass filtering to more complex methods such as autocorrelation and adaptive filtering. They are relatively straightforward to implement and work quite well for any signal type, such as CW and the various data modes.

However, if you can't easily describe the signal mathematically, then the task of noise reduction becomes quite complex. Such is the case with human speech, an extremely complex waveform whose production we still don't completely understand. When speech is corrupted by noise, as often occurs in radio communication, reduction of the noise may improve intelligibility or simply make the signal less stressful and tiresome to listen to. Also, some systems that encode speech for transmission at low bit rates require clean speech, ahd any noise that exists at the input can cause errors in the encoded oiitput.

Spectral Subtraction

One very effective frequency-domain method for enhancing noisy speech is spectral subtraction, which is based on short-term Fourier analysis. Even though a speech signal is not periodic, it can be dealt with by breaking it up into short segments and processing each segment separately as if it were. This is possible because certain properties of speech, such as pitch and amplitude, change very slowly with time. As long as a segment is not overly long, these assumptions are valid.

Let's consider a digitized speech segment s(n) corrupted by uncorrelated noise q(n). For the noisy speech signal, y(n)=s(n)+q(n), you can use the Fourier transform to obtain Y(eJu>)= S{eJ<°)+Q{§-#»), where Y, S and Q represent the short-time spectra of y, s and q. Now if you subtract the short-time noise spectrum from the noisy speech spectrum, the result is the un-corrupted original speech-segment spectrum S(ei®). In practice, however, we have to deal with the fact that in radio communication the only signal available is the noisy speech signal, YteJ®). How can you isolate the noise? By taking advantage of the fact that speech contains pausfes between words and syllables, noise can be determined since the signal during these pauses equals the noise [Y(eim)=Q(e>)]. Another fact worth noting is that the spectral characteristics of the uncorrelated noise occurring just prior to a word or syllable are generally the same as they are during that word or syllable. In other words, the noise spectrum changes more slowly than the speech spectrum.

Noting these effects, the equation for the total signal's spectrum can be rewritten as Y(e>)=S(e>)+Q'(ei®) +E{eJu>), where Q'XeJfy is the most recent estimate of the noise spectrum measured in the absence of speech, and E(eJm) is the error—hopefully small—resulting from the use of this estimate. By manipulating this equation, you can develop a formula for the enhanced speech signal: S/{eJ">)= Y(ej^)-Q'(,eJ<°), where S'(ei»)=S(e>) +E(ej™).

Now, how do we determine when speech is present in the noisy speech segment Y(eja)? There are several methods for determining the presence of speech, ranging from simple threshold detectors to elaborate banks of filters and detectors. If we examine the characteristics of human speech, we will note that most speech energy is concentrated in the region below 800 Hz and that syllables generally occur at rates of less than 10 per sec ond. Using this information, we can construct a detector which indicates the presence or absence of speech in a

Noisy Speech

0 0

Post a comment