noisy signal to update the noise estimate, Q'.

Today's DSP technology allows us to design such a system and make it cost-effective. Fig 1 shows a diagram of a spectral-subtraction system that enhances noisy speech in real time. I implemented it in a commercial product whose key components include 40-MHz TMS 320C26 DSP chips, 8 kbytes of external ROM and 16 kbytes of external SRAM. The algorithm, however, is also suitable for off-line (post-processing) operation on a personal computer.

The unit samples a noisy signal and converts it to the frequency domain using an FFT (fast Fourier transform). To preserve the phase of the input signal and reduce clicks and pops on the output, it uses a special windowing method based on overlap added frames. The overlapping process provides a frame size somewhat less than the number of points in the FFT.

The system also provides input samples to a speech detector that controls the noise reference update. When the detector determines that speech is not present in the input, it takes a "snapshot" of the current frame and stores it as the most recent noise estimate. It then subtracts this estimate from the current frame. It is entirely possible that the noise estimate for a given frequency might be larger than the total signal a moment later, and the result of the subtraction would be a negative value. Of course, the power level at a given frequency can't be less than zero, so if this occurs the algorithm sets the value to zero. The final step is to transform the resulting frame backinto the time domain using the inverse FFT, then convert the samples into an analog signal with a digital-to-analog (D/A) converter.

The spectral subtraction method works well whenever the original speech signal has a signal-to-noise ratio of around 0 dB or greater. The spectrum plots in Fig 2 show the results of processing a noisy speech signal with spectral subtraction enhancement. The original signal came from a Kenwood TS-850S transceiver and consisted of an SSB speech transmission corrupted with normal atmospheric noise. The processed signal clearly shows the effect of noise reduction.

While spectral subtraction works quite well to reduce white or atmospheric noise from radio transmissions, it really stands out as a superior reducer of impulse noise, such as automotive ignition noise, power line noise, computer noise, static, static crashes, etc. This method of noise reduction maintains a full audio bandwidth while removing noise, thus eliminating the "voice in a barrel" effect or audio "surging" noted with other noise-reduction techniques.

Other Potential Uses

Good applications for spectral subtraction include radio communication, where atmospheric noise, man-made noise and static interfere with voice transmissions, and nonradio communication, where acoustic noise from the speaker's surroundings can corrupt the speech signal. Speech-coding techniques aimed at bandwidth compression also benefit from the removal of noise using this method.

Spectral subtraction has limited application in the enhancement of nonspeech waveforms, such as music or high-speed data, because such signals don't usually provide the periods of silence necessary for the system to update the noise estimate. The short-time characteristics of these signals differ considerably from those of speech, as well. However, spectral subtraction can provide significant benefit for CW or RTTY signals, which do present momentary periods of silence. Of course, an appropriate signal detector must be written for these modes.

Disadvantages of Spectral Subtraction

Even though spectral subtraction is a very powerful tool, it does have disadvantages. The ultimate performance depends upon the ability of the detector to discriminate between speech and noise, and positive detection of speech in noise is a subject unto itself. In addition, spectral subtraction is extremely computationally intensive and requires a processor capable of 5 to 10 MIPS (million instructions per second) to obtain acceptable real-time operation. Finally, the process allows artifacts into the resulting signal that some users find objectionable. As DSPs become faster and faster, better speech detectors can be constructed and many of these problems can be eliminated.


For many applications, this process is quite satisfactory. It is single ended, meaning that it requires only one input signal—the noisy speech itself. It has a minimal effect on clean speech and provides far more intelligibility than simply bandlimiting the signal in an effort to improve the S/N ratio. Finally, it works well when the noise characteristics change with time, because it updates the noise estimate whenever speech is not present. I I I

0 0

Post a comment