Improved HRTF-Based Pseudostereophony

Ames Research Center, Moffett Field, California

An improved pseudostereophonic system utilizes digital filtering and head-related transfer functions (HRTFs) to afford tailorability and sound quality not available from older pseudostereophonic systems. Although the term "pseudostereophony" may not be widely known, the underlying concept has been studied and applied for more than four decades. Pseudostereophony is a family of techniques for deriving right and left channels of sound from a single-channel (monaurally recorded) source to give the listener an impression of sound coming from more than one direction.

Figure 1. Sounds Arriving in a Listener's Ears from various directions are measured, then used to compute head-related transfer functions.

The pinnae of human ears affect entering sound in a way that varies with the direction of incidence and thus gives the brain cues to the location of the source. These cues are in addition to the directional cues provided by differences between the times of arrival of signals. The effects of the pinnae can be quantified by HRTFs. In the time domain, an HRTF is an impulse response, as a function of direction of incidence, that is convolved with the incident acoustic signal. In the frequency domain, the HRTF is a magnitude and phase response, as a function of frequency of sound and direction of incidence, that multiplies the frequency-domain complex-amplitude representation of the incident acoustic signal.

The present pseudostereophonic system utilizes HRTFs in the following way: First, the HRTFs of an average human listener are determined experimentally. Thereafter, the digital HRTFs are used, along with delays and optional filtering functions, to digitally synthesize right- and left-channel signals that make a listener perceive the sound as coming from multiple sources in different directions.

Figure 1 depicts an apparatus for measuring HRTFs for five directions of incidence. A human listener sits in an anechoic chamber, with a loudspeaker at ear height directly in front (azimuth 0°) and four other loudspeakers at ear height at azimuths of 90, 120, 240, and 270°. A source of sound is connected to the loudspeaker at 0° and is connected to the other loudspeakers through an amplifier and delay devices. Small probe microphones in the listener's ears measure the entering sound. In principle, as indicated by the dashed lines, the outputs of the right and left microphones could be simplistically fed to right and left loudspeakers, respectively, to generate sounds with perceived directional characteristics. However, instead, the sounds measured by the microphones are used to compute the HRTFs, which are then used in the pseudostereophonic system depicted in Figure 2.

Figure 2. A Monaural Signal Is Digitally Filtered to synthesize right and left signals for a pseudostereophonic effect.

A monaural input signal is fed to an analog-to-digital converter (ADC). The digital signal is distributed on six lines. Lines 1 and 6 couple the signal directly to left and right digital summing devices, while each of lines 2 through 5 passes through an individual digital delay device that corresponds to one of the nonzero azimuth angles. The delays of these devices differ and are set as described below.

The outputs of all the delay devices are multiplied by a common gain factor that can be adjusted by the user. Next, each delayed and multiplied signal is processed by a right and a left finite-impulse-response (FIR) filter that approximates the right and left HRTF, respectively, for the azimuth angle. The resulting left and right HRTF outputs are summed in the left and right digital summing devices, respectively. The summed outputs for the left and right channels are then passed through separate digital-to-analog converters (DACs) to the left and right loudspeakers, respectively.

The delays, gains, and FIR-filter parameters can be chosen to obtain desired psychoacoustic effects. For example, the gain of the delayed signals on lines 2 through 5 can be set at 6 dB, in keeping with an empirical finding that this is the best gain for pseudostereophony with sounds of various types, including speech and music. Alternatively, the user can set the gain at zero for monaural listening, or can set the gain at >6 dB to obtain an exaggerated pseudostereophonic effect.

Criteria for setting delays are somewhat more complex. Parameters to be considered include (1) the interval between the undelayed sound and the first delay, (2) the intervals between succeeding delays, and (3) the time of the final delay, which depends on the previous delays. An important psychoacoustic consideration in choosing delays is that the signals remain below the level of echo disturbance; in this regard, the initial delay has been found to be the most important one. The intervals between successive delays should be within a range so that each delayed sound would not be heard as a separate sound. Typical acceptable values of initial delay are between 15 and 25 ms, and optimum intervals between successive delays are between 5 and 10 ms. In order for the use of HRTFs to yield a sensation of increased auditory spaciousness, the final delay should be at least 30 ms; this is consistent with findings from research into the effect of early reflections in concert halls.

Although the system as described thus far includes four delay and HRTF channels, it could also be constructed with more or fewer delay and HRTF channels. Other possible variations include the use of larger or smaller numbers of coefficients used to approximate the HRTFs in the FIR filters.

In comparison with older pseudostereophonic systems, this system generates sound that is less "colored"; that is, less altered in timbre. This system offers greater flexibility for synthesizing the sound at a wider range of listening positions. The two output channels of the present system can also be mixed to monaural output without disturbing coloration effects that result from phase cancellation. Moreover, the sound gives an increased impression of spaciousness because the two output signals are decorrelated. Yet another advantage is that multiple inputs with differential frequency responses can be distinguished from each other more easily in this system than in a monaural system.

This work was done by Durand R. Begault of Ames Research Center. For further information, access the Technical Support Package

For More Information (TSP) free on-line at www.nasatech.com/tsp under the Electronics & Computers category.

This invention has been patented by NASA (U.S. Patent No. 5,173,944). Inquiries concerning nonexclusive or exclusive license for its commercial development should be addressed to