Human beings extract
a lot of information about their environment using their ears. In order to understand
what information can be retrieved from sound, and how exactly it is done, we need
to look at how sounds are perceived in the real world. To do so, it is useful
to break the acoustics of a real world environment into three components: the
sound source, the acoustic environment, and the listener:
1.
The sound source: this is an object in the world that emits sound waves. Examples
are anything that makes sound - cars, humans, birds, closing doors, and so on.
Sound waves get created through a variety of mechanical processes. Once created,
the waves usually get radiated in a certain direction. For example, a mouth radiates
more sound energy in the direction that the face is pointing than to side of the
face. 2. The acoustic environment: once a sound wave has been emitted, it
travels through an environment where several things can happen to it: it gets
absorbed by the air (the high frequency waves more so than the low ones. The absorption
amount depends on factors like wind and air humidity); it can directly travel
to a listener (direct path), bounce off of an object once before it reaches the
listener (first order reflected path), bounce twice (second order reflected path),
and so on; each time a sound reflects off an object, the material that the object
is made of has an effect on how much each frequency component of the sound wave
gets absorbed, and how much gets reflected back into the environment; sounds can
also pass through objects such as water, or walls; finally, environment geometry
like corners, edges, and small openings have complex effects on the physics of
sound waves (refraction, scattering). 3. The listener: this is a sound-receiving
object, typically a "pair of ears". The listener uses acoustic cues
to interpret the sound waves that arrive at the ears, and to extract information
about the sound sources and the environment.
How
Virtual Surround Works A 3D audio system aims
to digitally reproduce a realistic sound field. To achieve the desired effect
a system needs to be able to re-create portions or all of the listening cues discussed
in the previous chapter: IID, ITD, outer ear effects, and so on. A typical first
step to building such a system is to capture the listening cues by analyzing what
happens to a single sound as it arrives at a listener from different angles. Once
captured, the cues are synthesized in a computer simulation for verification.
What is an HRTF? The
majority of 3D audio technologies are at some level based on the concept of HRTFs,
or Head-Related Transfer Functions. An HRTF can be thought of as set of two audio
filters (one for each ear) that contains in it all the listening cues that are
applied to a sound as it travels from the sound's origin (its source, or position
in space), through the environment, and arrives at the listener's ear drums. The
filters change depending on the direction from which the sound arrives at the
listener. The level of HRTF complexity necessary to create the illusion of 3D
realistic hearing is subject to considerable discussion and varies greatly across
technologies.
HRTF Analysis
The most common method of measuring the HRTF of an individual is to place
tiny probe microphones inside a listener's left and right ear canals, place a
speaker at a known location relative to the listener, play a known signal through
that speaker, and record the microphone signals. By comparing the resulting impulse
response with the original signal, a single filter in the HRTF set has been found.
After moving the speaker to a new location, the process is repeated until an entire,
spherical map of filter sets has been devised.