|
Definition
Voice morphing means the transition of one speech signal into another. Like image
morphing, speech morphing aims to preserve the shared characteristics of the starting
and final signals, while generating a smooth transition between them. Speech morphing
is analogous to image morphing. In image morphing the in-between images all show
one face smoothly changing its shape and texture until it turns into the target
face. It is this feature that a speech morph should possess. One speech signal
should smoothly change into another, keeping the shared characteristics of the
starting and ending signals but smoothly changing the other properties. The
major properties of concern as far as a speech signal is concerned are its pitch
and envelope information. These two reside in a convolved form in a speech signal.
Hence some efficient method for extracting each of these is necessary. We have
adopted an uncomplicated approach namely cepstral analysis to do the same. Pitch
and formant information in each signal is extracted using the cepstral approach.
Necessary processing to obtain the morphed speech signal include methods like
Cross fading of envelope information, Dynamic Time Warping to match the major
signal features (pitch) and Signal Re-estimation to convert the morphed speech
signal back into the acoustic waveform. INTROSPECTION
OF THE MORPHING PROCESS Speech
morphing can be achieved by transforming the signal's representation from the
acoustic waveform obtained by sampling of the analog signal, with which many people
are familiar with, to another representation. To prepare the signal for the transformation,
it is split into a number of 'frames' - sections of the waveform. The transformation
is then applied to each frame of the signal. This provides another way of viewing
the signal information. The new representation (said to be in the frequency domain)
describes the average energy present at each frequency band.
Further analysis enables two pieces of information to be obtained: pitch information
and the overall envelope of the sound. A key element in the morphing is the manipulation
of the pitch information. If two signals with different pitches were simply cross-faded
it is highly likely that two separate sounds will be heard. This occurs because
the signal will have two distinct pitches causing the auditory system to perceive
two different objects. A successful morph must exhibit a smoothly changing pitch
throughout. The pitch information of each
sound is compared to provide the best match between the two signals' pitches.
To do this match, the signals are stretched and compressed so that important sections
of each signal match in time. The interpolation of the two sounds can then be
performed which creates the intermediate sounds in the morph. The final stage
is then to convert the frames back into a normal waveform. <<back |