U.S. Air Force Seeks Voice-Transformation Technology

Voice transformation is one part of the Terminator's arsenal that the U.S. Air Force would like to have available. Researchers are being solicited to help ordinary human airmen disguise their voices—even to sound like another person altogether.

This could be accomplished with voice transformation algorithms that can also detect transformed voices.

Latest Videos From

Watch full video here:

The goal of this phase is to research techniques to analyze a person [sic] voice for voice transformation. While voice transformation have [sic] been around for awhile, the ability [sic] to transform a person's voice to a target voice is not yet solved. Parameters such as the speaking rate, stress, and intonation will provide broad parameters for modeling a person's voice. A finer grain analysis of a person's voice may also be performed by de-convolving an audio signal into its glottal pulse and vocal tract information.

Transforming a speaker's voice so it is unrecognizable may be less difficult than you might think. Studies were conducted in 1980 in which subjects were tested on their ability to recognize a group of 53 voices, 29 of which were actually familiar to the listener. In the study, 31 percent of speakers could be identified with a single word, 66 percent from a single sentence, but only 83 percent from a full 30 seconds of speech. So, for some of the time (or for some speakers), voices are just hard to recognize consistently.

Formant spectra: the coarse structure of the different parts of speech. "Formant" refers to the regions of concentration of energy, prominent on a sound spectrogram, that collectively constitute the frequency spectrum of a speech sound. This is the most common target of voice transformation algorithms, which work by constructing a map between the formant spectra of the two voices
Prosodic features: These are aspects of speech that vary from person to person, like fundamental pitch of the voice, timing—the patterns and rhythms of speech.
Mannerisms: This refers to word choices and preferred phrases and other high-level behaviors. For example, someone from New Jersey might imitate the voice of someone from Arkansas perfectly, but still fail to convince a listener owing to a failure to select the right phrases.

Vocaloid Voice—Soul Singing Synthesis Are you tired of listening to poor quality voice synthesis? Had enough of those monotone recitations of email? Wondering what to do with all those song lyric sites strewn across the Internet? You're in for a treat.
EtchASound—Picture Your Voice Hands-free Etch-A-Sketch in 3D; looks like great fun.
Mr. T Pities The Fool Who Won't Turn Left It's not enough that a GPS-equipped car can talk to you and give directions; it should do it with celebrity voices.

Bill Christensen catalogues the inventions, technology and ideas of science fiction writers at his website, Technovelgy. He is a contributor to Live Science.