Microsoft's Chief Research Officer, Rick Rashid, recently demonstrated software under development that translated his English remarks into spoken Mandarin Chinese.
Credit: Microsoft Research
Ever wondered what you'd sound like if you were fluent in Chinese, French or another language you don't know? New software that's in development might give you an idea. Microsoft has created a program designed to provide on-the-fly, spoken translations, in the user's own voice.
"We may not have to wait until the 22nd century for a usable equivalent of Star Trek's universal translator," Rick Rashid, Microsoft's chief research officer, wrote in a blog post Nov. 8. Microsoft's translator still makes errors at a noticeable rate, but significantly improves on previous speech translators, Rashid said.
"The results can sometimes be humorous," he said. "Still, the technology has developed to be quite useful."
Rashid presented the software on Oct. 25, getting some of his remarks translated into Mandarin Chinese during a conference held in Tianjin, China. In a video Microsoft posted online, the software's Chinese voice doesn't sound exactly like Rashid, but it does have the same general tone:
One of the biggest challenges in making the software came in getting it to recognize what users say, Rashid said. Computer scientists have been working on this problem virtually since computers were invented, and the fruits of a generation of research include the automated systems that U.S. banks use for call-in customer service ("Please enter or say your account number now"). In those systems, the speech recognizer only has to understand digits and perhaps some menu options, such as "make a transfer" or "bank hours."
It's more difficult for computers to understand freewheeling conversation, however. Until recently, speech-recognizing programs could only understand 75 to 80 percent of the words a person might say during a conversation, Rashid said. Microsoft Research has been working on improving that rate, he said, by using Deep Neural Networks, which are connected networks of computer processors that act a little like the connections between cells in human and animal brains. Google used the same technique this summer to build a computer that taught itself to recognize cat pictures on the Internet.
Microsoft's speech recognizer can correctly identify 86 to 88 percent of the words in arbitrary speech, Rashid said. "While still far from perfect, this is the most dramatic change in accuracy since the introduction of hidden Markov modeling in 1979," he said, referring to a landmark moment in the history of speech-recognition research. Hidden Markov modeling, a statistical technique, allowed researchers to incorporate recordings from several people into their speech models, Rashid explained.
After identifying what the user is saying in English, Microsoft's translator then finds matching words in Chinese and rearranges the words to the grammatically correct order in Chinese, Rashid said.
To train the translator to match his voice, Rashid had to record an hour of himself speaking in English, he said. The software also required a recording of a few hours of a native Chinese speaker.
"There is still much work to be done, but the technology is very promising, and we hope that in a few years we will have systems that can completely break down language barriers," Rashid said.