Baidu Research Achieves Speech Recognition Breakthrough With "Deep Speech"

New Scalable Deep Learning System Aims to Improve Speech Recognition in Noisy Environments


SUNNYVALE, CA--(Marketwired - Dec 18, 2014) - Baidu Research, a division of Baidu, Inc. (NASDAQ: BIDU), today announced initial results from its Deep Speech speech recognition system.

Deep Speech is a new system for speech, built with the goal of improving accuracy in noisy environments (for example, restaurants, cars and public transportation), as well as other challenging environments (highly reverberant and far-field situations).

Key to the Deep Speech approach is a well-optimized recurrent neural net (RNN) training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allowed Baidu researchers to efficiently obtain a large amount of varied data for training.

Earlier this month, tests demonstrated the following:

  • Deep Speech outperformed previously published results on the widely studied Switchboard Hub5'00 benchmark, obtaining 16.5% Word Error Rate.
  • Deep Speech outperformed public web APIs (Google Web Speech, wit.ai) as well as commercial systems (Bing Speech Services, Apple Dictation), especially in the regime of speech against noisy backgrounds. In noisy environments, Deep Speech outperforms all of these systems by over 10% (Word Error Rate).

Dr. Andrew Ng, Chief Scientist at Baidu, commented: "Deep learning, trained on a huge dataset -- over 100,000 hours of synthesized data -- is letting us achieve significant improvements in speech recognition. I'm excited by this progress, because I believe speech will transform mobile devices, as well as the Internet of Things. This is just the beginning."

Dr. Dan Jurafsky, Professor of Linguistics and Computer Science at Stanford University, said: "I am enthusiastic about Baidu's new methods for speech recognition, especially the use of elegant models that make the problem simpler and easier to engineer, together with GPUs for speed and scalability. The results suggest some really exciting near-term directions for speech recognition, especially for noisy environments and challenging speech tasks."

"Speech recognition in noisy and reverberant conditions is still a challenging task for state-of-the-art speech recognition systems. This recent work by Baidu Research has the potential to disrupt how speech recognition will be performed in the future," added Dr. Ian Lane, Assistant Research Professor of Engineering, Carnegie Mellon University. "Baidu's innovative work with GPU scaling and large data sets brings us a step closer to the vision of being able to converse naturally with smart devices, appliances, wearables and robots, even in noisy environments."

The Deep Speech results are posted in Deep Speech: Scaling Up End-to-End Speech Recognition.

About Baidu Research
Baidu Research, based in Silicon Valley and Beijing, is led by Dr. Andrew Ng, Chief Scientist. Baidu Research comprises three interrelated labs: the Silicon Valley AI Lab, the Institute of Deep Learning and the Big Data Lab, led by Dr. Adam Coates, Dr. Kai Yu and Dr. Tong Zhang, respectively. The organization brings together global research talent to work on fundamental technologies in areas such as image recognition and image-based search, voice recognition, natural language processing and semantic intelligence (http://research.baidu.com).

Contact Information:

Media Contact:
Calisa Cole
Baidu Research
ccole@baidu.com

Dr. Andrew Ng, Chief Scientist, Baidu