Briefing
|
- Significant Text-to-Speech Improvement – Chinese search giant Baidu developed second iteration of text-to-speech (TTS) system, Deep Voice 2, producing hundreds of hours of speech in hundreds of voices, improved from one voice speaking for 20 hours accomplished by original Deep Voice program
- Better Than Alternatives – Deep Voice 1 and 2 generates indistinguishable human-like speech and run in real-time, compared to alternative neural TTS systems
- Fast Learning – Learns to imitate human voice with less than 30 minutes of audio recording, compared to many hours of audio training in original version
- Multiple Voice and Accent System – Identifies shared characteristics of different voices to generate human model and tweaks it to speak different characters and accents
- Applications – Useful for voice assistants and text-to-speech applications, such audio readers in e-books
|
Accelerator
|
|
Sector
|
Information Technology
|
Organization
|
Baidu
|
Source
|
-
Popper, B., "Baidu’s new text-to-speech system can master hundreds of accents",
-
David, E., "Baidu’s text-to-speech AI can replicate hundreds of accents",
-
Arik, S., et. al., "Deep Voice: Real-time Neural Text-to-Speech",
-
"Deep Voice 2: Multi-Speaker Neural Text-to-Speech,"
-
AcceleratingBiz analysis
|
Original Publication Date
|
May 24, 2017
|