I remember the days of trying to use Dragon Naturally Speaking to transform my speech into text on Windows XP -and it was terrible. Fortunately, speech recognition technology has improved drastically; especially over the past few years. It is no longer for just business or super tech savvy folks but also for every-day people with every-day devices.
According to NEOSPEECH (a text to speech SaaS), the speech recognition market is forecast to grow from $3.7 billion a year to around $10 billion by 2022. This is also evident with the rise of cloud connected devices such as the nVidia Spot and Amazon Echo.
Speech recognition platforms work by converting spoken words into text that an application can interpret (speech to text). The software takes that text-command and does the appropriate actions; after which transforms a text response into speech (text to speech or TTS).
Good news for developers is that there are multiple open-source speech recognition toolkits currently available on the market. Some of them being:
- Kaldi – Originally released in 2011 and is constantly maintained. Kaldi is written in C++.
- CMUSphinx -A group of several speech recognition systems developed by Carnegie Mellon University. Sphinx is programmed with Java.
- Simon – A toolkit built upon other speech recognition platforms with an easier to use interface. Simon is written in C++.
Using these toolkits, developers can develop a multitude of applications. TTS has been especially popular in the e-learning market. Building apps that read stories out loud to children or even help a migrant-worker learn the basics of a new language is all possible thanks to speech recognition platforms.
To Learn More: