Tools to create speech-enabled applications

The current version of the Jaivox library (Version 0.7a, August 2014) can be used to improve accuracy of recognizers (please see the note below about the new maintenance version on github). We have developed a graphical tool for creating voice user interfaces. This tool, Jaivox Application Generator (JAG version 0.2a), creates speech applications starting from text files containing questions and answers. JAG generates applications that can be used with the Jaivox library to produce new speech applications or to add voice interfaces to existing applications. JAG now generates Android applications.

What is new? There is a new maintenance version, see downloads, it is also on github. This version deals with API changes to the Google speech recognizer. Please see the speech project for the Jaivox library and the gui project for the JAG tool (August 12, 2014). Rajesh has a new way to create hands-free Android apps using the Jaivox Application Generator. The demo is an app to follow a recipe for banana bread. The user can continuously ask questions, presumably while his hands are messy. The same technique can be applied to modify instruction manuals for other hands-free situations (May 12, 2014). An article describes working with noisy speech. It uses a standard noisy speech corpus from University of Texas at Dallas. We show that recognition of noisy speech is bad, even with some of our correction methods (April 14, 2014). The current version of the Jaivox library 0.7 contains some routines for improving recognizer accuracy. Another article shows how to get 99% accuracy with Google's recognizer (March 10, 2014). Another new application describes how to align words in audio with a text transcript (February 12, 2014).

The easiest way to create a speech application is to use the new Jaivox Application Generator.

You can also use the command-line Jaivox generator and Jaivox library to create speech-enabled applications using open source software. Users can talk with these applications. Users can hear simple answers to spoken questions. A tutorial describes an example where users, as in an in-car speech recognition application, can ask about various roads and whether they are fast or convenient. Use the download link to get Jaivox software and to see where to download other required software. All of the software is open source to provide your applications to users without licensing fees. (The Jaivox software you download is under the Apache license.)

How to add speech recognition to your applications

In the picture below, the Generator creates three separate programs (called agents), one for speech recognition, one for speech production and the third for managing conversations. The three programs communicate through sockets, hence they can be located on the same or different computers. All programs use functions from the Jaivox library. The red boxes in the picture indicate things included in what you download, the blue boxes represent things that are generated and the other boxes indicate your applications or third party programs (such as Sphinx, an open source speech recognizer.)

The Jaivox generator can also create non-agent applications where the speech recognizer and synthesizer are combined with the interpreter/dialog manager (In this case, you need FreeTTS, a Java-based synthesizer. You can also use espeak as a program. Another option is to use a web-based TTS such as from Google..)
We also have several experimental speech recognizers for over twenty world languages. The main limitation of these is the lack of large transcribed audio samples. For English, we use some large trained audio models that come with the Sphinx speech recognizer. Although we have tried to use Voxforge models for Spanish and other major languages, the results are not yet very good. However we have used Google's recognizer for many languages with very good results.

(April 2014)