Language Recognizers

Jaivox has developed demonstration programs for several languages. The programs use Carnegie-Mellon University's Sphinx tools.

For more details on how to make speech recognizers see How To Make Recognizers.

For each language, Jaivox has developed a "model". A model is a way to tell Sphinx how to recognize speech from that language.

Speech recognizers are trained on some data, then tested on some other data. For each language, we use about 200 sentences of text and corresponding audio recordings in that language, to train a speech recognizer. Then we use ten sentences from the same language of audio recordings to see whether Sphinx can use the training to recognize the text that was spoken in the ten sentences.

You need the data downloaded from here and two tools from Sphinx. To train a speech recognizer you need SphinxTrain. For recognizing the ten sentences you need Sphinx3. Sphinx3 is able to produce phonemes from audio recordings, which we then use to reconstruct the words that match those phonemes.

For each language, our downloaded data contains three subdirectories. For example, suppose you want to try the (modern standard) Arabic recognizer. You need the contents of the directory called "ar" (we use a two letter code for each language.)

ar contains three subdirectories
1. etc - these tell SphinxTrain how to train the speech recognizer
2. wav - these are the audio recordings used for training
3. speech2text - these are files needed for testing.

We have developed our models using SphinxTrain-1.0 and Sphinx3. You need to download these from Carnegie-Mellon's site.

As of Feburary 11, 2011, SphinxTrain-1.0 was available from

Sphinx3 is an "older" tool, however it is necessary for our models.
We use Sphinx3 to "decode phoneme sequences" as described in

To train the Arabic model: Suppose the ar model files you have downloaded from here are in downloads/ar.

Create a new directory, say test and copy our downloaded directory recomp to that.

Training Steps
1. Download SphinxTrain-1.0 to the directory test
2. Build the programs in it. This creates the executable files needed for training.
3. Make sure that your path contains Perl.
4. cd to the test/SphinxTrain-1.0 directory and enter
perl scripts_pl/ ar
5. This will create a new directory test/ar
6. Copy all the files from downloads/ar/etc to test/ar/etc.
7. Copy all the files from downloads/ar/wav to test/ar/wav
8. Copy all the files from downloads/ar/speech2text to text/ar/speech2text (you will need to create this directory in test/ar, however etc and wav directories will be in test/ar before you copy the files.)
9. cd to test/ar then enter perl scripts_pl/
10. If everything goes well, the training will be completed in a few minutes.

To test the arabic model
1. cd to test/ar/speech2text
2. run
3. If everything goes well, you should recreate the file arrecomp.txt that shows the recognition results for the ten recordings in the ar/speech2text directory.