Creating Spanish Applications

This article describes a Spanish application created to work with Sphinx. You can get more accurate recognition results using Jaivox libraries in combination with Google's web based recognizer. Most of the details in this article are not relevant for a web-based recognizer.

Jaivox includes an application that is similar to other applications but where users interact in Spanish. It uses the Voxforge Spanish model.

All the programs and data files are included in the apps/spanish directory of the Jaivox library. It needs the Spanish Voxforge model from Sphinx Sourceforge.

The main thing added by Jaivox is the dialog or conversation model. Dialogs are needed to utilize speech recognition in applications.

Quick Start

You can check the dialogs created by Jaivox using the files in the apps/spanish/es_data directory. To see the questions that can be asked by the user, please see road.sent in this directory.
The dialog can be tested without Sphinx or the Spanish audio model from Voxforge. You will be typing in questions and getting the answers on the screen.

Run
terminalTest

which allows you to type questions. Type in questions in road.sent (and variations on those questions); the program will produce responses as in a conversation.

If you have installed the Voxforge audio models and Sphinx, you can test recognition ability with similar questions spoken through the mike. To do so, run
liveTest
from the same directory (apps/spanish/es_data). The program here is similar to apps/1st, connecting to data contained in road.txt.

Creating Spanish applications

Jaivox dialogs can be created in any language. This process is illustrated here in an example using Spanish. But similar things can be done, especially with other language models from Voxforge.

The Jaivox download includes a tutorial example in apps/test. Here we modify that example to make a similar one in Spanish.

The test example is generated using
java com.jaivox.tools.Jvgen test/test.conf

The file test.conf refers to various other files

  • road.txt some data about roads
  • road.dlg the dialog model, which includes errors.dlg
  • road.spec data specs that describe the data and values for grammar tags
  • common_en.txt a list of common words in English
  • penn.txt a list of common grammar tags (like NN means "noun")

Corresponding to these, the Spanish example translates some of the above
files. We did not change road.txt keeping the English road names.

  • road_es.dlg Spanish dialog model, includes Spanish errors_es.dlg
  • road_es.spec data specs and grammar tags in Spanish
  • common_es.dlg common words in Spanish

These are referenced in a configuration file spanish.conf.

(You do not need to generate the files since they are already generated and saved in apps/es_saved. But if you want to recreate the Spanish example, the first step is to create
the program using
java com.jaivox.tools.Jvgen spanish/spanish.conf
This creates the spanish/es directory and several files within that directory.

Since we have modified the programs in spanish/es to create spanish/es_saved in the following, assume you are working in the es_saved directory.

Some further modifications of the files here are necessary because we need to use a different audio model for Spanish.

We are using
voxforge-es-0.1.1
which can be downloaded from various places, including Sphinx Sourceforge.

To follow the arrangement we have used for the WSJ audio model, we organize some of the files here
a bit differently.

Some information about using this model can be found on a
CMU page about using models trained with SphinxTrain..

In this case, a trained model is included in the download. To organize files for our application, first of all we place the audio model voxforge_es_sphinx.cd_cont_1500 in our classpath. This directory is contained in the voxforge-es-0.1.1/model_parameters of the files downloaded for the Spanish model.

We add the dictionary voxforge_es_sphinx.dic contained originally in voxforge-es-0.1.1/etc to a directory voxforge_es_sphinx.cd_cont_1500/dict to be consistent with the WSJ model's arrangement.

We modify live.xml generated by Jvgen to create live_es.xml that refers to the dictionary and other parameters corresponding to the Spanish audio model. There are several places that are changed. You can compare a generated live.xml with live_es.xml to see all the differences. In particular, the loader has to be configured to the use the model created by voxforge. Specifically the loader should have a vectorLength parameter, set to the value 13. Without this setting, Sphinx gets an an array out of bounds exception.

Corresponding to the use of live_es.xml, liveTest.java is modified to use this xml file instead of live.xml.

We have included the language model here, but if you want to create it yourself, you can run

sh lmgen.sh

(assuming you have all the components needed for the language model and sphinxbase.)

To compile live.java

sh complive.sh

To run the program

sh runlive.sh

There are a few problems that need to be solved at this point.

1. The Spanish dictionary does not contain many of the words used in the dialogs. These include simple words as well as foreign words like the street names.

2. We are using freeTTS for synthesizing the answers. However freetts does not come with a Spanish voice. Even festival, as it is released now, does not come with a Spanish voice, though it is possible to get an old version of a Castillian Spanish speaker's voice for festival.

3. The example here has to be adapted to use the data properly, the same way that the test example was adapted to in the "1st" example.

Using festival Spanish voice

The current version of festival does not seem to come with a Spanish voice, but there is an older version that is currently (April, 2013) available on the web from
The Centre for Speech Technology Research at the University of Edinburgh.

What you need is the file
festvox_ellpc11k.tar.gz

After unpacking this file, you should get

$ tar -ztvf festvox_ellpc11k.tar.gz
-rw-r--r-- awb/cstr    2061153 1998-07-08 12:14 festival/lib/voices/spanish/el_diphone/group/ellpc11k.group
-rw-r--r-- awb/cstr      10892 1999-06-18 14:12 festival/lib/voices/spanish/el_diphone/festvox/el_diphone.scm
-rw-r--r-- awb/cstr      21903 1999-06-10 01:28 festival/lib/voices/spanish/el_diphone/festvox/spanlex.scm
-rw-r--r-- awb/cstr       3439 1999-06-10 01:27 festival/lib/voices/spanish/el_diphone/festvox/spanint.scm
-rw-r--r-- awb/cstr       8834 1999-06-18 14:13 festival/lib/voices/spanish/el_diphone/festvox/sptoken.scm
-rw-r--r-- awb/cstr       2038 1999-06-10 01:32 festival/lib/voices/spanish/el_diphone/COPYING

As super user, copy the spanish directory here to /usr/share/festival/lib/voices to create /usr/share/festival/lib/voices/spanish containing the directory el_diphone.

You can use apps/spanish/agents.conf to generate an agent-based system that uses the
spanish voice for festival. The generated files should be modified to work with the Voxforge Spanish corpus.

We have all the required modifications already in spanish/agents_saved

The files generated for agents_saved/sphinx and agents_saved/festival are modified slightly. For agents_saved/sphinx/sphinxTest.java, we copy over the live_es.xml from the es_saved directory to use instead of the generated road.config.xml.

In the agents_saved/festival directory, we modify CxxResponder.cc to include an
initialization for the spanish voice

In CxxResponder::handleFestival

	if (!festival_initialized) {
	    festival_initialize (load_init_files, heap_size);
		festival_eval_command("(voice_el_diphone)"); // added this line
		festival_initialized = 1;
	}

Now compile and execute the agents in each of the directories in agents_saved.

First start the interpreter with: java interTest
Then start festival with: /.festivaltest
Then start the sphinx server with: java sphinxTest

(each in its respective window of course, as in the Jaivox tutorial, see towards the end on how to test the agents.)

Assuming that something is recognized by sphinx, a message will be sent to inter, which will then send a response to festival. You should then be able to hear the response in a male (Castillian) Spanish voice.

Translation notes

The following notes are regarding the translations done in the various files to convert from English to Spanish.

In answer_es.txt, I changed "I guess" to "Creo que," which actually means "I think/believe that." "I guess" simply doesn't sound like something that would be said very often in Spanish. I also included the phrases meaning both "It seems like" and "it seems to me like"; there is no difference between "it looks like" and "it seems like" (in this context) in Spanish. I deleted "apparently," because while Spanish does have an equivalent to that, it is not a word that seems to be terribly common in Spanish.

Instead of "that is the case," I have "eso es" (which means 'that's right') and "es así" (literally 'it is like that'), since a literal translation of this phrase would sound somewhat odd in Spanish. Instead of "the answer is yes," I have simply "sí"; a literal translation "la respuesta es sí" is also possible, but I would have had to perform a Google search to figure that out. "Creo que sí" (I think so) is surely much more common than "Creo que la respuesta es sí" (I think the answer is yes) anyway (but if desired, we may also add in "la respuesta es sí"). "That is true" may be either "es verdad" or "es cierto." (Note that I did not actually translate "that" in any of these sentences).

I added "es equivocado" meaning 'it is wrong'. I also added "No entiendo su pregunta" (I do not understand your question) as well as "No puedo entender su pregunta" (I cannot understand your question). Instead of "Sorry about being so dense," I included the phrases for: "I'm sorry, I cannot figure it out," "I'm sorry but I do not understand," and "I'm sorry but I do not know." For "Well, may be I am not figuring it out right," I have something that means "It is possible that I am wrong," and instead of "Really sorry about this," I have both "I am very sorry" and "I am sorry, I am not sure."

Instead of "is there another way to ask what you need" and "perhaps you can reformulate the question," I wrote the Spanish equivalents of: "is there another way to ask for what you are looking for," "is there another way to ask the question," and "may you be able to repeat the question another way." Under "oneitem," all of the Spanish expressions mean something along the lines of 'maybe' or 'probably', except the last one, which expresses certainty.

English has these two terms "for example" and "for instance"; Spanish does not. Therefore, I included only one Spanish term (por ejemplo) to cover both. The questions under "askanother" are identical to those in "followup"; if desired, the last question in this section may be deleted, since it is asking the listener to repeat the question. Finally, for "well, I don't know," I translated "well" as both "pues" and "bueno" (because I have heard both used that way), and the rest is simply "no sé" ('I don't know'). The expression after that ("lo siento pero no sé") means 'I am sorry but I don't know'.

In road_es.spec, I added a tag WPS. WP modifies singular nouns, whereas WPS modifies plural nouns. I included this tag in order to ensure that "cuáles" is selected to modify only plural nouns and that "cuál" is selected to modify only singular ones. I split "NN" into two categories, "NN-M" for masculine singular nouns (currently only "camino" meaning 'road' or 'path') and "NN-F" for feminine singular nouns (everything else under NN).

Similarly, I have "NNS-M" and "NNS-F" instead of just "NNS"; "JJM-P" for masculine singular positive adjectives, "JJF-P" for masculine feminine positive adjectives, "JJM-PS" for masculine plural positive adjectives, "JJF-PS" for feminine plural positive adjectives, and something similar with "JJM-N," "JJF-N," "JJM-NS," and "JJF-NS"; and "RBR-M," "RBR-F," "RBR-MS," "RBR-FS," "RBS-M," "RBS-F," "RBS-MS," "RBS-FS." I also have "JJM-PB" for the adjective "buen" meaning 'good' (masculine singular), which precedes the noun instead of following it (compare "bueno," which would be the same word occurring after the noun).

In road_es.dlg, I made no differentiation between "are NNS JJ-N" and "are the NNS JJ-N," because these would be expressed the same way in Spanish, and I made sure measures were taken to distinguish masculine from feminine nouns and singular from plural. (I also did not repeat the question "are the NNS JJ-N," which is repeated twice in the English version).

Also, Spanish word order in questions is a little bit more flexible than English word order, so I included two possible word orders. I did not really translate "do the NNS get JJ-N at this time," because as far as I can tell, this is the same in Spanish as "are NNS JJ-N at this time."

Since it is not clear how formal the system needs to be (and levels of formality vary not only with the social situation but also geographically in the Spanish-speaking world), I include both formal and informal versions of the questions (at least in some cases; in other questions, I simply include the formal version). Note also that I am assuming the proper nouns are masculine, especially since they do not sound even remotely like actual road names in Spanish-speaking countries (and masculine is sort of the default gender in Romance languages).

I didn't translate "is NNP DT NN," which won't work in the English either because DT is not specified as a tag. And instead of "is NNP a JJS-P NN," I translated "is NNP *the* JJS-P NN," which makes more sense in English (not just Spanish).

In addition to "WP is a JJ-P NN" and "WP are the JJ-P NNS," I have the equivalent of "WP are some JJ-P NNS" in Spanish, and instead of "WP NN JJ-P," I have the equivalent of "WP NN is JJ-P." In addition to things like "what other JJ-P NN," I have things like "what other NN is JJ-P" and "is there another JJ-P NN." I also added "another NN besides NNP," "what other NN besides NNP," and "what other NN is JJ besides NNP" (where "JJ" is intended to mean both JJ-P and JJ-N). Instead of "any other JJ NN other than NNP," in Spanish, I have the equivalents of "another JJ NN besides NNP" and "is there another JJ NN besides NNP." For "anything ELS," I have "anything else," and I also added "and apart from that" and "what else is there."

Then under "intro," instead of "I guess," "it looks like," "it seems like," and "apparently," I have only "it looks/seems like" and "apparently." (Perhaps these were the only expressions that I thought seemed appropriate in this context?).

In dstates_es.dlg, I added one line that means, "(apology1) An error has come up. Please try again later." ("apology1 se ha producido un error, por favor vuelva a intentarlo más tarde ;") The translation in "repeatyn," as you can probably tell at a glance, is not literal or exactly equivalent to the English. Instead of "again, are you asking," in Spanish, I have the equivalent of "I would like to repeat the question to make sure that I understand it. Are you asking..." Instead of "once more, is your question," in Spanish, I have the equivalent of "I don't know whether I understand the question. Let me repeat it. Is your question..." Also, both "yes, there are more" and "yes, there is more" are translated with the same phrase in Spanish. I have six expressions under yes.continue; I guess I could also include "está bien" which is another way of saying "OK." (There's only one way to say 'yes', though). "No" translates both "nope" and "nah" (and I guess also "not"), and instead of "not," I have "No, that is false."

Instead of "the question is what i asked," I have the equivalents of "that IS what I asked" and "that was my question" (expressed two different ways in Spanish). Instead of "I apologize" and "sorry for this," I have something that means something like "forgive/excuse me" and "please forgive/excuse me." Instead of "I must apologize" and "my apologies," I have "I beg pardon" (or something along those lines) and "I am really sorry." And a more literal translation of the most extreme apologies would be "a thousand pardons" and "I really am very sorry." In the last line of this file, the last sentence specifically says that the system is having TECHNICAL problems. Hopefully, that is not a problem.