answered
2017-08-23 10:50:32 +0200
I am hitting the lack of TTS facilities in Sailfish while developing for it. From the developer point of view, we would like to have an API that would allow to synthesize voice prompts in given languages (to WAV or live) and allow to query installed languages. Such API is currently not available for us. In this post, I would like to summarize what I found so far with the hope that it could be useful for others.
In general, after looking into the area a bit, it seems that OpenSource TTS for Linux have a long way to go. See https://opensource.com/life/15/8/interview-ken-starks-texas-linux-fest for some background information.
What makes our situation on SFOS rather complicated is that we are expected to have TTS for many languages, as on other mobile platforms. As mentioned by many others, while espeak does support many languages, its voice output is rather poor, to put it mildly.
At present, we have reasonable coverage for English via Mimic (based on Flite) and few other languages (de, es, fr, it) via PicoTTS (the both are available at openrepos). Those are tools that allow you to generate WAV file from text. Playing WAV file is responsibility of the app requesting it.
As you can see, many languages are missing. In Linux, we could also use MaryTTS (http://mary.dfki.de/) which uses Java to generate speech. As highlighted by Ken Starks (see link above), its probably the best tool available right now for many languages. Which is of no surprise since it uses unit selection technique for many of them. The RAM requirements are probably significant (expect 500 MB RAM, get surprised if less), but phones do get more RAM these days. As for CPU requirements, no idea - haven't tested it. We would need java (non-GUI) to run it, but its probably possible as well.
Now coming back to API: Linux has Speech Dispatcher which seems to be an interface between TTS-requiring apps and TTS synthesis engines. Speech Dispatcher is what Qt Speech (5.9) uses, as far as I can see. Maybe that could be solution for us as well and allow us to specify in one place the preferred TTS synthesizer as well as the preferred voice (male/female, voice model). In theory, it would be possible to make a GUI allowing to manage the voices and engines. Note that some voices could be rather large (100+ MBs).
There are also several companies working in the area and, maybe, that is the way to do it. Several companies have developed TTS solutions for Linux, ARM included. It looks to me that they prefer B2B model, but I haven't been in touch with any of them. Maybe someone in Jolla could contact and ask whether they would be interested in selling their solution for people running SFOS? Ideally, it should allow users of all devices (ported, SFOS from Jolla, SFOS from RU) to purchase the software and languages separately.
With the current developments, I think that TTS is becoming a necessity and an expected way for device to communicate at certain situations.
TTS is missing, who doesnt remember how handy was to listen sms in car from classic symbian? and also https://together.jolla.com/question/31059/feature-request-announce-caller-name/
pan tau ( 2015-02-22 05:44:35 +0200 )editAlso, the voice navigation in Here Maps gives street names, if a TTS is properly configured in the system (and interfaced with Android apps).
Federico ( 2017-08-24 17:48:43 +0200 )edit