The media industry is a major source of growth for speech technologies. It also constitutes a real-life technological testing ground.
The publishing industry exemplifies this growing demand, with the rapid expansion of digital books over the last few years. Voxygen is committed to providing a full suite of modules to vocalize digital books of all kinds. This effort revolves largely around bringing to market voices capable of conveying diverse expressive styles.
Building on our expertise in designing natural voices, we are striving to expand our catalog of multi-expressive voices. Their tone can be altered at will, simply by inserting tags into the text feed. We also develop pre-processing modules that can isolate and format specified textual items. Beyond raw text and depending on the type of book, these items may include mathematical equations, chemical formulas or computer code.
Voxygen also enhances the reading experience through synchronous text highlighting, letting the user know at glance what point the voice has reached. This functionality has proved especially beneficial to those learning how to read and to foreign-language students.
Of course, the ability to faultlessly vocalize any kind of text carries over to the web, where can be applied to any kind of content, from news reports to RSS feeds and blogs.
Voxygen-powered TTS also finds its way into more playful applications, such as electronic greeting cards that are read out loud by a voice of the user's choosing.
Finally, radio and TV are showing growing interest in TTS. Here again, our technology serves to vocalize a variety of contents, including upcoming programs, breaking news, commercials and classifieds.
Furthermore, the market automatic transcription of audio recordings is now mature, and Voxygen offers a range of services to real valuation of content.
Thus, the available technology allow to:
- make a segmentation of the acoustic flow into coherent segments (music, speech microphone, speaker, telephone, music + voice)
- perform speakers segmentation and tracking within the speech parts
- train specific speakers models in order to perform speaker identification
- perform a full transcription of the speech parts, using acoustic models adapted to the type of speech (telephone or microphone) as well as language models adapted to the style and speech domain to index
These tools may be used to achieve automatic indexing large amounts of audio/video content (files, Web content, etc ...), automatically extract named entities connected to specific a taxonomy/existing ontology defined by the client, etc ...