Transcribing And Aligning Conversational Speech: A Hybrid Pipeline Applied To French Conversations
Workshop on Automatic Speech Recognition and Understanding (ASRU).
Hiroyoshi Yamasaki,#Jérôme Louradour, #Julie Hunter, Laurent Prévot
The appearance of “hands-free” vocal interfaces in technology, ranging from device controls to search, marks the beginning of a long term trend in digital productivity tools and puts speech recognition and transcription (along with natural language understanding) at the forefront of AI research. At LINAGORA Labs, we are continually improving the algorithms behind our speech-to-text engine and speech generation models. The French language is our primary focus, though we are working to expand our offerings to a variety of European languages while maintaining GDPR standards for user privacy and data autonomy.
Our approach to keyword spotting focuses on developing reliable, multi-platform, small-footprint open source software to detect wake words in streams of spoken language. To this end, we develop a training methodology to easily produce detection models for customized keywords, along with packaged, ready-to-use implementations of these models for target platforms. This training methodology is based on a rigorous comparison of state of the art algorithms for data processing, including neural net architectures, to achieve a balance between performance and accuracy.
To respond to requests such as Turn on the lights in the meeting room and questions such as How is the weather in Toulouse?, our LinTO assistant uses high performance command models tailored to specific business use cases. By curating a small, targeted vocabulary, we optimize the accuracy and computational efficiency of the command model by reducing the size of its language model component. The resulting smaller, customized command models can then be embedded in IoT devices in a way that allows the voice data to remain as close as possible to its source, thus respecting user privacy.
Available in both streaming and offline versions, our large vocabulary models are designed to transcribe extended open-domain dialogue with a focus on handling spontaneous, multi-party conversations of the sort encountered in business meetings. These interactions pose a number of challenges for advanced speech recognition systems, which are generally trained on grammatically correct text and speech: noisy recording conditions, high levels of disfluency (e.g. hesitations, repetitions, false starts), and overlapping speech. Our LinSTT system exploits a hybrid speech recognition model that combines an acoustic model (Deep Neural Network) and a language model (Hidden Markov Model).
Text-to-speech technology aims to develop artificial voices for speech-based user interactions. Our research and development focus on creating natural sounding models using the latest technologies and deploying server-side embedded text-to-speech services in our products. Additional challenges include achieving a balance between voice quality and speech generation speed while handling large concurrent API requests, as well as developing voices for a variety of languages.
Workshop on Automatic Speech Recognition and Understanding (ASRU).
Hiroyoshi Yamasaki,#Jérôme Louradour, #Julie Hunter, Laurent Prévot
No posts this year
The 1st International Workshop on Language Technology Platforms (IWLTP)
#Ilyes Rebai, #Kate Thompson, #Sami Benhamiche, #Zied Selami, #Damien Lainé, #Jean-Pierre Lorré
20th Annual Conference of the International Speech Communication Association (Interspeech)
Abdelwahab Heba, Thomas Pellegrini, #Jean-Pierre Lorré, Régine Andre-Obrecht
The 5th International Conference on Statistical Language and Speech Processing, (SLSP)
Abdelwahab Heba, Thomas Pellegrini, Tom Jorquera, Régine André-Obrecht, #Jean-Pierre Lorré
The 21st International Conference on Knowledge Based and Intelligent Information and Engineering Systems (KES2017)
#Ilyes Rebai, Yessine BenAyeda, Walid Mahdia, #Jean-Pierre Lorré
Accessibility
visibility_offDisable flashes
titleMark headings
settingsBackground Color
zoom_outZoom out
zoom_inZoom in
remove_circle_outlineDecrease font
add_circle_outlineIncrease font
spellcheckReadable font
brightness_highBright contrast
brightness_lowDark contrast
format_underlinedUnderline links
font_downloadMark links