Speech

The appearance of “hands-free” vocal interfaces in technology, ranging from device controls to search, marks the beginning of a long term trend in digital productivity tools and puts speech recognition and transcription (along with natural language understanding) at the forefront of AI research. At LINAGORA Labs, we are continually improving the algorithms behind our speech-to-text engine and speech generation models. The French language is our primary focus, though we are working to expand our offerings to a variety of European languages while maintaining GDPR standards for user privacy and data autonomy.

Keyword Detection

Our approach to keyword spotting focuses on developing reliable, multi-platform, small-footprint open source software to detect wake words in streams of spoken language. To this end, we develop a training methodology to easily produce detection models for customized keywords, along with packaged, ready-to-use implementations of these models for target platforms. This training methodology is based on a rigorous comparison of state of the art algorithms for data processing, including neural net architectures, to achieve a balance between performance and accuracy.

Command Models

To respond to requests such as Turn on the lights in the meeting room and questions such as How is the weather in Toulouse?, our LinTO assistant uses high performance command models tailored to specific business use cases. By curating a small, targeted vocabulary, we optimize the accuracy and computational efficiency of the command model by reducing the size of its language model component. The resulting smaller, customized command models can then be embedded in IoT devices in a way that allows the voice data to remain as close as possible to its source, thus respecting user privacy.

Large Vocabulary Models

Available in both streaming and offline versions, our large vocabulary models are designed to transcribe extended open-domain dialogue with a focus on handling spontaneous, multi-party conversations of the sort encountered in business meetings. These interactions pose a number of challenges for advanced speech recognition systems, which are generally trained on grammatically correct text and speech: noisy recording conditions, high levels of disfluency (e.g. hesitations, repetitions, false starts), and overlapping speech. Our LinSTT system exploits a hybrid speech recognition model that combines an acoustic model (Deep Neural Network) and a language model (Hidden Markov Model).

Speech Generation

Text-to-speech technology aims to develop artificial voices for speech-based user interactions. Our research and development focus on creating natural sounding models using the latest technologies and deploying server-side embedded text-to-speech services in our products. Additional challenges include achieving a balance between voice quality and speech generation speed while handling large concurrent API requests, as well as developing voices for a variety of languages.

Related Projects

Products

Publications

2023

Transcribing And Aligning Conversational Speech: A Hybrid Pipeline Applied To French Conversations

Workshop on Automatic Speech Recognition and Understanding (ASRU).

Hiroyoshi Yamasaki,#Jérôme Louradour, #Julie Hunter, Laurent Prévot

#Language, #SUMM-RE, #Speech

Read article

2022

No posts this year

2021

Weakly Supervised Discourse Segmentation for Multiparty Oral Conversation

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Lila Gravellier, #Julie Hunter, Philippe Muller, Thomas Pellegrini, Isabelle Ferrané

#Language, #LinTO, #SUMM-RE, #Speech

2020

LinTO Platform: A Smart Open Voice Assistant for Business Environments

The 1st International Workshop on Language Technology Platforms (IWLTP)

#Ilyes Rebai, #Kate Thompson, #Sami Benhamiche, #Zied Selami, #Damien Lainé, #Jean-Pierre Lorré

#Speech, #Language, #LinTO

2019

Char+CV-CTC: Combining Graphemes and Consonant/Vowel Units for CTC-Based ASR Using Multitask Learning

20th Annual Conference of the International Speech Communication Association (Interspeech)

Abdelwahab Heba, Thomas Pellegrini, #Jean-Pierre Lorré, Régine Andre-Obrecht

#Speech, #LinTO

2017

Lexical Emphasis Detection in Spoken French using F-BANKs and Neural Networks

The 5th International Conference on Statistical Language and Speech Processing, (SLSP)

Abdelwahab Heba, Thomas Pellegrini, Tom Jorquera, Régine André-Obrecht, #Jean-Pierre Lorré

#Speech

Improving Speech Recognition Using Data Augmentation and Acoustic Model Fusion

The 21st International Conference on Knowledge Based and Intelligent Information and Engineering Systems (KES2017)

#Ilyes Rebai, Yessine BenAyeda, Walid Mahdia, #Jean-Pierre Lorré

#Speech

Speech

Keyword Detection

Command Models

Large Vocabulary Models

Speech Generation

Related Projects

Products

Publications

Transcribing And Aligning Conversational Speech: A Hybrid Pipeline Applied To French Conversations

Weakly Supervised Discourse Segmentation for Multiparty Oral Conversation

LinTO Platform: A Smart Open Voice Assistant for Business Environments

Char+CV-CTC: Combining Graphemes and Consonant/Vowel Units for CTC-Based ASR Using Multitask Learning

Lexical Emphasis Detection in Spoken French using F-BANKs and Neural Networks

Improving Speech Recognition Using Data Augmentation and Acoustic Model Fusion

Blog Posts

Voice Activity Detection for Voice User Interface.

Tuning of parameters for decoding in automatic speech recognition

Training of a speech recognition model for the Spanish language

Pourquoi modéliser la conversation orale spontanée reste un défi de taille ?

Site Map

LINAGORA Websites