The ability to reliably transcribe spoken conversation opens up the possibility of exploiting transcribed data for tasks that require advanced language understanding capacities, ranging from automatic summarization, to more fruitful dialogues with artificial assistants, to full-fledged situated interactions with assistants able to exploit information from the visual context during conversation. In collaboration with academic and industrial research partners, our team is strongly invested in developing innovative models of language understanding that draw on our solid experience in machine learning and a hybrid approach to research that brings linguistic expertise to bear on machine learning algorithms.

Text mining

Our research in text mining aims at providing smart features for a variety of products at LINAGORA, including Twake, OpenPaaS and the LinTO platform. By extracting and structuring data from these platforms — emails, chats and conversation transcripts — we are able to train algorithms for advanced features. These include automatic email classification, which sorts emails into predefined folders, smart reply, which relies on an ontology describing a set of intentions and a set of templates to propose possible responses to emails, and priority inbox, which pulls from a set of generic linguistic rules and statistics concerning the user’s email interactions to predict the urgency of an email. Our current focus is on developing similar smart features for chat threads and meeting transcripts.

icon language

Automatic Summarization

Our research on automatic summarization has led to improved models of lexical importance and discourse similarity that allow us to more reliably identify the utterances most central to a conversation. To extend this work to models capable of producing detailed summaries and meeting minutes, we are currently working on algorithms to track how utterances relate to one another in a conversation:  does an utterance serve to answer a question that was asked, to provide an explanation of something that was said, or to correct or disagree with an argument that was put forward, for example? Identifying such relations involves integrating our work on summarization with our work on dialogue and discourse modeling.

Dialogue Modeling

Drawing on our team’s expertise in modeling discourse structure, our research on dialogue extends work on discourse parsing for text and chat to model multi-party, spoken conversation. Progress in discourse parsing is greatly hindered by a dearth of annotated conversational data as well as a need for linguistic expertise for exploiting it. We are currently tackling both of these problems with an approach to weak supervision that allows expert annotators to study a small but representative sample of data and write labeling rules that can be used to automatically annotate large data sets. This approach allows us to easily incorporate heterogeneous sources of information that can be useful for dialogue modeling, from discursive cues to acoustic information.

language icon

Multimodal Dialogue

A final aspect of our work on language concerns the multimodality of face-to-face conversations, or even video conferences, in which gestures or other meaningful movements, as well as objects and actions visible in the context, can be semantically relevant.  Understanding how the nonlinguistic context adds content to a conversation, and conversely, how the content of a conversation can help us understand what is going on in the visual scene will be crucial for developing models of conversation sophisticated enough to facilitate natural conversation between humans on the one hand and assistants or embodied agents, such as collaborative robots, on the other.

Related Projects

project-linto logo
project-cocobots logo


logo Open Paas



Conversational Programming for Collaborative Robots

ICRA Workshop on Collaborative Robots and Work of the Future

Maike Paetzel-Prüsmann, #Julie Hunter, Kranti Chalamalasetti, #Kate Thompson, Alexandros Nicolaou, Ozan Güngör, David Schlangen and Nicholas Asher


Read article

FrugalScore: Learning Cheaper, Lighter and Faster Evaluation Metrics for Automatic Text Generation

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL)

Moussa Kamal Eddine, #Guokan Shang, Antoine Tixier, Michalis Vazirgiannis


Read article

Political Communities on Twitter: Case Study of the 2022 French Presidential Election

PoliticalNLP, LREC 2022

Abdine Hadi, Guo Yanzhu, #Virgile Rennard, Vazirgiannis Michalis


Read article


Spoken Language Understanding for Abstractive Meeting Summarization

Institut Polytechnique de Paris

#Guokan Shang

#Language, #LinTO

Read more

Weakly Supervised Discourse Segmentation for Multiparty Oral Conversation

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Lila Gravellier, #Julie Hunter, Philippe Muller, Thomas Pellegrini, Isabelle Ferrané

#Language, #LinTO, #SUMM-RE, #Speech

Read more


LinTO Platform: A Smart Open Voice Assistant for Business Environments

The 1st International Workshop on Language Technology Platforms (IWLTP)

#Ilyes Rebai, #Kate Thompson, #Sami Benhamiche, #Zied Selami, #Damien Lainé, #Jean-Pierre Lorré

#Speech, #Language, #LinTO

Read more

Modelling Structures for Situated Discourse

Dialogue & Discourse, vol. 11 (1): 89-121

Nicholas Asher, #Julie Hunter, #Kate Thompson

#Language, #LinTO

Read more

Speaker-change Aware CRF for Dialogue Act Classification

The 28th International Conference on Computational Linguistics (COLING)

#Guokan Shang, Antoine Jean-Pierre Tixier, Michalis Vazirgiannis, #Jean-Pierre Lorré

#Language, #LinTO

Read more

Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding

The 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL-IJCNLP)

#Guokan Shang, Antoine Jean-Pierre Tixier, Michalis Vazirgiannis, #Jean-Pierre Lorré

#Language, #LinTO

Read more


Meeting Intents Detection Based on Ontology for Automatic Email Answerings

IC 2019: Journées francophones d’Ingénierie des Connaissances

Manon Cassier, #Zied Sellami, #Jean-Pierre Lorré

#Language, #OpenPaaS::NG

Read more

Weak Supervision for Learning Discourse Structure

Conference on Empirical Methods in Natural Language Processing (EMNLP)

Sonia Badene, #Kate Thompson, #Jean-Pierre Lorré, Nicholas Asher

#Language, #LinTO

Read more

Learning Multi-Party Discourse Structure Using Weak Supervision

The 25th International Conference on Computational Linguistics and Intellectual Technologies (Dialogue)

Sonia Badene, #Kate Thompson, #Jean-Pierre Lorré, Nicholas Asher

#LinTO, #Language

Read more

Data Programming for Learning Discourse Structure

The 57th Annual Meeting of the Association for Computational Linguistics (ACL)

Sonia Badene, #Kate Thompson, #Jean-Pierre Lorré, Nicholas Asher

#LinTO, #Language

Read more

Apprentissage faiblement supervisé de la structure discursive

Conférence sur la Traitement Automatique des Langues Naturelles (TALN)

Sonia Badene, #Kate Thompson, #Jean-Pierre Lorré, Nicholas Asher

#LinTO, #Language

Read more


Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization

The 56th Annual Meeting of the Association for Computational Linguistics (ACL)

#Guokan Shang, Wensi Ding, Zekun Zhang, Antoine J.-P. Tixier, Polykarpos Meladianos, Michalis Vazirgiannis, #Jean-Pierre Lorré

#LinTO, #Language

Read more

Blog Posts

Next Word Prediction: A Complete Guide

Construction of 360° images dataset for image recognition

Data augmentation for Natural Language Understanding

Ontology-based Meeting Intents Detection for Automatic Email Answering

Pourquoi modéliser la conversation orale spontanée reste un défi de taile ?