SUMM-RE aims to combine expertise in theories of discourse interpretation with recent developments in distant supervision to improve the automatic production of meeting summaries and minutes from spoken data. Its guiding hypothesis is that by exploiting information about discourse relations and the rich structures determined by relations between utterances, we can significantly improve models for abstractive summarization.
To test this hypothesis, SUMM-RE will begin by building a 100 hour audio-video corpus of multi-party, meeting-like interactions in French. Then, building on prior work by SUMM-RE members, we will extend the data programming paradigm Snorkel to automatically annotate the SUMM-RE corpus and the AMI corpus, a large meeting-style corpus in English, for discourse structure. The automatically annotated data will then be used to improve algorithms for both short topic summaries and more detailed meeting minutes.
These algorithms in turn will be integrated into LINAGORA’s semi-automatic summarization tool to significantly improve the output for its users. All project results (corpus and algorithms) will be released under an open-source license as a part of LINAGORA’s LinTO/Conversation Manager offer.
ANR PRCI project (ANR-20-CE23-0017)
15 Decembre, 2020 – 14 May, 2024