is a French company, specializing in open source software. One of its current projects is the development of the open source smart vocal assistant LinTO
. LinTO helps employees organize and carry out meetings: thanks to its Natural Language Understanding system, it can answer voice commands.
When training a natural language recognition engine, one of the major problems we have to tackle is data scarcity. A large training corpus is generally needed to improve these systems. The task of manually building a corpus is time-consuming and requires a lot of human resources. In this article, we describe the method we developed to create a data augmentation module, able to automatically create alternative commands from a small existing french corpus.