As a part of a R&D team at Linagora, I have been working on several Speech based technologies involving Voice Activity Detection (VAD) for different projects such as OpenPaaS:NG to develop an active speaker detection algorithm or within the Linto project (The open-source intelligent meeting assistant) to detect Wake-Up-Word and vocal activity.
Speech is the the most natural and fundamental mean of communication that we (humans) use everyday to exchange information. Furthermore with an average of 100 to 160 words spoken per minute this is the most efficient way to share data – far exceeding typing (~40 words per minute).
From now on we’ll be working in an other domain. The frequency domain.
They return an array of values in opposition to the previous features.
Here some example of our use of VAD.