Speech Recognition: a Comprehensive Contactless Lip Reading and Acoustic Analysis Dataset


Comprehensive Dataset for Contactless Lip Reading and Acoustic Analysis

Empowering Speech Recognition: A New Frontier in Technology

Sophisticated new analysis of the physical processes which create the sounds of speech could help empower people with speech impairments and create new applications for voice recognition technologies, researchers say. Engineers and physicists from the University of Glasgow led the research, which scrutinized the internal and external muscle movements of volunteers as they talked using a wide spectrum of wireless sensing devices. The Glasgow team is making the data from 400 minutes of analysis freely available to other researchers to aid the development of new technologies based on speech recognition.

Those future technologies could help people suffering from speech impairment or voice loss by using sensors to read their lips and facial movements and provide them with a synthesized voice. The dataset could enable voice-controlled devices like smartphones to read users’ lips as they speak silently, enabling silent speech recognition. It could also help to enhance voice recognition, improving the quality of video and phone calls in noisy environments. It could even help improve security for banking or confidential transactions by analyzing users’ unique facial movements like a fingerprint before unlocking sensitive stored information.

The researchers discuss how they conducted their multi-modal analysis of speech formation in a paper titled “A comprehensive multimodal dataset for contactless lip reading and acoustic analysis,” in the journal Scientific Data. To gather their data, the researchers asked 20 volunteers to speak a series of vowel sounds, single words and entire sentences while complex scans of their facial movements and recordings of their voices were collected. The team used two different radar technologies—in the protocol of impulse radio ultra wideband (IR-UWB) and frequency modulated continuous wave (FMCW) to image the movement of the volunteers’ facial skin as they spoke, along with the movements of their tongue and larynx.

The University of Glasgow researchers collaborated with colleagues at the University of Dundee and University College London to synchronize and compile the dataset, which they call RVTALL for the radio frequency…

Published by TechWizard