Sub-audible speech is a new form of human communication that uses tiny neural impulses (EMG signals) in the human vocal tract instead of audible sounds. These EMG signals arise from commands sent by the brain’s speech center to tongue and larynx muscles that enable production of audible sounds. Sub-audible speech arises from EMG signals intercepted before an audible sound is produced and, in many instances, allows inference of the corresponding word or sound. Where sub-audible speech is received and appropriately processed, production of recognizable sounds is no longer important. Further, the presence of noise and of intelligibility barriers, such as accents associated with the audible speech, no longer hinder communication.

Neural signals are consistent, arising from use of a similar communication mechanism between (sub-audible) speaker and listener. This approach relies on the fact that audible speech muscle control signals must be highly repeatable in order to be understood by others. These audible and sub-audible signals are intercepted and analyzed before sound is generated by air pressure using these signals. The recognized signals are then fed into a neural network pattern classifier, and near-silent or sub-audible speech that occurs when a person “talks to himself or to herself” is processed. In this alternative, the tongue and throat muscles still respond, at a lowered intensity level, as if a word or phrase (referred to collectively herein as a “word”) is to be made audible, with little or no external movement cues present. This approach uses a training phase and a subsequent word recognition phase.

In one alternative, EMG signals are measured on the side of a subject’s throat, near the larynx, and under the chin near the tongue, to pick up and analyze surface signals generated by a tongue (so-called electropalatogram, or EPG signals). This approach uses a training phase and a subsequent word recognition phase. In the training phase, the beginning and end of a sub-audible speech pattern (SASP) is first determined for each spoken instance of a word in a database. This includes words in a window of temporal length 1 to 4 seconds each (preferably about 1.5 seconds) that are provided and processed. A signal processing transform is applied to obtain a sub-sequence of transform parameter values, which become entries in a matrix. The two matrix axes may represent scale factors and time intervals associated with a window. The matrix is tessellated into groups of cells (e.g., of rectangular or other shape), with each cell represented by a feature value for that cell, and the cell features are rearranged as a vector. Weighted sums of the vector components are formed and subsequently used as comparison indices. In the word recognition phase, a SASP including an unknown word is provided and sampled, as in the training phase.

This work was done by C. Charles Jorgensen of Ames Research Center and Bradley J. Betts of Computer Sciences Corporation. NASA invites companies to inquire about partnering opportunities and licensing this patented technology. Contact the Ames Technology Partnerships Office at 1-855-627-2249 or ARC-TechTransfer Refer to ARC-15519-1.

NASA Tech Briefs Magazine

This article first appeared in the February, 2016 issue of NASA Tech Briefs Magazine.

Read more articles from this issue here.

Read more articles from the archives here.