Using scanning technology, artificial intelligence, and speech synthesizers, scientists have transformed brain patterns into understandable verbal language – an advancement that can ultimately give voice to those without a voice.
It's a shame that Stephen Hawking does not see that, maybe because he got a real kick. The new language system developed by researchers at the Columbia University's Neural Acoustic Processing Lab in New York City is something the deceased physicist might have benefited from.
Hawking had amyotrophic lateral sclerosis (ALS), a motor neurone disease removed his verbal speech, but he continued to communicate with a computer and a speech synthesizer. With a cheek switch attached to his glasses, Hawking could preselect words on a computer that were read by a speech synthesizer. It was a bit boring, but it allowed Hawking to produce about a dozen words per minute.
But imagine that Hawking does not have to manually select and fire the words. In fact, some individuals, whether they have ALS syndrome or Locked-In syndrome or are recovering from a stroke, may not have the motor skills needed to control a computer, even if only one Change of cheek there. Ideally, an artificial language system would capture a person's thoughts directly to generate speech, eliminating the need to control a computer.
The new research published today in Scientific Advances brings us an important step closer to this goal. However, instead of grasping the inner thoughts of a person for the reconstruction of speech, it uses the brain patterns generated in hearing speech.
To develop such a lingual neuro-prosthesis, neuroscientist Nima Mesgarani and his colleagues combined recent advances in depth learning with speech synthesis technologies. Their resulting brain-computer interface, while still rudimentary, picked up brain patterns directly from the auditory cortex, which were then decoded by an AI-based vocoder or speech synthesizer to produce intelligible speech. The speech sounded very robotic, but nearly three quarters of the audience could recognize the content. It is an exciting progression that can ultimately help people who have lost their ability to speak.
It is clear that Mesgarani's neuroprosthetic device does not translate directly into words the hidden language of an individual, ie the thoughts in our minds, also called imaginary language. Unfortunately, we are not yet scientifically there. Instead, the system captured a person's individual cognitive responses as they listened to the pictures of the talking people. A deep neural network was then able to decode or translate these patterns so that the system can reconstruct the language.
"This study continues a recent trend in the application of deep-learning techniques for decoding neuronal signals," said Professor Andrew Jackson Neuronal Interfaces at Newcastle University, who were not involved in the new study, Gizmodo said , "In this case, the neuronal signals are detected by the human brain surface during epilepsy surgery. Participants hear different words and phrases that are read by actors. Neural networks are trained to learn the relationship between brain signals and sounds, thus reconstructing intelligible word / phrase replicas based only on the brain signals.
Epilepsy patients were selected for the study, as they often do undergo brain surgery. With the help of Ashesh Dinesh Mehta, a neurosurgeon at the Northwell Health Physician Partners Neuroscience Institute and co-author of the new study, Mesgarani recruited five volunteers for the study Experiment. The team used invasive electrocorticography (ECoG) to measure neuronal activity while patients were listening to continuous speech sounds. For example, the patients listened to speakers who recited numbers from zero to nine. Their brain patterns were then fed into the AI-capable vocoder, resulting in the synthesized speech.
The results were very robotic but quite understandable. In tests, listeners were able to correctly identify spoken numbers in 75 percent of the time. They could even tell if the speaker was male or female. Not bad and a result that was even a "surprise" for Mesgaran, as he told Gizmodo in an e-mail.
Here are some pictures of the speech synthesizer (the researchers tested different techniques, but the best result was achieved by combining deep neural networks with the vocoder).
The use of a speech synthesizer in this context was important to Mesgarani, as opposed to a system that can match and recite previously recorded words. As he explained to Gizmodo, language is more than putting together the right words.
"Since the goal of this work is to restore speech communication to those who have lost their ability to speak, we wanted to learn the direct mapping from the brain signal to the speech sound itself," he told Gizmodo. "It is possible to decode also phonemes [distinct units of sound] or words. However, the language contains much more information than just the content – such as the speaker [with their distinct voice and style] intonation, the emotional tone, etc. Therefore, our goal in this article was to restore the sound itself. "
Looking to the future, Mesgarani wants to synthesize more complicated words and phrases and collect brain signals from people who just think or imagine the act of speaking.
Jackson was impressed by the new study, but he said it still was not clear if this approach would be applied directly to brain-computer interfaces.
"In the paper, the decoded signals reflect the actual words the brain hears. To be useful, a communication device would need to decrypt words that the user imagines, "Jackson told Gizmodo. "Although brain areas often overlap in speech, speech, and speech, we still do not know exactly how similar their brain signals will be."
William Tatum, a neurologist at the Mayo Clinic, was also not involved in the new study , said research was important in that it first used artificial intelligence to reconstruct speech from the brainwaves involved in producing known acoustic stimuli. The significance is remarkable, "because it promotes the application of deep learning in the next generation of more well-designed language-producing systems," he told Gizmodo. He said, however, that the sample size of the participants was too low, and the use of data taken directly from the human brain during an operation was not ideal.
Another limitation of the study is that neural networks are involved in reproducing more than just words from zero to nine would need to be trained on a large number of brain signals from each participant. The system is patient-specific because we generate all different brain patterns when listening to speech.
"It will be interesting in the future to see how well decoders trained for one person generalize to other people," Jackson said. "It's a bit like early speech recognition systems that had to be individually trained by the user, unlike today's technologies like Siri and Alexa, which can re-understand everyone's voice using neural networks. Only time will tell if these technologies could one day do the same for brain signals. "
There is undoubtedly still much to do. However, the new paper is an encouraging step on the way to implantable lumbar prosthetics.