A speech BCI decodes the neural signals associated with attempted or imagined speech — the brain's commands to the tongue, lips, jaw, larynx, and respiratory muscles — and translates them into text or synthesized audio. Speech BCIs represent the frontier of BCI communication, achieving speeds (62+ WPM) that approach natural conversational rates and far exceed what cursor-based typing BCIs can achieve.
Neural Basis
Speech production involves coordinated activity across multiple cortical areas:
- Ventral premotor cortex / Broca's area: Speech planning and sequencing
- Primary motor cortex (ventral/lateral): Commands to articulatory muscles (tongue, lips, jaw, larynx)
- Supplementary motor area: Speech initiation and sequencing
- Somatosensory cortex: Sensory feedback from articulatory organs
Speech BCIs typically record from the ventral portion of precentral gyrus (motor cortex face/mouth area) using either intracortical electrodes or high-density ECoG grids.
Key Demonstrations
- Moses et al. (2021): UCSF/Chang lab decoded attempted speech from ECoG in a patient with anarthria (inability to speak due to brainstem stroke). Achieved 15 WPM with a 50-word vocabulary. First demonstration of a speech neuroprosthesis.
- Willett et al. (2023): Stanford/BrainGate decoded attempted speech from intracortical recordings in motor cortex at 62 WPM with 23.8% word error rate (reduced to 9.1% with language model). Record speed for intracortical speech BCI.
- Metzger et al. (2023): UCSF decoded attempted speech from ECoG at 78 WPM using a combination of neural decoding and large language model postprocessing. Also demonstrated decoding of facial expressions for emotional communication.
Decoding Pipeline
A typical speech BCI pipeline:
- Neural recording: Capture activity from speech motor cortex during attempted speech
- Feature extraction: Extract relevant neural features (spike rates, high-gamma power)
- Phoneme/articulatory decoding: RNN or transformer maps neural features to phonemes or articulatory gestures
- Language model: A language model (n-gram, RNN, or LLM) corrects errors and produces fluent text
- Output: Display text and/or drive a speech synthesizer for audio output
Significance
Speech BCIs have the potential to restore real-time conversational communication for people with ALS, locked-in syndrome, and brainstem stroke — conditions that progressively destroy the ability to speak while leaving cognitive function intact. The rapid progress from 15 WPM (2021) to 78 WPM (2023) suggests that natural-rate conversational BCIs are within reach.