NeuraLSP: Non-invasive Neural Language Decoding

Updated 4 February 2026

NeuraLSP is a non-invasive neural language decoding framework that transforms EEG-recorded handwriting attempts into coherent sentences using a dual-stage, curriculum-guided approach.
It integrates a CNN-based letter classifier and a fine-tuned BART LLM to achieve over 30% Top-1 letter accuracy while generating fluent sentences.
The system outperforms previous BCI methods by enabling full-alphabet recognition and natural interaction, paving the way for practical, real-time assistive communication.

NeuraLSP is a two-stage, curriculum-based framework for non-invasive neural language decoding that translates EEG-recorded handwriting attempts into coherent sentences. The system integrates a neural letter classifier with a curriculum-guided LLM, enabling high-fidelity translation of neural activity into text across the full English alphabet. NeuraLSP represents the first non-invasive brain-computer interface (BCI) approach to achieve both full-alphabet letter recognition and fluent sentence synthesis, advancing practical BCI communication technologies for individuals with speech or motor impairments (Jiang et al., 29 Jan 2025).

1. System Architecture and Data Flow

NeuraLSP operates through a dual-module pipeline that transforms EEG signals, captured during naturalistic “paper-writing” hand movements, into textual sentences via a staged neural decoding and generative AI process:

Stage 1: Neural Letter Classifier Raw EEG data and matched handwriting trajectory are encoded as feature representations. A convolutional neural network (CNN) analyzes EEG power spectral density (PSD) features, while a ResNet18 backbone processes (x, y) trajectory data on a 28×28 temporal grid. The cross-modal encoder learns to align EEG and trajectory representations using a contrastive loss:

$\text{loss}_{CL} = 1 - \cos^2(\theta_{EEG}, \theta_{Traj})$

The output is a softmax probability vector over the 26 alphabet letters.

Stage 2: Curriculum-Based LLM The top-K letter probability sequences are input to a pretrained BART-based sequence-to-sequence model, fine-tuned with progressively noisier letter sequences. Curriculum learning (CL) stages expose the model to letter-level error rates $c_i$ ranging from 10% to 90%, teaching denoising and robust correction of realistic EEG decoding errors.

The overall process can be summarized as: EEG + trajectory $\to$ CNN/ResNet18 $\to$ letter distribution $\to$ LLM (BART) $\to$ sentence.

2. Signal Acquisition, Preprocessing, and Feature Extraction

EEG Acquisition:

64-channel EEG (10–20 system), 1000 Hz sampling, synchronized to pen-down events during Wacom tablet handwriting.

Preprocessing:

Band-pass (1–70 Hz) and notch (50/100/150 Hz) filtering, independent component analysis (ICA) for artifact rejection, average re-referencing, and baseline correction. Epoched data spans [–1, +3] s around handwriting onset.

Feature Extraction:

EEG epochs are transformed via fast Fourier transform (FFT) into PSD(f) per channel.

$\text{PSD}(f) = \frac{1}{N} \left| \sum_{t=0}^{N-1} x(t) e^{-i2\pi ft/N} \right|^2$

Handwriting trajectories are min-max normalized onto a 28×28 grid, with intensity modulated by temporal order.

3. Neural Decoding Pipeline and Mathematical Formulation

EEG Encoder:

Given $X \in \mathbb{R}^{T \times C}$ , the encoder $f_{enc}$ produces $p \in \mathbb{R}^{26}$ :

$c_i$ 0

Loss Functions:

Neural decoding training combines cross-entropy for letter classification and contrastive loss for EEG–trajectory alignment:

$c_i$ 1

with

$c_i$ 2

Letter Probabilities and Curriculum Sampling:

For each letter $c_i$ 3, class probabilities from $c_i$ 4 samples are averaged as:

$c_i$ 5

Letter tokens are then sampled proportionally within the top-K for robust LLM input.

Curriculum Learning:

Training corpus $c_i$ 6 is divided into $c_i$ 7 stages, each with letter corruption rate $c_i$ 8.

LLM Fine-tuning:

The BART model is optimized to recover target sentences given noisy letter input sequences:

$c_i$ 9

4. Dataset, Training Protocols, and Evaluation

Participants and Data Collection:

32 (final n=28) healthy right-handed English speakers wrote all 26 letters 25 times each (total 650 trials per subject, ≈2 hours/session) with simultaneous EEG and digital trajectory capture.

Training Procedures:

Stage 1 neural classifiers (CNN, LSTM, Transformer variants) trained subject-wise, Adam optimizer (lr $\to$ 0), batch size 64, 50 epochs. Stage 2 LLMs (BART-base/large) fine-tuned on 1,320 prompt-response samples, with curriculum noise rates ( $\to$ 1), AdamW (lr $\to$ 2), batch size 16, 10 epochs.

Performance Metrics and Benchmarks:

Stage	Model	Top-1 Letter Acc.	BLEU-4	CER	WER
1	CNN w/ CL	33.1% ± 11.5	—	—	—
2	BART-large (CL)	—	44.4%	38.9%	46.7%

CNN with CL outperforms LSTM (≈25%) and Transformer (≈21%) backbones on Top-1 accuracy.
BART-large with curriculum produces higher BLEU-4 and ROUGE-L, and lower CER and WER versus non-curriculum and baseline methods.
Top-K sampling analysis indicates $\to$ 3 optimizes the trade-off between diversity and noise reduction.

5. Comparison to Previous BCI Systems

Non-invasive EEG spellers (e.g., ICA+EEGNet air-writing) were limited to recognition of only 9 symbols at ≈44% accuracy.
Invasive ECoG-based systems achieve $\to$ 490% Top-1 accuracy for all 26 letters but require surgical implantation.
NeuraLSP surpasses prior non-invasive approaches by (a) achieving $\to$ 530% Top-1 letter accuracy and (b) generating coherent, fluent sentences through an integrated LLM generative pipeline.

6. Discussion, Limitations, and Future Research

Neurophysiological Insights:

Gamma-band activity ( $\to$ 630 Hz) in prefrontal and parietal cortices is most informative for discrimination among letters. Dimensionality reduction (UMAP) reveals that letters with similar motor patterns (e.g., “TWY,” “BFE”) form distinct neural clusters.

Usability:

User studies report that the “paper-writing” protocol is natural and virtually fatigue-free, with average per-letter latency ≈200 ms and complete sentence synthesis requiring an additional ≈100 ms.

Limitations:

All training and evaluation were within-subject—cross-subject transfer not addressed. Only offline, trial-wise sentence decoding was demonstrated; continuous online operation remains untested. The modest dataset size per participant (650 trials) constrains model robustness, and LLM fine-tuning is computationally intensive.

Prospective Directions:

Plans include online EEG collection for end-to-end, real-world BCI validation; domain-adaptive transfer learning; multimodal integration (EMG, fNIRS); and lightweight, on-device LLM distillation to enable low-latency, portable decoding.

7. Significance and Implications

NeuraLSP demonstrates that curriculum-supervised neural letter classifiers, when fused with a generative LLM framework, enable full-alphabet, non-invasive neural language decoding with sentence-level coherence (Jiang et al., 29 Jan 2025). This paradigm addresses key limitations of both prior non-invasive and invasive BCI paradigms by supporting scalable, user-friendly language communication, and establishes a foundation for future research into multi-modal, curriculum-robust neural decoding systems in practical assistive communication contexts.

Markdown Report Issue Upgrade to Chat

References (1)

Neural Spelling: A Spell-Based BCI System for Language Neural Decoding (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to NeuraLSP.